Wednesday, 8 August 2007

MSN Search API in PHP

Here's some code for accessing the MSN Live Search API from PHP. You can get a developer key here by going to Configure Applications, Create and Manage Application IDs. You'll need a Microsoft Passport to get your key.

You'll need the PHP5 SOAP library enabled (make sure

extension=php_soap.dll

is uncommented in your php.ini).

To run a search on MSN and scrape results, set up a variable called

$msnsoapkey

with your key in it as a string, then call this function with three parameters:

A string of your query - "site:fish.com" or "seo ranter", for example

How many results you'd like, up to a maximum of 50

The offset for the start of results; 0 means give results from number 1 to $querysize; 100 means from 101 to 100+$querysize.


/////
// fetches from search.msn.com results for the query $query, using the API

function fetchMSNResults($query, $querysize, $offset) {
    global $msnsoapkey;
    static $msnsoap;

    // only generate this WSDL proxy once
    if (!isset($msnsoap)) {
        $msnsoap = new soapclient("http://soap.search.msn.com/webservices.asmx?wsdl"); 
    }

    $request = array(
        'Request' => array(
            'AppID' => $msnsoapkey,
            'Query' => $query,
            'CultureInfo' => 'en-US',
            'SafeSearch' => 'Off',
            'Flags' => '',
            'Requests' =>  array(
                'SourceRequest' => array(
                    'Source' => 'Web',
                    'Offset' => $offset,
                    'Count' => $querysize,
                    'ResultFields' => 'Url'
                )
            )
        )
    );

    $response = $msnsoap->Search($request);

    foreach($response->Response->Responses->SourceResponse->Results->Result as $hit) {
        $results[] = $hit->Url;
    }

    return $results;
}

It will return an Array() of Strings, each one containing a result URL. Easy! This is contained by a wrapper function in my code library, which manipulates the

$offset

and

$querysize

to allow for any number of results to be returned at the courtesy of the MSN API; you can figure one out pretty easily if you need.

BuyBlogComments.com review

BuyBlogComments.com is a service launched in July 2007 and run by Jon Waraas, where you can pay to have someone leave relevant blog comments on a bunch of blogs in your industry, with an anchor text and URL of your choosing. This gives a mild SEO benefit (despite most blog comments having rel=nofollow applied) and a bit of traffic toward your site. Many people hire freelancers from the likes of Rent-A-Coder or Get-A-Freelancer to do this kind of work; some call it black hat, though in my opinion, that's a little rude; black hat's often much cleverer.

I tried out this service, with the smallest package of 100 comments for $19.99.

When you apply, the post-purchase page says you'll be contacted within 12-24 hours. Now, I dived in there making a purchase shortly after the launch of the service; I'm not sure why, but it took about 4 days of hearing plain silence before I started being a little concerned, whereupon I emailed Jon, who reassured me that they hadn't forgotten my order, and was appropriately apologetic.

Lo and behold, a few more days later, a spreadsheet arrived with 100 URLs to blog comments. Fantastic, I thought; that was easy and quick!

So, I decided to review this list, to see how BuyBlogComments.com has done. I was a little disappointed by the details; a bit more attention to quality and thoroughness would've saved a few mistakes. I waited one week after receiving the URL list before checking it, to allow bloggers to approve comments.

One of the things that BuyBlogComments.com really pushed themselves on was the relevance of blog comments - "What BuyBlogComments.Com does is pay people to write quality blog comments on quality blogs" - though I found really generic comments, such as "dana: truly valuable post" (that's the entire comment! see mikes5starstore.typepad.com). The reporting was also really unclear in places; sometimes the links led to blank pages, or just the blog homepage - this is tough when there are a lot of posts. For example, I was given this link - see if you can guess which post the comment's on:

http://sheabutterblog.blogspot.com/ (shea butter)

The entire blog is about "shea butter", with hundreds of posts!
At other times, there would be a link to a blog that's utterly irrelevant. I tested the system out with a fashion/apparel themed site; I got left this comment. The topic's not relevant (mattresses are not fashionable, despite what your local bed retailer thinks!), the grammar's bad, the spelling's actually off - how people manage to misspell commercial copywriting these days is beyond me - and the comment itself has had no thought applied to it.

BuyBlogComments.com also suggests that "They are hand written by excellent english (sic) speaking people in America and Canada. We have a really great system here at Buy Blog Comments." That may be true; I don't see any claim to having native English speakers, which might explain the low rate. Of course that's not a problem until grammar and spelling mistakes appear, which they do. I'd much rather have a guarantee that comments will be syntactically correct than a guarantee of the location of the posters.

Another thing I noticed was that some comment URLs were actually duplicated within the list. That's right, I only received 97 unique URLs. I contacted Jon about this (as well as a few other mistakes) and he didn't provide any kind of answer or compensation.

Anyway, my biggest problem was that 33 of the 97 comments simply were not there. There's a bit on the legal part of the Buyblogcomments.com site, which Jon kindly referred me to, that says they only guarantee the posting of comments, and not their approval. Doesn't this sound like a loophole to allow them to just give you 100 blog post URLs sans comment and say "well, we submitted one!"? I'm quite stunned at the failure rate and how badly blogs were chosen; some were obviously dead/slow blogs that were pointless to comment. A paid service should, at the very least, review these factors before commenting, to ensure a successful comment approval:

Comment should contribute to the post
Post should have been made within the past week
Other comments on other posts should have been approved recently
Post shouldn't say "I am going on holiday for a while"
Comment should be grammatically correct and well-spelled
Post should already have some approved comments

There were definite cases where one or more of these suggestions had been completely ignored. I voiced my opinions to Jon, and after hearing nothing for a week, kindly prodded him; he referred me to their legal page with reference to unapproved comments, and offered nothing more in the case of duplicate / irrelevant comments.

Type	Quantity	Comments
Dupes	3	Where one comment's URL has been listed twice as a "successful" comment
No URL	1	Comments where there's no link
No comment	28	Comment has not been approved / never submitted
Not relevant	3	Comments on irrelevant posts / blogs
OK	56	Successfully placed comments
Blank page	1	The comment URL goes to a blank page - broken site?
Ambiguous URL	8	Couldn't find the comment, URL was given to a large and busy blog instead of the individual post

I haven't reported on really short comments that bore no relevance; there were too many to account for, and BuyBlogComments.com already seemed to be doing so badly that I gave up trying to get this point corrected. They make up about 30%-40% of the comments that actually got published.

As for results: the site that was used for testing had an unusually great day for sales when the report arrived, easily making back the outlay on comments. However, we've seen less that 20 clicks to date from all the blog comments in total, over a four week period from placing the order. I'm not sure if the sales have any link to the comments; the referral URLs were from natural search and Wikipedia. There has been a small but noticeable boost for the term used for the blog comments, so that's a mild plus in favour of the service, too.

So, in summary;

Pros

Saves getting ones hands dirty with a freelancer
Delivery in under ten days
Cordial and professional support
Was cheap ($19.99 for "100" comments)
Possibly great for short term sales
Some SEO benefit
Site is easy to use

Cons

Almost 50% failure rate (only 56 out of 100 comments ever made it)
Comments are badly written (grammar and spelling mistakes)
Comments are often not relevant ("great post!" - a bot could apply this kind of content at a much lower rate; we're paying to avoid such a low quality level)
Not always applied to relevant blogs
Contact can be slow (took over a week from payment to delivery)
Some duplicate reporting means you won't even get as many comment submissions as you pay for(only 97 unique URLs were in my report, when there should've been 100)
Price has gone up ($24.99 for "100" comments; with the same success rate as I had, that's 45 cents a comment.)
Next to no traffic (would you click through to see the site behind comments this inconsequential?)

I won't be returning; the yield simply isn't high enough, and quality's not as expected.

Sunday, 22 July 2007

How make successful landing pages

I'm going to do something I generally despise here, by posting about someone else's article. It's pretty much a bookmark for me, too. If you don't want to read the article, that's fine, I'll relieve your visitors of their cash instead.

The article is 11 ways to improve landing pages by Michael Nguyen. I keep coming back to this article year on year; it's a clear and concise guide on how to create a working landing page. Thanks Michael, you saved my memory some extra weight, and placed it in my wallet instead. If you're reading this now, you've come too far - click the link (shock) add it to your browser bookmarks.

Seriously, that's it. I've no original content in this post. Stop reading.

Thursday, 19 July 2007

pownce invitation

I don't want my pownce invitations. Link to me, and leave a comment with your email address, if you do. There are some left right now; I'll edit this post when I've run out.

Optimaliser.no

Poor old Netty - they sell undertøy (lingerie / underwear). Scroll to the bottom of their front page; see a weird thing in the bottom right? Let your mouse cursor hover over it. Their "SEO" company has placed links to all their other clients on the frontpage! I say, sue the buggers. Don't use Optimaliser.no! They're selfish scum, abusing their clients. See video below for the full insult they've levied on their unfortunate customers.

If you don't get it, they've placed links on client's homepages that detract from the client's optimal setup, instead helping Optimaliser.no's business.

Wednesday, 18 July 2007

SEO Theory

SEO Theory is run by a man who actually knows what he's talking about, instead of someone who loiters in forums and regurgitates hype. The latter forms around 90% of SEO blogs that I've found. There are plenty of published, authoritative resources around that Actually Tell You How Search Engines Work. There's no mystery there, and Michael Martinez is rather keen to rub this point in, with a little bile if neccessary. You'll even find long debunkings, exhaustive explanations, and guides on how to do things properly, which saves me having to type all this stuff. You might even find something related to (shock horror) actual proper Internet Marketing there. Go Michael! Now read his blog, you lucky, lucky people.

.htm

I love how people try to get file extensions into three letters these days, but really, why bother? Semantic value is everything - the extension .html tells you that the file contains something that someone, someone believes can be vaguely shoehorned into the category of "HTML code"; .lasso tells you it might be a Lasso file, .pl for Perl or Prolog. I can't think of any reason not to use .html - apart from of course these cases!

You haven't had your morning coffee yet.

You're using PC/MS/MR/4/whatever - DOS, which only supports 8.3 format filenames. Your webserving software also uses this. You probably use token ring cards in your network, which took four painstaking weeks to set up with TCP/IP. There are no known webservers for your OS, so you wrote one in Turbo Pascal.

You don't think there are enough TLAs in the world, and would like to create more. No, that doesn't mean Text Link Ads.

You code a special way, using HyperText Markup. It's not a Language (seriouly, the way you're going at it, it really isn't).

Your 'L' key doesn't work (not really an excuse).

The idiot who set up / mangled your webserver config doesn't believe in HTML, or leaving home, or having anyone but his mother do his laundry.

Seriously, there's no need to bother with three letter extensions. You're not using DOS / Novell / whatever, your clients aren't, your webserver (if it's a good one running on a proper OS) has never even recognised extensions and so definitely doesn't care about their lengths. If you're stuck to them, there's a spanner in your works.

Rankmoron

Take a look at this atrocity of a site - Rankmon.

What on earth is it? It looks like a three listings of sites that have a search function, ranked by some completely arbitrary system (which are apparently not at all privy to), allowing us to determine things like "top organic search sites". Yahoo! and Google must be pretty disappointed at their placings of 4th and 8th, despite being specialists in organic search for the past decade. It must be degrading indeed to have put all this effort in, and still be ranked lower than Tripadvisor or eBay as an organic search engine.

However, I think their pain must pale in comparison to the top dog - Wikipedia, with their number one spot; what was it that guy from Wikipedia said? "... its readers regard Wikipedia as a search engine. It probably comes as no surprise that my spine stiffens at that concept ...".

Further, we have this spurious "growth" statistic. What on earth does it mean? No idea, it's not explained! Perhaps we gan guess by sorting the list by "growth" - ah no, the site operators were too lazy to code that. Or even run the thing through a spell checker, for that matter.

On pane two, "Fastest growing organic search sites in world", we have even greater delights. This is the part where we find out who's going to blow Google's search back into the 20th Century. Who's number 6 on the list?

Rankmon say - watch out for plumbing supply

Plumbingsupply.com! Who'd have thought it! Google, you better watch out. (Yes, I know it's for visibility, but just look at the page title..)

We can even dig deeper; try clicking on any of the sites you see, and you'll see where they rank for what keywords, and how important this is considered.

http://www.rankmon.com/site/index/dressupgirl.net

For example, dressupgirl.net's number 1 contributor to its high place on Rankmoron is "second for girls games in MSN Live". Wow, amazing! Second place! Looking down the list, we can see other lesser achievements. What's this one I see? "first for girl games in Google". Well, that's pretty good, but obviously not as good as being Second (oh yeah!) for such a popular, prestigious and market-share hogging engine as MSN Live Search. How on earth could you reach such an awful conclusion, and publish on the web? I hope this stuff's still in beta, or alpha, or being coded in notepad between classes..

Woah there, though. Maybe a went a bit far. I think I can see a potentially useful tool here - "Competitors"! Let's take a look, and see who else is competing for little girl's dress up games. Perhaps if we can nail these guys, dressupgirl.net can rocket all the way to number 1. On MSN. So who's there?

http://www.rankmon.com/site/competitors/dressupgirl.net

Oh man, they're screwed! Wikipedia, Amazon, About.com, Yahoo!, NeXTaG, eBay. I suppose none of those guys are dedicated sources to dressing up games, and certainly should be easy to knock out of the water. But wait just a god damned minute - haven't we seen this list before?

BS exhibit (a)

So, congrats Rankmon. Your algorithm sucks.

Wednesday, 11 July 2007

Good linkbait, Bad linkbait

Good linkbait is easy to find. Bad link bait isn't, because nobody links to it. Get it?

Good linkbait - an iPhone in a blender (how sensationalist can you get?)

Bad linkbait - anything involving paris hilton. Seriously, it's so bad, it gets a javascript link and a nofollow. uurrghh

Tuesday, 10 July 2007

Pagerank

Pagerank is not important.

Pagerank is not important.

Ignore it now!

WHy isn't it important, I hear you cry. Easy.

Public (toolbar) pagerank is utterly useless, unless you're selling links. Ignore it if you're buying. There isn't much you can do as a result of your pagerank - it's not something that generates actions for you. It's a scaling of a massive log-distributed curve into a 0-10 integer scale, plus an N/A value, and up to 4 months out of date at any one time. What on earth are you going to do with that?

Although pagerank might buoy your rankings in Google, you'll never be able to define the precise effect of the pagerank of a single page.

If you do raise your pagerank, the main effect will be that you simply hve higher pagerank, and not neccessarily have higher rankings, or traffic, or money. Just a higher number in a toolbar.

It's really far too easy to pick up bought links, unless they're very well placed. Why buy a link in an effort to raise pagerank?

Google listings (shock) aren't ordered by pagerank.

Pagerank's not hugely important. I'd say, roughly 30% of your overall ranking score, based on nothing in particular. After all, it's just an openly disclosed algorithm for valuing links, inbound and outbound, based on a method for ranking academic papers; it doesn't see spam, or topics.

Please ignore forum morons who say that a PR7 link will give them PR+1. Or forum morons who ask if two PR6 links will give them PR4. These questions indicated a need for immediate education - send them to the formal description / specification of pagerank. It's published by Google themselves, so they can just shut right up.

Thanks, that's all for today.

Friday, 6 July 2007

Custom PC

Take a look at this site.

http://www.kustompcs.co.uk/

It's really badly set up for SEO. Really, really badly. For example:

Who on earth searches for "kustom pc" instead of "custom pc"? Branding based on a misspelling is a big mistake.

If somebody does actually go for a "kustom" custom pc, surely they'd go for one - not many! so why would you buy kustompcs.co.uk? The targeted term here is "custom pc".

You can buy a custom PC from kustompcs.com or kustompcs.co.uk - great! The company's bought both domains. So why have they chosen to duplicate content between the sites, instead of set up a permanent redirect? Awful.

The title tag.. oh god, the title tag. This tag is critical. Once I placed an image-based form I was working on for a client on a test server, and used the name of the campaign for the title of the page. It happened that the form got spidered; when we went to review the progress of the client's site, our marketing guys found that my creative had stolen the number 1 spot for the name of the campaign, blowing the client's site out of the water. And what have these guys done with it? "Kustom PC's"? Not only can they not spell - the grammar's awful - but there aren't /any/ keywords in there! How about "Custom PC parts and builds at kustompcs.com". Easy. Child's play, in fact.

On top of this - no meta description, no h1, it's in tables (disrupting flow and stopping important content coming to the top), the left category nav uses javascript; problems, guys. No wonder you're so lowly ranked for Custom PC ;)

Saturday, 30 June 2007

1400+ PHP Link Directories, catalogued for you

So, in advance of a little scripting, here's a list of over 1400 installations of the PHP Link Directory. I'd like to give more - there are easily 20k out there - but sadly Yahoo! and Google's search APIs don't like queries past result number 1000. In fact, they positively hurl their dummies out of their respective cradles.

The spreadsheet containing the results has the following information:

Submit URL

Cost of submission

Reciprocal link code

Whether or not the site uses a captcha

phpld-20070630.xls

It's spartan, but functional, and certainly open to further use. I was surprised by how many of these directories are wide open; further code is coming.

There's no source with this post as the abomination that created the data was truly awful, and probably still will be next time round. I even spent time in Excel updating individual entries; big to-do list entries include: add homepage pr, make backlink scraping code more accurate, de-dupe by domain and not hostname, add express submission price column, add flag to detect if unique sessions are required for submission, and detect PHPld version.

This list's probably very abusable. For example, those lovely chaps at the PHP Link Directory could abuse it to check that everyone's bought a license. In fact, this list should only contain the cheapskates that haven't paid to remove the link to the software creators, but hey, who am I to judge.

Friday, 29 June 2007

Yahoo! Stores - hard coded duplicate content

I read this post on the Yahoo! Stores blog. The Yahoo! stores blog is there to scratch the surface of online marketing for merchants new to the scene; it's probably great for giving people introduction to subjects, and leads to follow up, but for old dogs there's not a huge amount of new information. It certainly lets us see that Yahoo!'s helping its merchants, and that they're doing well from their help and the amazing Yahoo! store system. Anyway, in this SEO-oriented posts, Karl Ribas brought up some valid points, including a little intro to duplicate content:

Duplicate content was a pretty big concern at SMX, as having non-unique content on your website is quickly becoming a bigger and bigger problem for online merchants. ... From a search engine’s point-of-view, their one and only goal is to serve a variety of quality results per query, not multiple versions of the same content

Fantastic advice!

Yahoo! stores have cleverly helped us out here. As we all know, visiting the root URL - / - of a domain should really show the homepage; no redirects, no frames, just a plain and easy HTTP 200 response with some good content. And, to their credit, Yahoo! have managed this millimetre scale hurdle.

Now, we also know that in most circumstances, it's great to have a link to your homepage on every page in your site, right? After all, it's the most important page, and where people like to navigate from - so great to provide a link to in case they get lost.

Yahoo! have cottoned on to this little nugget of wisdom, and kindly added a link named "home" to the homepage of a site on every one of its sub-pages. Well, kind of. In fact, it's a hard-coded link, using the link text "home" (also hard-coded - heaven forbid anyone decides that using all-lower-case looks awful, or would prefer slightly less heterogenous link text here):

Dear Zack,

I think you might've got your wires crossed. I'd like to change the
small H is my yahoo stores / store editor system, I don't really mind
about yahoo web hosting. There are options to change all the other
tabs, but the name for the "home" page seems kind of elusive, even
though intuitively I expected them to be in the same place. Could you
check and come back to me ?

Hello Leon,

Thank you for contacting us.

It's not possible to change the 'H' in the navigation bar because the
links are hard-coded into the store software.

We apologize for the inconvenience.

This locked-down and widely shown link points to some strange, new page that's mentioned nowhere else in the store - to "/index.html".

<ul id="nav-general"><li><a href="index.html">home</a></li>

"What's this new-fangled index.html?" I hear you cry. "Where's my homepage?". Well, don't worry! Yahoo!'s kindly duplicated your homepage content for you onto this new URL. So search engines can NOT ONLY get your stuff at the root page, as standard, but now you'll find your link weight directly split between links to / - added by you - and links to /index.html - forcibly inserted by Yahoo!.

What do Yahoo! think of this? Can we get it changed?

Hello Leon,

Thank you for writing to Yahoo! Store Support.

Although this feature is not currently available in the Yahoo! Store
software, we do consider your feedback regarding the features you'd like
to see a very important part of how our development team decides which
features to add to the Yahoo! Store software.

We do not currently have an estimated time for if or when this feature
or any other features may be released. However, we do release a regular
newsletter to all of our merchants at the following link:

http://www.insightsforum.com/

You can see previous copies of the newsletters at:

http://store.yahoo.com/vw/merchant-newsletter.html

We appreciate your feedback. We've forwarded your comments to our
development team for review.

We believe this solution should resolve your issue, if it still
persists, please call us at 1-866-800-8092.

Please do not hesitate to reply if you need further assistance.

Regards,

Andre

Thanks Andre! I'm not sure what led you to believe it should resolve my issue, I'm fairly sure you just told me that it wasn't resolvable. Have you tried visiting http://www.insightsforum.com/ ? I'll save you the trouble:

Bad Request (Invalid Hostname)

Well, maybe the archive mentioned has something useful. Let's take a look at the last post:

February 2005-
Note: Insights has switched formats. While you will continue to receive monthly newsletters, all articles are archived on the Insights Forum site rather than a single HTML file.

Thanks Yahoo!. That's pretty good.

Will you stop duplicating my content soon please?

Microsoft AdCenter $50 free clicks

For new users only; a $5 account creation deposit is required. Expires in about 36 hours, so good luck.

http://www.startadcenter.com/MulttipTrav/

Teoma / Ask scraping code

Alas, Teoma's search API is down. If it ever returns, you can find great Teoma Search API documentation. For the meantime, here's code to do scraping for you:


/////
// fetches URLs from Teoma results for the query $query
// string $query is the search query
// int $querysize is the number of results needed
// int $offset says where the results should begin from (put 11 to get results 11-20)

function fetchTeomaResults($query, $querysize, $offset) {

    $page = 1 + intval($offset / 10);
    $requestUrl = 'http://www.ask.com/web?q='.urlencode($query).'&page='.$page;

    $oldua = ini_set('user_agent', 'Please bring back http://xml.teoma.com/.');
    $response = file_get_contents($requestUrl);
    ini_set('user_agent', $oldua);

    preg_match_all('|<a id="r[0-9]+_t" href="(.+?)"|', $response, $matches);

    $results = array_slice($matches[1], 0, $querysize);

    return $results;
}

It's dirty, nasty, and many other mean things. For example,

anyone sane wouldn't enable fopen wrappers;

the user agent is a little non-standard;

there's no HTTP From: header;

it's quite possibly against Ask TOS;

$offset should be a factor of ten, because I can't be bothered writing preference setting code and controlling the number of results per page isn't controllable via URL (as far as I can see)

scraping is never a permanent solution

- and other things. Bring back xml.teoma.com!

Use of fetchTeomaResults is usually wrapped up by another function, for accessing SERPs in general and aggregating results. The function signature conforms to this - else we could just specify a page instead of an offset.

Enjoy.

SEO Rant