Sunday, 19 July 2009

How to view HTTP headers

HTTP headers are the part of the webpage you don't see in your browser (usually); the special data describing the page to your browser software. HTTP headers are wehere you'd put redirects, information about whether a file is a PNG / HTML page / RAR archive, and where you say if the browser should open the file or present it as a download - as well as many other things. The full details are in RFC2616.

I'm going to cover three methods of dealing with headers today - all quick, simple and powerful.

Perl's LWP "HEAD" command

This is a commonly-found *nix command line tool, very simple in its operation, and likely already on your system. To use it, you simply enter "HEAD " at the command prompt, where is the full address (e.g. including http:) that you want to check.

If you don't have it, you can install this as root by whipping up a CPAN console (perl -MCPAN -e shell) and running i LWP::Simple - then just follow the prompts, and opt to install the GET/HEAD aliases.

Quick and simple - but it won't report on redirects, just the final page, and you need root to install it in most circumstances.

Tamper Data

You can use Firefox to examine headers, alter HTTP requests, and find out precisely what every page is doing with this masterpiece of a plugin. If you're using LiveHttpHeaders, I suggest you immediately exchange it for Tamper Data - just as light, and much more powerful. To use this, simply enable Tamper Data in Firefox, click "Start tampering" in the new window, and then visit the page you're interested in; you don't want to go playing with the server immediately, so simply accept the request, and ignore further requests. Tada - more information than you'll ever need - including full request and response headers for everything on the page! This is also great for finding out FLV URLs and other things hidden by Flash apps.

Tamper Data has many additional functions, including page load optimisation, and far too much to cover here. Just check out this tutorial for a taster.

Command-line cURL header script

For a very verbose, quick, minimal and to the point solution, create a file called header somewhere on your *nix server, and fill it thusly (perhaps amending the PHP executable path):

#!/usr/bin/env php
$url = $argv[1];

function url_header($url) {
global $useragent;
global $timeout;
if ($useragent == "") {$useragent = "Mozilla 8.0 +http://seorant.blogspot.com";}
if ($timeout == "") {$timeout = 20;}
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_NOBODY, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_MUTE, 1);
$result = curl_exec ($ch);
curl_close($ch);
return $result;
}



$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040614 Firefox/0.8';

echo $url."\n\n"; flush();
echo url_header($url);
?>



Make the file executable (chmod u+x header) and run with ./header

This will include the full details of redirects as and when they're performed, and should get a reponse more akin to that a browser receives when compared to the LWP method, which uses a different useragent string.

1 comment:

r00m said...

Umm.. "curl -I www.url.com" ?

 
Marketing & SEO Blogs - Blog Top Sites sitemap