In various circumstances I’ve needed to have PHP pull the information on HTML headers that a web server returns. I’ve found cURL to provide an excellent solution to this.
$htmlheader = ""; function html_header($url) { global $htmlheader; $htmlheader = ""; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'readHeader'); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_exec ($ch); curl_close ($ch); return $htmlheader; } function readHeader($ch, $header) { global $htmlheader; $htmlheader .= $header; return strlen($header); }
The variable will contain something like this:
HTTP/1.1 200 OK Date: Thu, 23 Jul 2009 20:56:23 GMT Server: Apache X-Powered-By: PHP/5.2.0-8+etch13 Content-Type: text/html; charset=UTF-8 Via: 1.1 bc1-rba Transfer-Encoding: chunked Connection: Keep-Alive Age: 0
There are two functions above. html_header is just the function to does the call to the URL and collects the header information. The header information is actually captured by the readHeader function.
You need to return the length of the header back to the CURLOPT_HEADERFUNCTION call, which is where the need for the global variable comes in. There might be a more elegant way of doing this, perhaps rather building this function into a class of it’s own. But I hope the above shows you how to get the information you require.
It’s very often needed to fake your User Agent, not to do any untoward, but to test various aspects of your website. Perhaps you want your website to display differently when using Internet Explorer 7 than when using Firefox 3.5. For whatever reason you have, here is a simple solution I use for my web crawlers.
$userAgent = "Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_VERBOSE, false); $html= curl_exec($ch);
With a bit of coding you can fill up an array with all the user agent strings you can find on the Internet, and randomly use them as the user agent when using cURL.