Faking your User Agent with cURL

It’s very often needed to fake your User Agent, not to do any untoward, but to test various aspects of your website. Perhaps you want your website to display differently when using Internet Explorer 7 than when using Firefox 3.5. For whatever reason you have, here is a simple solution I use for my web crawlers.

$userAgent = "Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_VERBOSE, false);
$html= curl_exec($ch);

With a bit of coding you can fill up an array with all the user agent strings you can find on the Internet, and randomly use them as the user agent when using cURL.

How much memory your script is using

An invaluable function is memory_get_usage. This function returns the amount of memory your script is being used.

As an example check how much memory giving a value to a variable will use:

  //start memory
  $mem_start = memory_get_usage();
 
  $s = "This is a test variable.";  
 
  //end memory
  $mem_end = memory_get_usage();
 
  //total used
  $used = $mem_end - $mem_used;
  echo $used;

You can use this to check if there are any potential memory problems with your script.

Quick and dirty template system

Before I knew about Smarty, I developed me own way of having HTML templates for the projects I developed. Admittedly, I still use it even though Smarty is so much more powerful. When I don’t need the versatility of Smarty, I just use my own method.

I started off by having a template directory with HTML files in it. These files consist of all the HTML code I want, and has placeholders to where content will be.

<html>
<body>
<div>%header%</div>
<div>
  <div>%menu%</div>
  <div>%content%</div>
</div>
<div>%footer%</div>
</body>
</html>

As you can see, I have 4 place holders there, %header%, %menu%, %content% and %footer.

Now all I have to do is use str_replace to replace the place holders with content.

Here is an example of doing that

  $html = file_get_contents("template.html");
 
  $menu = "<ul><li>Option 1</li><li>Option 2</li></ul>";
  $header = "<h1>This is the Header</h1><br />%menu%";
  $content = "<p>This is some content.</p>";
  $footer = "<small>This is the footer</small>";
 
  $html = str_replace("%header%",$header,$html);
  $html = str_replace("%menu%",$menu,$html);
  $html = str_replace("%content%",$content,$html);
  $html = str_replace("%footer%",$footer,$html);
  echo $html;

As you can see, I make use of the placeholder within another placeholder as well. So the %menu% is also replace within the $header variable. Just note if you want to use it like that, you need to order the str_replace so that it will get replaced.

The resulting HTML will be:

<html>
<body>
<div><h1>This is the Header</h1><br /><ul><li>Option 1</li><li>Option 2</li></ul></div>
<div>
  <div><ul><li>Option 1</li><li>Option 2</li></ul></div>
  <div><p>This is some content.</p></div>
</div>
<div><small>This is the footer</small></div>
</body>
</html>

register_globals evilness

If you’re a PHP developer you know all about the register_globals directive and all the evilness that comes with it. You can probably skip this post, because I’d just like to explain to the rest of the people what it is about.

In PHP you have superglobal arrays, which stores information that is being passed to you either by the browser or the server. These arrays are:

  • $_COOKIE – stores information about cookies from the browser
  • $_GET – stores form information
  • $_FILES – stores information on files a user wants to upload to your server
  • $_POST – stores form information
  • $_SERVER – stores various information about the server
  • $_SESSION – stores session data

What register_globals does when it is on is create a variable for each of the items in those arrays. I.e. if you have $_GET['var'] it will create $var automatically.

Look at the following examples.

HTML Form:

<form method="post" action="script.php">
  <input type="text" name="var">
  <input type="submit">
</form>

If register_globals is on, you can access it like this:

echo "The value of the "var" control is $var";

With register_globals off, you need to access it like this:

echo "The value of the "var" control is ".$_POST['var'];

Here is example of how someone can exploit a script when register_globals is enabled, courtesy of Dreamhost:

$admin['user'] = 'foo';
$admin['pass'] = 'bar';
if($admin['user'] == $_GET['username'] AND $admin['pass'] == $_GET['password']) {
  /* Give administrator access */
}

When you first look at this, it all looks fine. It’s check for the username and the password, so it must be fine? Wrong.

Let’s say you access the page as page.php?admin=asdf

  • Because register_globals is on, $admin = ‘asdf’, because $_GET['admin'] = ‘adsf’
  • $admin['user'] = ‘foo’; sets the first char of ‘asdf’ to ‘f’
  • $admin['pass'] = ‘bar’; sets the first char of ‘fsdf’ to ‘b’
  • $admin['user'] == $_GET['username'] tests if ‘b’ == $_GET['username']
  • $admin['pass'] == $_GET['password'] tests if ‘b’ == $_GET['password']

To get administrator access to this page, you simply access it as page.php?admin=asdf&username=b&password=b. See how that can affect you badly? So repeat after me, register_globals are evil.

Another example:

session_start();
if ($logged_in) {
  /* give access to things */
}

In the above example, you can make use of the $_SESSION['logged_in'] variable that automatically makes $logged_in available. Now if you call that page as page.php?logged_in=1 $_GET['logged_in'] will also register $logged_in and over write it with value of 1. Note that this is dependent on the variable_order directive which dictates which value will overwrite which one, but I hope you can see that register_globals can be very evil.

Another reason to start avoiding to use register_globals is that in PHP6 there is no support for register_globals. You cannot turn it on, because it is not there.

Get Metatags of a Website

There are many ways to get the metatags from a website, and I’ve played around with a few ideas, until I got the following function working nicely. This seems to be the most accurate and catering for all the different ways people make use of metatags. If you see a flaw, or if it’s not working for a website you’re testing it on, I’d be happy to know about it so that I can fix it.

function parsetags($url) {
    $contents = file_get_contents($url);
    $result = false;
    $title = null;
    $metaTags = null;
    preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
    if (isset($match) && is_array($match) && count($match) > 0)  {
      $title = strip_tags($match[1]);
    }
    preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
    if (isset($match) && is_array($match) && count($match) == 3) {
      $originals = $match[0];
      $names = $match[1];
      $values = $match[2];
      if (count($originals) == count($names) && count($names) == count($values)) {
        $metaTags = array();
        for ($i=0, $limiti=count($names); $i < $limiti; $i++) {
          $metaTags[strtolower($names[$i])] = array (
            'html' => htmlentities($originals[$i]),
            'value' => $values[$i]
          );
        }
      }
    }
 
    $result = array (
      'title' => $title,
      'metaTags' => $metaTags
    );
 
    return($result);
  }

This returns an array in the form of:

  $result['title'] => 'The Title'
  $result['metaTags']['keywords']['value'] => 'keyword1, keyword2, keyword3'
  $result['metaTags']['keywords']['html'] => '&lt;meta name=&quot;keywords&quot; content=&quot;keyword1, keyword2, keyword3&quot; /&gt;' - HTML version
  $result['metaTags']['description']['value'] => 'This is the description.'
  $result['metaTags']['description']['html'] => '&lt;meta name=&quot;description&quot; content=&quot;This is a description.&quot; /&gt;' - HTML version

Inline StdClass Object Creation

If you’re looking to create StdClass Objects inline, you’d probably normally do this:

$my_class = new StdClass;
$my_class->variable1 = "test1";
$my_class->variable2 = "test2";

You can put all that into one line using PHP’s casting operator

$my_class = (object) array("variable1" => "test1", "variable2" => "test2");

When a function doesn’t exist

Sometimes we develop PHP scripts that requires other 3rd party libraries or makes use of PHP4 or PHP5 specific functions. What we then forget is that when we upload it to a server with PHP4 on and we developed for PHP5, we’re going to end up with a lot of “function not found” errors.

Yes, PHP4 is old news, and everyone should be using PHP5 by now, but there are still a lot of times I come across the need for PHP4 development.

What I’ve started doing is making PHP4 equivalent functions. But this is not what this post is about, what I want to tell you about is the function_exists function.

if (function_exists(’FUNCTION_CALL’)) {
  FUNCTION_CALL();
}

You can define your own function by doing this:

if (!function_exists(’FUNCTION_CALL’)) {
  function FUNCTION_CALL() {
    echo "this function does something";
  }
}

The above will only create your own function is the function doesn’t already exist.

File download through your script

There are many uses in being able to have someone download a file using your script. Security and user verification just being one of them.

Quick way to do this:

$file = file_get_contents('file.zip');
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename="file.zip"');
header('Content-Length: ' . strlen($file));
echo $file;

This will send the file to the person’s browser and download the file. You can now go ahead and add various checks before you actually start sending the data to them.

RSS feeds made simple with SimpleXML

Getting RSS and XML to work seems to always be a pain, at least that is how it was when I started working with it. All changed when SimpleXML was introduced. Up until I learned about SimpleXML, I always used my own little script that converted a XML feed into a associative array.

With SimpleXML though, it’s a lot simpler to work with feeds.

All you need is the following lines of code:

  $filename = "http://www.phpdeveloping.co.za/feed";
  $feed = simplexml_load_file($filename);
  var_dump($feed);

From here you will see $feed containing all the information you can get from the feed, all neatly structured into an array.

You access the various items through:

$feed->channel->item[n]

As an example:

echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->link;
echo $feed->channel->item[0]->pubDate;

You can now iterate through all the items doing whatever you want to the data.

Credit Card validation

At one point or another everyone gets to a project where they’ve got to get credit card details from a client to process a payment. The security aspect of this is a whole book on it’s own, but that’s not the point of this post. I’ve always been fascinated how ID numbers (or social security numbers) work and how they can be validated.

I decided a few months ago to do some research into writing a simple credit card verification function. Obviously this doesn’t cater for the fact that it might be a stolen or canceled card, but it will give you an idea if the credit card number, validation numbers and expiration date are correct.

First I started by finding out what the patterns are for the varies credit card companies. I found this on Sitepoint. Although he has a complete class on that site, what is the fun in it all if you don’t write your own code to understand it a bit better?

Apart from passing the Mod 10 algorithm, each credit card company has further checks:

  • Mastercard: Must have a prefix of 51 to 55, and must be 16 digits in length
  • Visa: Must have a prefix of 4, and must be either 13 or 16 digits in length
  • American Express: Must have a prefix of 34 or 37, and must be 15 digits in length
  • Diners Club: Must have a prefix of 300 to 305, 36, or 38, and must be 14 digits in length
  • Discover: Must have a prefix of 6011, and must be 16 digits in length
  • JCB: Must have a prefix of 3, 1800, or 2131, and must be either 15 or 16 digits in length

Ok, so what is the Mod 10 Algorithm? It is a algorithm that will be able to spot errors of the digits in the credit card number. It’s not specifically made for credit card numbers, but has got other uses. Check out Wikipedia.org for more information. To explain it, let’s look at the steps thereof.

Step of the Mod 10 Algorithm for the number 4111111111111111:

  1. You start off by reversing the number so it becomes 1111111111111114
  2. Now multiply each 2nd digit by 2
    1111111111111114 => 1212121212121218
  3. Add all the resulting digits together
    1+2+1+2+1+2+1+2+1+2+1+2+1+2+1+8 = 30
  4. Take that number and divide it by 10. It goes 3 times into 30 with 0 remainder. If the remainder is 0 that means it passes.

This is also called a MOD operator. The MOD operator divides a number by another number and then gives you the remainder.

Those are the 2 tests a credit card number must pass. First checking if it’s got correct digits according to the type of credit card and then checking if the digits pass the Mod 10 algorithm.

Let’s look at the code for this.

function mod10_check($number) { 
    /* validate mod10 check on the number */
    $rcode = strrev($number);
    $csum = 0;
    for ($i = 0; $i < strlen($rcode); $i++) {
      $current_num = intval($rcode[$i]);
      if($i & 1) {  /*every 2nd digit - odd*/
        $current_num *= 2;
      }
 
      /* split the digits and add up */
      $csum += $current_num % 10; 
      if ($current_num >  9) {
        $csum += 1;
      }
    }
 
    if ($csum % 10 == 0) {
      return true;
    } else {
      return false;
    }
  }
 
  function validate_ccnumber($ccnumber) {
    /* validate; return card type if valid. */
    $c_type = "";
    $c_reg = array(
      "/^4\d{12}(\d\d\d){0,1}$/" => "visa",
      "/^5[12345]\d{14}$/" => "mastercard",
      "/^6011\d{12}$/" => "discover",
      "/^30[012345]\d{11}$/" => "diners",
      "/^3[68]\d{12}$/" => "diners",
    );
 
    foreach ($c_reg as $reg => $type) {
      if (preg_match($reg, $ccnumber)) {
        $c_type = $type;
        break;
      }
    }
 
    if (!$c_type) {
      //card type not found
      return false;
     }
 
    if (mod10_check($ccnumber)) {
      return $c_type;
    } else {
      return false;
    }
  }

I’ve split this up into 2 functions, namely mod10_check and validate_ccnumber. You can incorporate them into one function, I just found it easier this way so that I can reuse mod10_check later on separately.

So first I make use of regular expressions to determine which card type it is. I’ve only made use of the well known cards here, but it is easy to add additional card regular expressions as you see fit.

If the card type is found and checks out okay, we do the final check which is the Mod10 algorithm. If that also checks out, we return the card type from the function.

Sample usage:

if ($result = validate_ccnumber('4111111111111111')) { 
  echo $result;
} else {
  echo "invalid";
}