Archive for 'Regular Expression'

Validate domain name

Regular expressions can be very confusing at times, but they are extremely powerful to use when coding. It makes validation of user input a lot easier.

The example below will show you have to use a simple regular expression to check whether a domain name is valid or not.

  $url = "http://www.phpdevelopment.co.za/";
  if (preg_match('/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i', $url)) {
    echo "Your url is fine.";
  } else {
    echo "Your url is not fine.";
  }

Find images in a HTML document

I’ve made a few website scrapers over the last few months, and have enjoyed it very much. Something that I need quite a bit was to extra all objects (images, css files, etc) that a website refers to.

This is the function I used to get images from a HTML document.

  function get_images($html)
  {
    $images = array();
    preg_match_all('/(img|src)\=(\"|\')[^\"\'\>]+/i', $html, $media);
    unset($html);
    $html=preg_replace('/(img|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
    foreach($data as $url) {
      $info = pathinfo($url);
      if (isset($info['extension'])) {
        if (($info['extension'] == 'jpg') ||
	   ($info['extension'] == 'jpeg') ||
	   ($info['extension'] == 'gif') ||
	   ($info['extension'] == 'png'))
	   array_push($images, $url);
      }
    }
    return $images;
  }

This function takes as input the HTML content as a string. You can get this using cURL or file_get_contents. It returns an array of all the images it found on that page.