Regular expressions can be very confusing at times, but they are extremely powerful to use when coding. It makes validation of user input a lot easier.
The example below will show you have to use a simple regular expression to check whether a domain name is valid or not.
$url = "http://www.phpdevelopment.co.za/"; if (preg_match('/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i', $url)) { echo "Your url is fine."; } else { echo "Your url is not fine."; }
I’ve made a few website scrapers over the last few months, and have enjoyed it very much. Something that I need quite a bit was to extra all objects (images, css files, etc) that a website refers to.
This is the function I used to get images from a HTML document.
function get_images($html) { $images = array(); preg_match_all('/(img|src)\=(\"|\')[^\"\'\>]+/i', $html, $media); unset($html); $html=preg_replace('/(img|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]); foreach($data as $url) { $info = pathinfo($url); if (isset($info['extension'])) { if (($info['extension'] == 'jpg') || ($info['extension'] == 'jpeg') || ($info['extension'] == 'gif') || ($info['extension'] == 'png')) array_push($images, $url); } } return $images; }
This function takes as input the HTML content as a string. You can get this using cURL or file_get_contents. It returns an array of all the images it found on that page.