There are many ways to get the metatags from a website, and I’ve played around with a few ideas, until I got the following function working nicely. This seems to be the most accurate and catering for all the different ways people make use of metatags. If you see a flaw, or if it’s not working for a website you’re testing it on, I’d be happy to know about it so that I can fix it.
function parsetags($url) { $contents = file_get_contents($url); $result = false; $title = null; $metaTags = null; preg_match('/<title>([^>]*)<\/title>/si', $contents, $match ); if (isset($match) && is_array($match) && count($match) > 0) { $title = strip_tags($match[1]); } preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); if (isset($match) && is_array($match) && count($match) == 3) { $originals = $match[0]; $names = $match[1]; $values = $match[2]; if (count($originals) == count($names) && count($names) == count($values)) { $metaTags = array(); for ($i=0, $limiti=count($names); $i < $limiti; $i++) { $metaTags[strtolower($names[$i])] = array ( 'html' => htmlentities($originals[$i]), 'value' => $values[$i] ); } } } $result = array ( 'title' => $title, 'metaTags' => $metaTags ); return($result); }
This returns an array in the form of:
$result['title'] => 'The Title' $result['metaTags']['keywords']['value'] => 'keyword1, keyword2, keyword3' $result['metaTags']['keywords']['html'] => '<meta name="keywords" content="keyword1, keyword2, keyword3" />' - HTML version $result['metaTags']['description']['value'] => 'This is the description.' $result['metaTags']['description']['html'] => '<meta name="description" content="This is a description." />' - HTML version