<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PHP Software Developing &#187; regular expressions</title>
	<atom:link href="http://www.phpdeveloping.co.za/tag/regular-expressions/feed" rel="self" type="application/rss+xml" />
	<link>http://www.phpdeveloping.co.za</link>
	<description>for the love of PHP Development</description>
	<lastBuildDate>Tue, 29 Sep 2009 15:38:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Find images in a HTML document</title>
		<link>http://www.phpdeveloping.co.za/regular-expression/find-images-in-a-html-document.html</link>
		<comments>http://www.phpdeveloping.co.za/regular-expression/find-images-in-a-html-document.html#comments</comments>
		<pubDate>Tue, 28 Jul 2009 10:44:11 +0000</pubDate>
		<dc:creator>chris</dc:creator>
				<category><![CDATA[Regular Expression]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://www.phpdeveloping.co.za/?p=95</guid>
		<description><![CDATA[I&#8217;ve made a few website scrapers over the last few months, and have enjoyed it very much.  Something that I need quite a bit was to extra all objects (images, css files, etc) that a website refers to.
This is the function I used to get images from a HTML document.

  function get_images&#40;$html&#41;
  [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve made a few website scrapers over the last few months, and have enjoyed it very much.  Something that I need quite a bit was to extra all objects (images, css files, etc) that a website refers to.</p>
<p>This is the function I used to get images from a HTML document.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;">  <span style="color: #000000; font-weight: bold;">function</span> get_images<span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    <span style="color: #000088;">$images</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">preg_match_all</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/(img|src)\=(\&quot;|\')[^\&quot;\'\&gt;]+/i'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$html</span><span style="color: #339933;">,</span> <span style="color: #000088;">$media</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">unset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$html</span><span style="color: #339933;">=</span><span style="color: #990000;">preg_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/(img|src)(\&quot;|\'|\=\&quot;|\=\')(.*)/i'</span><span style="color: #339933;">,</span><span style="color: #0000ff;">&quot;<span style="color: #006699; font-weight: bold;">$3</span>&quot;</span><span style="color: #339933;">,</span><span style="color: #000088;">$media</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000088;">$info</span> <span style="color: #339933;">=</span> <span style="color: #990000;">pathinfo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #990000;">isset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$info</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'extension'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$info</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'extension'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'jpg'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span>
	   <span style="color: #009900;">&#40;</span><span style="color: #000088;">$info</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'extension'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'jpeg'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span>
	   <span style="color: #009900;">&#40;</span><span style="color: #000088;">$info</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'extension'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'gif'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span>
	   <span style="color: #009900;">&#40;</span><span style="color: #000088;">$info</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'extension'</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'png'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
	   <span style="color: #990000;">array_push</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$images</span><span style="color: #339933;">,</span> <span style="color: #000088;">$url</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000088;">$images</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>This function takes as input the HTML content as a string.  You can get this using cURL or file_get_contents.  It returns an array of all the images it found on that page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.phpdeveloping.co.za/regular-expression/find-images-in-a-html-document.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
