• 0

php scraping regex


Question

Im using this code to get only the data and it returns nothing could someone help?

$url = "http://www.athex.gr/content/en/announcements/companiespress/press_date.asp?dt=28/01/2011";

$input = @file_get_contents($url) or die("Could not access file: $url");

$regexp = '<td class="TDtitle">\s+<a name="(\d)+"><\/a>\s+<b>\s+<a href="(.)+">(.)+<\/a> : (.)+';

if(preg_match_all('/$regexp/si', $input, $matches, PREG_SET_ORDER)) {

foreach($matches as $match) {

echo '". $match[0] . "' / '" . $match[1] . "' / '" . $match[2] . "' / '" . $match[3] . "<br>"';

}

} else {

echo "no matches";

}

Link to comment
Share on other sites

3 answers to this question

Recommended Posts

  • 0

Try this. Its a lot less complicated.

$data = file_get_contents('http://www.athex.gr/content/en/announcements/companiespress/press_date.asp?dt=28/01/2011');
$html = new DOMDocument();

@$html-&gt;loadHTML($data);

foreach($html-&gt;getElementsByTagName('td') as $td):
	if ($td-&gt;getAttribute('class') === 'TDtitle'):
		foreach ($td-&gt;childNodes as $ch):
			$val = trim($ch-&gt;nodeValue);
			if (!empty($val)):
				echo  $val . '&lt;br&gt;';
			endif;
		endforeach;
	endif;
endforeach;

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.