• 0

Can't PHP screen scrape


Question

Hey guys! On my website you all helped me display the text from another website. When the source website changed, you all helped me fix it so it would work again. It's broken once again so I decided I would try to learn it on my own but I can't figure it out. Could someone post the code and then explain what's going on by any chance? :cry:

So this is the site and I would like to display the downloads from that webpage. I would like it to load the downloads everytime someone loads MY webpage so the downloads are always current.

https://addons.mozilla.org/en-US/firefox/addon/strike-49896/

Thanks again Neowinians!! :yes:

Link to comment
https://www.neowin.net/forum/topic/973286-cant-php-screen-scrape/
Share on other sites

16 answers to this question

Recommended Posts

  • 0

It is very easy using php oop.

$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;

  • 0

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0

YQL can make this easier...

Check out %27"]this query.

<?php
try{
  $response = new SimpleXMLElement(
    "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D    null,
    true
  );
}catch(Exception $e){
  $response = null;
}

$count = '0';

if(null !== $response && 1 === count($response->results)){
  $count = (string)$response->results->strong;
}

echo $count;

It's untested, so you may have to tweak.

  • 0

Yeah I tested it after seeing your post and indeed it does not work. file_get_contents does not support https. There is some tweaks to get it to work but at this point it is not worth the effort. Anthony's approach is much better.

  On 04/02/2011 at 03:48, thatguyandrew1992 said:

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0
  On 04/02/2011 at 11:54, AnthonySterling said:

YQL can make this easier...

Check out %27"]this query.


$count = '0';

It's untested, so you may have to tweak.

I appreciate your help very much. I tried it out. But that line is what defines the number that appears instead of what is from the mozilla site, and I'm not sure how to fix it :blush:

  • 0
  On 04/02/2011 at 22:04, sweetsam said:

I tried what Anthony posted and it works perfectly and the output matches the number on Mozilla's website. You might wanna check the code and make sure there are not line breaks where there shouldn't be any.

You were right! Copying it from neowin caused some line breaks! Thanks for all the help guys!!! :D

  • 0
  On 05/02/2011 at 02:40, sweetsam said:

I would recommend storing the data locally and updating it once a day. Do not load the data remotely every time your page loads because it might get you banned.

I see. I don't know how to do that though. :unsure:

  • 0

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

  • 0
  On 05/02/2011 at 17:23, sweetsam said:

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

Now I would use the code for multiple pages on my website. Should I change count.txt to countstrike.txt. Then have countsky.txt and countroyalblue.txt etc on my other pages?

  • 0

Elaborating on SweetSam's example, you could create two handy-dandy functions to help.

function get_stats($id){
  try{
    $response = new SimpleXMLElement(
      "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2F" . $id . "%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D'",
      null,
      true
    );
  }catch(Exception $e){
    $response = null;
  }

  $count = '0';

  if(null !== $response && 1 === count($response->results)){
    $count = (string)$response->results->strong;
  }

  return $count;
}

function get_cached_stats($id, $lifetime = 3600){
  $file = $id . '_stats_cache.txt';
  if(is_readable($file) && time() - filemtime($file) < (int)$lifetime){
    return file_get_contents($file);
  }
  $data = get_stats($id);
  file_put_contents($file, $data);
  return $data;
}

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • One can also use fastFetch. Gives you a quick hardware spec in a CMD or PowerShell console: https://imgur.com/a/fastfetch-kC0E3xT
    • Looking back at your message, you mention "less harsh terms." If by that you mean that requiring the pentagon to pass an audit is harsh, then to that I would say, that was already passed into law back in 1990. I wasn't planning to debate if that law was reasonable or not. If you are calling it harsh, then that is your opinion and we can disagree, but our options matter little considering it passed long ago. If you don't think the law itself is harsh, then I really can't understand how a reasonable enforcement of said law could be considered harsh either.
    • I have a strong feeling more than one store did that, and idk who is to blame more, Nintendo's terrible choice of packaging layout, or the sellers. I can't wait to see some class action lawsuit if Nintendo or said retailer doesn't step up. The question is what wonderful name we can come up with for the thing. Screengate(Staingate) already happened with Apple, so that's taken.
    • I'm definitely like this. I carry my rucksack everywhere with me and feel quite naked without it on my back. What if I need my laptop? What if the guys want to play a game of Magic? What if I need a plaster, or rubber gloves? Or my lockpicks and binoculars? Or my chessboard set? Or my portable power bank for my phone? I can't go anywhere without it, it's a life saver.
    • Bethesda's Deathloop is free to claim on the Epic Games Store this week by Pulasthi Ariyasinghe The Epic Games Store's Mega Sale promotion is in its final week, and that means mystery giveaways are soon ending too. For the finale, Epic Games has called in something big from Bethesda, with the Arkane-developed time loop adventure Deathloop coming in as the latest freebie for PC gamers. Alongside it, the indie adventure Ogu and the Secret Forest is also free. Developed by Arkane Lyon and published by Bethesda, Deathloop comes in touting an action-packed campaign involving plenty of time travel shenanigans. The game is set on the mysterious island of Blackreef, where two rival assassins, Colt and Julianna, are trapped in a time loop. The player, as Colt, has to figure out how to eliminate eight targets within a single day to escape the loop. Each of the assassinations can be taken care of in many ways, including stealth, traps, accidents, or simply going in guns blazing. Aside from gunplay, the title also makes use of supernatural systems very similar to the studio's Dishonored franchise, letting players teleport, go invisible, use telekinesis, and more. There is a multiplayer twist here too, where players, as Julianna, can invade the campaigns of others to take the role of the rival assassin, flipping the tables on the main character and his plans. As for Ogu and the Secret Forest, this is a 2024-released indie adventure featuring hand-drawn characters and various types of puzzles. The 2D game involves befriending characters across a fantasy land as baby Ogu, with plenty of exploration, puzzle solving, and boss battles available. The Deathloop and Ogu and the Secret Forest giveaways are now live on the Epic Games Store for all PC gamers. The promotion is slated to last until June 12, giving you seven days to claim a copy for your library permanently. While the summer mystery giveaways are ending, regular freebies will continue to arrive from the Epic Games Store. When this one comes to an end on Thursday, the SEGA-published humorous hospital simulation entry Two Point Hospital is incoming as the next giveaway.
  • Recent Achievements

    • Week One Done
      jbatch earned a badge
      Week One Done
    • First Post
      Yianis earned a badge
      First Post
    • Rookie
      GTRoberts went up a rank
      Rookie
    • First Post
      James courage Tabla earned a badge
      First Post
    • Reacting Well
      James courage Tabla earned a badge
      Reacting Well
  • Popular Contributors

    1. 1
      +primortal
      407
    2. 2
      +FloatingFatMan
      181
    3. 3
      snowy owl
      176
    4. 4
      ATLien_0
      171
    5. 5
      Xenon
      135
  • Tell a friend

    Love Neowin? Tell a friend!