• 0

Can't PHP screen scrape


Question

Hey guys! On my website you all helped me display the text from another website. When the source website changed, you all helped me fix it so it would work again. It's broken once again so I decided I would try to learn it on my own but I can't figure it out. Could someone post the code and then explain what's going on by any chance? :cry:

So this is the site and I would like to display the downloads from that webpage. I would like it to load the downloads everytime someone loads MY webpage so the downloads are always current.

https://addons.mozilla.org/en-US/firefox/addon/strike-49896/

Thanks again Neowinians!! :yes:

Link to comment
https://www.neowin.net/forum/topic/973286-cant-php-screen-scrape/
Share on other sites

16 answers to this question

Recommended Posts

  • 0

It is very easy using php oop.

$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;

  • 0

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0

YQL can make this easier...

Check out %27"]this query.

<?php
try{
  $response = new SimpleXMLElement(
    "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D    null,
    true
  );
}catch(Exception $e){
  $response = null;
}

$count = '0';

if(null !== $response && 1 === count($response->results)){
  $count = (string)$response->results->strong;
}

echo $count;

It's untested, so you may have to tweak.

  • 0

Yeah I tested it after seeing your post and indeed it does not work. file_get_contents does not support https. There is some tweaks to get it to work but at this point it is not worth the effort. Anthony's approach is much better.

  On 04/02/2011 at 03:48, thatguyandrew1992 said:

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0
  On 04/02/2011 at 11:54, AnthonySterling said:

YQL can make this easier...

Check out %27"]this query.


$count = '0';

It's untested, so you may have to tweak.

I appreciate your help very much. I tried it out. But that line is what defines the number that appears instead of what is from the mozilla site, and I'm not sure how to fix it :blush:

  • 0
  On 04/02/2011 at 22:04, sweetsam said:

I tried what Anthony posted and it works perfectly and the output matches the number on Mozilla's website. You might wanna check the code and make sure there are not line breaks where there shouldn't be any.

You were right! Copying it from neowin caused some line breaks! Thanks for all the help guys!!! :D

  • 0
  On 05/02/2011 at 02:40, sweetsam said:

I would recommend storing the data locally and updating it once a day. Do not load the data remotely every time your page loads because it might get you banned.

I see. I don't know how to do that though. :unsure:

  • 0

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

  • 0
  On 05/02/2011 at 17:23, sweetsam said:

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

Now I would use the code for multiple pages on my website. Should I change count.txt to countstrike.txt. Then have countsky.txt and countroyalblue.txt etc on my other pages?

  • 0

Elaborating on SweetSam's example, you could create two handy-dandy functions to help.

function get_stats($id){
  try{
    $response = new SimpleXMLElement(
      "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2F" . $id . "%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D'",
      null,
      true
    );
  }catch(Exception $e){
    $response = null;
  }

  $count = '0';

  if(null !== $response && 1 === count($response->results)){
    $count = (string)$response->results->strong;
  }

  return $count;
}

function get_cached_stats($id, $lifetime = 3600){
  $file = $id . '_stats_cache.txt';
  if(is_readable($file) && time() - filemtime($file) < (int)$lifetime){
    return file_get_contents($file);
  }
  $data = get_stats($id);
  file_put_contents($file, $data);
  return $data;
}

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Some AMD Ryzen users can get free Windows performance boost with this simple system tweak by Sayan Sen AMD understands that there is a lot of demand for its X3D processors and for good reason too, since they offer some of the best gaming experiences. As such, the company plans to launch a new 6-core Ryzen 5 9600X3D for those who may not want to spend top dollar on a 9800X3D. What makes X3D special is the densely packed last level cache (LLC) wherein the L3 (level 3) cache is 3D die-stacked such that there is a whole lot of it that the cores can access on demand all within the smallest footprint. This is said to help with latency especially, and games happen to be quite sensitive to it since they are a mixed workload and so there is a lot of to-and-fro. However, despite that fact, users have noticed micro-stuttering and freezes on Ryzen X3D CPUs. Although there is no official fix, some of the affected users have managed to resolve the issues by tweaking a motherboard setting. The tweak is related to a setting called "GLOBAL C-STATE CONTROL" (it may be called something else by your motherboard vendor) and changing it to 'Enabled' from 'Auto' could fix stuttering and lag-related issues in games. If you are not familiar with them, Processor Power Management is done through Advanced Configuration and Power Interface (ACPI) P-states or C-states. While P-states or performance states handle CPU voltage-frequency scaling, C-states deal with CPU sleep states so that some of the CPU functions, which are not necessary at that moment, are disabled. The P-states and C-states work together to make the processor run more efficiently. It helps the OS and apps determine which cores can be parked. The Global C-state control setting helps users manage not only the DF and CPU core C-states but also the I/O C-states too. For those wondering, DF here refers to Data Fabric or AMD's high bandwidth Infinity Fabric interconnect between CPUs, GPUs, and more, on AMD systems. By default, this is set to "Auto" which also means that it is "Enabled" by default. However, in the case of X3D parts, Auto may set this setting to "Disabled" and thus manually toggling it to "Enabled" may be necessary. X3D processors, the dual CCD (core complex die) ones especially, have their V-cache on a single CCD. If the CPPC (Collaborative Processor Performance Control), which lets an OS like Windows control the "preferred core" and clock speed boost, isn't working optimally to assign the correct gaming CCD, then this fix could well work. Global C-State Auto: Global C-State Enabled: We ran a benchmark on our Ryzen 9 9950X3D to see if toggling the settings would make a difference, and well, it didn't in the case of AIDA64. However, since this is a synthetic test that measures cache and memory exclusively, we can't definitively conclude that the fix will also not make a difference in the case of games. Another remedy for stuttering is to disable the monitoring of the "Power percent" metric on MSI Afterburner if you have it on. This has been a long-known issue and in fact can help you even if you are not using an X3D CPU. Source: Reddit (link1, link2) via YouTube
    • I only have one contact on WhatsApp. And that contact has sms also. I have many more contacts that use WhatsApp also, but everyone defaults to use iMessage, SMS or RCS anyway. Not a loss for me. I'm in Norway where mostly nobody uses WhatsApp.
    • Apple is boring for a kid. Only fun is browsing websites for HTML games. A PC with steam is another story. Of course if the child plays video games all day then maybe that might not be a good idea. :-)
    • Looking for a specific setting in Settings? Sorry, the option just doesn’t exist as you’d need to elevate for that. Want to do something quickly and efficiently? Nah, forced to use a “modern” interface which takes far longer to achieve what you’re looking to do. (Example: disable a NIC)
    • Yet the best laptop for all day battery life is a Mac, hands down, no contest. Windows is bloated and power management is rubbish. Search indexer. Defender. Malicious Software Removal Tool. Windows Update (+DISM). Office CTR. Telemetry. Disclaimer: I own a surface laptop studio, multiple gaming desktops, server, and a macbook pro.
  • Recent Achievements

    • One Month Later
      DecaffKnight94 earned a badge
      One Month Later
    • Dedicated
      S.P earned a badge
      Dedicated
    • One Month Later
      adxnksd42031 earned a badge
      One Month Later
    • Rising Star
      aphanic went up a rank
      Rising Star
    • Contributor
      GravityDead went up a rank
      Contributor
  • Popular Contributors

    1. 1
      +primortal
      663
    2. 2
      ATLien_0
      261
    3. 3
      Michael Scrip
      234
    4. 4
      Steven P.
      161
    5. 5
      +FloatingFatMan
      151
  • Tell a friend

    Love Neowin? Tell a friend!