• 0

Can't PHP screen scrape


Question

Hey guys! On my website you all helped me display the text from another website. When the source website changed, you all helped me fix it so it would work again. It's broken once again so I decided I would try to learn it on my own but I can't figure it out. Could someone post the code and then explain what's going on by any chance? :cry:

So this is the site and I would like to display the downloads from that webpage. I would like it to load the downloads everytime someone loads MY webpage so the downloads are always current.

https://addons.mozilla.org/en-US/firefox/addon/strike-49896/

Thanks again Neowinians!! :yes:

Link to comment
https://www.neowin.net/forum/topic/973286-cant-php-screen-scrape/
Share on other sites

16 answers to this question

Recommended Posts

  • 0

It is very easy using php oop.

$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;

  • 0

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0

YQL can make this easier...

Check out %27"]this query.

<?php
try{
  $response = new SimpleXMLElement(
    "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D    null,
    true
  );
}catch(Exception $e){
  $response = null;
}

$count = '0';

if(null !== $response && 1 === count($response->results)){
  $count = (string)$response->results->strong;
}

echo $count;

It's untested, so you may have to tweak.

  • 0

Yeah I tested it after seeing your post and indeed it does not work. file_get_contents does not support https. There is some tweaks to get it to work but at this point it is not worth the effort. Anthony's approach is much better.

  On 04/02/2011 at 03:48, thatguyandrew1992 said:

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0
  On 04/02/2011 at 11:54, AnthonySterling said:

YQL can make this easier...

Check out %27"]this query.


$count = '0';

It's untested, so you may have to tweak.

I appreciate your help very much. I tried it out. But that line is what defines the number that appears instead of what is from the mozilla site, and I'm not sure how to fix it :blush:

  • 0
  On 04/02/2011 at 22:04, sweetsam said:

I tried what Anthony posted and it works perfectly and the output matches the number on Mozilla's website. You might wanna check the code and make sure there are not line breaks where there shouldn't be any.

You were right! Copying it from neowin caused some line breaks! Thanks for all the help guys!!! :D

  • 0
  On 05/02/2011 at 02:40, sweetsam said:

I would recommend storing the data locally and updating it once a day. Do not load the data remotely every time your page loads because it might get you banned.

I see. I don't know how to do that though. :unsure:

  • 0

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

  • 0
  On 05/02/2011 at 17:23, sweetsam said:

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

Now I would use the code for multiple pages on my website. Should I change count.txt to countstrike.txt. Then have countsky.txt and countroyalblue.txt etc on my other pages?

  • 0

Elaborating on SweetSam's example, you could create two handy-dandy functions to help.

function get_stats($id){
  try{
    $response = new SimpleXMLElement(
      "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2F" . $id . "%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D'",
      null,
      true
    );
  }catch(Exception $e){
    $response = null;
  }

  $count = '0';

  if(null !== $response && 1 === count($response->results)){
    $count = (string)$response->results->strong;
  }

  return $count;
}

function get_cached_stats($id, $lifetime = 3600){
  $file = $id . '_stats_cache.txt';
  if(is_readable($file) && time() - filemtime($file) < (int)$lifetime){
    return file_get_contents($file);
  }
  $data = get_stats($id);
  file_put_contents($file, $data);
  return $data;
}

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Download The Inclusion Equation: Leveraging Data & AI (worth $21) for free by Steven Parker Claim your complimentary eBook worth $21 for free, before the offer ends on June 24. The Inclusion Equation is a comprehensive, one-of-a-kind guide to merging DEI and employee wellbeing concepts with data analytics and AI. In this book, renowned thought leader and professional keynote speaker Dr. Serena Huang explains exactly how to quantify the effectiveness of new talent strategies by connecting them to a firm ROI estimate, enabling readers to approach and win the favor of higher-ups in any organization with the same effectiveness that marketing and financial departments do. This book is written in a style that is appealing and accessible to all readers regardless of technical background, but with enough depth to provide real insight and strategies. Dr. Serena H. Huang distills her 10 years of Fortune 500 people analytics leadership experience into tools and framework you can leverage to measure and improve DEI and wellbeing in your workplace. Some of the topics explored in this book include: Attract and retain top talent, including Gen Z and Millennials, with tailored DEI and wellbeing strategies Quantifying not only a talent strategy's perceived initial effect on an organization, but also its improvement and expansion over time Turning DEI and wellbeing from illusive corporate concepts to quantifiable metrics Harness the power of AI to create synchronized DEI and wellbeing strategies that maximize ROI Getting serious attention from your CEO and CFO by quantifying HR initiatives Using data storytelling to demonstrate the business impact of DEI and wellbeing Preparing for the future by understanding the role of AI in creating an inclusive and healthy workplace The Inclusion Equation is a complete guide for DEI and wellbeing, covering getting started in measurement to using storytelling to influence leadership. This is the contemporary playbook for any organization intending to substantially improve their diversity, equity, inclusion, and employee wellbeing by leveraging data & AI. This book is also perfect for any data analytics professionals who want to understand how to apply analytics to issues that keep their CEOs up at night. Whether you are a data expert or data novice, as long as you are serious about improving DEI and wellbeing, this book is for you. This free to download offer expires June 24. How to get it Please ensure you read the terms and conditions to claim this offer. Complete and verifiable information is required in order to receive this free offer. If you have previously made use of these free offers, you will not need to re-register. While supplies last! Download The Inclusion Equation: Leveraging Data & AI (worth $21) for free Offered by Wiley, view other free resources The below offers are also available for free in exchange for your (work) email: AI and Innovation ($21 Value) FREE – Expires 6/11 Unruly: Fighting Back when Politics, AI, and Law Upend [...] ($18 Value) FREE - Expires 6/17 SQL Essentials For Dummies ($10 Value) FREE – Expires 6/17 Continuous Testing, Quality, Security, and Feedback ($27.99 Value) FREE – Expires 6/18 VideoProc Converter AI v7.5 for FREE (worth $78.90) – Expires 6/18 Macxvideo AI ($39.95 Value) Free for a Limited Time – Expires 6/22 Excel Quick and Easy ($12 Value) FREE – Expires 6/24 The Inclusion Equation: Leveraging Data & AI ($21 Value) FREE – Expires 6/24 Microsoft 365 Copilot At Work ($60 Value) FREE – Expires 6/25 Natural Language Processing with Python ($39.99 Value) FREE – Expires 6/25 How to Engage Buyers and Drive Growth in the Age of AI ($22.95 Value) FREE – Expires 7/1 Using Artificial Intelligence to Save the World ($30.00 Value) FREE – Expires 7/1 Essential: How Distributed Teams, Generative AI, [...] ($18.00 Value) FREE – Expires 7/2 The Chief AI Officer's Handbook: Master AI leadership with strategies to innovate, overcome challenges, and drive business growth ($9.99 Value) FREE for a Limited Time – Expires 7/2 The Ultimate Linux Newbie Guide – Featured Free content Python Notes for Professionals – Featured Free content Learn Linux in 5 Days – Featured Free content Quick Reference Guide for Cybersecurity – Featured Free content We post these because we earn commission on each lead so as not to rely solely on advertising, which many of our readers block. It all helps toward paying staff reporters, servers and hosting costs. Other ways to support Neowin The above deal not doing it for you, but still want to help? Check out the links below. Check out our partner software in the Neowin Store Buy a T-shirt at Neowin's Threadsquad Subscribe to Neowin - for $14 a year, or $28 a year for an ad-free experience Disclosure: An account at Neowin Deals is required to participate in any deals powered by our affiliate, StackCommerce. For a full description of StackCommerce's privacy guidelines, go here. Neowin benefits from shared revenue of each sale made through the branded deals site.
    • It's basically been a rite of passage to blow up your first WSUS server by trying to sync the drivers database. Anyone who has done this has certainly seen the tens of thousands of driver packages and asked "what is all of this literal garbage?". Seems Microsoft is asking the same question. I do hope they won't take it too far and start removing drivers needed to run legacy systems, but there's definitely a happy medium to be found between "only the latest versions for actively supported hardware" and "every version of every driver ever for all time".
    • Stable..... No, he isn't..
    • Of course the sales are bad. Who even asked for a thinner phone with way less battery? Lightness? It's still a giant brick, it's just a thinner giant brick. It makes no sense at all. Making folding phones thinner, now that does make sense. Because when folded, the thinner it is unfolded, the more usable and pocketable it is when folded. You already expect worse battery at expense of actually being more pocketable. Galaxy Flip, when folded is half the size of S Ultra models and about as thick. That does make a big difference when fitting it in a pocket. But the phone that's as big as Ultra, making it thinner, you don't really solve anything, it's still a giant slab that barely fits into a pocket. All the "Mini" phones made way more sense than this thin crap. Especially now that it's literally impossible to find a phone smaller than 6.5". My dad only needs phone for calls and SMS and he doesn't want to go with smartphone because they are all so massive. Especially cheaper ones. Like, he'd be fine with Galaxy A06 for all he cares in terms of hardware, but it only comes in giant 6.7" format. It's useless. Or is he suppose to find a 800€ old gen iPhone Mini or Zenfone? He doesn't even need those stupid specs and such stupid price. And then you see old people fumbling around with giant smartphones and they don't even need 3/4 of features on them.
  • Recent Achievements

    • First Post
      emptyother earned a badge
      First Post
    • Week One Done
      Crunchy6 earned a badge
      Week One Done
    • One Month Later
      KynanSEIT earned a badge
      One Month Later
    • One Month Later
      gowtham07 earned a badge
      One Month Later
    • Collaborator
      lethalman went up a rank
      Collaborator
  • Popular Contributors

    1. 1
      +primortal
      664
    2. 2
      ATLien_0
      270
    3. 3
      Michael Scrip
      218
    4. 4
      Steven P.
      161
    5. 5
      +FloatingFatMan
      157
  • Tell a friend

    Love Neowin? Tell a friend!