• 0

Can't PHP screen scrape


Question

Hey guys! On my website you all helped me display the text from another website. When the source website changed, you all helped me fix it so it would work again. It's broken once again so I decided I would try to learn it on my own but I can't figure it out. Could someone post the code and then explain what's going on by any chance? :cry:

So this is the site and I would like to display the downloads from that webpage. I would like it to load the downloads everytime someone loads MY webpage so the downloads are always current.

https://addons.mozilla.org/en-US/firefox/addon/strike-49896/

Thanks again Neowinians!! :yes:

Link to comment
https://www.neowin.net/forum/topic/973286-cant-php-screen-scrape/
Share on other sites

16 answers to this question

Recommended Posts

  • 0

It is very easy using php oop.

$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;

  • 0

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0

YQL can make this easier...

Check out %27"]this query.

<?php
try{
  $response = new SimpleXMLElement(
    "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D    null,
    true
  );
}catch(Exception $e){
  $response = null;
}

$count = '0';

if(null !== $response && 1 === count($response->results)){
  $count = (string)$response->results->strong;
}

echo $count;

It's untested, so you may have to tweak.

  • 0

Yeah I tested it after seeing your post and indeed it does not work. file_get_contents does not support https. There is some tweaks to get it to work but at this point it is not worth the effort. Anthony's approach is much better.

  On 04/02/2011 at 03:48, thatguyandrew1992 said:

<?php
$data = file_get_contents('https://addons.mozilla.org/en-US/firefox/addon/strike-49896/');

$html = new DOMDocument();

@$html->loadHTML($data);

foreach($html->getElementsByTagName('strong') as $strong):
        if ($strong->getAttribute('class') === 'downloads'):
                echo $strong->childNodes[0]->nodeValue;
        endif;
endforeach;
?>

I added this but it didn't work :(

http://firefox.thechillroom.com/strike.php

  • 0
  On 04/02/2011 at 11:54, AnthonySterling said:

YQL can make this easier...

Check out %27"]this query.


$count = '0';

It's untested, so you may have to tweak.

I appreciate your help very much. I tried it out. But that line is what defines the number that appears instead of what is from the mozilla site, and I'm not sure how to fix it :blush:

  • 0
  On 04/02/2011 at 22:04, sweetsam said:

I tried what Anthony posted and it works perfectly and the output matches the number on Mozilla's website. You might wanna check the code and make sure there are not line breaks where there shouldn't be any.

You were right! Copying it from neowin caused some line breaks! Thanks for all the help guys!!! :D

  • 0
  On 05/02/2011 at 02:40, sweetsam said:

I would recommend storing the data locally and updating it once a day. Do not load the data remotely every time your page loads because it might get you banned.

I see. I don't know how to do that though. :unsure:

  • 0

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

  • 0
  On 05/02/2011 at 17:23, sweetsam said:

Try this...

		$file = 'count.txt';

		if ( file_exists($file) and time() - filemtime($file) < 3600):
			$count = file_get_contents($file);
		else:
			try{
				$response = new SimpleXMLElement("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2Fstrike-49896%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5Dnull, true);
			}catch(Exception $e){
				$response = null;
			}

			$count = '0';

			if(null !== $response && 1 === count($response->results)):
			  $count = (string)$response->results->strong;
			endif;

			file_put_contents($file, $count);
		endif;

		echo $count;

Now I would use the code for multiple pages on my website. Should I change count.txt to countstrike.txt. Then have countsky.txt and countroyalblue.txt etc on my other pages?

  • 0

Elaborating on SweetSam's example, you could create two handy-dandy functions to help.

function get_stats($id){
  try{
    $response = new SimpleXMLElement(
      "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Faddons.mozilla.org%2Fen-US%2Ffirefox%2Faddon%2F" . $id . "%2F%22%20and%0A%20%20%20%20%20%20xpath%3D'%2F%2Fstrong%5B%40class%20%3D%20%22downloads%22%5D'",
      null,
      true
    );
  }catch(Exception $e){
    $response = null;
  }

  $count = '0';

  if(null !== $response && 1 === count($response->results)){
    $count = (string)$response->results->strong;
  }

  return $count;
}

function get_cached_stats($id, $lifetime = 3600){
  $file = $id . '_stats_cache.txt';
  if(is_readable($file) && time() - filemtime($file) < (int)$lifetime){
    return file_get_contents($file);
  }
  $data = get_stats($id);
  file_put_contents($file, $data);
  return $data;
}

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • The Xbox game bar says hi. It's been able to show you CPU, RAM, GPU and FPS for as long as I can remember.
    • WinSCP 6.5.2 by Razvan Serea  WinSCP is an open source free SFTP client, FTP client, WebDAV client and SCP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality. WinSCP features: Graphical user interface Translated into several languages Integration with Windows (drag&drop, URL, shortcut icons) U3 support All common operations with files Support for SFTP and SCP protocols over SSH-1 and SSH-2 and plain old FTP protocol Batch file scripting and command-line interface Directory synchronization in several semi or fully automatic ways Integrated text editor Support for SSH password, keyboard-interactive, public key and Kerberos (GSS) authentication Integrates with Pageant (PuTTY authentication agent) for full support of public key authentication with SSH Explorer and Commander interfaces Optionally stores session information Optionally supports portable operation using a configuration file in place of registry entries, suitable for operation from removable media WinSCP 6.5.2 changelog: Thumbnail view in file panels. Three selectable sizes of toolbar icons, showing slightly larger size by default. Switching to Segoe UI font with slightly larger size. Improvements to Synchronization checklist window, including resolving file moves and pushing synchronization to background queue. Ongoing local delete operation can be moved to a background queue. Optimized working with large local directories. Compatibility with new OneDrive WebDAV interface. Dark theme for session tabs. Improvements to S3 support, including more options to authentication and display and modification of S3 file/object tags. List of all changes. Download: WinSCP 6.5.2 | 11.6 MB (Open Source) Download: WinSCP MSI | 28.7 MB Download: Standalone Executable | 8.4 MB Link: WinSCP Home page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • QOwnNotes 25.6.2 by Razvan Serea QOwnNotes is a open source (GPL) plain-text file notepad with markdown support and todo list manager for GNU/Linux, Mac OS X and Windows, that (optionally) works together with the notes application of ownCloud (or Nextcloud). So you are able to write down your thoughts with QOwnNotes and edit or search for them later from your mobile device (like with CloudNotes) or the ownCloud web-service. The notes are stored as plain text files and you can sync them with your ownCloud sync client. Of course other software, like Dropbox, Syncthing, Seafile or BitTorrent Sync can be used too. Features: the notes folder can be freely chosen (multiple note folders can be used) sub-string searching of notes is possible and search results are highlighted in the notes application can be operated with customizable keyboard shortcuts external changes of note files are watched (notes or note list are reloaded) older versions of your notes can be restored from your ownCloud server trashed notes can be restored from your ownCloud server differences between current note and externally changed note are showed in a dialog markdown highlighting of notes and a markdown preview mode notes are getting their name from the first line of the note text (just like in the ownCloud notes web-application) and the note text files are automatically renamed, if the the first line changes compatible with the notes web-application of ownCloud and mobile ownCloud notes applications compatible with ownCloud's selective sync feature by supporting an unlimited amount of note folders with the ability to choose the respective folder on your server manage your ownCloud todo lists (ownCloud tasks or Tasks Plus / Calendar Plus) or use an other CalDAV server to sync your tasks to encryption of notes (AES-256 is built in or you can use custom encryption methods like Keybase.io (encryption-keybase.qml) or PGP (encryption-pgp.qml)) dark mode theme support theming support for the markdown syntax highlighting all panels can be placed wherever you want, they can even float or stack (fully dockable) support for freedesktop theme icons, you can use QOwnNotes with your native desktop icons and with your favorite dark desktop theme support for hierarchical note tagging and note subfolders support for sharing notes on your ownCloud server portable mode for carrying QOwnNotes around on USB sticks Evernote import QOwnNotes is available in many different languages like English, German, French, Polish, Chinese, Japanese, Russian, Portuguese, Hungarian, Dutch and Spanish Changes in QOwnNotes 25.6.2: The Find action dialog is now working again (for #3294) Added more French translation (thank you, jd-develop) Download: QOwnNotes 25.6.2 | 37.3 MB (Open Source) Download: QOwnNotes for Other Operating Systems View: QOwnNotes Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • So it’s fine he scams people to troll others? 🙄 This is a repeat of his watch fiasco.
  • Recent Achievements

    • First Post
      Fuzz_c earned a badge
      First Post
    • First Post
      TIGOSS earned a badge
      First Post
    • Week One Done
      slackerzz earned a badge
      Week One Done
    • Week One Done
      vivetool earned a badge
      Week One Done
    • Reacting Well
      pnajbar earned a badge
      Reacting Well
  • Popular Contributors

    1. 1
      +primortal
      700
    2. 2
      ATLien_0
      279
    3. 3
      Michael Scrip
      209
    4. 4
      +FloatingFatMan
      195
    5. 5
      Steven P.
      130
  • Tell a friend

    Love Neowin? Tell a friend!