• 0

simple_html_dom: simple use-case - to get back data for storing in SQLite db


Question

hello dear experts and friends on Neowin, 

 


i fairly new to simple_html_dom usage and methods. I know a little the parser,
 
i want to gather some information from this site:

https://europa.eu/youth/volunteering/organisations_en#open


is this possible to get the content - of let us say 10 or 20 last records on that page - and subesquently to store it in my mysql - db!?

<?php
// Report all PHP errors (see changelog)
error_reporting(E_ALL);

include('inc/simple_html_dom.php');

    //base url
    $base = 'https://europa.eu/youth/volunteering/organisations_en#open';

    //home page HTML
    $html_base = file_get_html( $base );

    //get all category links
    foreach($html_base->find('a') as $element) {
        echo "<pre>";
        print_r( $element->href );
        echo "</pre>";
    }

    $html_base->clear(); 
    unset($html_base);

?>

 

I have the above code and I'm trying to get certain elements of the page but it isn't returning anything. 

 

Is it possible that certain PHP functions might be disabled on the server to stop that?

The above code works perfectly on other sites.

 

Is there any workaround?

 

 

btw:  i have created a small snipped as a proof of concept to run this with Python and BeautifulSoup - 

 


import requests
from bs4 import BeautifulSoup
 
url = 'https://europa.eu/youth/volunteering/organisations_en#open'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
block = soup.find('div', class_="eyp-card block-is-flex")

 

and this.... 

 

European Youth Portal
>>> block.a
<a href="/youth/volunteering/organisation/48592_en" target="_blank">"Academy for Peace and Development" Union</a>
>>> block.a.text
'"Academy for Peace and Development" Union'
 
>>> block.select_one('div > div > p:nth-child(9)')
<p><strong>PIC:</strong> 948417016</p>
>>> block.select_one('div > div > p:nth-child(9)').text
'PIC: 948417016'

 

what is aimed in the end - i want to gather the first 20 results of the page - and put them in to a sql-db or alternatively show the information in a little widget 

9 answers to this question

Recommended Posts

  • 1

Correct, file_get_html is not a native PHP function, the function resides within your included file that cannot be found so you need to fix that first.

 

You should immediately be able to see weather or not the file exists by checking the inc folder in your project root.

  • 0

 

 
hi there good day . 

 

after the first try i did another one:  - i still get following errors: 

 

 

<br />
<b>Warning</b>:  include(inc/simple_html_dom.php): failed to open stream: No such file or directory in <b>[...][...]</b> on line <b>5</b><br />
<br />
<b>Warning</b>:  include(): Failed opening 'inc/simple_html_dom.php' for inclusion (include_path='.:') in <b>[...][...]</b> on line <b>5</b><br />
<br />
<b>Fatal error</b>:  Uncaught Error: Call to undefined function file_get_html() in [...][...]:11
Stack trace:
#0 {main}
  thrown in <b>[...][...]</b> on line <b>11</b><br />

and this one: 


FATAL ERROR syntax error, unexpected '<', expecting end of file on line number 1

 

hmmm - i think that i have to do some corrections.  I have to investigate the target to find out what is missing - what i have to correct in my testcode. 

 

i will come back later the day.. 

regards 

  • 0

hi there good day dear @+virtorio,  and good day dear @Rix, 

 

first of all: many many thanks for the input. I am very glad to hear from you  both 



agreed:  since file_get_html is not a native PHP function, the function resides within my included file that cannot be found so i need to to fix that first.
The filed does not exist by checking the inc folder in the project root. i am going to fix it.


note - the first testrun i made on systems like the following: 
- PHP Sandbox, test PHP online, PHP testersandbox.onlinephpfunctions.com
- PHPTESTER - Test PHP code onlinephptester.net
- Online PHP Editor | Online editor and compilerpaiza.io › projects ›
 

so yes. this  was not able to function. This had to go wrong and fail. 

 

now i am happy that you encouraged me to go into the right direction. 

 

on my machine ( a windows seven) i have installed Atom with the php IDE 

 

so the question is: - i run ATOM 


this is my project-folder:

/project
/project/includes/

 

where to put the above mentioned file - (from the threastart in)?


question: does the above mentioned file resides in that folder - the includes file!?


many thanks for any and all help and for hints with this.

love to hear from you

regards

 

 

Edited by tarifa
  • 0

 
it looks like so: 
 

C:\Users\Kasper\Documents\_mk_\_dev_\php\ ->here my_project-file_ 
C:\Users\Kasper\Documents\_mk_\_dev_\php\includes  (and here the "simplehtmldom-parser" from  https://sourceforge.net/projects/simplehtmldom/ goes in

 

i am going to testrun now the whole thing on my machine - using ATOM 

i come back later the day 
love to hear from you

regards

 

ps - see the picture: image.thumb.png.b6614b9b55139e797aadbba2bfedcf07.png

Edited by tarifa
  • 0

 

hi there Rix - hello dear +virtorio;)

 

 

first of all - many many thanks for the reply and the hint. I am very glad to hear from you. Thanks for encouraging me to go the way. I am very happy.  I appreciate every help. 

 

 

i have now some first approaches :  the "semantic" class is suppoese to be  "eyp-card". 
 

 


function get_eyp_cards_data(){
  $dom = new DomDocument();
  $my_cards = array();

  if ( $dom->load('https://europa.eu/youth/volunteering/organisations_en') ) { // true or false https://www.php.net/manual/en/domdocument.loadhtml.php
    $domx = new DOMXpath($dom);
    $eyp_cards = $domx->query('div[contains(@class,"eyp-card")]'); // returns DOMNodeList https://www.php.net/manual/en/class.domnodelist.php

    if ( $eyp_cards->length > 0 ) { // length IS a property of DOMNodeList. works but looks a bit JSy 
      foreach ( $eyp_cards as $eyp_card ) {
        // Debug: echo '<pre>', var_dump($eyp_card), '<pre>';
        $my_cards[] = array(
          'title' => $eyp_card->getElementsByTagName('h5')->item(0)->nodeValue,
          'content' => $eyp_card->firstChild->nodeValue, // includes title
        );
      }
    }
  }
  return !empty($my_cards) ? $my_cards : false;
}

$my_cards = get_eyp_cards_data();

 

the referenc of selectors 

 

https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string

 

 

regarding the target page:  i want to gather some information from this site:

 

https://europa.eu/youth/volunteering/organisations_en#open

 

note -there are approx 200 pages or more. 

 

i guess that i will rework this and enhance it to get some  data stored in a sql-db

 

many thanks for all your feed-back and your hints 

 

 

  • 0

 to find out more about how i work with the DOMdocument i go ahead - eg like so: I have this html code:

 

<html>
    <head>
    ...
    </head>
<body>
    <div>
    <div class="foo" data-type="bar">
        SOMECONTENTWITHMORETAGS
    </div>
    </div>
</body>

 

and now  I'd like to return all html tags (including its attributes) of DOMElement. How I can do that?

How to achive this!?

 



private function get_html_from_node($node){
  $html = '';
  $children = $node->childNodes;

  foreach ($children as $child) {
    $tmp_doc = new DOMDocument();
    $tmp_doc->appendChild($tmp_doc->importNode($child,true));
    $html .= $tmp_doc->saveHTML();
  } 
  return $html;
}

 

I already can get the "foo" element (but only its content) with this function above,.... 

 

 

 

okay - so far so good:  I already can get the "foo" element (but only its content) with this function above,.... 

 

furthermore:  guess that it is pretty woth to thake a closer look at the optional argument to DOMDocument::saveHTML: this says "output this element only".

 

return $node->ownerDocument->saveHTML($node);

 

Note that the argument is now in PHP7 available - - it is this since the good old version 5xcy . Before that, you would need to use DOMDocument::saveXML instead.  The good thing is that the results may very very helpful - Also, if we already have a reference to the document, we can just do this:

 

 

$doc->saveHTML($node);

okay - and now i will work on the above mentioned example... - in europe

 

if anybody has got some ideas or hints - i appreciate any and all help

Edited by tarifa
more insights

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Anthropic pulls Fable 5 and Mythos 5 after US export control order by Pradeep Viswanathan In April this year, Anthropic launched the Claude Mythos Preview frontier model with state-of-the-art cyber and coding capabilities for a select set of companies around the world. After preparing appropriate guardrails, early this week, Anthropic launched Claude Fable 5 and Mythos 5, its most capable AI models. Claude Fable 5 is for general users and comes with strict safeguards, while Mythos 5 is designed with fewer safeguards for cybersecurity and biology use cases. Today, Anthropic abruptly suspended access to its Fable 5 and Mythos 5 AI models for all customers after receiving an export control directive from the US government. The company received the directive from the government today at 5:21 p.m. ET, and the received letter did not provide any details regarding the national security concern. Anthropic understands that the government became aware of a method to bypass, or “jailbreak,” Fable 5, which might be the reason behind the directive. The order was issued under national security authorities and requires the company to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether they are inside or outside the United States. The restriction also applies to foreign national employees working at Anthropic. As a result, the company has disabled both models for all customers to ensure compliance. Access to previous Anthropic models like Opus and Sonnet is not affected by this government order. The company highlighted that it had developed strong safeguards to reduce the possibility that Fable is misused for tasks related to cybersecurity. In fact, many developers are complaining that the safeguards are going overboard. Additionally, the company worked with the US government, the UK AISI, multiple private third-party organizations, and internal teams to red-team Fable’s safeguards for thousands of hours. Finally, Anthropic noted that no testers have yet been able to find a universal jailbreak on Fable 5. As expected, Anthropic disagrees that a narrow potential jailbreak should lead to the recall of a commercial model used by hundreds of millions of people. It warned that applying this standard across the AI industry could effectively halt new frontier model deployments. Anthropic concluded by mentioning that it is working to restore access to Fable 5 and Mythos 5 as soon as possible and plans to share more details within the next 24 hours.
    • Brave Browser 1.91.172 is out.
    • Any Video Converter Free 9.2.3 by Razvan Serea Any Video Converter is an All-in-One video converting tool with an easy-to-use graphical interface, fast converting speed and excellent video quality. Any Video Converter supports all popular video formats and converts your videos to different video formats including MP4, MOV, MKV, M2TS, M4V, MPEG, AVI, WMV, ASF, OGV, WEBM, and more. It supports converting videos to customized percent (50%, 100%, 200%, and more) or resolution (480p, 720p, 1080p, 4K, and more); It supports encoding videos into x264, x265, h263p, xvid, mpeg, wmv, and more. Any Video Converter Free key features: Compatible with Windows 11/10/8.1/8/7 (32-64bit) User interface are available in 14 languages Convert all kinds of video formats including high-definition videos Extract audio from any videos and save as MP3/WMA for your mp3 player Take snapshot from any videos and build your own picture collection Support high-definition for both input and output Batch add videos from hard drive and batch convert Customize output parameters completely as you like Manage your output videos files by group or output profile Merge several video files into a single and long one Clip a video into segments Free Audio Filter: Adjust audio volume and add audio effects Crop frame size to remove black bars and retain what you want only Adjust the brightness, contrast, saturation Rotate or flip or add noise/sharpen effects Produce output video with subtitles of your own dialogue and much, much more... Any Video Converter Free 9.2.3 changelog: Fixed video download engine auto-update failures. Added custom speed control support in the speed change tool. Added support for downloading YouTube AI-generated subtitles. Added support for preserving original audio stream in the format convert tool (e.g., Dolby Atmos, DTS:X). Fixed other bugs and improved overall performance. Download: Any Video Converter Free 9.2.3 | 7.6 MB (Freeware) View: Any Video Converter Free Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Not sure what country you’re in but in many countries you can absolutely jail the sellers behind businesses… in fact I’d say in most countries you can do that
    • I guess we are done since you refuse to read my comment you replied to or my other comment in another thread you were also a part of here.
  • Recent Achievements

    • Contributor
      MarkHughes4096 went up a rank
      Contributor
    • Dedicated
      jordanspringer earned a badge
      Dedicated
    • Rookie
      Rimplesnort went up a rank
      Rookie
    • One Year In
      Markus94287 earned a badge
      One Year In
    • One Month Later
      Markus94287 earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      504
    2. 2
      +Edouard
      173
    3. 3
      PsYcHoKiLLa
      155
    4. 4
      ATLien_0
      92
    5. 5
      Steven P.
      79
  • Tell a friend

    Love Neowin? Tell a friend!