• 0

simple_html_dom: simple use-case - to get back data for storing in SQLite db


Question

hello dear experts and friends on Neowin, 

 


i fairly new to simple_html_dom usage and methods. I know a little the parser,
 
i want to gather some information from this site:

https://europa.eu/youth/volunteering/organisations_en#open


is this possible to get the content - of let us say 10 or 20 last records on that page - and subesquently to store it in my mysql - db!?

<?php
// Report all PHP errors (see changelog)
error_reporting(E_ALL);

include('inc/simple_html_dom.php');

    //base url
    $base = 'https://europa.eu/youth/volunteering/organisations_en#open';

    //home page HTML
    $html_base = file_get_html( $base );

    //get all category links
    foreach($html_base->find('a') as $element) {
        echo "<pre>";
        print_r( $element->href );
        echo "</pre>";
    }

    $html_base->clear(); 
    unset($html_base);

?>

 

I have the above code and I'm trying to get certain elements of the page but it isn't returning anything. 

 

Is it possible that certain PHP functions might be disabled on the server to stop that?

The above code works perfectly on other sites.

 

Is there any workaround?

 

 

btw:  i have created a small snipped as a proof of concept to run this with Python and BeautifulSoup - 

 


import requests
from bs4 import BeautifulSoup
 
url = 'https://europa.eu/youth/volunteering/organisations_en#open'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
block = soup.find('div', class_="eyp-card block-is-flex")

 

and this.... 

 

European Youth Portal
>>> block.a
<a href="/youth/volunteering/organisation/48592_en" target="_blank">"Academy for Peace and Development" Union</a>
>>> block.a.text
'"Academy for Peace and Development" Union'
 
>>> block.select_one('div > div > p:nth-child(9)')
<p><strong>PIC:</strong> 948417016</p>
>>> block.select_one('div > div > p:nth-child(9)').text
'PIC: 948417016'

 

what is aimed in the end - i want to gather the first 20 results of the page - and put them in to a sql-db or alternatively show the information in a little widget 

9 answers to this question

Recommended Posts

  • 1

Correct, file_get_html is not a native PHP function, the function resides within your included file that cannot be found so you need to fix that first.

 

You should immediately be able to see weather or not the file exists by checking the inc folder in your project root.

  • 0

 

 
hi there good day . 

 

after the first try i did another one:  - i still get following errors: 

 

 

<br />
<b>Warning</b>:  include(inc/simple_html_dom.php): failed to open stream: No such file or directory in <b>[...][...]</b> on line <b>5</b><br />
<br />
<b>Warning</b>:  include(): Failed opening 'inc/simple_html_dom.php' for inclusion (include_path='.:') in <b>[...][...]</b> on line <b>5</b><br />
<br />
<b>Fatal error</b>:  Uncaught Error: Call to undefined function file_get_html() in [...][...]:11
Stack trace:
#0 {main}
  thrown in <b>[...][...]</b> on line <b>11</b><br />

and this one: 


FATAL ERROR syntax error, unexpected '<', expecting end of file on line number 1

 

hmmm - i think that i have to do some corrections.  I have to investigate the target to find out what is missing - what i have to correct in my testcode. 

 

i will come back later the day.. 

regards 

  • 0

hi there good day dear @+virtorio,  and good day dear @Rix, 

 

first of all: many many thanks for the input. I am very glad to hear from you  both 



agreed:  since file_get_html is not a native PHP function, the function resides within my included file that cannot be found so i need to to fix that first.
The filed does not exist by checking the inc folder in the project root. i am going to fix it.


note - the first testrun i made on systems like the following: 
- PHP Sandbox, test PHP online, PHP testersandbox.onlinephpfunctions.com
- PHPTESTER - Test PHP code onlinephptester.net
- Online PHP Editor | Online editor and compilerpaiza.io › projects ›
 

so yes. this  was not able to function. This had to go wrong and fail. 

 

now i am happy that you encouraged me to go into the right direction. 

 

on my machine ( a windows seven) i have installed Atom with the php IDE 

 

so the question is: - i run ATOM 


this is my project-folder:

/project
/project/includes/

 

where to put the above mentioned file - (from the threastart in)?


question: does the above mentioned file resides in that folder - the includes file!?


many thanks for any and all help and for hints with this.

love to hear from you

regards

 

 

Edited by tarifa
  • 0

 
it looks like so: 
 

C:\Users\Kasper\Documents\_mk_\_dev_\php\ ->here my_project-file_ 
C:\Users\Kasper\Documents\_mk_\_dev_\php\includes  (and here the "simplehtmldom-parser" from  https://sourceforge.net/projects/simplehtmldom/ goes in

 

i am going to testrun now the whole thing on my machine - using ATOM 

i come back later the day 
love to hear from you

regards

 

ps - see the picture: image.thumb.png.b6614b9b55139e797aadbba2bfedcf07.png

Edited by tarifa
  • 0

 

hi there Rix - hello dear +virtorio;)

 

 

first of all - many many thanks for the reply and the hint. I am very glad to hear from you. Thanks for encouraging me to go the way. I am very happy.  I appreciate every help. 

 

 

i have now some first approaches :  the "semantic" class is suppoese to be  "eyp-card". 
 

 


function get_eyp_cards_data(){
  $dom = new DomDocument();
  $my_cards = array();

  if ( $dom->load('https://europa.eu/youth/volunteering/organisations_en') ) { // true or false https://www.php.net/manual/en/domdocument.loadhtml.php
    $domx = new DOMXpath($dom);
    $eyp_cards = $domx->query('div[contains(@class,"eyp-card")]'); // returns DOMNodeList https://www.php.net/manual/en/class.domnodelist.php

    if ( $eyp_cards->length > 0 ) { // length IS a property of DOMNodeList. works but looks a bit JSy 
      foreach ( $eyp_cards as $eyp_card ) {
        // Debug: echo '<pre>', var_dump($eyp_card), '<pre>';
        $my_cards[] = array(
          'title' => $eyp_card->getElementsByTagName('h5')->item(0)->nodeValue,
          'content' => $eyp_card->firstChild->nodeValue, // includes title
        );
      }
    }
  }
  return !empty($my_cards) ? $my_cards : false;
}

$my_cards = get_eyp_cards_data();

 

the referenc of selectors 

 

https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string

 

 

regarding the target page:  i want to gather some information from this site:

 

https://europa.eu/youth/volunteering/organisations_en#open

 

note -there are approx 200 pages or more. 

 

i guess that i will rework this and enhance it to get some  data stored in a sql-db

 

many thanks for all your feed-back and your hints 

 

 

  • 0

 to find out more about how i work with the DOMdocument i go ahead - eg like so: I have this html code:

 

<html>
    <head>
    ...
    </head>
<body>
    <div>
    <div class="foo" data-type="bar">
        SOMECONTENTWITHMORETAGS
    </div>
    </div>
</body>

 

and now  I'd like to return all html tags (including its attributes) of DOMElement. How I can do that?

How to achive this!?

 



private function get_html_from_node($node){
  $html = '';
  $children = $node->childNodes;

  foreach ($children as $child) {
    $tmp_doc = new DOMDocument();
    $tmp_doc->appendChild($tmp_doc->importNode($child,true));
    $html .= $tmp_doc->saveHTML();
  } 
  return $html;
}

 

I already can get the "foo" element (but only its content) with this function above,.... 

 

 

 

okay - so far so good:  I already can get the "foo" element (but only its content) with this function above,.... 

 

furthermore:  guess that it is pretty woth to thake a closer look at the optional argument to DOMDocument::saveHTML: this says "output this element only".

 

return $node->ownerDocument->saveHTML($node);

 

Note that the argument is now in PHP7 available - - it is this since the good old version 5xcy . Before that, you would need to use DOMDocument::saveXML instead.  The good thing is that the results may very very helpful - Also, if we already have a reference to the document, we can just do this:

 

 

$doc->saveHTML($node);

okay - and now i will work on the above mentioned example... - in europe

 

if anybody has got some ideas or hints - i appreciate any and all help

Edited by tarifa
more insights

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Visual Studio finally gets long-awaited feature that developers will love by Usama Jawad Visual Studio Code is Microsoft's popular, lightweight, open-source code editor, it is actually Visual Studio that is the company's flagship integrated development environment (IDE). Although the IDE already offers a boatload of useful features for developers, Microsoft has finally introduced a long-requested capability that will be loved by many. While developers have already been able to create Git pull requests (PRs) directly within Visual Studio for the past couple of years, it had not been possible to review a PR without switching to the browser, until now. Microsoft revealed in December 2025 that it is working on UX that enables developers to do just that, and fast-forward to June 2026, and Visual Studio finally has native capabilities to open and inspect a PR, discuss feedback, and wrap up the review, all without switching to the browser. This integration works for both GitHub and Azure DevOps (including on-prem). Developers have access to multiple surfaces to open a PR, including Git Repository, Git Changes, and the Git menu in Visual Studio. Once you open a PR, all the important details will be immediately visible to you, from where you can navigate to various levels of granularity and branch states, depending on the reviews that you are engaged in. As you would expect, you also get a diff view that enables you to see code changes inline or side-by-side in a separate panel. You can also review commit-by-commit. Additionally, this UX fosters collaboration as you can leave comments, reply to threads, and resolve conversations easily. Naturally, you can also leverage Copilot to apply a code suggestion to fix a potential issue. When you are done, you have the ability to approve, complete, and merge the PR. This is a pretty major feature as it has been requested heavily for the past few years. You can try it out in Visual Studio 2026 version 18.7, made available here recently. Microsoft plans to enhance this experience further in future releases with comment filtering, a timeline of PR activity, and more.
    • This AdGuard Family lifetime deal is still only $15.97 by Steven Parker Today's highlighted Neowin Deal comes via our Apps + Software section, where you can get a lifetime subscription and save 90% on a lifetime AdGuard Family Plan. AdGuard is a unique program that has all the necessary features for what they claim to be "the best web experience." The software combines the an advanced ad blocker, a privacy protection module, and a parental control tool—all working in one app. This software deals with annoying ads, hides your data from a multitude of trackers, protects you from malware attacks, and even lets you restrict your kids from accessing inappropriate content. Install AdGuard and see the internet as it was supposed to be: clean and safe. Get rid of annoying banners, pop-ups & video ads once and for all Hide your data from the multitude of trackers & activity analyzers that swarm the web Avoid fraudulent and phishing website and malware attacks Protect your kids online by restricting them from accessing inappropriate & adult content Good to know Family Plan Length of access: lifetime This plan is only available to new users Redemption deadline: redeem your code within 30 days of purchase Max number of devices: 9 Access options: desktop & mobile Software version: AdGuard Family Updates included A lifetime subscription of AdGuard Family Plan normally costs $169.99, but this deal can be yours for just $15.97, that's a saving of $157.02. For full terms, specifications, and license info please click the link below. Get this AdGuard Family lifetime deal for just $15.97 (was $169.99) Although priced in U.S. dollars, this deal is available for digital purchase worldwide. As an online publication, Neowin too relies on ads for operating costs and, if you use an ad blocker, we'd appreciate being whitelisted. In addition, we have an ad-free subscription for $28 a year, which is another way to show support! Support queries If you have queries or need support for any of the Neowin Deals, please use the contact form here. Neowin Deals are managed and sold by StackCommerce who represent Neowin on an affiliate basis. Why we post these deals We post these because we earn commission on each sale so as not to rely solely on advertising, which many of our readers block. It all helps toward paying staff reporters, servers and hosting costs. So for those that keep moaning and complaining, be thankful we're still online for you to even do that. Other ways to support Neowin Whitelist Neowin by not blocking our ads Create a free member account to see fewer ads Make a donation to support our day to day running costs Subscribe to Neowin - for $14 a year, or $28 a year for an ad-free experience Disclosure: Neowin benefits from revenue of each sale made through our branded deals site powered by StackCommerce.
    • the MCT currently downloads 26200.8653, so not completely up to date.
    • Around 68% of developers are now using AI to generate code during development, and some experts are saying that a single developer using AI tools can now do the work of an entire team of 4 to 5 engineers.  According to Figma's State of the Designer 2026 report, 72% of designers now use generative AI in their workflows and 91% say it improves the quality of their work, not just their speed.  But does this mean web developers and designers are becoming less relevant, or are they simply evolving into a different kind of role? Would love to hear from developers and designers here has AI made your job easier, or do you feel threatened by how fast these tools are improving
  • Recent Achievements

    • Week One Done
      Timaximus earned a badge
      Week One Done
    • One Month Later
      Timaximus earned a badge
      One Month Later
    • Rookie
      FBSPL went up a rank
      Rookie
    • First Post
      davidbazooked earned a badge
      First Post
    • Week One Done
      davidbazooked earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      508
    2. 2
      PsYcHoKiLLa
      181
    3. 3
      +Edouard
      160
    4. 4
      Steven P.
      83
    5. 5
      ATLien_0
      75
  • Tell a friend

    Love Neowin? Tell a friend!