- In the extension bar, click the AdBlock Plus icon
- Click the large blue toggle for this website
- Click refresh
- In the extension bar, click the AdBlock icon
- Under "Pause on this site" click "Always"
- In the extension bar, click on the Adguard icon
- Click on the large green toggle for this website
- In the extension bar, click on the Ad Remover icon
- Click "Disable on This Website"
- In the extension bar, click on the orange lion icon
- Click the toggle on the top right, shifting from "Up" to "Down"
- In the extension bar, click on the Ghostery icon
- Click the "Anti-Tracking" shield so it says "Off"
- Click the "Ad-Blocking" stop sign so it says "Off"
- Refresh the page
- In the extension bar, click on the uBlock Origin icon
- Click on the big, blue power button
- Refresh the page
- In the extension bar, click on the uBlock icon
- Click on the big, blue power button
- Refresh the page
- In the extension bar, click on the UltraBlock icon
- Check the "Disable UltraBlock" checkbox
- Please disable your Ad Blocker
- Disable any DNS blocking tools such as AdGuardDNS or NextDNS
- Disable any privacy or tracking protection extensions such as Firefox Enhanced Tracking Protection or DuckDuckGo Privacy.
If the prompt is still appearing, please disable any tools or services you are using that block internet ads (e.g. DNS Servers, tracking protection or privacy extensions).
Question
tarifa
hello dear all,
i am fairly new to bs4for that matter, but im trying to scrape a little chunk of information from a site:
but it keeps printing "None" as if the title, or any tag if i replace it, doesn't exists.
The project consits of two parts:
the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below-
from bs4 import BeautifulSoup import requests #array of URLs to loop through, will be larger once I get the loop working correctly plugins = ['https://wordpress.org/plugins/wp-job-manager', 'https://wordpress.org/plugins/ninja-forms']
this can be done like so
ttt = page_soup.find("divclass":"plugin-meta"}) text_nodes = [node.text.strip() for node in ttt.ul.findChildren('li-1:2]]
the Output of text_nodes:
['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 ']
but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a intereting task
- first of all we need to fetch the urls
- then we fetch the iformation and have to sort out the _newest_
Link to comment
https://www.neowin.net/forum/topic/1394106-asyncio-web-scraping-fetching-multiple-urls-with-aiohttp-doable/Share on other sites
0 answers to this question
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now