• 0

some errors in a simple Python parser-script - that makes use of BS4





hello dear all


just to do some approach to python i have tried to get some data from yahoo .- with the following script ..


note the while look should fit the aim that the data were fetched constantly - guess that this part is working - but i get some errors.



import bs4
import requests
from bs4 import BeautifulSoup

def parsePrice():
price=soup.find_all('div',{'class':'My(6px) Post(r) smartphone_Mt(6px)'})[0].find('span').text
return price

while True:
print('the current price_ '+str (parsePrice()))




but there is missing some thing . getting errors all the way - in my ATOM - Editor 



Traceback (most recent call last):
File "C:\Users\Kasper\Documents\_f_s_j\_mk_\_dev_\bs\yahoo_finance.py", line 14, in <module>
print('the current price_ '+str (parsePrice()))
File "C:\Users\Kasper\Documents\_f_s_j\_mk_\_dev_\bs\yahoo_finance.py", line 10, in parsePrice
price=soup.find_all('div',{'class':'My(6px) Post(r) smartphone_Mt(6px)'})[0].find('span').text
IndexError: list index out of range



any idea what goes wrong here 

Link to post
Share on other sites

1 answer to this question

Recommended Posts

  • 0

The problem is that the requested URL uses a bunch of javascript to render the page in html when the page is requested, so you need to set the bs4 parser to 'html.parser' instead of xml. But! This is a webpage rendered for display, not parsing. The div classes are very convoluted and may differ based on your request headers (which you never set). Because of the html, I find it easier to just parse it using string methods rather than bs4. After that, you can just convert it to a json dictionary and return whatever you want. Here's a quick example that prints the price quote every minute with a timestamp:


import requests
import json
from datetime import datetime
import time

def priceParse(stock):
    url = 'https://finance.yahoo.com/quote/' + stock
    html = requests.get(url).text
    json_str = html.split('root.App.main =')[1].split('(this)')[0].split(';\n}')[0].strip()
    data = json.loads(json_str)['context']['dispatcher']['stores']['QuoteSummaryStore']
    latest_price = data['price']['regularMarketPrice']['fmt']
    market_state = data['price']['marketState']
    timestamp = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
    printed_str = stock + ' — $' + latest_price + ' (market state: ' + market_state + ') — ' + timestamp

while True:


Link to post
Share on other sites
This topic is now closed to further replies.
  • Recently Browsing   0 members

    No registered users viewing this page.