• 0

some errors in a simple Python parser-script - that makes use of BS4


Question

 

 

hello dear all

 

just to do some approach to python i have tried to get some data from yahoo .- with the following script ..

 

note the while look should fit the aim that the data were fetched constantly - guess that this part is working - but i get some errors.

 

 

import bs4
import requests
from bs4 import BeautifulSoup



def parsePrice():
r=requests.get('http://finance.yahoo.com/quote/FB?p=FB')
soup=bs4.BeautifulSoup(r.text,"xml")
price=soup.find_all('div',{'class':'My(6px) Post(r) smartphone_Mt(6px)'})[0].find('span').text
return price



while True:
print('the current price_ '+str (parsePrice()))




 

 

 

but there is missing some thing . getting errors all the way - in my ATOM - Editor 

 

 

Traceback (most recent call last):
File "C:\Users\Kasper\Documents\_f_s_j\_mk_\_dev_\bs\yahoo_finance.py", line 14, in <module>
print('the current price_ '+str (parsePrice()))
File "C:\Users\Kasper\Documents\_f_s_j\_mk_\_dev_\bs\yahoo_finance.py", line 10, in parsePrice
price=soup.find_all('div',{'class':'My(6px) Post(r) smartphone_Mt(6px)'})[0].find('span').text
IndexError: list index out of range

 

 

any idea what goes wrong here 

Link to comment
Share on other sites

1 answer to this question

Recommended Posts

  • 0

The problem is that the requested URL uses a bunch of javascript to render the page in html when the page is requested, so you need to set the bs4 parser to 'html.parser' instead of xml. But! This is a webpage rendered for display, not parsing. The div classes are very convoluted and may differ based on your request headers (which you never set). Because of the html, I find it easier to just parse it using string methods rather than bs4. After that, you can just convert it to a json dictionary and return whatever you want. Here's a quick example that prints the price quote every minute with a timestamp:

 

import requests
import json
from datetime import datetime
import time


def priceParse(stock):
    url = 'https://finance.yahoo.com/quote/' + stock
    html = requests.get(url).text
    json_str = html.split('root.App.main =')[1].split('(this)')[0].split(';\n}')[0].strip()
    data = json.loads(json_str)['context']['dispatcher']['stores']['QuoteSummaryStore']
    latest_price = data['price']['regularMarketPrice']['fmt']
    market_state = data['price']['marketState']
    timestamp = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
    printed_str = stock + ' — $' + latest_price + ' (market state: ' + market_state + ') — ' + timestamp
    print(printed_str)
    return


while True:
    priceParse('FB')
    time.sleep(60)

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.