• 0

vscode: first script-test after a quick installation - some minor issues


Question

tarifa

hello dear fellows here at neowin, 

 

vscode: first script-test after a qucik installation - some minor issues

on a MX-Linux version 19.1 i have installed VSCodium 1.43.2

Version: 1.43.2

Commit: 0ba0ca52957102ca3527cf479571617f0de6ed50

Date: 2020-03-24T21:03:16.125Z

Electron: 7.1.11

Chrome: 78.0.3904.130

Node.js: 12.8.1

V8: 7.8.279.23-electron.0

OS: Linux x64 4.19.0-6-amd64

 

i have Python installed - unfortunatly not with venv - but globally .

to test the whole system i just run a little testscript.

btw - to setup with venv i will take care later the weekend. Now at the moment i only will test the system.

 

import requests
from bs4 import BeautifulSoup
import pandas as pd
page1 = requests.get('https://en.wikipedia.org/wiki/Peths_in_Pune').text
soup1 = BeautifulSoup(page1, 'lxml')
table = soup1.find('table',{'class':'wikitable sortable'})
#table
table1=""
for tr in table.find_all('tr'):
    row1=""
    for tds in tr.find_all('td'):
        row1=row1+","+tds.text
    table1=table1+row1[1:]
row1

 

see the output

 

 

 

^

SyntaxError: unexpected EOF while parsing

martin@mx:~

$ /usr/bin/python3 /home/martin/dev/python/test.py

File "/home/martin/dev/python/test.py", line 2

^

SyntaxError: unexpected EOF while parsing

martin@mx:~

$ /usr/bin/python3 /home/martin/dev/python/test.py

Traceback (most recent call last):

File "/home/martin/dev/python/test.py", line 5, in <module>

soup1 = BeautifulSoup(page1, 'lxml')

File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 196, in __init__

% ",".join(features))

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

martin@mx:~

$ /usr/bin/python3 /home/martin/dev/python/test.py

Traceback (most recent call last):

File "/home/martin/dev/python/test.py", line 5, in <module>

soup1 = BeautifulSoup(page1, 'lxml')

File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 196, in __init__

% ",".join(features))

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

martin@mx:~

$ python3 -V

Python 3.7.3

martin@mx:~

$ python3 -V

Python 3.7.3

martin@mx:~

 

 

...any idea - and thoguts regarding this error

 

Edited by tarifa
Link to post
Share on other sites

3 answers to this question

Recommended Posts

  • 0
Christopher Andreason

your linux missing also relevant libraries for both bs4 lxml

pip can also install missing

missning things inside python or python3


pip install lxml
pip install beautifulsoup4


pip3 install lxml
pip3 install beautifulsoup4


an other hint about all of this is too use

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())

 

vscode use some old packages?

 

i did take an look at vscode for Windows but just for see what is inbuilded inside vscode i did see 300 several different projects / programs from github

 

this also causes bugs to occur if you have so many different sources that you need to update for the entire vscode

 

you probably could have solved the problems by building vscode from the source code but other problems are surely occurring in the meantime

Edited by Christopher Andreason
add some extra info
Link to post
Share on other sites
  • 0
tarifa
Posted (edited)
11 hours ago, Christopher Andreason said:

 

hello dear Christopher, 

 

first of all - many many thanks for the quick reply - this thread is the result of a headstart into Python with VSCode - since ATOM does not seem to be able to do all what i want to do.

So i try to setup VScode to work with Python. 

 

Quote

your linux missing also relevant libraries for both bs4 lxml

pip can also install missing

missning things inside python or python3


pip install lxml
pip install beautifulsoup4


pip3 install lxml
pip3 install beautifulsoup4


an other hint about all of this is too use

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.prettify())

 

vscode use some old packages?

i did take an look at vscode for Windows but just for see what is inbuilded inside vscode i did see 300 several different projects / programs from github

this also causes bugs to occur if you have so many different sources that you need to update for the entire vscode

you probably could have solved the problems by building vscode from the source code but other problems are surely occurring in the meantime

 

Yes i guess  that you are right - 100%  - i am missing some packages  - and sure thing - VScode does not work with venv - that said - think i have to do a correct and good setup.

 

Well - Christopher i have read some and have seen a whole bunch of tutorials on the setup virtual environment for Python

 

- in VS Code - best practices for using Virtualenv 
- and besides for ATOM ...  - see above. 

i am very very glad that you have posted and give me some important hints: 

 

 

Quote


it semms like the package which i installed and the project's environment i am currently using is different.
the package which i have  installed was in the global environment: " c:\program files\python37\lib\site-packages ", and well it seems like i looks like using the environment different(maybe the environment related to the project): This said i try ton identify this by look the right left coner shows which environment i am currently using now.

 

So i guess that i have to using virtualenv to isolate my projects and then i store pip dependencies in requirements.txt. 
As we develop, we install, remove and upgrade packages, the list of dependencies in your project differs from requirements.txt

but how can i do it right - how to setup venv the right way - 
a. on win 10 and 
b. on linux ? 

i have setup a python-development-environment on a Windows 10 machine and on a MX-Linux-machine. 
i guess that i set up the machine /ATOM badly - any and all help greatly appreciated. 

 

 

To do some steps into the direction of a decent and correct installation of VSCode and the correct interaction with Python i have written down some ideas ..: 


note: the setup of Python with global mode is weird so weird. - At leaset it seems to be so - if we have a look at this thread.  

 regarding the set up and usage of virtual environment in VSCode 

I recently have read an article on using Virtual Environments for Python projects.
https://towardsdatascience.com/python-virtual-environments-made-easy-fe0c603fe601
 
and this one Comparing Python Virtual Environment tools
https://towardsdatascience.com/comparing-python-virtual-environment-tools-9a6543643a44
 
guess, that have to take care how i setup python on my linux-machine. 

Comparing installed pip packages with requirements.txt :: So if we are using virtualenv to isolate our projects and then subsequently we store pip dependencies in requirements.txt.  As we develop, we install, remove and upgrade packages, the list of dependencies in your project differs from the so called requirements.txt

Currently Installed Packages: To list what are the packages that are actually installed,  we can run

$ pip freeze

 

Compare Differences: A simple comparison of requirements.txt and pip freeze will fail because the packages are in different order. 
sort both of the output then compare them with pip

to sume up: some of the best Practices are the following: 

- Always make sure requirements.txt reflects the actual dependencies
- Pin package dependencies - use the exact version
- Preferably track only the top level dependencies in your requirements.txt
- Update dependencies periodically  - year but how
- Consider using pip-compile and pip-sync to manage your dependencies. Use pur to automatically update your top level dependencies in requirements.txt

 
I am starting to work on VS-Code using venv: In my project folder I guess that i have to create venv folder.

python -m venv venv /path/to/new/virtual/environment

but when i run in VS Code the command select python interpreter my venv folder is not shown. 

to make sure that i do all okay i try the following steps
to make my virtual interpreter in VS Code visible? i 

 

1. just go to File > preferences > Settings  - afterwards i

2. click on Workspace settings.

3. Under Files: Association, we will find Edit in settings.json , Well i click on that.
4. Update "python.pythonPath": "my_venv_path/bin/python" under workspace settings. 

(For Windows): Update "python.pythonPath": "my_venv_path/Scripts/python.exe" under workspace settings.  And subsequently

5. Restart VSCode incase if it still doesn't show the venv.


another option to show virtual environments in vs code: 

go to the parent folder in which venv is there through command prompt.
Type code . and Enter. [Working on both windows and linux for me.]

That should also show the virtual environments present in that folder.

In one workspace folder named Python need to adde all my other projects. 

to spell it out clearly: 

- I would have to have only one venv for the whole workspace folder Python. 
- i add each subfolder in Python folder as a workspace project like Project1, Project2, Project3, Project4, Project5, Project6  etc. 

In that Project folder I created venv environment and edited settings.json for workspace with this "python.venvPath": "venv" .

 

Now, for every new project I will create new workspace and inside that folder goes venv folder which will be automatically recognized.

 

+------------------------+
|                        |
|                        |
|     python-workspace   |
|     ....-folder        |
|                        |
+----------+-------------+
           |
           |
           |              +----------------------+
           |              |                      |
           +--------------+     Project1         |
           |              |                      |
           |              +----------------------+
           |
           |              +----------------------+
           |              |                      |
           +--------------+     Project2         |
           |              |                      |
           |              +----------------------+
           |
           |              +----------------------+
           |              |                      |
           +--------------+     Project3         |
           |              |                      |
           |              +----------------------+
           |
           |              +----------------------+
           |              |                      |
           +--------------+     Project4         |
           |              |                      |
           |              +----------------------+
           |
           |              +----------------------+
           |              |                      |
           +--------------+    Project5          |
           |              |                      |
           |              +----------------------+
           |
           |              +----------------------+
           |              |                      |
           +--------------+   Project6           |
                          |                      |
                          +----------------------+

 

Christropher, how do you like this idea!?

 

additional: some question regarding the practical use of VScode for running and testing scripts. 


I want to be able to export the data I have scraped as a CSV file. My question is how do I write the piece of code which outputs the data to a CSV?

 

Christopher: the question is: can we run in VSCode the code below - and have a closer look at the output? does VSCode execute the code - and store the file somewhere on the machine!?

 

 

import csv ; import requests
from bs4 import BeautifulSoup 

outfile = open('career.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for link in links:
     if "career" in link.get("href") and 'COPENHAGEN' in link.text:
        item_link = link.get("href").strip()
        item_text = link.text.replace("View Position","").strip()
        writer.writerow([item_link, item_text])
        print(item_link, item_text)
outfile.close()

 

 

We now should be able to run this in VScode ( and yes: i do not think that we need a fully fledged IDE as PyCharm)  - i guess that we  can now open the py file and run it nicely with the shortcut Ctrl+Shift+B (Windows) or Cmd+Shift+B (Apple)


i have done a search on the net - there are quite some extensions for running python:

 

 

Official python extension: This is a must install.

Increadibly useful for all sorts of languages, not just python. Would highly reccomend installing.


AREPL: Real-time python scratchpad that displays your variables in a side window.  I'm the creator of this so obviously I think it's great but I can't give a unbiased opinion

Wolf: Real-time python scratchpad that displays results inline

 

 

And -if we use the integrated terminal we can run python in there and not have to install any extensions.

 

  [1]: https://marketplace.visualstudio.com/items?itemName=ms-python.python
  [2]: https://marketplace.visualstudio.com/items?itemName=formulahendry.code-runner
  [3]: https://marketplace.visualstudio.com/items?itemName=almenon.arepl
  [4]: https://marketplace.visualstudio.com/items?itemName=traBpUkciP.wolf
  [5]: https://marketplace.visualstudio.com/items?itemName=donjayamanne.jupyter
 

 

dear Christopher - many many thanks for your reply  - and for the idea sharing. 

 

i am very glad to be here - and to be able to share ideas in this thread.

 

have a great day

 

regards 

tarifa

Edited by tarifa
Link to post
Share on other sites
  • 0
tarifa

 

hello dear Christopher 

 

i reworked the things in the VSCode - runned the following code ... :  (note: with all the necessary plugins and extensions loaded in python) and got back the following 

in the terminal - see below... 


import requests
from bs4 import BeautifulSoup
import re
import csv
from tqdm import tqdm


first = "https://europa.eu/youth/volunteering/organisations_en?page={}"
second = "https://europa.eu/youth/volunteering/organisation/{}_en"


def catch(url):
    with requests.Session() as req:
        pages = []
        print("Loading All IDS\n")
        for item in tqdm(range(0, 347)):
            r = req.get(url.format(item))
            soup = BeautifulSoup(r.content, 'html.parser')
            numbers = [item.get("href").split("/")[-1].split("_")[0] for item in soup.findAll(
                "a", href=re.compile("^/youth/volunteering/organisation/"), class_="btn btn-default")]
            pages.append(numbers)
        return numbers


def parse(url):
    links = catch(first)
    with requests.Session() as req:
        with open("Data.csv", 'w', newline="", encoding="UTF-8") as f:
            writer = csv.writer(f)
            writer.writerow(["Name", "Address", "Site", "Phone",
                             "Description", "Scope", "Rec", "Send", "PIC", "OID", "Topic"])
            print("\nParsing Now... \n")
            for link in tqdm(links):
                r = req.get(url.format(link))
                soup = BeautifulSoup(r.content, 'html.parser')
                task = soup.find("section", class_="col-sm-12").contents
                name = task[1].text
                add = task[3].find(
                    "i", class_="fa fa-location-arrow fa-lg").parent.text.strip()
                try:
                    site = task[3].find("a", class_="link-default").get("href")
                except:
                    site = "N/A"
                try:
                    phone = task[3].find(
                        "i", class_="fa fa-phone").next_element.strip()
                except:
                    phone = "N/A"
                desc = task[3].find(
                    "h3", class_="eyp-project-heading underline").find_next("p").text
                scope = task[3].findAll("span", class_="pull-right")[1].text
                rec = task[3].select("tbody td")[1].text
                send = task[3].select("tbody td")[-1].text
                pic = task[3].select(
                    "span.vertical-space")[0].text.split(" ")[1]
                oid = task[3].select(
                    "span.vertical-space")[-1].text.split(" ")[1]
                topic = [item.next_element.strip() for item in task[3].select(
                    "i.fa.fa-check.fa-lg")]
                writer.writerow([name, add, site, phone, desc,
                                 scope, rec, send, pic, oid, "".join(topic)])


parse(second)

 

 

see the output in the terminal - the question is: where are the results are stored!?

 

image.thumb.png.605a091d42e4cc0e2996ed6ec8f7017c.png

 

 

Should i do some more settings in VSCode !?  Do i need more plugins?

 

look forward to hear from you 

 

regards 

tarifa

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.