web_scraper_script

Simple script made in python 3.8.5 to extract all html tags and top 5 most used from a given url there is no external libs used.

Instructions for development:

1. Create virtualenv:

mkvirtualenv web_scraper --python=python3

2. Link project dir to postactivate:

setvirtualenvproject <project_dir>

3. Clone the repository:

git clone git@github.com:ChronoFrank/web_scraper_script.git 

or 

git clone https://github.com/ChronoFrank/web_scraper_script.git

4. run tests:

python tests.py

5. run script:

python url_scraper.py

the script will ask you to enter a valid url to scrape

enter url to scrape ->

after you enter the url, the script will perform a scrape on the web page and it'll retrieve the total elements (html tags) used on the page and also the top 5 most frequently used elements

enter url to scrape -> https://www.python.org/
url to analise -> https://www.python.org/
total elements in the webpage -> 31
top 5 most frequently used tags -> {'</a>': 209, '</li>': 163, '</span>': 72, '</div>': 49, '</ul>': 28}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
tests.py		tests.py
url_scraper.py		url_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

web_scraper_script

Instructions for development:

1. Create virtualenv:

2. Link project dir to postactivate:

3. Clone the repository:

4. run tests:

5. run script:

About

Uh oh!

Releases

Packages

Languages

ChronoFrank/web_scraper_script

Folders and files

Latest commit

History

Repository files navigation

web_scraper_script

Instructions for development:

1. Create virtualenv:

2. Link project dir to postactivate:

3. Clone the repository:

4. run tests:

5. run script:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages