This is a project for Udacity's Full Stack Web Developer Nanodegree
- To enhance student's SQL database skills.
- To get practice interacting with a live database both from the command line and from your code.
- To explore a large database with over a million rows.
- To build and refine complex queries and use them to draw business conclusions from data.
- What are the most popular three articles of all time? Which articles have been accessed the most? Present this information as a sorted list with the most popular article at the top.
- Who are the most popular article authors of all time? That is, when you sum up all of the articles each author has written, which authors get the most page views? Present this as a sorted list with the most popular author at the top.
- On which days did more than 1% of requests lead to errors?
The log table includes a column status that indicates the HTTP status code that the news site sent to the user's browser.
- PostgreSQL database
- Python 3.7.3
- psycopg2
This project runs in a virutal machine using Vagrant so to get things done, follow the below steps.
- Install Vagrant
- Install VirtualBox
- Download the vagrant setup files from Udacity's Github These files will configure the virtual machine and install all the tools needed to run this program.
- Download the database file: sql data
- Unzip the data folder to get the newsdata.sql file.
- Move the newsdata.sql file into the vagrant directory
- Download the project: log analysis project
- Upzip it and copy all the files into the vagrant directory into a folder named log_analysis_project
- Open Terminal and navigate to the project folders we setup above.
- cd into the vagrant directory
- Run
vagrant upto build the VM for the first time. - Once it is built, run
vagrant sshto connect. - cd into the correct project directory:
cd /vagrant/log_analysis_project
- Import the data using the following command:
psql -d news -f newsdata.sql - Running this command will connect to your installed database server and execute the SQL commands in the downloaded file, creating tables and populating them with data.
- If you aren't in log_analysis_project directory, cd into the correct project directory:
cd /vagrant/log_analysis_project - Run
python log_analysis.py
[=========PROCESSING OUTPUT===========]
MOST POPULAR THREE ARTICLES OF ALL TIME:
[1] "Candidate is jerk, alleges rival" — 338647 views
[2] "Bears love berries, alleges bear" — 253801 views
[3] "Bad things gone, say good people" — 170098 views
MOST POPULAR ARTICLE AUTHORS OF ALL TIME:
[1] Ursula La Multa — 507594 views
[2] Rudolf von Treppenwitz — 423457 views
[3] Anonymous Contributor — 170098 views
[4] Markoff Chaney — 84557 views
DAYS WITH MORE THAN 1% OF ERRORS:
July 17, 2016 — 2.2% errors