Data analysis and visualization of the Youtube content of a specific search query
This project analyzes YouTube video data to gain insights into trends, performance, and engagement metrics. It involves data collection, preprocessing and visualization to understand the factors influencing video popularity.
- Data Collection: Extracts video metadata (title, views, likes, comments, duration, etc.) using YouTube API or web scraping.
- Data Preprocessing: Cleans and formats the dataset by handling missing values and converting data types.
- Exploratory Data Analysis (EDA): Generates visualizations to analyze trends in video performance.
- Statistical Insights: Identifies correlations between video attributes and engagement metrics.
- Programming Language: Python
- Libraries: Pandas, Matplotlib, Requests, Time, Nltk
- Data Sources: Web Scraping (BeautifulSoup, Selenium)
- Version Control: Git, GitHub
YouTube-Data-Analysis/
│── Youtube_Videos.xlsx # Raw dataset
│── Youtube_Video.ipynb # Jupyter notebooks for analysis
│── README.md # Project documentation
│── requirements.txt # Dependencies
-
Clone the Repository
git clone https://github.com/tharikashree/Youtube-Data-Analysis.git cd Youtube-Data-Analysis -
Create a Virtual Environment (Optional but recommended)
python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
Install Dependencies
pip install -r requirements.txt
-
Run the Analysis
- Open Jupyter Notebook and explore
Youtube_Video.ipynb - Run Python scripts for analysis
- Open Jupyter Notebook and explore
- Views vs. Duration scatter plots
- Most popular categories
- Trends in video engagement
- Automate data collection using APIs
- Build a recommendation model
- Deploy a web dashboard for real-time analysis
Feel free to fork the repository and submit pull requests for improvements!
This project is licensed under the MIT License.