C++ BBC News Scraper

This beginner project scrapes headlines from the BBC News homepage using libcurl and std::regex.

What it does

Sends an HTTP request to https://bbc.com/news
Saves the websites structure as a txt file
Extracts headline titles using regular expressions
Filters out short or irrelevant titles (e.g. "News", "Sport")
Prints valid headlines to the terminal

Technologies Used

C++
libcurl
Regular Expressions

Why I built this

I used this to get hands-on practice with web scraping.

Problems with the code

Hardly applicable to other websites as many websites block the use of web scrapers
infinite loop bug after running
using regex makes it harder to expand upon the code. You should probably switch to a proper HTML parser

Expanding the code

The code can be expanded by adding more regex patterns in the main function- to do this just inspect the websites structure (txt file) and search for keywords like lastUpdated. To find the position of those keywords, search for a news title

Forking

Feel free to fork, i am here to learn!

How to run

Make sure libcurl is installed.
Compile the code:

g++ -std=c++11 main.cpp -o scraper -lcurl

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
LICENSE		LICENSE
README.md		README.md
demo.jpeg		demo.jpeg
main.cpp		main.cpp
makefile		makefile
scraper		scraper
website_structure.txt		website_structure.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C++ BBC News Scraper

What it does

Technologies Used

Why I built this

Problems with the code

Expanding the code

Forking

How to run

About

Uh oh!

Releases

Packages

Languages

License

Lukas22092/cpp-bbc-news-scraper

Folders and files

Latest commit

History

Repository files navigation

C++ BBC News Scraper

What it does

Technologies Used

Why I built this

Problems with the code

Expanding the code

Forking

How to run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages