This project was done for Bypolar Factory. Here I collected data from hindustantimes news portal. I went to its site and first picked all the URLs available on the screen and then went to each page and scraped necessary information like -- Headline,Day,Date,Time and the complete news in the body of the news.
The tool used for this is Scrapy as it can easily scrape,clean and format large volumes of data easily.
The final outputs can be seen in the news.csv , news1.csv and news2.csv file which stored the scraped information.