Webscraper using puppeteer

This is a project to scrape the paginated table of the website of registered marriage celebrants in Australia filtered by those in NSW. There were issues with scraping the table and having an appropriate exit condition to successfully stop the loop, so I just hard coded a fixed number of loops to scrape the table and paginate to the next table page.

The website it scrapes is 'https://marriage.ag.gov.au/statecelebrants/state'. However, the HTML structure of the table isn't so straightforward to scrape as it has a series of table row headers inside the table and different columns that have no data.

The resulting scrape outputs 7000+ rows of data. :)

Installation

Clone repo

git clone git@github.com:daveanthonyc/Webscraper-Test.git

Install dependencies

npm install

Run the script

node scrape.js

You should expect a browser instance to run, move to the 'NSW' filter, then it proceeds to paginate to the end of the results. You should be able to find an output.xlsx file in your project directory after running the script.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
node_modules		node_modules
README.md		README.md
output.xlsx		output.xlsx
package-lock.json		package-lock.json
package.json		package.json
scrape.js		scrape.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Webscraper using puppeteer

Installation

Run the script

About

Uh oh!

Releases

Packages

Uh oh!

Languages

daveanthonyc/Webscraper-Test

Folders and files

Latest commit

History

Repository files navigation

Webscraper using puppeteer

Installation

Run the script

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages