Web Scrape

A prototype of a library that aims to help users parse html pages easily using annotations with the help of html unit.

This is just a prototype and its probably full of bugs. Its not tested at all, just a concept to try out some stuff and see if it could work.

How to use

Create a class annotated with the @UrlScraper annotation and let the library inject the requested elements. There are three main type of injection:

@Auto injects user defined classes that are annotated with the @Scraper annotation.
@Element injects HtmlUnit elements like HtmlBody.
@TextContent injects String that represent the textContent of a dom node. Every annotation can manage a List of elements if the type of the class parameter is a List.

@UrlScraper(url = "http://example.com/")
public class PageScraper {

	@Element(xpath = "/html/body/")
	private HtmlBody pageBody;

	@PostConstructor
	public void postConstructor() {
		// Called after all fields get injected
	}
  
	public static void main(String[] args) {
		WebScrape<PageScraper> webScraper = WebScrape.run(PageScraper.class);

    	// Instance with injected properties
		PageScraper scraper = webScraper.getResult();
	}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scrape

How to use

About

Uh oh!

Releases

Packages

Languages

lucaato/web-scrape

Folders and files

Latest commit

History

Repository files navigation

Web Scrape

How to use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages