Overview

Why does skrape{it} provide its own http client implementations?

Skrape{it} offers an unified, intuitive and DSL-controlled way to make parsing of websites as comfortable as possible.

A Http request is done as easy as in the given example. Just call the skrape function wherever you want in your code. It will force you to pass a fetcher and make further request option available in the clojure.

skrape(HttpFetcher) { // <-- pass any Fetcher, e.g. HttpFetcher, BrowserFetcher, ...
    // ... request options goes here, e.g the most basic would be url
    url = "https://docs.skrape.it"
    
    expect {}
    extract {}
}

The http-request is only executed after either the extract or expect function has been called. This behaviour also allows to preconfigure the http-client for multiple calls. If you use expect as well as extract it will only make 1 request.

The Different Fetchers

Skrape{it} provides different types of Fetchers (aka Http-Clients) that can be passed to its DSL. All of them will execute http requests but each of them handles a different use-case.

You want to scrape a simple HTML page, easy, as fast as possible, but with deactivated Javascript?

You want to scrape a complex website, maybe a SPA app that has been written with frameworks like React.js, Angular or Vue.js or at least rely on javascript a lot?

You want to scrape multiple HTML pages in parallel from inside a coroutine?

Last updated