Overview

Why does skrape{it} provide its own http client implementations?

Skrape{it} offers an unified, intuitive and DSL-controlled way to make parsing of websites as comfortable as possible.

Http-Client DSL without verbosity and ceremony to make requests and corresponding request options like headers, cookies etc. in a fluent style interface.
Pre-configure a client once to either reuse it or adjust only the things that differ at certain requests - especially handy while working with authentication flows or custom headers.
Can handle client side rendered web pages (e.g. pages created with frameworks like React.js, Angular or Vue.js or pages manipulated with jQuery or other javascript)

A Http request is done as easy as in the given example. Just call the skrape function wherever you want in your code. It will force you to pass a fetcher and make further request option available in the clojure.

skrape(HttpFetcher) { // <-- pass any Fetcher, e.g. HttpFetcher, BrowserFetcher, ...
    // ... request options goes here, e.g the most basic would be url
    url = "https://docs.skrape.it"
    
    expect {}
    extract {}
}

The http-request is only executed after either the extract or expect function has been called. This behaviour also allows to preconfigure the http-client for multiple calls. If you use expect as well as extract it will only make 1 request.

The Different Fetchers

Skrape{it} provides different types of Fetchers (aka Http-Clients) that can be passed to its DSL. All of them will execute http requests but each of them handles a different use-case.

You want to scrape a simple HTML page, easy, as fast as possible, but with deactivated Javascript?

HttpFetcher

You want to scrape a complex website, maybe a SPA app that has been written with frameworks like React.js, Angular or Vue.js or at least rely on javascript a lot?

BrowserFetcher

You want to scrape multiple HTML pages in parallel from inside a coroutine?

AsyncFetcher

PreviousWho should be using it NextHttpFetcher

Last updated 4 years ago

Was this helpful?