skrape{it}
1.0.X
1.0.X
  • Introduction
  • overview
    • Setup
    • Who should be using it
  • Http Client
    • Overview
    • HttpFetcher
    • BrowserFetcher
    • AsyncFetcher
    • Implement your own
    • Request Options
    • Pre-configure client
  • Html Parser
    • Untitled
  • assertions
    • Untitled
  • How to Use
    • Parsing HTML
    • Matchers
    • Testing
    • Scraping
    • JS-rendered sites
  • Examples
    • Grab all links from a Website
    • Creating a RESTful API (Spring-Boot)
  • GitHub Repo
  • Extensions
    • MockMvc
      • Getting Started
      • GitHub Repo
    • Ktor
      • Getting Started
      • GitHub Repo
  • About skrape{it}
Powered by GitBook
On this page

Was this helpful?

  1. Http Client

Overview

Why does skrape{it} provide its own http client implementations?

PreviousWho should be using itNextHttpFetcher

Last updated 4 years ago

Was this helpful?

Skrape{it} offers an unified, intuitive and DSL-controlled way to make parsing of websites as comfortable as possible.

A Http request is done as easy as in the given example. Just call the skrape function wherever you want in your code. It will force you to pass a fetcher and make further request option available in the clojure.

skrape(HttpFetcher) { // <-- pass any Fetcher, e.g. HttpFetcher, BrowserFetcher, ...
    // ... request options goes here, e.g the most basic would be url
    url = "https://docs.skrape.it"
    
    expect {}
    extract {}
}

The http-request is only executed after either the extract or expect function has been called. This behaviour also allows to preconfigure the http-client for multiple calls. If you use expect as well as extract it will only make 1 request.

The Different Fetchers

Skrape{it} provides different types of Fetchers (aka Http-Clients) that can be passed to its DSL. All of them will execute http requests but each of them handles a different use-case.

You want to scrape a simple HTML page, easy, as fast as possible, but with deactivated Javascript?

You want to scrape a complex website, maybe a SPA app that has been written with frameworks like React.js, Angular or Vue.js or at least rely on javascript a lot?

You want to scrape multiple HTML pages in parallel from inside a coroutine?

HttpFetcher
BrowserFetcher
AsyncFetcher