skrape{it}
1.0.X
1.0.X
  • Introduction
  • overview
    • Setup
    • Who should be using it
  • Http Client
    • Overview
    • HttpFetcher
    • BrowserFetcher
    • AsyncFetcher
    • Implement your own
    • Request Options
    • Pre-configure client
  • Html Parser
    • Untitled
  • assertions
    • Untitled
  • How to Use
    • Parsing HTML
    • Matchers
    • Testing
    • Scraping
    • JS-rendered sites
  • Examples
    • Grab all links from a Website
    • Creating a RESTful API (Spring-Boot)
  • GitHub Repo
  • Extensions
    • MockMvc
      • Getting Started
      • GitHub Repo
    • Ktor
      • Getting Started
      • GitHub Repo
  • About skrape{it}
Powered by GitBook
On this page

Was this helpful?

  1. Http Client

Overview

Why does skrape{it} provide its own http client implementations?

PreviousWho should be using itNextHttpFetcher

Last updated 4 years ago

Was this helpful?

Skrape{it} offers an unified, intuitive and DSL-controlled way to make parsing of websites as comfortable as possible.

A Http request is done as easy as in the given example. Just call the skrape function wherever you want in your code. It will force you to pass a and make further available in the clojure.

skrape(HttpFetcher) { // <-- pass any Fetcher, e.g. HttpFetcher, BrowserFetcher, ...
    // ... request options goes here, e.g the most basic would be url
    url = "https://docs.skrape.it"
    
    expect {}
    extract {}
}

The http-request is only executed after either the or function has been called. This behaviour also allows to for multiple calls. If you use expect as well as extract it will only make 1 request.

The Different Fetchers

Skrape{it} provides different types of Fetchers (aka Http-Clients) that can be passed to its DSL. All of them will execute http requests but each of them handles a different use-case.

You want to scrape a simple HTML page, easy, as fast as possible, but with deactivated Javascript?

You want to scrape a complex website, maybe a SPA app that has been written with frameworks like React.js, Angular or Vue.js or at least rely on javascript a lot?

You want to scrape multiple HTML pages in parallel from inside a coroutine?

BrowserFetcher
AsyncFetcher
Http-Client DSL
Pre-configure a client
handle client side rendered web pages
request option
extract
expect
preconfigure the http-client
HttpFetcher
fetcher