Request Options

allows you to set several options to define your request.

All of the available options already have reasonable defaults which aim to make the use of skrape{it} as easy and intuitive as possible.

Option

Description

Type

Default

url

The URL that is used to fetch and parse a web page. The protocol must be http or https

String

http://localhost:8080

method

HTTP defines a set of request methods to indicate the desired action to be performed for a given resource. Although they can also be nouns, these request methods are sometimes referred as HTTP verbs.

Method

GET

userAgent

The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.

String

Mozilla/5.0 skrape.it

headers

Request headers containing more information about the resource to be fetched or about the client itself.

Map<String, String>

no additional custom headers will be sent by default

cookies

Will add Cookies to your request

Map<String, String>

will send no Cookies by default

timeout

Sets the total request timeout duration. A timeout of zero (0) is treated as an infinite timeout.

Int

5000

followRedirects

Configures the connection to (not) follow server redirects.

Boolean

true

ignoreContentType

Ignore the document's Content-Type when parsing the response. If set to false, an unrecognized content-type will cause an IOException to be thrown. (This is to prevent producing garbage by attempting to parse a JPEG binary image, for example.)

Boolean

true

ignoreHttpErrors

Configures the connection to not throw exceptions when a HTTP error occurs. (4xx - 5xx, e.g. 404 or 500). An IOException is thrown if an error is encountered. If set to true the response is populated with the error body, and the status message will reflect the error.

Boolean

true

validateTLSCertificates

Disable/enable TLS certificates validation for HTTPS requests.

All connections over HTTPS perform normal validation of certificates, and will abort requests if the provided certificate does not validate.

Boolean

true

maxBodySize

Set the maximum bytes to read from the (uncompressed) connection into the body, before the connection is closed, and the input truncated.

Int

no maximum body size

mode

For server-side rendered Websites, XML-responses you should always use the default mode because it's more performant (good old HTTP request).

If you need to parse client side rendered Websites (e.g. build with React.js, Vue.js, Angular or jQuery) try the DOM mode.

Mode

SOURCE