Parsing HTML
How to Parse HTML aka creating the Doc
-object
Doc
-objectSkrape{it} will create a Doc
-object that is representing the parsed HTML Document whenever the htmlDocument{} function has been called.
It can either parse HTML from String, a File or from an HTTP-requests response body (from web).
Picking Html-Elements from a Doc
Now that we have a Doc
-object this is one of the parts where skrape{it} probably shines the most. It provides the possibility to pick Html-elements in a DSL-ish way. To archive this behaviour skrape{it} provides extension functions on Doc that are representing all available HTML tags following the HTML5 standart.
Picking Custom HTML tags
Since the HTML5 standart it is possible create Custom Elements that define new types of HTML elements. Such elements as well as all other elements can be selected by the use of string inkocation that will again create an css-selector and be used to pick matching elements.
Building CSS selectors
As shown above Skrape{it} provides the possibility to pick elements by either use its corresponding DSL function or via String invokation. Both will create a CssSelector
-scope that allows us to build complex css-selectors in an idiomatic fashion.
Let's imagine we want to create a selector that is matching the following complex html element:
<button class="foo bar" fizz="buzz" disabled>click me</button>
We could either archive this by using a css query selector:
or we could do it even more readable and less error prone:
Last updated