JS-rendered sites
This section describes how to scrape data from client-side rendered DOM elements.
Most modern webpages are dynamically adding elements or data to the DOM by the use of Javascript. Some even are single-page-applications (SPA) build with React.js, Vue.js, Angular, etc - that in turn means nearly the whole DOM is rendered by Javascript that has to be interpreted by a Javascript engine (usually your Browser).
Skrape{it}'s default mode is making simple HTTP requests - and thereby it's not possible to scrape JS driven websites in default mode.
Example
Let's assume a pretty basic scenario. We want to make a request to a website that is rendering data via Javascript. For instance it's markup could look like this - that is adding an extra div element including some text.
<!DOCTYPE html>
<html lang="en">
<head>
<title>i'm the title</title>
</head>
<body>
i'm the body
<h1>i'm the headline</h1>
<p>i'm a paragraph</p>
<p>i'm a second paragraph</p>
</body>
<script>
var dynamicallyAddedElement = document.createElement("div");
dynamicallyAddedElement.className = "dynamic";
var textNode = document.createTextNode("I have been dynamically added via Javascript");
dynamicallyAddedElement.appendChild(textNode);
document.body.appendChild(dynamicallyAddedElement);
</script>
</html>
fun main() {
val scrapedData = skrape {
url = "http://some.url"
mode = Mode.DOM // <--- here's the magic
extract {
div {
withClass = "dynamic"
findFirst { text }
}
}
}
println(scrapedData)
}
> I have been dynamically added via Javascript
Last updated
Was this helpful?