What exactly is a headless browser?
The most commonly used headless browser is PhantomJS, but HTMLUnit is also an option for Java websites.
How does this help you?
- This doesn't sound too hard, but it's only fair to warn you of a few troubles you might run into.
- The internet doesn't work all the time. You'll need to handle and recover from these errors.
- Just like in your browser, web pages will take a few seconds to load in the headless browser.
- The slowness means you probably don't want to wait until the request comes from Google to render the page. Response time is a factor in their rankings. You'll want to render the pages in advance.
- It also means that if you have a large site you will need to run several instances of your headless browser in parallel.
Headless browsers tend to be resource intensive. If you have to run multiple instances, it will probably require a few dedicated servers.
- If you cache the results, you will need to figure out when pages change to re-render them.