Is a headless browser the answer?

What exactly is a headless browser?

It's just like any other web browser -- Firefox, Chrome, Safari, IE -- except there is no user interface. Nothing is shown on the screen and there is nothing to click on. Instead, you write scripts that interact with the browser. The particular script you need for javascript SEO is pretty basic. For every web page on your site, you want to open the page, wait for the javascript to run, then capture the entire DOM and save it to a file.

The most commonly used headless browser is PhantomJS, but HTMLUnit is also an option for Java websites.

How does this help you?

Now you have an html file. Unlike this html file you serve to users, this html has all the content already in place. This is exactly what Google needs. When requests come to your web server from Google-bot, you can send Google this file instead of the one you send to your users. Your headless browser already executed all the javascript and filled in the content so Google can read it.

Some Gotchas

  • This doesn't sound too hard, but it's only fair to warn you of a few troubles you might run into.
  • The internet doesn't work all the time. You'll need to handle and recover from these errors.
  • Just like in your browser, web pages will take a few seconds to load in the headless browser.
  • The slowness means you probably don't want to wait until the request comes from Google to render the page. Response time is a factor in their rankings. You'll want to render the pages in advance.
  • It also means that if you have a large site you will need to run several instances of your headless browser in parallel.
    Headless browsers tend to be resource intensive. If you have to run multiple instances, it will probably require a few dedicated servers.
  • If you cache the results, you will need to figure out when pages change to re-render them.
  • You'll need to find a way to determine when your page is finished loading data and executing javascript so it is ready to be saved.