Adding site search to Netlify sites using Elasticsearch

A guide to implementing static site search using 11ty, Netlify and Elasticsearch

Published on 07 May 2024

Static site generators (SSGs) such as 11ty allow us to build super-fast websites with great developer experience. Hosting platforms like Netlify allow us to rapidly iterate and release directly from GitHub.

SSGs are perfect for (mostly) static sites such as blogs (like this one!), corporate sites, documentation sites etc. Search, however, is not static. You can build static-ish site search using solutions such as Pagefind but these solutions have limitations in hosting environments and flexibility.

SaaS search solutions like Algolia have their place too, but the official Algolia x Netlify integration simply triggers a web crawl of a site rather than indexing content directly. Again, this limits flexibility and results in a delay between build and updating the search index.

As a fan of elasticsearch I thought I would have a go at implementing search on this site using pure 11ty, Netlify and Elastic capabilities:

Create a search index at build time with 11ty
Update a remote elasticsearch index after build
Use an Elastic search application to create a simple search experience
Use Elastic Behavioral Analytics to capture search & click data

Sounds simple enough! Let's walk through the steps:

Create a search index at build time

There are a number of ways to do this, but I took the route that seemed the most simple to me - using nunjucks. I added a new nunjucks filter and a page template to render blog posts as ndjson, which will be important later:

/* --- eleventy.js --- */
eleventyConfig.addNunjucksFilter("ndjson", (page)=> {
  let d = new Date();
  return JSON.stringify({
    'url': 'https://simonhearne.com'+page.url,
    'tags': page.data.tags,
    'published': page.date,
    'title': page.data.title,
    'excerpt': page.data.excerpt,
    'image': page.data.image,
    'indexed': d.toISOString()
  });
});

{# --- esindex.ndjson.njk --- #}
---
permalink: /esindex.ndjson
eleventyExcludeFromCollections: true
---
{%- for post in collections.posts %}
  { "index": {"_id": "{{post.url}}"}}
  {{ post | ndjson | safe }}
{%- endfor %}

This creates a new file at build time, esindex.ndjson, at the root of the published site directory. It should look a little like this:

{ "index": {"_id": "/2015/web-performance-optimisation-basics/"}}
{"url":"https://simonhearne.com/2015/web-performance-optimisation-basics/","tags":["posts","WebPerf"],"published":"2015-03-11T19:54:00.000Z","title":"Web Performance Optimisation Basics","excerpt":"Website performance is critical to user experience. We need rules to make it easier.","image":"./images/speed_hero.jpg","indexed":"2024-05-07T11:47:22.116Z"}
...

Note that this ndjson file corresponds to the format required for the elasticsearch _bulk API. I've set the _id of each document to the path of the page, this is unique in my context and means that further indexing requests to existing documents in elasticsearch will replace the older document.

Index the data to elasticsearch

We can validate the bulk data created above by POSTing it to an elasticsearch instance (assuming you have the relevant variables set):

curl -XPOST -H \"Content-Type: application/x-ndjson\" -H \"Authorization: ApiKey $ELASTIC_APIKEY\" $ELASTIC_HOST/$ELASTIC_INDEX/_bulk --data-binary @_site/esindex.ndjson

If that works as expected, we can lazily add it to our build command:

# --- netlify.toml --- #
[build]
  publish = "_site"
  command = "npm run build && curl -XPOST -H \"Content-Type: application/x-ndjson\" -H \"Authorization: ApiKey $EAK\" $ELASTIC_HOST/$ELASTIC_INDEX/_bulk --data-binary @_site/esindex.ndjson"

You will need to add your variables to your Netlify config of course!

Create a search application

Search Applications allow you to create a public API endpoint against an elasticsearch instance. Our use case is simple but you can do all sorts with search applications!

I've set my index name $ELASTIC_INDEX to search-simonhearne (original I know) so creating my search application (called simonhearne) is as simple as a single PUT via Kibana dev tools:

PUT /_application/search_application/simonhearne
{
  "indices": [ "search-simonhearne" ],
  "template": {
    "script": {
      "source": {
        "fields": ["title","tags","url","excerpt","published","image"],
        "query": {
          "bool": {
            "should": [
              {
                "multi_match": {
                  "query": "",
                  "fields": ["title^4","tags","excerpt","url"],
                  "fuzziness": "auto"
                }
              },
              {
                "multi_match": {
                  "query": "",
                  "fields": ["title^4","tags","excerpt","url"],
                  "type": "phrase_prefix"
                }
              }
            ]
          }
      	},
      	"from": "",
      	"size": "",
    	  "highlight": {
          "fields": {
            "title": {},
            "excerpt": {}
          },
          "tags_schema" : "styled",
          "fragment_size": 150,
          "number_of_fragments": 1,
          "type": "plain"
        }
      },
      "params": {
        "query": "*",
        "from": 0,
        "size": 10
      }
    }
  }
}

This request is doing quite a lot, defining the query I want the search application to run (against the post title, tags, excerpt and url) with relevant weightings, also to run a phase prefix query to provide a bit of query completion logic. I've also enabled highlighting to enable me to show highlights in the search results.

We can test this search application with a simple POST against the application endpoint:

POST _application/search_application/simonhearne/_search
{
  "params": {
    "query": "velo"
  }
}

Implementing the search client

We need to create an API key for the search application to allow public API requests:


POST /_security/api_key
{
  "name": "public api key",
  "role_descriptors": {
    "my-restricted-role-descriptor": {
      "indices": [
        {
          "names": ["simonhearne"],
          "privileges": ["read"]
        }
      ],
      "restriction":  {
        "workflows": ["search_application_query"]
      }
    }
  }
}

With the API key, we can now construct a client-side API call:

const searchEndpoint = "<elasticsearch-host>/_application/search_application/simonhearne/_search"
const apiKey = "<base64 apiKey>"

const body =
  {
    "params": {
      "query": term
    }
  };
let response = await fetch(searchEndpoint, {
  method: 'POST',
  mode: 'cors',
  headers: {
    'Authorization': `apiKey ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(body)
});

And render the results (using a template literal function in this case):

const postTemplate = (result,term,position) => `
<li class="postlist-item" data-query="${term}" data-id="${result._id}" data-index="${result._index}" data-position="${position}">
  <div class="hero-img-container">
    <a href="${result.fields.url[0]}" class="no-outline"><img src="${result.fields.image[0].slice(1)}" width="1200" height="675" onclick="sendClick(this)"></img></a>
  </div>
  <div class="meta-container">
    <div class="post-title"><a href="${result.fields.url[0]}" class="postlist-link" onclick="sendClick(this)">${(result.hasOwnProperty('highlight') && result.highlight.hasOwnProperty('title')) ? result.highlight.title[0] : result.fields.title}</a></div>
    <time class="postlist-date">${new Date(result.fields.published[0]).toLocaleDateString()}</time>
    <div class="post-description">${(result.hasOwnProperty('highlight') && result.highlight.hasOwnProperty('excerpt')) ? result.highlight.excerpt[0] : result.fields.excerpt}</div>
  </div>
</li>`;

Note that there is a search application client library which can do this for you, I just prefer hacking away at HTML and JS manually.

Enable Analytics

Elastic Behavioral Analytics allows you to capture and view simple analytics of search events. The documentation covers the important stuff so I'll just mention how I've implemented it.

I host the analytics locally and load it only on the search page, deferred to prevent blocking render:

<script>
  function bootstrapAnalytics() {
    window.elasticAnalytics.createTracker({
      endpoint: "<elasticsearch-hsot>",
      collectionName: "<collection-name>",
      apiKey: "<apiKey>",
    });
  }
</script>
<script src="../js/behavioral-analytics-browser-tracker.2.0.0.min.js" defer onload="bootstrapAnalytics()"></script>

Tracking search events

On each search execution I run this function to send the relevant analytics event:

function sendAnalytics(query,results,total) {
  if (window.elasticAnalytics) {
    window.elasticAnalytics.trackSearch({
      search: {
        query: query,
        search_application: "simonhearne",
        results: {
          total_results: total,
          items: results.map(res=> {
            return {
              'document':{
                'id':res._id,
                'index':res._index
              },
              'page':{
                'url':res.fields.url[0]
              }
            }
          })
        }
      }
    });
  }
}

Tracking search clicks

Each search result link has an event listener to send an analytics beacon on click, using data attributes added in the postTemplate documented above:

function sendClick(e) {
  if (window.elasticAnalytics) {
    let el = e.closest('[data-index]');
    window.elasticAnalytics.trackSearchClick({
      'document': {
        'id': el.dataset.id,
        'index': el.dataset.index
      },
      'search': {
        'query': el.dataset.query,
        'search_application': 'simonhearne'
      }
    })
  }
}

Results

You should now have a dashboard in Kibana with your search analytics!

screenshot of kibana search analytics dashboard — Kibana search analytics dashboard

Productionising

In my case I just wanted something that worked and did not take too much care on production readiness. There are a few further tasks I would complete for a 'proper' deployment:

delete the source ndjson file after indexing
configure separate indices for lower environments
take language into account
tune the search query based on the analytics results

Overall I spent about three hours building this search solution, with an existing elasticsearch cluster ready to use.

Create a search index at build time #

Index the data to elasticsearch #

Create a search application #

Implementing the search client #

Enable Analytics #

Tracking search events #

Tracking search clicks #

Results #

Productionising #