How to integrate Algolia with your Headless CMS

May 25, 2021

Storyblok is the first headless CMS that works for developers & marketers alike.

If you want to build a simple search interface for your website, you can use Storyblok's Content Delivery API (opens in a new window) and the query filters (opens in a new window). This API is fast and easy to implement and you will also be able to sort the results by one of the attributes in the schema of your Stories.

If your search interface requires more advanced features, one of the best options is to use Algolia, an AI-powered search service. Algolia provides a high-performance API, ready-made widgets, and has many features to customize the search experience for your users, serving them the most relevant results.

In this article, we’ll go through the necessary steps for indexing your content in Algolia and keeping it up-to-date. Since both Storyblok and Algolia allow complete freedom in terms of technology, we won’t go through too much code, so you can use your favorite programming language and adapt the ideas from this tutorial.

LEARN:

You could be interested also in our article on how to index entries from Storyblok with Algolia.

Setting-up Algolia

In Algolia, we just have to create a new index and populate it via API.

Interface of a new index in Algolia — Creation of a new index in Algolia and how to populate it via API.

After the creation, we can customize the settings of our index according to our needs. We can select searchable fields, facets, sorting criteria, and many other parameters we might need.

Setting up the content in Storyblok

We don’t need to perform any change in the schema in Storyblok for indexing our content in Algolia.
We just have to make sure we have all the necessary fields for building the previews of the results listing. For example, we might need an image and an excerpt for each page. We have to index such fields in Algolia and then retrieve them from the search results.

How to index the content in Algolia

In order to populate the index in Algolia with the content of our project, we have to set up a serverless function on a cloud services provider of your choice. We will see later how to trigger this function whenever we need to update the index.
There are 3 main ways in which we can index our website: indexing the Story objects as they are, manipulate them before sending them to Algolia, or crawling the live pages of your website.

Indexing full story objects

We can store the Story objects (opens in a new window) directly into Algolia fetching them from the Content Delivery API (opens in a new window). We have a tutorial about how to index entries from Storyblok with Algolia (opens in a new window), that shows how this can be achieved with a few lines of Node.js code. In this case, the structure of the content in Algolia will be exactly the same as we have in the CMS;

Indexing manipulated story objects

Another option, similar to the previous one, is fetching and manipulating the Story Objects before sending them to Algolia. We can still use the CDN API for this, we just need to add an extra step.
Why should we manipulate the data? Some examples of what we might want to do are:

Removing fields from the schema;
Replacing objects with strings, for example, data from custom type fields;
Converting rich-text and markdown fields to plain text;
Replacing relations with just the name of the linked entry.

In case we are using Node.js, the example from option 1 is still perfect for us, we just need to map the array of Stories right before calling the Algolia API.

Indexing content from live pages

We can use a crawler to read the pages of the live site. Two advantages of this approach are:

Indexing content based on the HTML tags. The entries in Algolia could have a property for each one of the HTML tags we want to store and we can customize the ranking of the results giving more weight to some tags;
Reading the actual content of the pages, which might be generated by combining more Stories linked together with relations, or which might get data from other API sources.

An index with data in Algolia — An entry indexed in Algolia.

The best approach for you

None of the above approaches is universally the best and we need to check carefully which kind of search we are building. Options 1 and 2 might be best if our search is for a specific section of the site (for example a blog or a portfolio) and option 3 might be more suitable for a site-wide full-text search because it will probably be more effective in generating a good ranking of the results.

How to trigger the indexing script

We can use Storyblok’s webhooks to trigger the serverless function. In the settings of our space, we can define a URL to which Storyblok will send a post request once a Story has been published, unpublished, or deleted.

Webhooks settings in the Storyblok App — Webhooks configuration in the Storyblok App.

The webhook will send a payload containing the action performed in Storyblok (publish, unpublish, or delete), the space id, and the story id.

webhook's payload

{
  "action": "published",
  "text": "The user test@domain.com published the Story XYZ (xyz)",
  "story_id": 123,
  "space_id": 123
}

Since the payload gives us a reference to the story, we can also only index the changed content. Our function will need to read the payload and, if the story id property is set, it will have to fetch and update just that one in Algolia. By doing this we will not reindex the whole Space every time, so this operation will be executed much faster and we will run the serverless function for a shorter time, saving money.

Another way to index the website is to create a button in Storyblok that can be used by editors to reindex the whole website. We have to install an app called Task Manager (opens in a new window) in our Space, which will add a new section in the sidebar called Tasks. Inside that section, we will add a new task:

Interface for creating a new task — Defining a new Task in the Storyblok App.

In the form, we will add a name for the Task, an explanatory description for the editors, and the URL of the serverless function. We can then leave the dialog field blank.

Tasks list in Storyblok — Overview of defined tasks in the Storyblok App.

The execution time of this task will take longer compared to the indexing of a single Story, but it can be useful to perform the initial indexing of the project. Or, if we just want to trigger the function manually once we have done several updates to the content in different Stories.

Both Storyblok’s and Algolia’s API are very fast, but trying to keep the execution time of our function as low as possible will still save time and money.

Resource	Link
How to index entries from Storyblok Tutorial	https://www.storyblok.com/tp/index-storyblok-algolia
Storyblok Webhooks	https://www.storyblok.com/docs/guide/in-depth/webhooks
Storyblok CDN API	https://www.storyblok.com/docs/api/content-delivery/v2
Algoalia Indexing API	https://www.algolia.com/doc/api-client/methods/indexing/

Question or Feedback?

Author

Christian Zoppi

Github (opens in a new window)

Christian is a full-stack developer and he's the Head of the Website & Developer Experience department. He's from Viareggio, Italy.