From content strategy to code, JoyConf brings the Storyblok community together - Register Now!

Build a GEO-ready website with Storyblok and Astro

Storyblok is the first headless CMS that works for developers & marketers alike.

In the first part of this series—master SEO with Storyblok and Astro—we covered SEO classics, including meta tags, Open Graph, and structured metadata (JSON-LD). Now, it's time to introduce a new complementary approach, dubbed generative engine optimization (GEO).

The technical implementation of GEO is almost identical to SEO. Only the end goal is different: instead of appearing on top of Google or Bing's search results, you optimize your content to be surfaced by AI chatbots, like ChatGPT, Perplexity, or Gemini.

hint:

Check out the first part in the series to learn how to wire your Storyblok CMS and Astro frontend to let content editors quickly generate meaningful structured data for search engines, social media platforms, and LLMs.

When they trawl the web, bots scrape pages and strip away the presentational layer (scripts, styles, navigation, etc.) to sift the raw content and break it into chunks.

Simplifying this process by providing structured metadata (as part of the HTML markup and in a separate file, like llms.txt) helps all parties: your customers, yourself, and the LLM vendors.

That's what we focus on in this part. We review the new spec, learn how it maps to Storyblok's content model, and build a practical demo that fetches content from Storyblok to generate automated llms.txt files in Astro.

hint:

You can find all the code samples and content schema in a dedicated GitHub repository.

What you need to know about GEO

The term GEO might have been coined by a group of researchers who published a research titled GEO: Generative Engine Optimization in November 2023.

While the authors mention that "Generative Engines utilize generative models conditioned on website content", LLM providers have since incorporated real-time search functionality (sourced from both Google and Bing), which relies on search engines' decades-old algorithms and conventions.

Still, GEO introduces one addition to marketers' and developers' toolboxes: llms.txt. Launched a year ago by data scientist Jeremy Howard, this proposed formatting standard that has gained traction among online businesses and IDEs.

"We propose adding a /llms.txt markdown file to websites to provide LLM-friendly content. This file offers brief background information, guidance, and links to detailed markdown files."

This separation of concerns has a few benefits:

  • LLMs can consume and process your content without repeatedly crawling multiple pages. Fewer requests mean consuming fewer server resources.
  • Automatically generated at build time, llms.txt is a flat file representation of your content, served in a semantic structure.
  • The format won’t negatively impact your SEO. Instead, it will complement your existing SEO strategy.

Hundreds of organizations have already adopted this unofficial standard, including Astro (llms.txt and llms-full.txt), Next.js, Svelte, Anthropic, Cloudflare, Netlify, Vercel, Cloudinary, Shopify, NVIDIA, Stripe, Tiptap, and many more. And it's not reserved for technology companies or documentation hubs; the standard serves businesses across industries, from tourism to restaurants to law firms.

learn:

In this context, a headless CMS offers a significant advantage over a monolithic CMS: it decouples presentation from data, supporting structured data by design. Furthermore, a composable architecture that's based on modular components lets you reshape this data however you like. This means you can focus on delivering a human-first experience in the visible content layer while generating a JSON-LD schema/llms.txt file for bots.

The llms.txt spec

Despite its name and file extension, llms.txt is a markdown file that should follow the proposed spec:

  • An h1 with the name of the project or site (required)
  • A blockquote with a summary of the project and key information about the file
  • A list of links to relevant pages nested under h2 sections
    • Each link has descriptive information
  • An Optional section linking to additional information that can be skipped if the tool's context window is smaller

The result would look something like this:

llms.txt
# Storyblok

> Storyblok is a headless CMS that enables marketers and developers to create with joy and succeed in the AI-driven content era. It empowers you to deliver structured and consistent content everywhere: websites, apps, and more.

- Marketers get a visual editor with reusable components, in-context preview, and workflows to launch fast.
- Developers can integrate their favorite frameworks through the API-first platform.
- Brands get one source of truth for accurate, flexible, and measurable content.

## Technology guides

- [Use Astro with Storyblok](<https://www.storyblok.com/docs/guides/astro/>): Detailed instructions on integrating Storyblok with Astro. The guide walks you through the setup of the SDK and how to use the Content Delivery API to manage an Astro website.
- [Use Next.js with Storyblok](<https://www.storyblok.com/docs/guides/nextjs/>): Detailed instructions on integrating Storyblok with the Next.js app router. The guide walks you through the setup of the SDK and how to use the Content Delivery API to manage a Next.js website.

## Optional

- [The stories endpoint](<https://www.storyblok.com/docs/api/content-delivery/v2/stories/>): API References and examples of fetching single or multiple stories from the Content Delivery API.

That's about all the theory you need to follow the practical part of this tutorial—let's start building.

Create the schema in Storyblok

The first step is to prepare a content model that maps Storyblok's blocks and fields to the corresponding sections in the spec.

The example used throughout this tutorial is a modified version of the core blueprint with the following schema:

  • A Content type block named page with two fields:
  • A Nestable block named teaser with one field:
    • A Text field named headline

We also have two folders: articles and posts. Which content (stories or folders) you prioritize depends on your marketing strategy and business goals.

The contents of the headline correspond to the h1 title, the summary to the descriptive information about each link, and the section h2 heading is generated from the folder name.

Generate the files in Astro

Astro offers a way to create custom endpoints that can serve any type of data; use it to generate two files: the basic llms.txt and a complete archive in llms-full.txt.

Since this project uses SSG (statically-generated site), these files will be generated at build time. To match the spec, make sure you place them in the root directory.

Both files fetch the content from the stories API endpoint, taking advantage of the native sorting and filtering options to create a structure that adheres to the spec. Meanwhile, many organizations opt to recreate a sitemap in markdown. In this scenario, we recommend using the links API endpoint, which provides a minimal response.

learn:

The code in this tutorial uses some of the new features introduced in Storyblok's latest Astro SDK version (@storyblok/astro v7.3.0). Learn more in the announcement post.

llms.txt

Let's review the code for llms.txt:

src/pages/llms.txt.js
import { storyblokApi } from '@storyblok/astro/client';
import { extractStoryMeta } from '../utils/extractStoryMeta';

export const GET = async () => {
	try {
		const stories = await storyblokApi.getAll('cdn/stories', {
			sort_by: 'name:asc', // You can do all sorts of sorting. Learn more here: <https://www.storyblok.com/docs/api/content-delivery/v2/stories/examples/sorting-by-story-object-property>
			version: 'draft',
		});

		// Filters all stories not in folders, such as Home and About
		const mainStories = stories.filter((story) => !story.parent_id);

		// Filters all stories in folders, such as articles/ or posts/
		const childStories = stories.filter((story) => story.parent_id);

		const mainExtract = mainStories.map((story) => extractStoryMeta(story));
		const childExtract = childStories.map((story) =>
			extractStoryMeta(story, {
				folder: story.full_slug
					.split('/')[0]
					.replace(/^./, (firstLetter) => firstLetter.toUpperCase()),
			}),
		);
		const body = `# Feeding the bots with Storyblok

> A tutorial for developers interested in generating an \\`llms.txt\\` file using a combination of Storyblok and Astro.

This file contains a list of links to all relevant sections of the tutorial, serving as sitemap for LLMs.

***

${mainExtract
	.map(
		(story) =>
			`- [${story.headline}](<https://example.com/${story.slug}>): ${story.summary}`,
	)
	.join('\\n')}
${childExtract
	.map(
		(story) =>
			`\\n## ${story.folder}\\n\\n- [${story.headline}](<https://example.com/${story.slug}>): ${story.summary}`,
	)
	.join('\\n')}

## Optional

- [Homepage](<https://example.com>)
`;
		return new Response(body, {
			headers: {
				'Content-Type': 'text/plain; charset=utf-8',
			},
		});
	} catch (error) {
		return new Response(`Failed to generate llms.txt \\n\\n${error}`, {
			status: 500,
		});
	}
};

So what's going on here?

To reconstruct the suggested markup, filter the fetched stories into two sets:

  1. mainStories: stories not inside a folder filtered by excluding stories with a parent_id of 0 or null.
  2. childStories: stories inside folders filtered using the opposite technique.

Then, extract the relevant fields from each set with the extractStoryMeta helper function, which we cover below. For now, know that it extracts the headline, slug, and summary.

This is enough for mainExtract, but the second object, childExtract, also requires the folder name, used to render the section h2heading. To do that, get the folder name, then capitalize its first letter, transforming something like articles/first-article into Articles.

Why use a utility function?

Before moving on to the final part, of constructing the body, let’s pause and take a short detour to explain extractStoryMeta.

First, the code:

src//utils/extractStoryMeta.js
export const extractStoryMeta = (story, extra) => ({
	headline: story.name,
	slug: story.slug === 'home' ? '' : story.full_slug,
	summary: story.content?.summary || 'The page description',
	...extra,
});

The function takes two arguments, the standard story object and an extra object with any additional data you may require; in this case, the capitalized folder name.

Since these properties are used in multiple places, it's easier to maintain in a separate utility.

Now, back to llms.txt. The final task is to integrate everything into a response body that follows the spec: the project name, key information about it, and separate lists of links.

Run the build command to review the generated file. If you followed this tutorial and have the same content model in your space, the result should resemble the demo file (dist/llms.txt).

llms-full.txt

To recreate a single file archive of your site, you'd likely need to handle rich text. That's where Storyblok's richtext package, with its support for custom resolvers, comes into play.

To use it, create a helper utility that transforms the HTML generated by the rich text field into markdown. Before diving into the code that produces the llms-full.txt, let’s take a moment to understand what this helper utility does.

Before you start, install turndown, a library that converts HTML to CommonMark markdown. To handle tables, add a plugin that extends it to support GitHub-Flavored markdown.

npm install turndown @joplin/turndown-plugin-gfm --save-dev

Here's the full code:

src/utils/html2md.js
import { richTextResolver } from '@storyblok/astro';
import TurndownService from 'turndown';
import { tables } from '@joplin/turndown-plugin-gfm';

const teaserBlock = (blok) => {
	return `${blok.headline}`;
};

const mdResolver = richTextResolver({
	resolvers: {
		blok: (node) => {
			const componentBody = node.attrs?.body;
			if (!Array.isArray(componentBody)) return '';

			return componentBody
				.map((blok) => {
					if (blok.component === 'teaser') {
						return teaserBlock(blok);
					}

					// Handle any other block allowed inside the rich text field

					return '';
				})
				.join('');
		},
	},
});

export const convertedMarkdown = (richTextField) => {
	const content = mdResolver.render(richTextField);
	const turndownService = new TurndownService({
		bulletListMarker: '-',
		codeBlockStyle: 'fenced',
		emDelimiter: '*',
		fence: '```',
		headingStyle: 'atx',
	});
	turndownService.use(tables);

	const markdown = turndownService.turndown(content);
	return markdown;
};

Let's break down what the helper does.

Suppose your content team needs to add the teaser block to the rich text block. Use the richTextResolver provided by Storyblok to create a custom resolver that handles this block and extracts the headline.

Next, process everything inside the rich text field using turndown and configure the output to match your markdown style. Finally, add the tables plugin.

learn:

You can skip turndown and write a custom resolver to fit your structure and markdown formatting. Find the sample implementation in the GitHub repository.

Now that's done, let's continue to the llms-full.txt file and use the helper:

src/pages/llms-full.txt.js
import { storyblokApi } from '@storyblok/astro/client';
import { extractStoryMeta } from '../utils/extractStoryMeta';
import { convertedMarkdown } from '../utils/html2md';

export const GET = async () => {
	try {
		const stories = await storyblokApi.getAll('cdn/stories', {
			sort_by: 'name:asc', // You can do all sorts of sorting. Learn more here: <https://www.storyblok.com/docs/api/content-delivery/v2/stories/examples/sorting-by-story-object-property>
			version: 'draft',
		});
		const storyExtract = stories.map((story) =>
			extractStoryMeta(story, {
				content: convertedMarkdown(story.content.content),
			}),
		);
		const body = `# Feeding the bots with Storyblok

> A tutorial for developers interested in generating an \\`llms-full.txt\\` file using a combination of Storyblok and Astro.

This file contains the complete content of the tutorial, converted into markdown.

***

${storyExtract
	.map(
		(story) =>
			`## ${story.headline}

${story.content}

URL: [${story.headline}](<https://example.com/${story.slug}>)

***\\n`,
	)
	.join('\\n')}
## Optional

- [Homepage](<https://example.com>)
`;
		return new Response(body, {
			headers: {
				'Content-Type': 'text/plain; charset=utf-8',
			},
		});
	} catch (error) {
		return new Response(`Failed to generate llms-full.txt \\n\\n${error}`, {
			status: 500,
		});
	}
};

Call the two imported helper functions discussed earlier, extractStoryMeta and convertedMarkdown, inside storyExtract. Then, map over the fetched stories, extracting the headline, slug, and summary, as well the handy extra. This time, it’s the markdown version of the rich text field (content).

Finally, integrate everything into a response body that follows the spec: the project name, key information about it, and the full text of each story.

Run the build command to review the generated file. If you followed this tutorial and have the same content model in your space, the result should resemble the demo file (dist/llms-full.txt).

hint:

You can sort and filter the posts (in the code or the API call), add other fields, blocks, or metadata, and set up caching to adapt to your needs. While this tutorial fetches stories of the content type article, the format supports products, services, and other content types.

The last step isn't related to GEO, but to how Storyblok renders blocks inside a rich text field; modify the Page.astrocomponent to display the teaser block on the website:

src/storyblok/Page.astro
---
import { storyblokEditable } from '@storyblok/astro';
import { richTextToHTML } from '@storyblok/astro/client';
import StoryblokComponent from '@storyblok/astro/StoryblokComponent.astro';

const { blok } = Astro.props;

const renderedRichText = await richTextToHTML(blok.content);
---

<main {…storyblokEditable(blok)}>
	{
		blok.body?.map((blok) => {
			return <StoryblokComponent blok={blok} />;
		})
	}
	<Fragment set:html={renderedRichText} />

</main>

Run your development server and open the website to ensure everything renders as expected.

Test with chatbots

Reviewing the files and verifying the content structure and formatting is fine, but you should also confirm that chatbots can use it.

These are still the early days, and authoritative analytics solutions have yet to emerge. Furthermore, the inherent challenges of LLMs—leading models are as much a black box as Google’s search algorithm, and the technology is nondeterministic by design—make it harder to get a conclusive and reliable answer.

So, for now, the best thing you can do is a manual test in multiple LLMs. Build the site, copy and paste the contents of each file into a chatbot with a sufficient context window, then ask specific questions that can be extracted from the generated file.

Hallucinations aside, the tool should now use your data to generate answers that match and reference your content.

Takeaways

Now that you're familiar with the technical aspects of GEO and understand how to recreate content from Storyblok in an llms.txt file using Astro's custom endpoints, the only thing left to do is decide which content needs to be served.

With an API-first approach and composable architecture, Storyblok lets you create a version of your content that helps bots access, consume, and process it efficiently.

Investing time in this one-time setup is a great way to support the humans who use chatbots by delivering accurate and fast answers to their prompts.

hint:

You can find all the code samples and content schema in a dedicated GitHub repository.

Technical consultant: Dipankar Maikap, DevRel Engineer II