What is structured data?

Definition

Structured data is a semantic markup added to a page's HTML code to explicitly tell search engines what type of content is being presented. It uses the Schema.org vocabulary and is implemented as JSON-LD. It enables rich results in SERPs and facilitates citation by LLMs.

Structured data addresses a fundamental problem: HTML tells machines what is displayed, but not what it means. A heading could be a product name, an article author, or the answer to a question. Schema.org markup adds this semantic layer — machine-readable, invisible to users.

JSON-LD: the recommended format

There are three ways to implement structured data (JSON-LD, Microdata, RDFa), but Google explicitly recommends JSON-LD for its ease of maintenance and flexibility. The JSON-LD block is embedded in a <script type="application/ld+json"> tag, typically in the <head> or at the bottom of the <body>. It can be updated without touching the page HTML.

The schemas with the most impact

Impact varies by page type and objective. For classic SEO visibility: FAQPage (triggers expandable rich snippets), HowTo (displays steps directly in the SERP), Product (reviews, price, availability), Article with datePublished and author. For GEO visibility: DefinedTerm and DefinedTermSet allow LLMs to precisely identify and cite glossary definitions, ItemList structures ordered extractable lists, and SpeakableSpecification flags passages optimized for voice reading.

Structured data and LLMs

LLMs trained on web corpora have integrated Schema.org logic into their content understanding. Content explicitly marked up with a DefinedTerm is more likely to be cited precisely and correctly than content whose nature must be inferred from context. Structured data reduces ambiguity for machines — which reduces hallucination risk and increases citation fidelity.

Not directly. Google officially states that structured data is not a ranking factor in itself. However, it facilitates earning rich results that improve CTR, and it anchors content understanding by engines, which has a measurable indirect impact on perceived relevance and AI citation frequency.

To maximize visibility in LLMs, the most effective schemas are DefinedTerm and DefinedTermSet for glossary and definition pages, FAQPage for Q&A content, HowTo for step-by-step guides, and ItemList for structured lists. These formats help LLMs extract and precisely cite key information without having to infer it from context.