Advancing Web Semantics: The Promise of the Block Protocol

From Documents to Data: The Web's Unfinished Journey

Since the mid-1990s, the World Wide Web has primarily served as a platform for publishing human-readable documents. Web pages are built with HTML, which provides basic formatting directives—like identifying paragraphs or emphasizing words. CSS adds visual flair, such as styling text in tiny gray sans-serif fonts. While this approach works well for human readers, it leaves computers largely in the dark about the actual meaning of the content.

Advancing Web Semantics: The Promise of the Block Protocol
Source: www.joelonsoftware.com

Consider a typical mention of a book on a web page: the title might be bolded, but a program reading the page cannot reliably distinguish that this is a book reference, let alone extract details like the author, illustrator, publisher, or year. The underlying structure is almost nonexistent.

The Semantic Web Vision

As early as 1999, Tim Berners-Lee articulated a dream for a more intelligent web—one where computers could analyze content, links, and transactions automatically. He wrote: “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.”

To realize this vision, web authors would need to add structured metadata to their pages. Standards like schema.org provide vocabularies for describing things (books, events, people, etc.), and formats such as RDF and JSON-LD allow embedding that data within HTML. In theory, this would make web content both human- and machine-readable.

Why Adoption Stalled

Despite its promise, adding semantic markup has remained a tedious, homework-like task. After crafting a blog post, few authors have the motivation to research schema types and manually insert JSON-LD blocks. Without immediate reward or widespread tooling, most give up. As a result, semantic markup is rare on the web even two decades after the Semantic Web was first proposed.

Enter the Block Protocol

We believe the solution lies in lowering the barrier to entry. The Block Protocol is a new approach that enables content authors to add structured data as easily as they insert an image or a video. It works by defining reusable “blocks”—self-contained components that carry their own semantic meaning. For example, a book block would automatically include all relevant metadata (title, author, ISBN) in a machine-readable format, without requiring the author to write any special code.

Advancing Web Semantics: The Promise of the Block Protocol
Source: www.joelonsoftware.com

How It Works

Blocks are built on existing web standards and can be plugged into any supporting platform (like WordPress, Notion, or custom sites). Each block contains a piece of content (text, media, interactive widget) and an attached “schema” that describes its meaning in a structured way. When a user adds a block, the system automatically handles the structured data behind the scenes.

Benefits

The Path Forward

By making structured data a byproduct of normal content creation, the Block Protocol aims to finally realize the Semantic Web’s original promise. It shifts the effort from individual web authors to the developers who build these blocks, accelerating adoption. Human progress depends on making information more accessible—not just to people, but to the intelligent programs that can process it at scale. With the Block Protocol, that future is within reach.

To learn more about implementing blocks, see our introduction or the block protocol specification.

Recommended

Discover More

Knee Arthritis Relief: The Top Exercise Revealed by ScienceUX Alert: Misused Modals Sabotage User Flow – Experts Demand Better Design DecisionsUS Residents Sentenced for Aiding North Korean Cyber Workers Through Fake Laptop NetworksEtherRAT Malware Campaign: How Cybercriminals Use Fake GitHub Repositories to Target AdministratorsHow to Achieve Machine-Speed Defense: A Step-by-Step Guide to Automating Modern Cybersecurity