Design Systems as AI Infrastructure

#039: What IBM and SAP's newly launched design system MCP servers mean for how you build and structure your system.

Mar 23, 2026

“A token system documented in Confluence and a token system published as a well-structured JSON schema are not two ways of capturing the same information; they’re two different levels of AI-readiness.”

From component library to AI reasoning layer

The framing that keeps coming up in conversations with design systems practitioners, and that showed up quietly but unmistakably in the agenda at Into Design Systems this year, is that a design system is no longer just a shared language for designers and engineers. It’s the actual specification that generative UI tools reason from when they produce output. A spec, in the same way that an API contract or a data schema is a spec: the thing that defines what valid output looks like, and against which everything generated gets implicitly or explicitly evaluated.

And if the conference framing felt abstract, IBM and SAP just made it concrete, arriving at the same conclusion through entirely different architectures. When that happens independently, it stops being a trend and starts being a fact.

Carbon MCP, now in public preview, is an MCP server that connects AI assistants directly to the Carbon Design System knowledge base, letting AI tools query components, tokens, icons, usage guidelines, and code examples in real time instead of relying on whatever the model absorbed during training. It works across Cursor, Claude Code, Claude Desktop, VS Code with Copilot Chat, and IBM’s own tooling. The design system is no longer a reference that a developer might check; it’s the live context that the model reasons from while it generates code.

SAP has taken the same core idea and applied it differently, shipping four MCP servers (CAP, Fiori Elements, SAPUI5, and MDK) that together expose the entire Fiori design system to AI coding assistants. The Fiori MCP server lets an AI model generate and modify full Fiori-compliant applications from natural language prompts, scaffolding list reports, object pages, and flexible column layouts that adhere to SAP’s design patterns by construction rather than by hope. The SAPUI5 server exposes a dedicated tool for retrieving coding standards and guidelines, and the Fiori server’s documentation search queries across Fiori elements, annotations, UI5 resources, and tooling docs. All of SAP’s design guidance, available as queryable context at generation time.

Most teams haven’t made that mental shift yet, and it shows in how their systems are structured. Built for human legibility, not for machine consumption, which is a meaningful distinction when the consumer is increasingly a code generation model instead of an engineer reading documentation. What IBM and SAP have both done, through different architectures and for different ecosystems, is treat that distinction as a product problem worth solving at the infrastructure level. The result in both cases is a design system where constraints aren’t just documented; they’re enforced at the point of generation.

How tokens, APIs, and encoded constraints actually bound AI output

The mechanism is more direct than most people realize, and it runs through every layer of the design system.

Tokens are the most immediate constraint. When a generative UI tool produces a component, it gets bounded by the token system, the variables defining color, spacing, and typography. Every naming convention decision, every token hierarchy choice, every ambiguity in how a semantic token maps to a primitive is now also a decision about what range of outputs the AI is capable of producing. A poorly structured token system doesn’t just make things harder for engineers; it produces a wider, less predictable distribution of AI-generated UI, because the model ends up resolving ambiguities on its own instead of being bounded by clear, well-scoped values.

Component APIs are the next layer, and the logic is the same: a well-specified button component with explicit variant definitions and clear prop contracts produces predictable AI output, while an underspecified one gives the model enough room to pattern-match toward whatever it’s seen most often in training data. The output is usually technically functional and wrong in ways that are hard to articulate in a code review, because the wrongness lives in the details of variant selection and interaction pattern, not in the syntax.

Documentation, crucially, is not part of this equation in the way most teams assume. Or at least, it wasn’t until MCP changed what documentation can do. A model generating UI couldn’t read a Notion page and decide to follow it, so any constraint that lived only in written guidance rather than encoded in the component specification itself was a constraint the model didn’t have. What both Carbon MCP and SAP’s Fiori MCP do is collapse that gap from different directions: Carbon makes usage guidelines, accessibility rules, and interaction patterns queryable context that the model retrieves before it writes a line of code, while SAP’s servers embed design guidance directly into the generation workflow. The model doesn’t look up what a Fiori list report should contain; it scaffolds one that already conforms to SAP’s patterns, with the design rules enforced by the tooling rather than left to the model’s judgment. In both cases, the design system’s documentation becomes part of the spec, not a companion to it.

How the teams getting ahead of this are making their systems machine-readable

The teams getting ahead of this are treating their design systems as source material for model context, restructuring not just what the system contains but how it’s formatted, so that generative tools can actually consume it. The practical version of this looks like exporting token dictionaries as structured JSON rather than PDF documentation, writing component API specs in formats that map to the inputs a model needs to make a bounded decision, and encoding usage constraints directly into component definitions instead of maintaining them as a separate layer of guidance that the model will never see.

Carbon MCP and SAP’s Fiori MCP servers are the two furthest-along versions of this approach that are publicly available, and the contrast between them is worth understanding. Carbon exposes the design system as a queryable knowledge base where an AI tool can explore components, retrieve token values, pull code examples, and get answers to documentation questions through a standardized protocol instead of scraping or training. IBM has turned Carbon into a queryable API for design decisions. SAP has taken a more opinionated path: the Fiori MCP server doesn’t just expose information for the model to reason from, it executes structured operations (generating applications, adding pages, modifying controller extensions) where the design system’s constraints are baked into the execution logic itself, so the model never gets the chance to deviate from Fiori patterns in the first place. Both approaches work. And the fact that two of the largest enterprise design systems in the world arrived at MCP as the integration layer, independently and through different architectures, tells you this isn’t a niche bet.

Some organizations are going further still and using their design systems as the retrieval layer in a RAG workflow, a technique that grounds a model’s output in specific, real-world source material instead of relying solely on what it learned during training. A model queried to build or modify UI first retrieves the actual token values and component specifications from the system before generating output, so the result reflects what the product specifically requires, not what the model has seen most often at training time. The teams doing this well aren’t waiting for tool vendors to figure out how to consume design systems; they’re making their systems legible on their own terms. Carbon MCP and SAP’s Fiori servers are the proof that this has moved past the experimental phase and into shipped infrastructure.

Why the format of your design system now determines AI output quality

The implication that doesn’t get discussed clearly enough is that the format and structure of a design system now has direct downstream consequences for AI output quality. A token system documented in Confluence and a token system published as a well-structured JSON schema are not two ways of capturing the same information; they’re two different levels of AI-readiness, and the difference shows up immediately in the range and consistency of what generative tools produce. The teams with the most leverage over generative UI output aren’t necessarily the ones with the most mature or comprehensive design systems; they’re the ones whose systems are structured in ways that AI tools can actually reason from.

That’s a new design criterion, and for most teams, it’s one the current system wasn’t built to satisfy. Carbon MCP and SAP’s Fiori servers are worth paying attention to not because every team should replicate IBM’s or SAP’s exact approach, but because when two of the largest enterprise design systems in the world independently ship MCP servers as part of their developer tooling within the same quarter, it gets hard to call that a coincidence. The question isn’t whether your system needs to be machine-readable. It’s how far behind you are on making it so.

Further Reading

If you have thoughts on this, I’d genuinely like to hear them. I’ve been trying to track which teams are ahead of this and which are discovering it the hard way, so reply and tell me where your system stands.

— Justin

Discussion about this post

Ready for more?