Skip to main content

3 posts tagged with "mcp"

View All Tags

The Two-Layer Model That Separates AI Teams That Ship from Those That Demo

· 62 min read
Vadim Nicolai
Senior Software Engineer

In February 2024, a Canadian court ruled that Air Canada was liable for a refund policy its chatbot had invented. The policy did not exist in any document. The bot generated it from parametric memory, presented it as fact, a passenger relied on it, and the airline refused to honor it. The tribunal concluded it did not matter whether the policy came from a static page or a chatbot — it was on Air Canada's website and Air Canada was responsible. The chatbot was removed. Total cost: legal proceedings, compensation, reputational damage, and the permanent loss of customer trust in a support channel the company had invested in building.

This was not a model failure. GPT-class models producing plausible-sounding but false information is a known, documented behavior. It was a process failure: the team built a customer-facing system without a grounding policy, without an abstain path, and without any mechanism to verify that the bot's outputs corresponded to real company policy. Every one of those gaps maps directly to a meta approach this article covers.

In 2025, a multi-agent LangChain setup entered a recursive loop and made 47,000 API calls in six hours. Cost: $47,000+. There were no rate limits, no cost alerts, no circuit breakers. The team discovered the problem by checking their billing dashboard.

These are not edge cases. An August 2025 Mount Sinai study (Communications Medicine) found leading AI chatbots hallucinated on 50–82.7% of fictional medical scenarios — GPT-4o's best-case error rate was 53%. Multiple enterprise surveys found a significant share of AI users had made business decisions based on hallucinated content. Gartner estimates only 5% of GenAI pilots achieve rapid revenue acceleration. MIT research puts the fraction of enterprise AI demos that reach production-grade reliability at approximately 5%. The average prototype-to-production gap: eight months of engineering effort that often ends in rollback or permanent demo-mode operation.

The gap between a working demo and a production-grade AI system is not a technical gap. It is a strategic one. Teams that ship adopt a coherent set of meta approaches — architectural postures that define what the system fundamentally guarantees — before they choose frameworks, models, or methods. Teams that demo have the methods without the meta approaches.

This distinction matters more now that vibe coding — coding by prompting without specs, evals, or governance — has become the default entry point for many teams. Vibe coding is pure Layer 2: methods without meta approaches. It works for prototypes and internal tools where failure is cheap. But the moment a system touches customers, handles money, or makes decisions with legal consequences, vibe coding vs structured AI development is the dividing line between a demo and a product. Meta approaches are what get you past the demo.

This article gives you both layers, how they map to each other, the real-world failures that happen when each is ignored, and exactly how to start activating eval-first development and each other approach in your system today.

Industry Context (2025–2026)

McKinsey reports 65–71% of organizations now regularly use generative AI. Databricks found organizations put 11x more models into production year-over-year. Yet S&P Global found 42% of enterprises are now scrapping most AI initiatives — up from 17% a year earlier. IDC found 96% of organizations deploying GenAI reported costs higher than expected, and 88% of AI pilots fail to reach production. Gartner predicts 40% of enterprise applications will feature task-specific AI agents by end of 2026, up from less than 5% in 2025. Enterprise LLM spend reached $8.4 billion in H1 2025, with approximately 40% of enterprises now spending $250,000+ per year.

Pixel-Perfect UI with Playwright and Figma MCP: What Actually Works in 2026

· 14 min read
Vadim Nicolai
Senior Software Engineer

I asked an AI coding assistant to implement a page layout from a Figma design. It got the heading size wrong (28px instead of 24px), inserted a 4px gap where there should have been 8px, and hallucinated a duplicate magnifying glass icon inside the search bar. The overall structure was fine. The details were not.

This is the state of AI-assisted design-to-code in 2026. The tools get you 65-80% of the way there, then leave you in a no-man's land where the remaining pixels matter more than all the ones that came before. Every frontend engineer who has shipped production UI knows: "close enough" is not close enough.

I spent a session trying to close that gap using the toolchain everyone is talking about -- Figma MCP for design context, headless Playwright for runtime measurement, and an AI assistant for the correction loop. Here is what happened, what broke, and what produced results.

Trigger.dev Deep Dive: Background Jobs, Queue Fan-Out, MCP, and Agent Skills

· 14 min read
Vadim Nicolai
Senior Software Engineer

Trigger.dev is a serverless background job platform that lets you run long-running tasks with no timeouts, automatic retries, queue-based concurrency control, and full observability. Unlike traditional job queues (BullMQ, Celery, Sidekiq), Trigger.dev manages the infrastructure — you write TypeScript tasks and deploy them like functions.

This article covers the platform end-to-end: architecture, task authoring, the queue fan-out pattern, MCP server integration for AI assistants, agent skills/rules, and a production case study of a TTS audio pipeline.