I Shipped AI Features on .NET 10 Across 9 Production Apps — Here's the Playbook
After three decades of building on the Microsoft stack, I've developed a reliable instinct for when a platform shift is real versus when it's marketing dressed up as engineering. COM+ was real. WCF was mostly real. "Silverlight everywhere" was not. The current wave of AI tooling on .NET? It's real — and unlike most hype cycles, the pieces actually fit together.
This isn't a survey article. I run a portfolio of 9+ production web properties through Prana Entertainment and WhiteStar Labs, and I've been integrating AI capabilities across them throughout 2025 and into 2026. What follows is a practitioner's guide to the tools, patterns, and hard-won lessons that emerged from that work.
What You'll Walk Away With
- How to evaluate which layer of the .NET AI stack solves which problem
- Why RAG is still the highest-ROI AI pattern for most .NET applications
- When to use Microsoft Agent Framework vs. Semantic Kernel vs. raw
IChatClient - How SQL Server 2025 eliminates the need for a separate vector database in most scenarios
- The MCP server pattern and why it's the REST API of the AI era
- Mistakes I made so you don't have to
Author
Gal RatnerStart with the Mental Model: LLMs Are Stateless Functions
If you're a .NET developer encountering generative AI for the first time, here's the framing that clicked for me: an LLM is a very expensive, stateless function. It takes text in, breaks it into subword units called tokens, maps those tokens to high-dimensional vectors (embeddings), finds patterns, and predicts what comes next. There's no memory between calls. There's no session state. Every invocation is independent unless you provide the context.
Three concepts drive everything else. First, tokens — not words, but subword fragments that govern both pricing and context limits. "Unhappiness" might consume three tokens. Second, embeddings — numerical vectors that capture semantic meaning. The phrases "how do I return an item" and "what's your refund policy" are completely different strings but nearly identical vectors, which is why semantic search works. Third, prompts — your programming interface. System prompts define behavior, user prompts provide input, and the discipline of crafting them well is genuine software design applied to natural language.
Your job as the .NET developer is everything the model can't do: managing state, choosing the right context, structuring outputs, handling errors, and making sure the whole thing doesn't cost a fortune. That's where the stack comes in.
The Stack at a Glance
Here's how I think about the layers — from the highest-level orchestration down to the runtime foundation. Each layer has a distinct job, and understanding the boundaries prevents you from reaching for the wrong tool.
- Microsoft Agent Framework Multi-agent workflows, sequential/concurrent orchestration
- Microsoft.Extensions.AI IChatClient · IEmbeddingGenerator · Middleware pipeline
- RAG Pipeline Ingest → Chunk → Embed → Store → Retrieve → Ground
- Vector Storage SQL Server 2025 · Extensions.VectorData · Qdrant
- Model Providers Azure Foundry · Ollama · Foundry Local · GitHub Models
- Protocol & Tooling MCP Servers · .NET Aspire · OpenTelemetry
- .NET 10 / ASP.NET Core / EF Core 10 Runtime, DI, SignalR, performance foundation
The Abstraction That Changes Everything: Microsoft.Extensions.AI
If you absorb one thing from this entire article, let it be this: Microsoft.Extensions.AI is to AI what ILogger is to logging and HttpClient is to HTTP. It's a provider-agnostic abstraction layer that lets you write AI code using the dependency injection and middleware patterns you already know from ASP.NET Core.
The package surfaces three core interfaces. IChatClient handles all chat-based model interactions — both complete responses via GetResponseAsync and token-by-token streaming via GetStreamingResponseAsync. It supports multi-modal content including text, images, and audio. IEmbeddingGenerator<TInput, TEmbedding> generates vector embeddings from input — this is the engine behind your RAG retrieval layer. And IImageGenerator provides a consistent API for text-to-image generation.
The practical payoff is enormous: swap between Ollama for local development and Azure OpenAI for production by changing a single DI registration. Your application code, your tests, your middleware pipeline — none of it changes.
// The provider swap. Dev: local Ollama. Prod: Azure OpenAI. IChatClient chatClient = environment.IsDevelopment() ? new OllamaApiClient(new Uri("http://localhost:11434/"), "llama3.2") : new AzureOpenAIClient(endpoint, credential) .GetChatClient("gpt-4.1") .AsIChatClient(); // Streaming works identically with any provider await foreach (var update in chatClient.GetStreamingResponseAsync("Explain DI.")) { Console.Write(update.Text); }
The Middleware Pipeline
This is where the design gets elegant. Extensions.AI supports composable middleware — identical in concept to ASP.NET Core's HTTP pipeline — that wraps your AI calls with cross-cutting concerns:
builder.Services.AddChatClient(services => new AzureOpenAIClient(endpoint, credential) .GetChatClient("gpt-4.1") .AsIChatClient() .AsBuilder() .UseLogging() .UseFunctionInvocation() .UseDistributedCache() .UseOpenTelemetry() .Build(services));
Logging, caching, telemetry, rate limiting, automatic function calling — all composed through the same pattern. Write a custom middleware once and it works regardless of whether you're hitting Azure OpenAI, Ollama, or Anthropic. The ordering is significant (outermost wraps first), just like HTTP middleware.
UseFunctionInvocation() is how agents move beyond text generation. You define C# methods, the model decides when to call them, and the framework handles the invocation loop. This is the mechanism behind every tool-using agent in the stack.Choosing Your Orchestration Layer
This is the question I get asked most: "Should I use Semantic Kernel or Microsoft Agent Framework?" The answer depends on what you're building, and understanding the relationship between the two saves you from making the wrong architectural bet.
Microsoft Agent Framework — The Agent Runtime
If you followed the Semantic Kernel and AutoGen projects, you probably felt the friction of two parallel tracks that didn't quite align. Microsoft Agent Framework (MAF) is the reconciliation — it merges AutoGen's clean agent abstractions with Semantic Kernel's enterprise features into a single, unified SDK. It hit Release Candidate in February 2026. The API surface is stable and feature-complete for v1.0.
using Microsoft.Agents.AI; AIAgent agent = new AzureOpenAIClient( new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!), new AzureCliCredential()) .GetChatClient("gpt-4.1") .AsAIAgent(instructions: "You are a helpful assistant."); Console.WriteLine(await agent.RunAsync("Summarize our Q1 revenue data."));
Five lines to a working agent. But the real capability is multi-agent orchestration — MAF ships a workflow engine supporting sequential, concurrent, handoff, and group chat patterns with streaming. It gives you session state management, full DI integration, OpenTelemetry tracing, MCP client support, Microsoft Entra security, and responsible AI features including prompt injection protection.
Crucially, MAF deploys like any ASP.NET Core application. No special hosting model. No new infrastructure. If you can ship a web API, you can ship agents.
How to Pick
| Scenario | Use This | Why |
|---|---|---|
| Single LLM call, no orchestration | IChatClient directly |
Least overhead, maximum control |
| RAG pipeline with tool use | Semantic Kernel + Extensions.AI | Plugin system handles tools cleanly |
| Multi-agent workflows | Microsoft Agent Framework | Built-in orchestration, handoffs, streaming |
| Existing SK codebase | Stay on SK, adopt MAF incrementally | SK plugins transfer directly to MAF |
The mental model: Extensions.AI is the transport layer. Semantic Kernel is the orchestration layer. Agent Framework is the agent runtime. Each builds on top of the one below it, and SK plugins port directly to MAF — so nothing you build today becomes throwaway.
RAG: Still the Highest-ROI Pattern
Here's an opinion that might be unpopular in agentic-AI circles: the majority of AI features running in production today aren't autonomous agents. They're retrieval-augmented generation — find the right context from your own data, inject it into a prompt, get a grounded answer. And that isn't a limitation. It's pragmatic engineering.
RAG works because it attacks the core enterprise problem with LLMs: they don't know your data. Fine-tuning is expensive, slow to iterate, and stale by deployment day. RAG gives you real-time grounding against live domain data, and every step of the pipeline has a clean .NET implementation in 2026.
The Four-Stage Pipeline
Stage 1 — Ingestion and Chunking. Documents get split into semantically meaningful pieces. Microsoft.Extensions.DataIngestion provides building blocks for reading and preparing documents. SQL Server 2025 also supports native text chunking in T-SQL for simpler cases.
Stage 2 — Embedding Generation. Each chunk becomes a vector through IEmbeddingGenerator<string, Embedding<float>>. Provider-agnostic — Azure OpenAI's text-embedding-3-large, Ollama running mxbai-embed-large locally, whatever. Same interface, any backend.
Stage 3 — Vector Storage and Retrieval. Microsoft.Extensions.VectorData provides a unified abstraction across vector stores — SQL Server 2025, Qdrant, Azure AI Search, Cosmos DB, MongoDB, Elasticsearch, even SQLite for dev. Define your model with attributes, and the framework handles the rest.
Stage 4 — Prompt Grounding. Top-k results from vector search get injected into the system prompt. The LLM generates a response anchored in your actual data instead of confabulating.
// End-to-end RAG flow var query = "What are our return policy exceptions?"; // Generate embedding for the query var queryEmbedding = await embeddingGenerator .GenerateEmbeddingVectorAsync(query); // Vector search against SQL Server 2025 var relevantChunks = await db.DocumentChunks .OrderBy(c => EF.Functions .VectorDistance("cosine", c.Embedding, queryEmbedding)) .Take(5) .ToListAsync(); // Ground the agent var context = string.Join("\n", relevantChunks.Select(c => c.Content)); var agent = chatClient.AsAIAgent( instructions: $""" Answer using ONLY the following context: {context} If the answer isn't in the context, say so. """); var answer = await agent.RunAsync(query);
SQL Server 2025: Skip the Separate Vector Database
This is the piece of the puzzle that I think most .NET shops haven't fully internalized yet. SQL Server 2025 shipped with native vector data types, DiskANN indexing, and built-in AI integration. Vectors are a first-class column type — not a bolt-on, not an extension. And if your data already lives in SQL Server, you very likely don't need a dedicated vector database at all.
EF Core 10 supports vector columns natively through SqlVector<float>:
public class DocumentChunk { public int Id { get; set; } public string Content { get; set; } public string SourceDocument { get; set; } [Column(TypeName = "vector(1536)")] public SqlVector<float> Embedding { get; set; } }
DiskANN indexing is the real differentiator. Microsoft's algorithm was designed for billion-scale approximate nearest neighbor search, and it's integrated directly into the SQL Server query engine. You get vector similarity search with full ACID transactions, your existing security policies, and your existing backup infrastructure.
Hybrid Search: The Quality Multiplier
Pure vector search misses exact keyword matches. Pure keyword search misses semantic intent. Combining both using Reciprocal Rank Fusion is where RAG quality makes a measurable jump:
-- Hybrid: vector similarity + full-text with RRF SELECT d.Id, d.Content, (1.0 / (60 + vs.rank)) + (1.0 / (60 + ft.rank)) AS rrf_score FROM DocumentChunks d INNER JOIN VECTOR_SEARCH( DocumentChunks, Embedding, @queryVector, 'metric=cosine, k=20' ) vs ON d.Id = vs.Id INNER JOIN FREETEXTTABLE( DocumentChunks, Content, @searchTerms ) ft ON d.Id = ft.[KEY] ORDER BY rrf_score DESC
MCP Servers: The REST API of the AI Era
Model Context Protocol is the part of the stack I think will have the most lasting architectural impact. MCP defines a standard for AI models to discover and invoke tools — databases, APIs, file systems, business logic — through a consistent JSON-RPC interface. Microsoft partnered with Anthropic on an official C# SDK built on the shared AI abstractions.
I've been converting existing .NET APIs into MCP servers, and the pattern is direct: wrap your service endpoints as MCP tools with rich schema descriptions, and any MCP-compatible agent can discover and call them at runtime. No pre-programming. No bespoke integrations. The agent connects, asks "what can you do?", gets a structured manifest, and proceeds to use your tools intelligently.
var mcpClient = await McpClientFactory.CreateAsync( clientTransport, mcpClientOptions, loggerFactory); var tools = await mcpClient.ListToolsAsync(); var response = await chatClient.GetResponseAsync( messages, new ChatOptions { Tools = [.. tools] });
MAF has native MCP client support, so every agent you build automatically gets access to any MCP server you expose. If you spent the 2010s building REST APIs, you'll spend the late 2020s building MCP servers. The mental model is identical — you're designing contracts for consumers. The consumers just happen to be AI agents now.
Model Providers: The Freedom of Abstraction
Because Microsoft.Extensions.AI makes everything provider-agnostic, you get genuine freedom in how you source your models. Here's the landscape as of March 2026:
Azure AI Foundry (formerly Azure AI Studio) is the enterprise platform — model catalogs spanning OpenAI, Meta, DeepSeek, Cohere, and Mistral, plus fine-tuning and production deployment. Foundry Local brings that experience to your machine with GPU and NPU acceleration for testing, CI/CD, and air-gapped environments. Ollama remains my go-to for local development — the OllamaSharp package implements IChatClient directly, and Microsoft's Phi-4 family runs well on it. GitHub Models provides free experimentation with the GitHub model catalog for prototyping.
The key point: every one of these implements the same interfaces. Application code, middleware, and tests stay identical. Only the DI registration changes. I run Ollama locally, push to Azure OpenAI in staging, and can swap to any new provider without touching business logic.
The Orchestration Backbone: .NET Aspire
.NET Aspire has matured significantly in .NET 10 and serves as the orchestration layer that wires AI applications together. Ollama containers, Qdrant instances, your API services — all declared in code and spinning up together with connection strings injected automatically. No hardcoded URLs. No docker-compose files to babysit.
Where Aspire really delivers for AI is in evaluation. You can spin up ephemeral containers with your entire stack, ingest test data, run queries, and score answers using LLM-as-Judge patterns — groundedness, relevance, coherence — all within integration tests. The .NET AI Evaluation libraries plug directly into this workflow. If you're shipping AI features without systematic quality checks, you're accumulating debt that compounds with every deployment.
What a Real Production System Looks Like
To make this concrete, consider building a customer support agent for an e-commerce platform — a problem I've actually solved. Here's how the layers compose:
At the top, Microsoft Agent Framework orchestrates a workflow: a triage agent classifies the customer's intent, a retrieval agent pulls relevant context, and a response agent generates the answer. The workflow engine manages handoffs and fallback logic.
The RAG layer feeds the retrieval agent. Product documentation, return policies, and order history are chunked, embedded, and stored in SQL Server 2025 — the same database where transactional data lives. Hybrid search (vector + full-text) retrieves the most relevant context.
MCP servers expose the order management system, inventory service, and CRM. Agents look up orders, check stock, and update tickets through standardized tool calls without custom integration code.
Extensions.AI ensures every model call flows through IChatClient with middleware for logging, caching, and telemetry. SignalR streams the agent's response token-by-token to the browser. .NET Aspire wires up all services and provides the monitoring dashboard. And .NET 10 underneath ties it together with DI, OpenTelemetry, and the raw performance to hit sub-100ms response initiation.
Mistakes I Made (So You Don't Have To)
Underestimating chunking strategy. I spent weeks tuning model selection and prompt engineering before realizing my retrieval quality was bottlenecked by how I split documents. Bad chunks produce bad retrievals, and no model is smart enough to overcome irrelevant context. Overlap-aware recursive splitting with semantic boundary detection is where the real ROI lives. Get this right before you optimize anything else.
Using oversized embeddings by default. A 1536-dimension general-purpose embedding model might be overkill for your domain. I tested smaller models and domain-specific embeddings and found measurably better recall in some cases with lower dimensionality. Measure retrieval quality, not vibes.
Ignoring token economics until the bill arrived. A RAG pipeline that works at 100 queries per day gets expensive at 100,000. Cache your embeddings. Cache your retrievals. Consider a reranking step with a smaller, cheaper model to filter candidates before your expensive generation call. Build cost awareness into the architecture from day one.
Treating agent observability as optional. MAF gives you OpenTelemetry hooks, but the tooling for understanding why an agent chose a particular tool or reasoning path is still maturing. Build verbose logging into your middleware pipeline from the start. You'll thank yourself the first time an agent does something unexpected in production.
Falling behind on package versions. The ecosystem is moving fast. MAF went from preview to RC in months. Microsoft.Extensions.VectorData just went GA. Pin your package versions. Read the migration guides. Budget time for upgrades.
Where to Start
If you've read this far and want to stop reading and start building, here's my recommendation: start with RAG. It's the highest-value, lowest-risk entry point into the .NET AI stack. Get your domain data into SQL Server 2025 vectors, wire up a simple retrieval pipeline using IEmbeddingGenerator and Extensions.VectorData, ground a single agent with real context, and ship it. You'll learn more from one production deployment than from a year of reading whitepapers.
The tools are stable. The patterns are proven. The abstractions actually abstract. For the first time in a long time on the Microsoft stack, the AI story doesn't require you to bet your architecture on preview packages and hope for the best.
Build something. Ship it. Then tell me what you learned.
Gal Ratner is an award-winning CTO with 30+ years in software architecture, AI, and scalable platform development. He operates 9+ production web properties through Prana Entertainment and WhiteStar Labs in Las Vegas.
Available for consulting and interim CTO roles — galratner.com · whitestarlabs.com