Blog Mar 5

I Built an MCP Server in Go to Fix a Problem Every AI Developer Has

The Problem: When Your AI Coding Assistant Can’t Find the Docs

I’ve been using Claude Code to assist in writing agentic systems with Eino, ByteDance’s Go framework for LLM applications. The framework is SOLID – it powers Doubao and TikTok’s AI features at massive scale – but I kept hitting the same wall.

Claude couldn’t search the documentation effectively.

Eino’s docs are AMAZING. They are well-written, comprehensive, and even include a cookbook of production examples, but spread across 100+ pages on cloudwego.io/docs/eino/. Great for human browsing, terrible for AI semantic search. Every time I needed to reference something specific about graph orchestration or flow integration, I found myself converting the relevant page into markdown, copying, and pasting entire documentation pages into the context window to help guarantee I was getting the best, most intelligent method of implementation.

Longer-term friction added up: bloated context windows of irrelevant information, or worse, proposed solutions missing key integration features, and finally, constant interruptions to my flow. And it’s not just an Eino problem – any framework with extensive documentation faces this challenge when you’re building with AI coding assistants.

I needed Eino docs to be reliably searched by my AI tools without my manual intervention. So I built an MCP server to make it happen.

The Solution: MCP + Vector Search + Go

The project is straightforward: fetch Eino documentation from GitHub, chunk it intelligently, summarize it, create semantic embeddings, and expose searchable tools via the Model Context Protocol.

MCP is becoming the standard protocol for connecting AI assistants to data sources. With 97M+ monthly downloads and backing from Anthropic, OpenAI, Google, and Microsoft, it’s transitioning from an interesting experiment to production infrastructure. I believe 2026 is shaping up to be the year every enterprise considers deploying custom MCP servers to ease agentic access to critical, custom information.

Implementing vector search enables semantic similarity matching – far more effective than keyword search for AI retrieval tasks. Instead of just hoping Claude finds the right page to crawl, after I give it the base URL, and then hoping it is still in context when I need it, Claude has the option to search semantically every time.

I chose Go for production requirements: single binary deployment, low memory footprint, and type safety that catches bugs at compile time. These aren’t theoretical benefits – they matter when something needs to actually work.

The system is deployed on Fly.io at https://eino-docs-mcp.fly.dev/. You can connect Claude Code to it right now with a single command.

Full transparency: I’m learning Go publicly. I built this MCP server with Claude’s help. But I understand every line of the architecture, and that’s what matters for production systems.

While coding assistants like Claude or Copilot can greatly speed up developer productivity, the trade-off is you are constantly pruning the code (AI slop) it creates and have to really digest all code outputs before you commit. You will quickly code yourself into a corner otherwise.

I have previously written a vector search RAG system in Python using LangChain by hand so am already familiar with the subject matter.

The Architecture: What Makes This Production-Grade

Here’s the pipeline:

GitHub –> Markdown Fetcher –> Header-Aware Chunker –> GPT-4o Metadata (summary) –> OpenAI Embeddings –> Qdrant Vector DB –> MCP Server –> Claude Code

Each piece has a specific purpose. Let me break down the decisions that make this production-grade rather than a toy demo.

1. Header-Aware Chunking

Instead of naively splitting documents at fixed character counts, the chunker respects markdown structure. It splits at H1 and H2 boundaries while preserving hierarchy context in each chunk.

Here’s the core logic from internal/markdown/chunker.go:

// Extract chunks with header context
func (c *Chunker) extractChunks(doc ast.Node, source []byte, items toc.Items, ancestors []string, chunks *[]Chunk) {
for i, item := range items {
// Build header path for this item
currentPath := append(ancestors, string(item.Title))
headerPath := formatHeaderPath(currentPath)

// … (boundary detection and content extraction omitted for brevity)

// Create chunk with prepended header path
chunk := Chunk{
Index: len(*chunks),
HeaderPath: headerPath, // “# Installation > ## Prerequisites”
RawContent: content,
Content: fmt.Sprintf(“%s\n\n%s”, headerPath, content),
}
*chunks = append(*chunks, chunk)
}
}

Why does this matter? Semantic coherence. A chunk that says “Set model to ‘gpt-4’ and temperature to 0.7” without context is useless. A chunk that says “# ReactAgent > ## LLM Setup > Set model to ‘gpt-4’ and temperature to 0.7” gives Claude the additional ReactAgent context it exactly needs.

2. Search Deduplication Strategy

The search handler implements a three-step filter to prevent overwhelming users with multiple chunks from the same document:

// From internal/mcp/handlers.go
// Request 3x to ensure enough unique documents after dedup
chunks, err := store.SearchChunksWithScores(ctx, queryEmbedding, maxResults*3, repository)

// Deduplicate by parent document, keeping highest score per doc
docScores := make(map[string]float64)
for _, chunk := range chunks {
if chunk.Score < minScore {
continue // Below threshold
}
if existing, seen := docScores[chunk.ParentDocID]; !seen || chunk.Score > existing {
if !seen {
docIDs = append(docIDs, chunk.ParentDocID)
}
docScores[chunk.ParentDocID] = chunk.Score
}
}

Flow: Request 3x desired results –> filter by score threshold –> dedupe by parent document –> limit to max results. This achieves 95% reduction in duplicate results while maintaining diversity.

Input (15 chunks from vector search):
1. react_agent.md chunk#1 0.92
2. react_agent.md chunk#2 0.89
3. react_agent.md chunk#3 0.85
4. chat_model.md chunk#1 0.82
5. react_agent.md chunk#4 0.78
6. tools.md chunk#1 0.75
7. graph.md chunk#1 0.71
8. callbacks.md chunk#1 0.68
9. tools.md chunk#2 0.65
10-15. (more chunks…)

Basically, it gives me the top 3 actual .md files, not just 3 chunks from the same .md file.

3. Dual-Mode Deployment

The server supports both stdio (for local Claude Code) and HTTP (for remote clients) in the same binary:

// From cmd/mcp-server/main.go
serverMode := getEnv(“SERVER_MODE”, “false”) == “true”

if serverMode {
// HTTP mode: serve MCP over HTTP for remote clients
addr := “0.0.0.0:” + port
log.Printf(“Starting HTTP server on %s”, addr)
http.ListenAndServe(addr, mux)
} else {
// Stdio mode: run MCP server over stdin/stdout
server.Run(ctx)
}

I originally coded this using stdio as the MCP transport layer, but then quickly realized I would be using it on three different machines and did not want the overhead of deploying it locally on three different machines. And then of course realized others might just want to be able to plug into it without spinning up their own Docker containers.

4. Exponential Backoff Everywhere

Every external service call (OpenAI, GitHub, Qdrant) includes retry logic with exponential backoff:

// From internal/storage/qdrant.go
func (s *QdrantStorage) healthCheckWithRetry(ctx context.Context) error {
exponentialBackoff := backoff.NewExponentialBackOff()
exponentialBackoff.InitialInterval = 500 * time.Millisecond
exponentialBackoff.MaxInterval = 10 * time.Second
exponentialBackoff.MaxElapsedTime = 30 * time.Second

operation := func() error {
return s.Health(ctx)
}

return backoff.Retry(operation, exponentialBackoff)
}

Production reliability requires graceful handling of transient failures. Zero production failures from transient API errors since deployment. Except when I forgot to pay my OpenAI API bill (no backoff around that ;).

The Stack

• Go 1.24 with modelcontextprotocol/go-sdk v1.2.0

• Qdrant vector DB (gRPC, 1536-dimension vectors, cosine similarity)

• GPT-4o for metadata extraction, text-embedding-3-small for vectors

• Fly.io deployment: 512MB RAM, 1 shared CPU, 1GB persistent volume

Four MCP Tools

1. search_docs – Semantic vector search with smart deduplication

2. fetch_doc – Retrieve full document by path

3. list_docs – Browse all indexed documents

4. get_index_status – Check freshness, detect staleness (commits behind GitHub HEAD)

What I Learned: Go, MCP, and Building in Public

On Go for AI Infrastructure

Single binary deployment is underrated. No Python virtual environments, no dependency hell, no “works on my machine” problems. Just copy the binary and run. On Fly.io, the entire deployment is a single container with Qdrant and the MCP server supervised together. Simple.

Go’s type system caught bugs at compile time that would have been production failures in Python. Pass a []float64 where []float32 is expected? Compiler error. Forget to handle an error return? Compiler error. Pass an object that doesn’t implement the required interface? Compiler error. In Python, all three fail silently until runtime.

Learning curve? Steep initially. But Claude Code + the Go SDK made it manageable. I asked Claude to explain Go idioms (error wrapping with %w and context propagation) as I built. This is the meta part: I used AI to help build AI infrastructure. I would argue the value of having AI explain code is about as useful as having it write code.

Trade-off: Go’s AI/ML library ecosystem is much smaller than Python’s. For model training or deep learning research and even learning the basics, Python wins. But for infrastructure and orchestration tasks – building MCP servers, resilient back-end systems, agent platforms – Go really starts to shine.

On MCP

MCP is becoming what REST was for web APIs – a standard protocol everyone will adopt. Building MCP servers is surprisingly straightforward with the MCP Go SDK. The tool abstraction is elegant: you define input/output types, implement a handler function, register it, and the SDK handles all protocol details.

The ecosystem is maturing fast. In November 2024, MCP launched with reference implementations for GitHub, Slack, and Postgres. By late 2025, it reached 97 million monthly SDK downloads. OpenAI adopted it across their Agents SDK and ChatGPT desktop in March 2025. Google launched managed MCP servers for BigQuery, Maps, GCE, and GKE (now in preview), with Gemini SDK and CLI providing native MCP client support. In December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation for vendor-neutral governance.

This is the inflection point. 2026 will be the year MCP goes from “interesting” to “essential” for AI infrastructure.

On Building in Public

Being honest about using AI to help build AI tools feels meta, but authentic. People appreciate transparency.

Here’s the reality: I didn’t write every line of this Go MCP code from scratch. I described what I needed, Claude generated implementations, I reviewed and refined them until they met production standards. I understand the architecture I built, even if I had help writing the code.

This is how modern software development is changing. Fighting it is pointless.

What I’d do differently: Start with a smaller doc corpus for faster iteration (asking OpenAI to create embeddings and then summaries for 120+ documents on each failure added up quickly). Maybe over-engineered the initial chunking strategy – simpler would have worked fine.

A deployed project beats an opinionated blog post. People want to see real systems, not theoretical tutorials.

Why This Matters: Go, MCP, and the Future of AI Infrastructure

Three trends are converging:

Go’s Strengths for AI Infrastructure

Go has proven itself as the backbone of cloud infrastructure – Kubernetes, Docker, and Fly.io’s flyctl are all written in Go. Now it’s establishing itself as a serious contender for AI infrastructure too. AI agent systems need production-grade reliability, and that’s Go’s sweet spot.

True parallelism (not an issue for this project, but I suspect will be one as specialized agentic deployments increase in large organizations, especially high frequency ones). Type safety that prevents runtime failures. Low memory footprint that matters at scale.

Single binary deployment means no dependency management in production. My Dockerfile just copies two compiled binaries into a minimal container – no pip install, no node_modules, no runtime to maintain.

MCP Becoming Essential

97M+ monthly SDK downloads. 10,000+ MCP servers available. MCP replaces fragmented integrations with a standard protocol.

Every AI assistant will need MCP servers for production deployments. Building custom MCP servers is becoming the new “building REST APIs” – a table stakes skill.

Eino’s Production Pedigree

Eino has ~9.5K GitHub stars. It’s battle-tested at ByteDance scale (hundreds of services, millions of users).

This combination – Go’s performance, MCP’s standardization, Eino’s production patterns – represents a compelling alternative to where AI infrastructure is heading (outside of Python and LangChain).

Wrapping Up

I’m not claiming mastery. I’m sharing authentic building experience. Real deployed, truly useful systems beat theoretical tutorials every time.

Try It Yourself

Connect to the deployed server:

claude mcp add –transport http eino-user-manual https://eino-docs-mcp.fly.dev/mcp

Or run locally:

git clone https://github.com/mike-a-ellis/eino-docs-mcp
cd eino-docs-mcp
docker-compose up -d
go build -o eino-sync ./cmd/sync && ./eino-sync sync
go build -o mcp-server ./cmd/mcp-server && ./mcp-server

What’s Next

I’m writing about Go + AI infrastructure and MCP architecture patterns. This is the first piece in what I hope will be a series on building production AI systems with Go.

If you’re building MCP infrastructure for your Go stack, let’s talk architecture. DM me if you’re tackling similar challenges with AI tool integration.

What framework docs do you wish your AI assistant could search? Drop a comment – I’m curious what documentation gaps are slowing you down.

*Michael Ellis is building AI infrastructure with Go. Find the project at https://github.com/mike-a-ellis/eino-docs-mcp

By: Michael Ellis

Senior Application Development Consultant