From ChatGPT to Production - Operationalizing AI - The Devops Conference - Eficode

When ChatGPT landed, every part of our company reacted differently. Product saw endless possibilities, marketing wanted to capitalize, CXOs smelled revenue, and engineering said “damn, how do we make this work?” As VP engineering at Showpad, I lived through this firsthand and learned there are five fundamental building blocks: the prompt, the model, the data, APIs and agents, and the orchestration framework tying it all together.

Prompts are deceptively complex. A single question-answer seems simple, but production use means chaining multiple prompts, validating structured output against schemas, handling retries when the model returns garbage JSON, and dealing with the fact that adding a comma can change behavior completely. Moving between GPT-3 and GPT-4 breaks everything again. The model landscape is equally volatile – open source models caught up faster than anyone expected, and the choice between running your own versus using a service has real implications for cost, privacy, and SLA guarantees.

RAG (Retrieval Augmented Generation) became our primary pattern for bringing company data into the mix without exposing everything to the LLM. The chunking strategy matters enormously – split in the middle of a sentence and your similarity search returns garbage. We also learned that customers trained on keyword search resist switching to natural language queries, so showing both result types side by side was the pragmatic solution.

The delivery pipeline looks familiar but different. Prompts, code, and orchestration deploy together as containers, but integration testing is where things get weird. How do you write rules for what makes a good quiz question? We ended up with three approaches: NLP-based relevance checks, specialized small models for specific tasks like toxicity detection, and using LLMs to evaluate other LLMs – scary inception, but possibly the only way to scale.

Operations requires continuous data quality monitoring in production, not just the traditional latency and throughput metrics. Run your test set as a health check against the live system. Monitor PII leakage. Think about LLM firewalls. Capture user feedback through copy buttons, regenerate actions, and inline editors. The biggest organizational shift: data engineers need to move toward production. We went through several team structures before landing on embedding AI capabilities into the platform team. Budget 20% of compute spend on AI, factor in an innovation tax for constant re-architecturing, and start now – your competitor already has.

Watch on YouTube — available on the jedi4ever channel

This summary was generated using AI based on the auto-generated transcript.

Get notified of new posts