Building on moving ground: Life as an MLOps engineer at Kensho

EngineeringCompany

Apr 28

Kensho's Alec Alameddine shares what two years in MLOps taught him about engineering in a field where the tools, patterns, and ground beneath you are always changing.

There’s a particular kind of engineering problem that doesn’t fit neatly into how we usually think about designing systems: the kind where the foundation you’re building on is still being built itself. At Kensho, this is the norm. Engineers move fast and start building before the path ahead is fully clear. Technologies shift, customer needs evolve, and there’s no obvious “right” way to proceed.

At first, earliness can feel costly: setup takes longer, and you worry about decisions that might not age well. But over time, you realize that building on the frontier trains you to focus on what actually matters.

What are you actually trying to solve?

Focusing on what matters starts with asking the right questions.

When there’s no precedent, the usual shortcuts disappear. You’re forced into first-principles thinking: Before asking “What’s the right way to do this?” you ask, “What problem am I actually solving?” From there, you start identifying assumptions, figuring out what’s likely to change, and deciding what deserves the most focus. Only then does an ensuing tradeoff analysis become meaningful.

During my time at Kensho, I’ve come to see MLOps in a new lens: it’s really about building solutions for customers. Whether that’s technical folks at Kensho, financial teams at S&P Global, or external users, the starting point is the same: you must truly understand a problem before trying to create a solution. That’s why we use tight feedback loops like weekly office hours, tech talks, and regular pain-point interviews in order to stay grounded in real user needs.

When we built agents to deliver S&P Global’s financial data to external customers and perform workflows for them, our first question wasn’t technical. Instead, it was “What data do they want and why?” To answer that, we spoke directly to potential customers to learn about their use cases. Once we truly understood their workflows, it became clear that regardless of the data each customer wanted, the most impactful way to deliver value would be to integrate our agents into the LLM platforms they already use, like Claude, Gemini and ChatGPT.

But understanding the problem is only half the story.

Challenges of building with new technology

Building on the frontier means figuring things out as you go along.

Even after deciding on an approach, implementation is rarely straightforward. Nascent tech often comes with incomplete documentation, competing approaches, and answers buried in source code or git issues.

To integrate financial agents into LLM clients, we created a new platform built around Model Context Protocol (MCP), one of the leading standards for agentic communication. We evaluated which dependencies could support the functionality we needed, though this changed over time as our needs, the protocol itself, and its downstream libraries evolved. Even patch updates would create silent bugs, which encouraged us to put more time and effort into establishing a minimal set of dependencies (a great practice in its own right).

As with most emerging standards, using MCP with third-party systems often exposes mismatches in how key behaviors are defined and implemented. This creates issues when components of the same stack conflict.

We encountered this when we implemented Dynamic Client Registration (DCR) according to the MCP auth specification. We wanted authenticated users to be able to register new clients at runtime, but we discovered that our backend identity provider, Okta, does not support this because it defines DCR differently from MCP. With Okta’s model, administrators must manually approve every app a user wants to connect to our agents. As such, we created our own layer to handle the discrepancy.

Sometimes inconsistencies are more cryptic. In one case, our agents failed to connect to one of the most widely used LLMs without any visible errors. After meticulously combing through our logs and codebase, we realized the bug came from just one line of source code hidden within a nested dependency. In a stroke of bad luck, this dependency had recently become incompatible with a single field of a single endpoint of the LLM platform’s newly updated MCP flow. Diving deep with this level of granularity is invaluable for truly understanding a system.

But why does that matter if everything is going to change anyway?

Build for the shape of the problem, not just surface details

Underlying constraints matter more than the tools used to solve them.

Even though implementations change often, the constraints that shape problems tend to persist and recur.

Authentication will always involve tradeoffs between security and usability. APIs will always involve tradeoffs between flexibility and simplicity. These tradeoffs take on new forms as tools evolve, but they don’t go away.

Working with new technology forces you to become intimately familiar with the underlying problem shapes. Every time you trace a problem to its root, your instincts sharpen. You improve at seeing what actually matters now, and what will still matter when everything else changes. That’s the true advantage of building on moving ground.

Afterword: Reflecting on Team MLOps and my Kensho experience

Team culture means everything when building on the frontier.

In a broad sense, we are all a product of our environment. I’ve spoken about my experience working in MLOps, but there’s an important aspect I haven’t expanded on yet: the team culture that underpins it.

MLOps work is, in many ways, defined by building on moving ground. Our primary customers are engineers working on novel problems, where both the requirements and the tools are constantly evolving. To operate effectively in this environment, the team is built around continuous learning, close collaboration, flexibility, and an emphasis on deep understanding across both ML and infrastructure.

Within the team, we meet weekly to discuss what each of us is working on, ask questions, and offer feedback. We also share what we’re learning with others through discussion series, tech talks, and ML syncs. Additionally, we set aside a day each month to learn a new topic of our choosing.

We regularly conduct pain point interviews and join other teams’ standups to understand how their workflows are evolving and where friction arises. Addressing that friction requires depth across both ML and infrastructure. Sometimes this means refining existing systems, like when we improved ML pipelines for audio ingestion and transcription. Other times, it means building new ones, such as a custom inference server to support new R&D models with exotic architectures. In both cases, it requires understanding of both the modeling side and the systems needed to serve ML reliably.

Our culture doesn’t just shape how we build systems; it also shapes how we grow.

On a personal level, I’m extremely grateful for the opportunities this role has afforded me. I’ve had the freedom to take on disparate projects across the stack, work with very different types of customers, and explore new ideas with uncertain outcomes. When ideas didn’t work out, they were seen not as failures, but instead as new insights into what holds up in practice and what doesn’t; many of my important lessons come from these experiences.

Working on open-ended research efforts with no clear timelines taught me how to stay persistent through ambiguity, examine problems from every possible angle, and push toward optimal solutions by carefully refining the smallest details. In contrast, working on products for external customers with strict, time-sensitive deliverables taught me how to manage execution risk and prioritize “must-haves” over “nice-to-haves.” Navigating this wide range of experiences has allowed me to grow immensely, both as an engineer and as a problem solver more broadly.

Together, these tenets make it possible to build effectively in an environment where the ground is always moving.

Machine LearningMLOpsMCPCultureFeatured

Alec Alameddine