CoDA: Coding Agents on Databricks Apps

Mon, 01 Jun 2026 00:00:00 +0000

What is CoDA?

CoDA (Coding Agents on Databricks Apps) is a terminal running in your browser as a Databricks App with five coding agents installed and ready to use: Claude Code, Codex, Gemini CLI, Hermes, and OpenCode. Every model call goes through the Databricks AI Gateway. You open a tab and the agents are already wired to your workspace. Since it’s a Databricks app, the code, the credentials, and the context never leave Databricks. Every model call uses Foundation Model APIs, so there is no third-party egress, and actions respect your Unity Catalog permissions.

It ships with curated skills and MCP servers so you can go from idea to shipped feature fast, with the same governance you already trust for every other Databricks workload.

Your browser does not support embedded video. Download the promo.

What’s Inside

Claude Code — Anthropic’s agent with 39 Databricks skills and 2 MCP servers
Codex — OpenAI’s coding agent, pre-configured for Databricks
Gemini CLI — Google’s coding agent with shared skills
Hermes Agent — NousResearch’s multi-provider AI CLI with tool-calling
OpenCode — Open-source multi-provider agent

Every agent installs at boot. On your first terminal session you paste a short-lived PAT and all CLIs are configured automatically. Tokens auto-rotate every 10 minutes.

Why it lives on Databricks

First and most importantly, it gives you access to coding agents without running them directly on your desktop. You are getting a sandbox to go crazy without worry about accidentally deleting critical files. Secondary but still neat, it integrates with lots of pieces of Databricks including:

Unity Catalog governs all data access — agents only touch what your identity allows.
AI Gateway routes every LLM call through a single control plane — swap models, set rate limits, manage keys centrally.
MLflow Tracing captures every session for review and evaluation.
App Logs to Delta for long-term retention and dashboarding.

Try it

Watch a recording of me setting it up in 5 minutes:

The repo lives at databrickslabs/coding-agents-databricks-apps. Click “Use this template” on the repo to spin up your own copy and deploy it as a Databricks App in your workspace.

Random Learnings

I’m a contributor and maintainer for CoDA. Working with git and writing / reviewing code is nothing new for me. One thing that has been new for me is promoting it using this neat framework called ‘remotion’. It renders scenes using react so it pairs well with coding agents. That’s how I made the promo video. Another call out goes to Mureka, which I used to make the music. Specifically, I typed ‘chill, ambient, happy’ and clicked ‘Create’. Technology is unbelievable.

I did the same thing for my mom to help her fundraise for an ALS bikeride earlier this week. She was quite pleased. Good son points. Here’s her donate link:

Your browser does not support embedded video. Download the video.

Answer all of your RFPs really fast with the RFP Monster

Fri, 01 Sep 2023 00:00:00 +0000

Update

I found out my project was featured in Wired magazine in October! Check out the article here.

Introduction

In our last company Hackathon, my team won second place where the reward was a bunch of pats on the back and a little bit of prize money. This was not the first hackathon I’ve done nor was it the first where my team placed in the top three but I’m proud enough of this one to write about it. It was the first time I led our team on everything from software development to the business presentation. Our project was called the RFP Monster and our vision was this:

“Win more deals by enabling our sales force to answer lots of questions better and faster. That’s it.”

What is an RFP?

RFP stands for ‘Request for Proposal’. They are questionnaires that companies send out to prospective vendors when they’re considering buying software or a service. At a company that sells fairly sophisticated Machine Learning software, answering these is a not a trivial task. I’ve seen documents containing upwards of 200 questions (seriously) and answering them thoroughly requires knowledge of pretty much every part of our company. Besides the nitty gritty details of how it handles machine learning problems, there are often questions about our architecture, security, downtime, and pricing among other topics. They are annoying hoops jump through and answering them well involves people from a bunch of different departments. I can’t imagine who on the customer side would actually read through all of the responses. Yet they are a necessary evil. Many prospects won’t talk to you unless you fill them out.

The Solution

The obvious answer to this problem for anyone familiar with Large Language Models (LLMs) and vector databases is to use RAG which stands for Retrieval Augmented Generation. The idea is that you take all of your relevant documentation, split them into chunks, and shove them into special type of database that can encode text data. Then for a given question, you encode it, retrieve the chunks of text from the database that are most similar to it, and shove the chunks in your prompt as context to help something like chatGPT generate an answer. With packages like langchain and local open source vector databases like FAISS, this was pretty easy to build. To make our answers a little more specialized, we loaded about 600 pages of our platform documentation, our security policies, and a few RFPs we had answered before.

This alone would not have placed us in the hackathon though. By this time, RAG was a familiar method in the company. To take it a step further, I wanted to figure out a way to interlink RAG answers with traditional machine learning. To do this, our team took an RFP with about 150 questions, generated 150 answers. And then hand labeled them as Correct or one of multiple classes of Incorrect, such as “Insufficient Knowledge Base” or “Hallucination”. We then trained a classifier (we called this an audit model) on the labeled responses and used it to assign a confidence score for any answers generated by the RFP Monster. To take it a step further, we set up a configuration in our web app that would re-query the LLM in different ways if the confidence score was low enough. The diagram I made for this work is below:

We had a couple other bells and whistles in there too, like a way to submit new question answer pairs to our vector database and a system to give feedback and retrain our audit model.

Post Hackathon

My company marketed our project pretty heavily since we presented. They even made me take away the cookie monster stuff and make a more professional demo video of it, which is on the DataRobot Youtube Channel. You can watch both and tell me which one you like better.

RFP Monster (My Hackathon Video)

GenAI and Predictive AI Demo (The professional looking one)

Credits

Cynthia did a lot of the hand labeling for our audit model and Kyle implemented a neat re-querying strategy. A special shout out to John who asked to join my team as an intern and did a bunch of backend work with our vector database and alternative LLMs. Thanks teammates!

agentic | MPK