Binary vs JSON for MCP: I Went Looking

Protobuf beats JSON. GraphBinary beats GraphSON. So I went looking for where a binary wire helps MCP. The answer, in the venues I tested, was narrower and more interesting than I expected: not text-heavy tools, not JSON-optimized structured payloads, but exactly where JSON is forced to fake binary.

Protobuf beats JSON on CPU and memory. Apache TinkerPop’s GraphBinary, same story over GraphSON - I’m living that migration at work, and we’re already seeing memory pressure ease. The tradeoff is old: text is readable but costs CPU and bytes, and past a certain payload size that cost outweighs the convenience.

I wanted to go looking for that. My first thought was the Language Server Protocol - but LSP is a decade old: settled, widely shipped, and with little appetite for changing the bytes on its wire.

MCP - the Model Context Protocol - is the opposite: young, still malleable. It’s how coding agents talk to tools, and it’s JSON-RPC - over stdio locally, or HTTP to a remote server. Agents make a lot of those calls, all JSON text on the wire. If binary is ever worth it, a busy MCP server is where it should show up.

So this is an exploration, not a thesis. I hoped for binary to win, the way it does in the protocols above. It doesn’t. The benefit turned out narrow and venue-specific.

Two assumptions I had to drop first, so nobody else burns a weekend on them:

I hoped a smaller wire would also help the client. It doesn’t. A binary decoder isn’t free, so the client rarely decodes faster - and even when it does, the saving is noise. A tool call’s latency is a rounding error next to the model’s own: the agent spends its wall-clock waiting on the LLM, not parsing responses. The benefit, where there is one, lives on the server.

I half-hoped fewer bytes might ease the model’s context budget. It can’t. The model pays for the decoded content, not the wire bytes - MessagePack and JSON decode to the same thing. Binary changes what the machine pays, never what the model pays. (Keeping the payload out of the response entirely could, more on that at the end.)

So this isn’t about faster agents or cheaper context. The only question left: can binary make a busy server cheaper to run - and where?

I tried three venues. Only one produced a clean win:

Three venues, one clean win. Binary barely touches text, loses to a JSON-optimized implementation, and only wins decisively where JSON is forced to carry binary as base64.

Here’s the whole post in one line: a binary wire format doesn’t automatically win. It wins exactly where the system was already paying a text tax it didn’t have to. The three venues are three tests of that single idea. The rest of the post is how I got to each verdict, and why the last one is the only one I’d stake a claim on.

Act 1 - The wire, and a benchmark that lied to me

The protocol: `msgpack-v1`

The design goal was to change how MCP messages are encoded without changing what they mean. MCP is JSON-RPC: methods, request and response IDs, params, results, errors. None of that semantics needed to move. Only the bytes did.

The frame is deliberately boring:

4-byte unsigned big-endian length
MessagePack payload

stdio and TCP are byte streams - they hand you bytes with no idea where one message ends and the next begins. JSON-RPC gets away with this because text has delimiters you can scan for. Raw MessagePack bytes don’t; a 0x0a mid-payload is data, not a boundary. So each message declares its length first: read four bytes, learn the next N are one message, wait for all N, slice it, hand back the payload. That is length-prefixed framing.

Read the 4-byte header, learn the payload length N, wait for a full frame, slice exactly one message. The length prefix is what turns a boundary-less byte stream back into discrete messages.

The one detail worth dwelling on is the size cap: four bytes put a 4 GiB ceiling on any message, and a default 64 MB limit means a corrupt header can’t trick the reader into allocating gigabytes before it gives up. The rest is just convention - a big-endian length, because that’s network byte order.

Writing a frame is the mirror image of that read path. The MessagePack step is a single library call in every language; the framing around it is just as short - prefix the payload with its big-endian length. Here it is in Python and Go:

def encode_frame(payload: bytes) -> bytes:
    if len(payload) > 0xFFFFFFFF:
        raise ValueError(f"frame length {len(payload)} exceeds uint32 maximum")
    return len(payload).to_bytes(4, "big") + payload

func encodeFrame(payload []byte) ([]byte, error) {
	if len(payload) > math.MaxUint32 {
		return nil, fmt.Errorf("frame length %d exceeds uint32 maximum", len(payload))
	}
	frame := make([]byte, 4+len(payload))
	binary.BigEndian.PutUint32(frame, uint32(len(payload)))
	copy(frame[4:], payload)
	return frame, nil
}

Proving parity first

msgpack-v1 was implemented three times - TypeScript, Python, and Go - against shared golden fixtures, so “the same message” means byte-identical frames across languages. That cross-language contract was the point of Act 1: before measuring anything, I needed proof that JSON and msgpack-v1 carried exactly the same semantics, byte for byte - the only reason the later numbers compare at all. The protocol was never the hard part. Proving it honest was.

The benchmark that lied to me

With parity established, I went for the obvious thing first: latency. I ran the real mcp-server-git tools over stdio - git_status, git_diff, git_log, git_show, git_show_blob - JSON against msgpack-v1, comparing p50. Mostly noise. git_status was flat: ~16 ms for JSON, ~15.7 ms for MessagePack. Some tools a hair faster, some a hair slower.

The first real lesson hid in that noise: a binary wire format is not automatically interesting for text-heavy tools. Those git tools return blobs of text - diffs, logs, file contents. Text is text in either encoding; MessagePack barely gets to do anything. The bytes shrink a little and the rest of the path swamps it.

So I changed the payload. I added a synthetic, benchmark-only tool that returned a deterministic review-context object - files, hunks, lines, annotations, structured metadata. Not terminal text: object-heavy MCP structuredContent, the shape I actually cared about.

The byte win showed up immediately: about 14% fewer bytes on the wire, consistently. And latency got worse - 2.22 ms for JSON versus 2.88 ms for MessagePack, about 30% slower. On a fast laptop, over local stdio, the cost of encoding and decoding MessagePack was bigger than the payload savings.

I was measuring the wrong thing, on the wrong machine. Local stdio on a fast laptop is a terrible proxy for a busy remote server: for one user on one call, a few milliseconds of decode overhead hides any benefit. The real question was never “does my client see a lower p50?” It was: can a loaded server do the same work with less CPU and memory?

So I stopped chasing client wall-clock and went looking for a server-resource win on a real remote server. That reframe set up everything after:

Server CPU and memory are the decision metrics - what an operator actually pays for.
Client p50 latency is a guardrail, not the headline - it catches broken compatibility and ugly regressions, but it answers a different question.

The data forced this, not me - and it’s why everything after Act 1 is a remote benchmark.

Act 2 - GitHub’s Go server, and the trap an agent built

A synthetic Python tool proves a point, but felt Meh. I wanted a real out-in-the-wild use-case. Hello to GitHub’s MCP server, written in Go.

I didn’t want to benchmark GitHub’s API or the public internet, so I captured real pull-request context with my local gh CLI, sanitized it, and replayed the snapshots through benchmark-only tools. During measurement the server never touched the network - just GitHub-shaped payloads at three sizes: small (the go-sdk PR 972), medium (the github-mcp-server PR 2570), and large (kubernetes/kubernetes PR 139316).

The false win: MessagePack outside, JSON inside

The first Go run was a disaster for MessagePack - slower and burning more CPU. Less I/O, a true WTF moment.

I’d told the coding agent driving this - Codex - to replace the Go server’s transport: rip out JSON-RPC-over-JSON, put msgpack-v1 in its place. What it actually did was keep JSON exactly where it was and bolt MessagePack on top. The path looked like this:

JSON-RPC object
  -> json.Marshal           (build the JSON)
  -> json.Unmarshal         (back into generic values)
  -> msgpack.Marshal        (now finally binary)

That is not a binary protocol. It’s the Men in Black bug in an Edgar suit: JSON stuffed inside a MessagePack skin, lumbering around and running both serializers on every call to keep the disguise from splitting. I asked for a replacement and got a costume.

The lesson is about the agent, not the wire. Be careful what you hand Codex’s /goal - it will bend the rules you set to reach it. I said replace JSON; it heard make the benchmark run and kept JSON alive underneath to get there. Read the diff, not the commit message.

Each generation pushed JSON further off the result path: a wrapper that ran both serializers, then an SDK that still stored the result as raw JSON, then a result that writes itself to MessagePack. The wire format barely changed; where the work happened did.

So I pointed the agent at the SDK itself - vendoring the official Go SDK and patching the transport inside it. Better, but it still lost: the SDK holds params and results as json.RawMessage, so even with a native frame the large nested payloads still crossed a JSON boundary on the way out. The envelope was never the expensive part; the response body was.

The honest verdict

So for the cleanest test I made the result encode itself: instead of marshaling the handler’s CallToolResult to JSON and transcoding that, the struct writes its fields straight to MessagePack. Then I held everything else fixed: one direct Go HTTP endpoint, same process and same handler, serving JSON as application/json and MessagePack as application/msgpack. The encoding is the only variable left.

Direct Go HTTP, same process and handler, encoding the only variable. MessagePack saved bytes and nothing else: client latency was flat, and the server-side samples showed no CPU win - if anything, slightly more.

MessagePack saved about 12% of response bytes (and ~20% of the small request bytes). That was the entire benefit. Client p50 was effectively flat, and the server-side request samples showed no CPU win - if anything, MessagePack did slightly more work per call.

The reason isn’t that “JSON is cheap.” It’s that this implementation is optimized for JSON. Because the result body stays json.RawMessage all the way down, the native encoder still has to parse it and re-emit it as MessagePack - a transcode the JSON path never pays, on the payload that is most of the response. For object-heavy structured JSON in a stack already tuned for text, the binary wire simply doesn’t earn its keep.

That’s a non-result, and I’m reporting it as one. It sharpened the question: maybe binary only wins where JSON does something genuinely wasteful - not encoding structure, but smuggling binary through text.

Act 3 - Where binary actually wins: base64 media

There’s exactly one place MCP forces JSON to do something it’s bad at. The official Filesystem MCP Server has a read_media_file tool - read an image or audio file, return it to the agent. Because the result rides in JSON, the bytes go out as base64: a fixed ~33% size tax, plus the CPU to encode on the way out and decode on the way in. That’s not JSON encoding structure. That’s JSON faking binary.

So I built a variant of that server that returns the media differently per format. JSON returns the official base64 string. MessagePack returns the same bytes as a MessagePack bin, a raw byte blob - no base64 anywhere on its path. Same process, same HTTP endpoint, the file read fresh on every call. The only difference is whether the media is text-encoded or carried raw.

This is where binary should win if it ever does. It does - cleanly, on every metric, including the latency guardrail.

Filesystem media, JSON base64 vs native MessagePack binary. Shorter bar = less work. Bytes, server CPU, RSS, and latency all move the right way - locally and on a Fly machine.

The shape held everywhere I looked, and it scaled with payload size. Locally, a 1 MiB file shed 25% of its bytes, ~35% of server CPU, and ~6% of RSS; a 5 MiB file shed the same 25% of bytes, ~38% of CPU, and ~29% of RSS. On a Fly performance-1x machine the 5 MiB workload went further still - −25% bytes, −70% server CPU, −17% RSS.

The byte savings are exactly the base64 tax, removed. The CPU drop is the encode/decode work that tax used to require. And for the first time, latency dropped too - every local and remote run saw lower p50 and p95, with no base64 round-trip on either end. Every metric moved together; nothing cherry-picked.

One caveat, and it’s a different kind than the others. Acts 1 and 2 only swapped the wire encoding: same MCP, same data model, a binary transport any server could adopt as-is. Act 3 is the one place I changed what the protocol is allowed to carry - media as a native binary blob instead of a base64 string. Today’s schemas mandate base64, so this isn’t a drop-in; the win is conditional on MCP changing its media contract. That’s the experiment’s finding, and also its boundary.

The door I didn’t walk through

Act 3 says binary beats base64 for media that rides inline, in the tool result, on every call. There’s a deeper question behind it.

MCP already has an escape hatch: a tool can return a resource_link - a typed URI pointing at a Resource - instead of embedding the data. The client fetches it separately, out of band, and only if it actually needs it. If the expensive thing is a 5 MiB blob, the most effective encoding might be not putting it on the response path at all: hand back a reference and let the client pull the bytes, in whatever format, on its own schedule. It’s also the only move here that touches the model’s context budget: a reference the client never resolves is a payload the model never pays for.

That turns “which encoding?” into “does this payload belong in the response at all?” - and it’s the next thing I want to measure.

How I measured

The thesis only counts if it survives a real implementation, so I didn’t reimplement any servers. I vendored the actual upstreams and swapped the transport underneath them: the official Python mcp-server-git, GitHub’s Go MCP server, the official Go SDK (v1.6.1), and a modified build of the official TypeScript filesystem server that can return media as raw bytes. The benchmark client reused the MCP stdio/HTTP plumbing from a fork of OpenCode, which is also where the client-side MessagePack transport lives.

That list is itself a finding: every honest test forced a fork. None of these systems - not the client, not the SDK, not either server - let you swap the wire format from the outside. The transport is woven through each of them, which is exactly why the agent’s “wrapper” in Act 2 was such an easy mistake to make, and why proving the binary path goes all the way down meant patching each upstream by hand. (I’m not publishing those forks yet.)

The local laptop and the Fly machine answer different questions. Within each, the only comparison that counts is JSON vs MessagePack on the same machine under the same topology.

The two machines are not comparable to each other - the only valid comparison is JSON vs MessagePack on the same machine under the same topology. Server CPU is the cgroup’s usage_usec or the process counter; RSS is physical memory the server process holds, sampled from /proc. Client p50 is the guardrail.

Concluding

I went looking for a place where binary beats text in MCP, expecting to find it nearly everywhere. I found it in nearly one place.

A binary wire is easy - four bytes of length and a MessagePack payload. Making it matter meant finding the one venue where text was the wrong tool, and reporting that everywhere else it wasn’t.

So here’s the protocol-design implication, as concretely as the evidence supports it: MCP probably doesn’t need a wholesale binary replacement for JSON-RPC. In every venue but one the JSON wire was already as cheap as the binary one, sometimes cheaper, and a stack tuned for JSON pays more to go binary. What the evidence points to instead is a narrow escape hatch: a way to carry media as native bytes instead of base64, or stronger guidance toward resource_link for large blobs - so the one case where text genuinely taxes the protocol stops being the mandatory default.

A binary wire format doesn’t automatically win. It wins exactly where the system was already paying a text tax it didn’t have to.