How fast can GattyWorks build a website?

A marketing website goes live in 24 hours. A full-stack web MVP, with auth, database, and payments wired, ships in 48 hours. Every brief gets a written reply within 24 hours, or the website fee is refunded 100%.

What does a 24-hour website cost?

A Sprint Site (24-hour marketing website) starts at $599. A Sprint MVP (48-hour full-stack build) starts at $1,999. AI workflows start at $1,799 and custom agents at $2,999.

What is the 24-hour reply guarantee?

Every brief sent to GattyWorks gets a written reply within 24 hours. If we miss that window, the website fee on your first project is refunded 100%.

Does GattyWorks build AI workflows and agents?

Yes. GattyWorks builds custom AI workflows wired into your CRM, inbox, and ops tools, plus bespoke AI agents with evals, monitoring, and handover runbooks deployed to your own infrastructure.

Where is GattyWorks based?

GattyWorks is a senior-led build studio based in Bangalore and Mangalore, India. We work with clients worldwide and ship entirely remotely.

← All news

July 5, 20268 min read

Show the model a picture and cut your token bill by up to 70 percent

A local proxy turns Claude Code's bulky context into PNGs before it leaves your machine, cutting real bills by 59 to 70 percent. It forgets exact strings, and the project says so upfront.

48,000 characters of system prompt cost 25,000 text tokens, or 2,700 as one dense PNG.

pxpipe is a local proxy that sits between Claude Code and Anthropic's servers. It finds the bulky text in your requests, the system prompt, tool docs, old conversation turns, and rewrites it as PNG images before the request leaves your machine. The model reads the images with its vision encoder instead of billing every character as a text token. On real Claude Code traffic that lands as a 59 to 70 percent lower end to end bill, and the project says upfront where it can quietly get an answer wrong.

Why an image can be cheaper than the text inside it

A model tokenizes an image by slicing it into fixed patches and encoding each patch as one vision token, no matter how many characters those pixels happen to spell out. Text works the opposite way: it is billed per token, and dense content, code, JSON, log output, runs close to 1 character per token. Measured across 391 production rows of real Claude Code traffic, the average was 1.91 characters per text token. The same content rendered as an image packs about 3.1 characters per image token. That roughly 3x gap is the entire trick.

	Tokens	Cost driver
As text	25,000	characters, dense content near 1 char/token
As one PNG	2,700	fixed by pixel dimensions, not character count

Same 48,000 characters of system prompt and tool docs, read at 100/100 on a clean eval either way.

How the pipeline actually works

The proxy intercepts every outgoing POST /v1/messages call, wraps eligible text at 1928px wide columns, and packs roughly 92,000 characters per page before turning it into a PNG.

tool_result string -> wrap at 1928px-wide columns -> pack ~92,000 chars/page -> PNG[]

It splices the images back into the request in a cache-friendly way, so the static prefix stays intact and Anthropic's prompt caching still works. Every request logs to ~/.pxpipe/events.jsonl. Setup is genuinely thirty seconds:

npx pxpipe-proxy                                  # proxy on 127.0.0.1:47821
ANTHROPIC_BASE_URL=http://127.0.0.1:47821 claude  # point Claude Code at it

A dashboard at http://127.0.0.1:47821/ shows tokens saved, every text to image conversion side by side, and a kill switch. Responses stream normally, because the proxy only touches the request, never the model's reply.

Not every block gets imaged

A 1928x1928 image costs about 4,761 vision tokens and holds about 92,000 characters, so plain text only wins once content runs above roughly 19 characters per token. Claude Code traffic runs around 1.91 chars/token, comfortably under that line, so imaging pays off. A runtime estimator checks the math block by block and only images the blocks that would actually save money. Three kinds of block are eligible:

Large tool_result bodies above about 6k characters of dense content: file reads, command output, logs
Older conversation turns behind the live tail (recent turns always stay text)
The static system prompt and tool documentation slab

Sparse prose, your live messages, and the model's own output all pass through byte-identical. Models outside the allowlist skip imaging entirely.

The honest part: it forgets exact strings

This is the tradeoff the project leads with instead of burying. Ask an imaged page to recall an exact 12 character hex string and Claude Opus got 0 out of 15 right. Claude Fable 5 got 13 out of 15. The failure mode is the part worth remembering: misses are not errors, they are confident wrong answers. A blind read of exact identifiers off a rendered page tops out around 63 percent, and every miss lines up with a glyph confusability matrix, meaning the model is misreading similar-looking characters, not hedging.

The repo documents a real miss outside the benchmarks. Over weeks of daily use, the model once recalled a person's name from imaged chat history and got it confidently wrong. No error, just a plausible wrong name. Coding sessions tolerate this because the agent rereads files before it edits them. Plain chat recall has no such safety net.

The rule of thumb the project ships with: image the context the model only needs to understand, keep as text anything it must reproduce word for word. IDs, hashes, secrets, and exact quotes stay as text. There is an escape hatch too: subagents on models outside the allowlist pass through as text entirely, so byte-exact work can be routed to a subagent with CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4-6.

What the benchmarks actually show

Test	Text	Imaged	Tokens
Novel arithmetic, Fable 5 (n=100)	100%	100%	-38%
Novel arithmetic, Opus 4.8 (n=100)	100%	93%	-38%
Gist recall A/B, Fable 5 (n=98/arm)	98/98	98/98	-
Verbatim hex recall, Opus (n=15)	15/15	0/15	-
Verbatim hex recall, Fable 5 (n=15)	-	13/15	-

Every test uses novel random numbers the model cannot have memorized.

On full coding tasks, a 10 instance SWE-bench Lite pilot resolved 10 out of 10 on both the compressed and uncompressed arm, at 65 percent smaller requests. A harder 19 pair SWE-bench Pro run resolved 14 out of 19 with imaging on versus 15 out of 19 off, at 60 percent smaller requests per call. Verdicts agreed on 18 of the 19 pairs, and the one split pair re-resolved 3 out of 3 on a repeat run, which points at normal run to run agentic variance rather than a real regression. The author is upfront that these sample sizes are small.

How the dollar figure is actually measured

The savings number avoids the usual trick of only counting the tokens it touched. For every request, the proxy fires a free count_tokens probe on the original, uncompressed body in parallel with the real, compressed one, then reads Anthropic's actual billed usage off the response. Both land in the same row of the event log, so there is no way to quietly cherry-pick which requests count.

The headline number is end to end: every small request left untouched, every cache write and read, and all output tokens, which the proxy never compresses, all count in the denominator. On a 13,709 request snapshot that came out to 59 percent, so a $100 bill becomes about $41. A later 8,904 request trace measured about 70 percent. The compressed-only slice runs higher, about 72 to 74 percent, and the author quotes that separately rather than as the headline.

Efficiency gain, or pricing loophole

The unresolved argument splits into two camps. One side points to DeepSeek's OCR paper (arXiv 2510.18234), which found large input token cuts at close to full accuracy, and argues image tokens legitimately cover more characters because both text and image tokens embed down to the same size vector. The other side argues the model still has to resolve the image back to a text-equivalent representation internally, so the compute cost is comparable, and the whole saving exists only because Anthropic prices image tokens generously. If that price moves, the arbitrage closes with it.

Some of that skepticism showed up directly in the Hacker News thread. A few commenters called the README rough going, one calling it painfully written, a fair complaint for a project whose own commits are mostly authored by Opus and Fable agent sessions rather than a person writing prose by hand. It does not change the numbers, but it is worth knowing before you go digging in the docs.

pxpipe itself picks a side by measuring rather than arguing: it reports the token cut, which does not move with pricing, separately from the dollar figure, which does. It only turns imaging on for models that have actually been tested to read renders reliably. The project has grown fast since this discussion went up: from about 765 stars and 38 forks at the time of writing to 1,943 stars and 118 forks as of this week, still MIT licensed, still around 96 percent TypeScript, now at release v0.8.0.

Why a build studio cares

We run agents that make a lot of model calls: long research sessions, coding agents, tool heavy workflows. Every one of those pays for a growing pile of old conversation history and repeated tool documentation on every single turn. A 59 to 70 percent cut to that bill is the difference between an agent you run all day and one you ration.

The catch is the same one we would tell a client building on any lossy system: know what you are allowed to lose. Fine for a coding agent that rereads a file before it edits it. Risky for a support bot that has to repeat an order number back correctly. The fix is not to avoid compression, it is knowing which parts of your context can never be approximate, and keeping only those as text.

Next step: run npx pxpipe-proxy against a real Claude Code session and watch the dashboard at 127.0.0.1:47821 tally what plain text would have cost. Read the source at github.com/teamchong/pxpipe, MIT licensed, or the original discussion on Hacker News. If your agents are burning tokens on repeated context and you want a hand tuning the setup, write to us at hello@gattyworks.com.

AI EngineeringLLMOpen SourcepxpipeClaudeCodeLLMTokensTokenCostVisionTokensAIEngineeringOpenSourcePromptEngineeringAnthropicAPIMachineLearning