Program-as-Weights: compiling fuzzy functions into local LoRAs

paper-reviewLLMLoRA

Found an exciting paper this week and wanted to get a few quick notes down while it's fresh.

Program-as-Weights (PAW) is a new way to build what the authors call fuzzy functions: tasks that resist clean, rule-based implementation, e.g. flagging important log lines, repairing malformed JSON, or ranking search results by intent. Today we outsource these to an LLM API call per input. PAW instead compiles the function definition once into a tiny local model, and reuses it for every call after that.

A nice way of looking at it: the paper turns the foundation model into a tool builder, not a per-input solver. You call it once per function definition, then run a compact local neural artifact repeatedly.

The pipeline

There are two distinct phases: a one-time compilation step per function definition, and a cheap runtime step for every call.

Diagram of the PAW pipeline: a spec compiles through a pseudo-compiler and LoRA compiler into a LoRA adapter, which an interpreter then loads at runtime

Compile once per function, then run forever on a 0.6B interpreter.

Compilation (once per function):

You write a natural language spec for the function, e.g. "alert on log lines describing an outage".
A frozen, bog standard 4B model turns the spec into a pseudo program: a cleaned restatement of the task plus a handful of input/output examples.
The spec and pseudo program are fed into a second, trained 4B "LoRA compiler". It reads both in a single forward pass and emits a LoRA adapter: about 38.5M parameters at rank 64, roughly 23MB once quantised to Q4_0.

Interpretation (every call):

A frozen 0.6B interpreter (Qwen3) loads the pseudo program as context, hot swaps in the LoRA adapter, and generates the output autoregressively. No network call, no Anthropic bill, just a tiny local model with a small file bolted on.

Training

The pseudo compiler is never trained, it's just prompted. Only the LoRA compiler, and a small mapper that projects its hidden states into LoRA mixing coefficients, is trained end to end, with gradients flowing back through the frozen interpreter. The authors built FuzzyBench, a 10M example dataset spanning 800+ categories of fuzzy text tasks (classification, format conversion, parsing, fuzzy matching, tool use, and more) to train it.

Note: One thing which is not yet clear is how well this method generalises across more diverse tasks.

Why I find this exciting

A 0.6B interpreter running a compiled PAW program matches Qwen3-32B direct prompting on these tasks, using roughly a 1/50th of the inference memory, and runs at ~30 tokens/second on a MacBook M3.

For anyone reaching for an LLM for "simple", repetitive tasks in production, this matters:

No more paying frontier model prices for a call that's really a fuzzy if statement.
No network latency or upstream outages.
You can run and test these functions locally and in CI, instead of mocking LLM calls.

Those are the immediate benefits. But the more interesting thing is how far can we scale this up? Is there a way to go from programs as weights to workflows as weights? Hopefully I can find some time (and tokens) to give it a try!

References

Wentao Zhang, Liliana Hotski, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng. Program-as-Weights: A Programming Paradigm for Fuzzy Functions. arXiv:2607.02512, 2026.
Paper page and discussion on Hugging Face.