This is a thought experiment, not a claim about how LLMs already work. I am not saying we should retrain every model in Lisp tomorrow.
I am just asking a small question:
what if a model used something Lisp-like as a compact inner language for reasoning?
Natural language is flexible, but it is also messy. The same idea can be said in many ways. Code is better, but most code still carries a lot of surface detail. Lisp is interesting because it is small, regular, and tree-shaped by default. Everything looks like an expression. That means less syntax variety, less ambiguity, and maybe a shorter path between “idea” and “representation.”
The core hunch
If a model could turn a messy sentence into a compact symbolic form first, it might spend fewer tokens on structure and more on meaning.
Something like this:
flowchart TD
A["User request in natural language"] --> B["Parse intent"]
B --> C["Small Lisp-like plan"]
C --> D["Reason over the plan"]
D --> E["Write answer in natural language"]
The thought is not “Lisp is magical.” (it definitely is but will let you take this copium for the rest of the page since I do want you to keep on reading)
The thought is: regular structure might be cheaper than free-form text.
Why this could matter
Transformers pay heavily for long sequences.
In the simplest view, attention cost grows roughly like:
\[C \propto n^2\]where $n$ is the sequence length.
So if a structured intermediate language reduced the working sequence from $n$ tokens to $kn$ tokens, where $0 < k < 1$, then the rough attention cost becomes:
\[C' \propto (kn)^2 = k^2 n^2\]That means even a modest shortening can matter.
For example, if $k = 0.7$, then:
\[\frac{C'}{C} = 0.49\]That is not a proof. It is just a reminder that shorter internal representations can have outsized effects.
Why Lisp?
Because Lisp is already close to a tree, and even John McCarthy created Lisp in order to develop a programming language that could effectively handle artificial intelligence research, so why the fuck not lisp?
This matters because reasoning often has tree structure:
graph TD
R["Goal: answer the question"] --> A["Find the main task"]
R --> B["Find the needed facts"]
R --> C["Choose output form"]
A --> A1["classify"]
B --> B1["retrieve"]
B --> B2["compare"]
C --> C1["write"]
In Lisp-like form, that kind of structure can be written very directly:
(answer
(classify task)
(retrieve facts)
(compare facts)
(write result))
That is the whole attraction.
The form is plain. The nesting is visible. The grammar is small.
So what’s the catch?
Current LLM tokenizers were not built for this idea.
A Lisp expression is only compact if the model, tokenizer, and training setup are all aligned around it. If not, you may just replace one kind of overhead with another.
Also, not all reasoning wants a rigid symbolic form, some tasks are fuzzy, some are stylistic, some need exploration before they need structure.
So this is not “Lisp beats language.” It is more like: maybe some parts of reasoning want a cleaner internal scaffold.
And I did found some research
A few papers make this thought experiment feel less silly:
- ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models shows that tokenization is not sacred. Models can work directly on bytes, which suggests the input representation is a design choice, not a law of nature.
- MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers shows that token-free modeling can stay competitive when the architecture is designed for long raw sequences.
- Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks shows that small tokenization details can measurably affect model quality. That makes representation choices feel more important, not less.
None of these papers say “use Lisp inside LLMs.” But together they do say something to the lines of how we represent text and structure really matters.
Here is a simple version of my idea
Maybe future models do not think in plain English all the way through. Maybe they move between layers like this:
natural language -> compact symbolic form -> natural language
And if that compact symbolic form were Lisp-like, it might be easier to parse, easier to compress, and easier to reuse.
Not because Lisp is old. Not because it is beautiful.
Just because simple trees are cheap. I do not think “LLMs should use Lisp” is a serious product roadmap. I do think it is a something to consider.
It forces us to ask whether today’s token usage is paying for meaning, or just paying rent on messy representation.