(2 mins read)
LLM Structured output and real-time parsers
Standardization of JSON mode has been like a fundamentally psychologically limiting
thing for developers when it comes to generating structured outputs from LLMs. Here
are a couple of interesting things I've discovered around that.
I made this library an year ago to parse JSON streams to achieve things like this:
By JSON streams, I mean taking something like this: {"title": "This is an unfinishe, and
parsing it into a partial object: {"title": "This is an unfinishe"}, while accounting for
the context that rest of the object is yet to arrive.
The neat part about this is how the graph structure here goes recursively deep. Most
data is linear and much simpler to work with, but if you were to generate that graph
one node per LLM request to have that streaming effect, not only would it cost a lot
more but the quality of output would also go down significantly because of the
recursive nature.
Another neat thing you can do here is that you can start generating the cover image
and/or introductory content for the course after only the first couple of nodes are
done generating, and let the rest of the graph be generated in parallel with that.
Here's a more complex example from a thing I've been working more recently on:
It uses an entirely custom format inspired by QML, the parsing logic here is quite
complex compared to the previous one.
There's simply no way to achieve something like this using JSON/YAML/whatever and
have the same kind of cost and speed (low token count), and output quality (the model's
familiarity with generating the UI in a more UI-like format instead of key-value pairs)
Having a custom format does come with its costs tho. While it's better than JSON, it's
nowhere as good as generating react + tailwind. The further you move away from standard
formats and/or natural language, the worse the inference gets (that's why the format
here *had to* be inspired from QML). But find the right sweet spot and it's magical.