(2 mins read)


        LLM Structured output and real-time parsers


        Standardization of JSON mode has been like a fundamentally psychologically limiting
        thing for developers when it comes to generating structured outputs from LLMs. Here
        are a couple of interesting things I've discovered around that.
        

JSON         top

        I made this library an year ago to parse JSON streams to achieve things like this:

        

        By JSON streams, I mean taking something like this: {"title": "This is an unfinishe, and
        parsing it into a partial object: {"title": "This is an unfinishe"}, while accounting for
        the context that rest of the object is yet to arrive.

        The neat part about this is how the graph structure here goes recursively deep. Most
        data is linear and much simpler to work with, but if you were to generate that graph
        one node per LLM request to have that streaming effect, not only would it cost a lot
        more but the quality of output would also go down significantly because of the
        recursive nature.

        Another neat thing you can do here is that you can start generating the cover image
        and/or introductory content for the course after only the first couple of nodes are
        done generating, and let the rest of the graph be generated in parallel with that.
        

CUSTOM FORMATS         top

        Here's a more complex example from a thing I've been working more recently on:

        

        It uses an entirely custom format inspired by QML, the parsing logic here is quite
        complex compared to the previous one.

        There's simply no way to achieve something like this using JSON/YAML/whatever and
        have the same kind of cost and speed (low token count), and output quality (the model's
        familiarity with generating the UI in a more UI-like format instead of key-value pairs)

        Having a custom format does come with its costs tho. While it's better than JSON, it's
        nowhere as good as generating react + tailwind. The further you move away from standard
        formats and/or natural language, the worse the inference gets (that's why the format
        here *had to* be inspired from QML). But find the right sweet spot and it's magical.