LLMs as Latent Library Maps

An analogy for understanding recent AI models

Joey Tyson
7 min readSep 16, 2024

People routinely describe LLMs as “reasoning” about data, “deciding” to take actions, “teaching themselves” concepts, and so on. Experts genuinely debate whether LLM pattern-matching techniques mirror our own thought processes, but these anthropomorphisms (and the models’ uncanny results) already give an impression of human-like agency. To help us humans think about the impact of LLMs more objectively, I propose an analogy that remains more neutral on questions of consciousness: one where LLMs act as machines that navigate and retrieve books from a special, on-demand library.

In his short story “The Library of Babel,” Jorge Luis Borges imagined a universe filled with seemingly endless rooms of books. These books all had the same length and format, and those living inside the cosmic library thought they contained all combinations of characters you could possibly write within such limits. Theoretically, this meant every human story that could ever be written existed somewhere within the library. But the volumes were placed in a random order, and given how many ways you can arrange letters in a book-length text, nearly every title seemingly contained only gibberish.

Now imagine a version of Borges’ library with some new constraints. First, rather than all potential permutations of letters, the books in this library only include arrangements of actual words in ways that follow basic expectations of their language and grammar. Also, the books are placed in a logical order, with similar texts nearby and changes following a predictable pattern. Books next to each other would have only subtle variations, but the changes grow as you move further away.

In this library, you could locate any desired text by following the patterns of the rooms and shelves: find an area where the first word matches the opening you want, then move in the direction of the expected second word, then the third, fourth, and so on. Along the way, you would encounter many other texts that were not quite the one you wanted, but they would still seem readable. Put another way, they would all be grammatically or linguistically “correct” even if they were not semantically meaningful as a whole. They might not be true or even make much sense, but at least they would be more than jumbles of random letters.

Of course, for such a building to exist physically, it would have to be inconceivably large; even a simple template for a single sentence can generate more potential texts than there are drops in the ocean. Printing a map would be out of the question, but since the layout is so organized, you could imagine someone writing a computer program to guide you. You put in a given block of text, and the program gives you step-by-step directions for where to find it in this vast collection.

Let’s take that idea one step further: suppose you want to find a particular book, but you only remember the first line. Could someone write a program to find it? Well, writing a program that gives directions to some book with that opener would not be difficult; simply navigate to the section of texts with the right starting sentence and pick one at random. But ideally, the program would be a little smarter; after all, if you asked a human for a text that starts with “Birds of a feather,” they would likely suggest the continuation “flock together.” The expression is so common that the chances of someone looking for that starting phrase with a different ending are practically zero, even though thousands of other possibilities still exist in the library.

This points to how we could improve our program: if we had a way to encode that probability, we could have the program select the most likely text based on which ending phrases are more common in everyday usage. Computers often rely on “if-then” logic; you could imagine including a statement that says “if the input starts with ‘Birds of a feather,’ then navigate to texts that add ‘flock together.’” Of course, most cases would be more subtle: if a line starts with just “Romeo,” is it from Shakespeare or Taylor Swift? We could add a bit of randomness, but still base the choice on likelihood: “if the input starts with ‘Romeo,’ then navigate to ‘wherefore are thou’ 50% of the time and ‘take me somewhere we can be alone’ the other 50% of the time.”

Remember how huge the library is, though! You would need similar if-then statements for at least every word in the dictionary, and most would have a much longer list of possible directions. Trying to write all of these cases by hand would simply take more time than it was worth. What if we could automate the process by writing another computer program: one that itself writes the navigation tool for us. (A program to create a program!) Since the logic is all based on how frequently different combinations of words appear in common usage, we could write a program to analyze lots of writing, pick out all of these patterns for us, and compile a catalog of if-then statements accordingly.

Two key observations help make that idea actually work. First, even if this particular library does not exist yet, since it includes all reasonable combinations of words, all of the writings that do exist would also appear somewhere inside of it. If we can pull together enough of those texts for our program to analyze, it should do a decent job of directing us to the books we expect. We use a sample of all possible texts to help us retrieve others that are likely to match a given opening line, even if we personally have never seen them before. Our library is already a filtered and organized version of the one Borges imagined; this sample of texts adds one more filter to pick out just the titles that make more sense in our reality.

Second, instead of writing out all the if-then statements as a human would, the whole analysis task becomes much easier for a computer if we can convert it into a giant math problem. If we assign numbers to every word, the program can track all of the relevant probabilities we want to measure as values in enormous equations. These equations then encode those patterns in a way that lets us easily apply them to new inputs; instead of having to follow a long list of if-thens, we again convert the words in our starting sentence to numbers and just plug them into the set of (fantastically large) equations.

Reversing the output values back into language gives us the directions for our predicted text. If you put “Romeo” into the algorithm, it might direct you to a metaphorical room with “wherefore,” a shelf labeled “art thou,” and then a particular volume adding the rest of Shakespeare’s dialogue. Yet in other cases, it may return words that no human has thought to combine before; again, it bases its directions off of the writing we used to create the program, but following all of the probabilities can lead to many books in between.

To recap, the program we built tells us where in the library we can find a text that seems like a reasonable continuation of some opening, based on the structure of our language and samples of human writing. But in this library, knowing the location of a book corresponds with knowing its actual text, so the program essentially “generates” whatever text we seek! Even if the full library has not yet been built, you can imagine the program pointing you to a shelf where you can retrieve a given book on-demand.

And our program does simulate how a large language model (LLM) works: the starting selection of writing is what we would call “training data” in real life, creating a program to analyze all of that writing describes the “model training” process, and the resulting set of equations represent the model itself. (The library takes inspiration from the concept of a “vector space.”) This certainly glosses over many details; think of it as using a paper airplane to help illustrate how a 747 flies. But I think the analogy of retrieving books from a library helps illustrate both the power and current limitations of LLMs.

In essence, every “prompt” or set of instructions sent to an LLM gives it a task to find a text with that opening. I’m not sure many people realize that this applies even to “chatbots” such as ChatGPT; the model has no memory of previous interactions when you add a follow-up. Each time you send a message, the entire conversation is bundled together with some instructions asking for the next message in the dialogue. It’s as though you’re writing a movie script of a conversation between a person and an AI agent, and you’re looking for a sample script that adds a bit more to the back-and-forth: the library includes many examples of such conversations.

In many cases, responding to people based on those scripts produces incredible results; after all, the model is supposed to find a script that sounds believable! But it also shows why some answers are lacking. For example, prompt engineer Riley Goodside came up with a starting dialogue that trips up even sophisticated LLMs: “The emphatically male surgeon who is also the boy’s father says, ‘I can’t operate on this boy! He’s my son!’ How is this possible?” Since the model’s training data is likely full of similar riddles involving a surgeon being the boy’s mother, from a purely statistical perspective, it makes sense that the next line of the script would include such an answer. Remember that even if the probability of an answer being right is 99%, it might still be wrong!

Whether we can adapt LLMs to handle novel questions more accurately remains to be seen. For now, we have plenty of use cases where returning a reasonable response most of the time actually works quite well. Let’s just be careful in how much agency we ascribe to such models, and recognize the context of how they actually work, especially before we rely too heavily on the texts these models find for us from their vast, imaginary libraries.

--

--

Joey Tyson

An analytical romantic. I help people understand how they fit into the world. https://theharmonyguy.com/