With LLMs, hallucination is the point.

Thank you to Jeremy Kirshbaum for sharing his article in our knowledge base. You can view this in Medium as well.

Generative AI models produce outputs that do not always align with factual truth. This property is often criticized as “hallucination.” However, this imagination is fundamental to the creative potential of these models. Strictly limiting models to factual regurgitation would degrade their functionality — even in the rare case that what constitutes “fact” isn’t up for debate. Instead, we should view the unpredictability of generative models as a feature that enables new ideas, not a bug to be eliminated.

Hallucination is Imagination

Let me be more direct: with generative models, hallucination is the point. We often aim to create things that don’t exist, especially in fiction or entertainment. There is a vast territory between factual truth that no one could argue with, and things that are clearly wrong. Even factual information involves interpretation and analysis, which are forms of opinion and argument without definitive right or wrong answers. The truth is very blurry. Analysis is a kind of opinion or argument, and there is no particular way to define whether an argument is right or wrong. We don’t want models that never hallucinate, we want models that hallucinate how and when we want them to.

No Universal Standard of Truth

There is no universal standard for determining the truth or accuracy of statements. Truth is an active, contextual, multilayered process dependent on domain-specific metrics. The domain it lives in defines a lot of these metrics or elements of consideration for when something is considered right or wrong. If models only produced previously seen statements, we would lose the ability to expand knowledge and imagination. The ability to produce novel ideas and creativity beyond their training set is what makes LLMs unpredictable, yes, but this is also what makes them uniquely useful. People creating not only fiction and and entertainment require this, but so do those generating new scientific hypotheses, creative analysis, and inventing new products and features. Is Van Gogh’s painting a hallucination? Is James Joyce’s ‘Finnegans Wake’ a hallucination? Einstein’s thought experiments? Edison’s original, failed lightbulbs? We’d call them them if they came out of a generative model today, and we’d be trying to get rid of them in future versions of our models.

Hallucination Enables Creativity

Attempts to make models more factual and deterministic degrade their creative potential. Engineers often over-optimize for testability at the expense of interesting outputs. Turning down model temperature improves reproducibility but reduces quality. There’s the old Einstein quote, ‘Imagination is more important than knowledge.’ For instance, a common method is dropping the temperature of models to zero, which then makes the writing worse. Truly deterministic systems already exist in most software. If we want the benefits of generative AI like randomness and unpredictability, we cannot eliminate the hallucination. Every other kind of software system for the last hundred years has been entirely deterministic. If you want that, great! Just skip the generative AI.

Qualitative disciplines like design thinking, strategic foresight, management consulting, design, product development, and many other valuable business functions rely on creative ideation and brainstorming. The reason many of these methods exist at all is because create new, novel ideas is actually very difficult, and the potential value of being able to do this in an automated way, at scale is hugely valuable. Not to mention that I am sitting in California where we know very well that not all hallucinations, even in the traditional sense, are a bad thing. We hear stories of people like Steve Jobs taking acid, and that guiding him towards some insights that led to some of the technological revolutions. Much thinking involves more than regurgitating facts. For the first time, we have systems that can do more, yet we spend effort trying to restrict them to correctness.

This is not to say that we should simply let generative models run wild. There are a variety of methods for controlling models that already exist, including prompt engineering, retrieval augmentation, finetuning and even knowledge graphs. These are all well and good. But in the process of grounding models, we should avoid constraining them to the point that we lose what is potentially one of their most powerful aspects, and certainly a capability that has never existed before from a digital tool. We should strive to create nuanced levels of control over models so that we can switch between Allen Ginsberg-style free association, partially grounded historical fiction, truly creative, new ideas for designs and products, and buttoned-down customer service bots. With GPT-4 and Claude, you can definitely accomplish the last one, but the first three will all come out trite, awkward, and over-structured pretty much no matter how much prompting you use to force them out of their normal patterns. With the original GPT-3 Davinci that OpenAI released in 2020 and is removing access to next week, you truly could accomplish the first three, although it was a messy, weird process that was difficult to duplicate. In making the models more performant, we’re losing a bit of their magic.

I’m excited about the new open source models which are getting good enough to really experiment with. We’re exploring some of their potential at Handshake and Library of Babel (more details on that another time), but maybe this article is a call to action for us and others to not just copy-paste the same finetuning methods that others are doing. What could exist beyond instruction tuning? What datasets could we compile to easily embed creativity, beauty, and serendipity into our models without sacrificing controlability and coherence?

The hallucinatory imagination enabled by generative models is precisely what makes them uniquely valuable. Rather than view unpredictability as a problem to solve, we should embrace it as a feature that allows for creativity. The beauty of computers producing something simultaneously new and coherent, whether strictly “true” or not, is worth celebrating and expanding.