Where I need to say “AI” because it seems that almost the entire world has forgotten that AI is the field, while LLM is the technology, but really: this article is about LLMs.
A short background
I’ve been working with chatbots for a long time: my MSc was a framework for emotional virtual assistant, it got converted into a patent (let’s ignore for a second that it was stolen and published not under my name) that got cited over 200 times by the likes of Apple, Google, Microsoft, Samsung, Intel, etc. Also, I’ve an interest in psychology and intelligence, and my current working theory is aligned with embodied cognition.
While I haven’t done ‘building’ work in the current wave of LLMs, I still try to be as up-to-date as I can on the field, reading sometimes complex papers or their explanation written by many good people, with a product and design angle. This is because I believe that to design it’s important to know the material we work with. And LLMs are a new material.
So what are LLMs?
This question can have many many answers, but for the goal of using it as a material I think one good definition is: LLMs are language black boxes that output a prediction to everything that was written before.
This is why I prefer this phrasing:
- Language — they work because they were trained on a lot of text data of people talking with each other (i.e. web and chats)
- Black Box — people that are deep in the work with LLM know how they work (even if the understanding of how everything contributes to the output isn’t quite there, i.e. explainability), but for any practical product use case — for people that build “on” and not build “in” — it’s better to think of it as a black box.
- Prediction — at its core LLMs are predictive: the next statement is a best guess on what it’s expected to come (expected by the training data) after what has been written. Think of this: generative models are all predictive, it changes only what they are predicting (i.e. what comes after your text, what image comes up from the noise, etc).
- Written Before — the reason why LLMs took the shape of chatbots is almost incidental, due to the training data being a lot of people talking with each other. But it’s in many ways a “trick”: in a single conversation you can imagine it’s a single flow of text that occasionally stops predicting and allows you to add another piece of text before continuing the prediction.
A term that I find very effective is also stochastic parrot. This is because LLMs don’t really know what they are doing. They are just predicting, repeating a blurb of words from the training data. There’s no intelligence or reasoning in it (at least, in the 2024 generation of LLMs, maybe in the future they will find a way as it’s the current next problem they are trying to solve).
Another way to think about it is to realize that LLMs don’t give answers (they can’t reason), but they give examples of how an answer to that question would look like. It’s like someone creating an impressive prop for a theatre play. It might be so close to the real thing to be used in place of the real thing, but occasionally it becomes obvious it’s just a prop.
Designing with LLMs
Before starting
The question any designer or product person should ask before adding LLMs into a product is: does it add real value to our customers?
This might be challenging as in some cases this feature has been pushed to the forefront by market forces, expectations, and marketing, but in general we should always try to connect to the real value they can add.
The core LLM design principle
The first and core principles to design with this technology is: LLMs hallucinate.
(well lots of things based on neural networks do, but let’s stop here for the scope of this article).
This means that any solution that leverages this tool needs to account that the output might not be accurate, which usually means two possible things:
- Make sure to hand the output back to the user to review before doin any action.
- Make sure that errors in the outcomes are as inconsequential as possible, can be undo’ed cleanly and easily, and won’t cause distress to the user.
If we understand this, we realize that we need to set the right expectations with our users, as well as be accountable for any error that might be done.
When we say that they hallucinate we mean that the core technology (LLMs) hallucinates. Then there are of course approaches that different products do trying to create guardrails. For example search engines verify the output statements with their own search engine queries to show sources, generative outputs use the framework of the software itself to limit its range (i.e. dashboard outputs, or design systems, or theming modules, etc), automation integration show draft to review before activating, etc.
The other LLM design principles
- Quick Undo — make it safe to try things and roll back (i.e. a clear undo action, or a drafting space, or an unpublished item).
- Enhance Features — instead of thinking as LLM as the product on its own, think how it could be instead augmenting existing features (i.e. have LLMs write a filter for you in the filter feature, not LLM as a chat interface for your existing product).
- Show Workings — sure LLMs are almost like “magic” when they work, but the trust is low, so it can be useful to show how it reached that decision if possible (i.e. show source links for an answer)
- Quick Feedback — have an effective feedback loop where people can quickly mark outputs that aren’t effective and have a pipeline to review them internally (i.e. dislike button on an answer, then review monthly to improve the answers).
- Output/Refine Loop — for more complex problems, don’t provide one-and-done actions but create iterative loops where the output can be refined with multiple back and forth (i.e. generate an item, then ask to change and tweak that item until it’s good enough).
One LLM, many LLMs
There are different LLMs, they can be trained on different data, with different performance, and work differently. They can be augmented, fine-tuned, pre-prompted, etc. This is the step after: once you designed the experience you want, the kind of interaction and use, you are likely to work with a specialist that is able to create the black box best suited for that design — and of course, this can be an iterative approach between design and engineering (or R&D). This is also why it’s advisable to try out different LLMs and see how they respond — again, understand the material.
Why we add AI?
In general, try to remember that apart from the hype, people want solutions to their problems, save time, and get results. LLMs and anything else that will come out of the AI field in the future is a tool to get there, not the end story. Yes in the short term it might be valuable to highlight “AI features”, leave that to marketing.
And as always, work closely with good engineers that have a grasp of this material and prototype in short loops.
Further readings
- Google (2024) “People + AI Guidebook”
- G. Mauro (2024) “My bold predictions on the future of the AI industry”
- C. Fraser (2023) “Who are we talking to when we talk to these bots?”
- T. B. Lee, S. Trott (2023) “A jargon-free explanation of how AI large language models work”
- M. Wooldridge (2023) “What’s the future for generative AI?”
Thanks to Erlend Davidson for reviewing the AI background of this article.