Are users of your GenAI application aware of its intended uses and limitations?

If not, they will get angry quickly.

Feb 24, 2024

There is old-but-gold XKCD comic pointing out a challenge of explaining the difference between “easy” and “hard”:

While checking whether a photo contains a bird is simple these days, a pretty similar situation occurs while building GenAI-powered solutions and then getting users' feedback.

Users of these applications are absolutely amazed of their initial capabilities. They keep asking more and more detailed or nuanced questions. Their expectations keep rising, because the tool is just so good.

...and then, they hit a wall. The previously-bright application starts spitting out nonsense. It skips over important facts or fails miserably at basic reasoning. 🤯

Examples include when you ask a LLM to calculate something and it can't do the basic math (but it did so well with other tasks!). This is also evident in RAG for any question that - to be answered - would need analysis of thousands of documents or hours of additional calculations on top of them (but it did so well with well-defined questions requiring just a couple of relevant chunks to be answered!).

Why that happens

We, the builders, know - it is either limitation of the technology or inherent design of a particular GenAI-logic within the app. While we also know how some of these issues can be overcome (use additional tools, apply additional RAG tricks, build custom GenAI "workflows" for a particular question), it’s not that simple for the users.

The line between "easily solvable" and "maybe if we spent 3 months prototyping it, it could work for that particular question" is really, really thin.

I’ve looked into hundreds of pieces of feedback around multiple GenAI apps and it’s very often clear that a particular user expects more than the app is able to do due to its underlying GenAI “logic”.

What you can do about it

Make sure to educate users of what's more or less happening underneath and what are the current limitations. This way they will use the tool to solve what it was designed to solve. Otherwise - they'll get angry and drop your app.

Of course, it’s easier said than done.

But if you do spend actually some time educating (or even lowering the expectations?) your users, they go from being angry to being actually productive.

…and that’s what you’re aiming for in the end, right?

BTW. The name of my Substack has changed to better reflect what I’m writing about. Hopefully that’s not a drastic change for you. The content tended to be way broarder anyway.

Data & AI on AWS and how you tame it

Discussion about this post