Last week, we subtly hinted at this week’s topic. Today, we're tackling a crucial question many software developers and product managers are grappling with: Does your software product really need an LLM?
Or at least, that’s the ideal case. Too many managers just want an LLM in their application when none is actually needed.
Indeed, in recent months, we've observed a trend: LLMs are often proposed for product features that don't actually require such advanced AI. We’ve even seen someone wanting to use an LLM to schedule recurrent appointments (e.g. schedule a dentist reminder every six months or year)! Such a feature only requires a dropdown list and very few lines of code.
Getting caught up in the hype is easy, but it's essential to critically evaluate your product's needs before jumping on the LLM bandwagon. Don’t lose yourself in your desire to build “a cool app” that uses new technology if it doesn’t have to.
Remember that an LLM is designed to generate text by processing numbers associated with tokens. It does not take the time to understand and reflect, and it was especially not built to think and decide for you. Even though you cannot blindly trust an LLM for that reason, it remains a powerful tool for analyzing text and generating the next statistically relevant words. This means you should be leveraging for tasks like summarization, translation, and most tasks related to dealing with large amounts of information.
So, should YOU be using an LLM in your product? Here’s how to decide…
Some Key Considerations
Before integrating an LLM into your software product, consider these points:
Feature Complexity: Is the feature truly complex enough to warrant an LLM? Many seemingly AI-worthy features can often be solved with simpler, traditional methods.
Privacy Concerns: LLMs often involve sending data to third-party APIs. Can your product afford the associated privacy risks?
Existing Solutions: Can non-AI tools or algorithms handle the feature effectively?
Hallucination Risks: Can your product tolerate the potential for false or misleading outputs from an LLM?
Value Addition: Is the LLM feature essential to significantly improve your product, or is it just a "nice-to-have"?
Real-World Examples
Let's look at some scenarios where companies use (or considered to use) LLMs for their products:
Customer Support Chatbot: A SaaS product wanted to implement an LLM-powered chatbot. After analysis, they found that 80% of user queries were about specific features, pricing, and account issues. These were efficiently handled with a decision-tree based (a fake chatbot with options to click) system connected to their product database.
Khan Academy: The learning platform Khan Academy has a partnership with OpenAI, where they built a custom tutor to answer students' questions and focus on orienting the students towards the right approach and learning how to find solutions on their own. They used GPT-4 to achieve that, combining prompting approaches and RAG (following user preference, implementing memory, etc).
Duo Lingo: Duolingo's language learning app has a new chatbot that offers a more pleasing user experience when learning a new language. The bot will simulate real conversations and provide feedback on why your answer was correct or incorrect, which was only possible before LLMs by paying tutors or in a crowdsourced environment.
Elicit: Elicit is an AI research assistant that uses language models (like GPT-4) to automate research workflows. It can find papers you’re looking for, answer your research questions, and summarize key points from a paper. This is a perfect use case for LLMs!
Stripe: Just by analyzing the syntax of requests, GPT-4 has been flagging accounts where Stripe's fraud team should follow up to ensure it isn't, in fact, a fraudster playing nice. GPT-4 can help scan inbound communications, identifying coordinated activity from malicious actors.
Our own tests with a “Social Copilot”:
We’ve spent several months building what we termed our “social copilot” to help us generate useful Twitter threads from my blog posts. The goal was simple (or so we thought): share the key insights of a blog post. Yet, even after lots of work, refinement, templates, and sweat, we couldn’t get good results from a GPT-4-based pipeline.
The current state of LLMs does not allow us to generate good, fun-to-read, creative social posts. They are boring, sound robotic, and have difficulties identifying what’s really useful from what is just interesting, something a person can easily do.
An LLM uses words that only AI uses, which makes the text easily recognizable as LLM-generated. For example, the infamous “delve” is largely overused by ChatGPT. Here is a fun graph representing the number of times this word appears in scientific articles.
p.s. We abandoned this project and kept doing it ourselves to provide the most value possible.
When LLMs Make Sense in Products
While many features don't require LLMs, there are situations where they can provide significant value:
Complex Text Generation: If your product needs to generate human-like text based on varied inputs (e.g., automatic report generation, personalized email drafting), an LLM might be appropriate.
Advanced Language Understanding: LLMs can offer superior performance for products dealing with open-ended user queries or complex document analysis.
Creative Assistance: LLMs can serve as powerful ideation and drafting tools in products aimed at content creators or marketers.
Sophisticated Data Analysis: LLMs can be valuable if your product needs to draw insights from large volumes of unstructured text data.
Translation: Transformers were built to translate from one language to another.
The Bottom Line
Before investing in LLM integration, ask yourself: "Can this feature be implemented effectively without an LLM?" Often, the answer is yes. If you cannot answer this question, ask experts!
By choosing the right tool for the job, you can save development time, reduce costs, and avoid unnecessary complexity (and problems) while still delivering an excellent product.
However, LLMs can offer a significant competitive advantage for truly complex language-related tasks where traditional algorithms fall short, but doing so is complex and requires more time and investment than expected in most cases.
Here's a handy tl;dr for future reference:
In short, take the time to research and think of what’s best for your product and for your users, not for the hype.