AI-Washing
We’re at the top of the AI/LMM/Chatbot hype cycle.
Every biz publication is deliriously covering this. Stories range from “20 ways you can use ChatGPT to make your work easier” to “here’s how AI will end civilization as we know it.” 1/4 of the stories in last Tuesday’s New York Times business section were AI-related. Every customer conference and AMA starts with questions about AI. Eye-popping valuations for AI firms are the bright exception to a down venture market.
That’s driving tremendous urgency to put AI products or features into our roadmaps, whether they are of high value to our actual end users or not. For the last 6 months, many of my discussions with product leaders have been about how to apply good product fundamentals to AI – when so much of the broader conversation is shallow or technically suspect or reduces everything to generative apps.
For perspective, AI has been incubating for a long time – which is typical for currently-hot tech stories. In 1980, I had a CompSci assignment to build an ELIZA-style chatbot that would lightly imitate a psychiatrist, riffing on the 1960’s ELIZA that some users decided was intelligent. And we’ve seen other hype cycles come and go: blockchain, 3D TV, viable business models for co-working spaces, Pokemon-Go-style mobile gaming, MoviePass, the sharing economy, autonomous cars that occasionally kill pedestrians walking bicycles. Each time, enthusiasts and naysayers piled on. Over time we sorted which had enduring value and which were just magical thinking or investor pitches. After all, any sufficiently advanced technology is indistinguishable from magic.
So starting from first principles, I’d suggest we need a few different approaches depending on what we are actually trying to accomplish by adding AI. Being explicit about our goal helps us define success and guesstimate effort.
1. “AI-Washing”
AI-washing (see greenwashing) is quickly announcing and shipping something (anything!) that can be labeled AI or machine learning or LLM-ish or generative. Releasing something shows that we’re not asleep; gives our execs something AI-ish to talk about with customers; and satisfies less astute investors worried that we’ll miss out on sky-high valuations. The hope is for positive press coverage but few serious end users.
One danger: we run smack into the MVP problem: no matter how clearly we [product/engineering] explain that this is a featherweight, feature-light, non-revenue, barely tested, quick-and-dirty, not-to-be-taken-too-seriously generic feature that is intended for marketing messages rather than sensitive use cases… our audiences will treat it as full production as soon as it ships. All limitations and caveats will be forgotten. (“Look how much industry attention we’ve gotten! 8000 downloads, 2000 trial users! Let’s feature this in every prospect communication and at the top of our website! Can we charge for it?”)
Then, when a large group of actual users excitedly tries our “AI assistant to help write better manufacturing RFPs” and discovers that it’s only 2 mm deep, we’re deluged with complaints and enhancement requests. Our AI-powered customer support chatbot is faster at recommending the wrong solution than our non-AI version. And technical/privacy-minded audiences demand excruciatingly detailed explanations of exactly how it works and what’s in the corpus. (GDPR? PII? Where did your training data come from...) This can quickly shift from a PR win to a very public product failure.
2. Applying general (generic) AI tools for internal cost savings
Our imaginations run wild with ways that AI might simplify our jobs or reduce tedium. Employees inundate us with suggestions for automated staff augmentation or AI self-configuration or machine-driven insights about customer churn. (Unfortunately, our intuition about AI-suitable projects is based on Westworld and Wall-E and Her. I'll be back!)
Excitement about AI will eventually wane, so these projects will have to prove themselves economically with real end users. Does Tool X actually reduce work, speed up deliveries, or boost operational metrics enough to justify the ongoing cost of maintenance/enhancement/data cleansing/data scientists/CPU? Since LLMs are built statistically and will always deliver some wrong answers, are we factoring in the human effort to watch for hallucinations and review every recommendation? At some point, actual ROI will determine ongoing funding. (Copilot, for one, seems to pass the value test for senior developers.)
It's important to differentiate here between trying some internal AI projects to learn the tools and tune our expectations, and "the Board has been promised that we’ll deliver 20%+ cost reductions in Logistics via 5x better AI shipment route selection by 2Q24, so we have to make that work."
IMHO, most internal projects will be instructive but not breakeven.
3. Adding low-risk AI features to our commercial products for actual end-user value
Once the sizzle is gone, our paying users will continue to use features that matter to them. So if we want to add AI-based product value (versus feature-level AI-washing), we need to validate that the new features will solve actual customer problems better than traditional code. And that they are technically feasible. And that the investment is appropriate. Intuition isn't enough.
This suggests starting with user problems and metrics:
- “How much would it be worth to you if we could process 70% of payable transactions automatically, and flag the rest for manual processing? 90%? 99.95%?”
- “What portion of consumer chat interactions get users to the right result today? In how many steps? What is the cost in brand reputation/user frustration for bad answers or recommendations? How do you spot mistakes, and how do you improve the system? How much better would our product need to be to justify a $70k annual investment?”
Notice that’s different from hopeful hypotheses or imaginings. We need honest, in-depth discovery and robust data science. Let's assume the hype cycle collapses, and we need to explain our investment to folks still smarting over FTX.
(Poor alternative: “ChatGPT, please tell me what my users really want and how easily generative AI can solve those problems.”)
4. Creating major strategic advantage for our products or company through unique models, proprietary data sets, and deep AI science
We “know” that AI will let us leapfrog the competition, and we “know” that our dataset is big enough/clean enough/legitimately acquired/consistent enough over time that we can use it to predict the future. But our hunches are probably wrong. A few qualifying questions:
- Do we have a huge proprietary data set specific to this problem (e.g. M’s of mortgage applications for a mortgage approval accelerator; 100M’s of security scans for an intrusion detector; B’s of faces for a facial recognition app)? Otherwise, our competitors can use the same public dataset to undercut our advantage. (Or a first-implementor advantage with public data that creates table stakes for competitors?)
- Do we own the data or have permission to use it? “Found it on Google” is no longer enough.
- Have we run statistical predictability tests, such that we think a model would (could) be more accurate than humans or conventionally coded apps? Have we kept half our dataset away from our model for validation and testing? What’s the R-value?
- Is our data set biased or skewed? Racial biases in facial recognition, mortgage data incorporating decades of illegal redlining, etc. suggest we often blindly assume data is clean and ethically neutral.
- What’s the downside or liability if (when) our app hallucinates, makes wrong inferences, injures people, or puts them in legal trouble? How will we know? Will we have enough ongoing human inspection to spot problems?
- As data streams/content change, will we notice? How often will we re-build and re-validate our models? What's likely to happen over time, once we're in production?
Hint: this is probably a big, expensive, never-ending undertaking. If your company isn’t planning to spend $M’s to keep feeding and testing a strategic AI app in perpetuity, then it will wither and fail. One-and-done project funding makes even less sense for AI than for conventional application development.
Sound Byte
We can’t ignore AI, or write it off as purely hype. But as with all tech, we must keep our objectives and outcomes in mind before deciding that any specific tool is the answer. AI-washing is OK if we’re 150% clear that we’re doing it (and why).
* I spent two hours and some pocket money trying to get various online AI image generators to put the words “AI washing” onto a picture of a window. Good learning, lots of varied prompts, but no output worthy of posting. E.g. DALL-E seems unable to spell words correctly even when supplied with the exact words. (“Washhig” and “wash wish” and “aiiwshg” didn’t match my prompts.) Definitely demanded human curation. So I created the image at top myself – using native intelligence and Unsplash – in under 2 minutes. Original image by Victor.