Blogs

Why AI for CX Fails (Part 2): Not Enough Data, Unsupervised Hallucinations

Why AI for CX Fails (Part 2): Not Enough Data, Unsupervised Hallucinations

12 September 2024

Imagine your business rolling out an AI-powered chatbot to provide instant updates about your latest smartphone. The goal: Reduce wait times, generate buzz, and enhance customer satisfaction. But when a customer tries to track their order, the chatbot instead suggests troubleshooting a smartwatch they never purchased. The result? The customer is not only left without the critical information they were seeking but also frustrated by irrelevant advice.

In the real world, this scenario plays out far too often: Among surveyed customers, 40% of their interactions with chatbots are negative. The fallout is significant — 30% of customers will abandon their purchase entirely.

This failure is not just from the quality of the data used to train the chatbot, but the quantity and veracity. In their eagerness to implement an AI tool for CX, many companies overlook two critical factors: having enough of the right data to properly train them, and having oversight to the outputs.

In the first part of our series, we examined the complexities of training AI models and how inaccurate, outdated, and biased data can undermine the effectiveness of AI in customer experience. These challenges are exacerbated by other obstacles — data scarcity and hallucinations. Consider that same AI chatbot. If it’s trained primarily on data related to product troubleshooting but lacks sufficient data on order tracking and other customer service inquiries, it might excel in one area while completely failing in another. .  

In this second part of our series, we’ll explain how data scarcity and hallucinations from poor-quality data undermine the business’s adoption of AI for customer experience. 

Poor AI model performance from data scarcity

While AI’s power lies in its ability to process and learn from vast amounts of data, this strength can quickly become a weakness when the necessary data is in short supply. Data scarcity is a significant challenge, particularly in niche markets or emerging CX trends where the volume of relevant data might be limited or difficult to obtain. For example, GenAI startups, particularly those focused on industries like finance and healthcare, are reportedly struggling to source the training data their models require. The US Treasury also reported that some financial institutions lack the data to train AI models for detecting and mitigating fraud.

In niche markets, where customer interactions might be less frequent or highly specialized, the lack of sufficient data can severely hinder the development of an effective AI-powered CX solution. For instance, a company selling a new, specialized high-tech product might struggle to gather enough data to train a chatbot or recommendation engine that accurately reflects the needs and behaviors of its target audience. Without a large and diverse dataset, the AI model is likely to generalize poorly, leading to subpar performance and unreliable outputs.  

The challenge is even more pronounced when dealing with emerging CX trends. Consider a company attempting to use generative AI for CX on a new digital platform that has only recently gained popularity. The limited data available might not capture the full spectrum of user behavior or account for the rapid changes that characterize emerging trends. The AI model might struggle to predict customer needs or respond effectively to inquiries.  

Addressing data scarcity often requires creative solutions. Companies could supplement their datasets with synthetic data — such as those created by generative adversarial networks (GANs) — which mimics the characteristics of real-world data. Some use open-source data, such as those from the government and cloud providers. Additionally, techniques like data augmentation could be used to artificially increase the diversity of the training data. However, these methods come with their own set of challenges and limitations, and they are not foolproof.

When developing AI solutions, TDCX, for instance, addresses these issues with techniques such as data resampling — either by oversampling the underrepresented classes or undersampling the overrepresented ones. Another approach that we use is to apply algorithms specifically designed to ensure that the model doesn’t become overly biased toward any particular type of data. However, we understand that these solutions are not without their complexities. We work with our partners in carefully implementing these to avoid introducing new biases or inaccuracies. 

Poor data quality that leads to hallucinations

Hallucinations occur when the AI models generate outputs disconnected from the actual input data or the context it’s supposed to respond to. In CX, this could mean chatbots confidently misguiding customers with fabricated or misleading information or an AI sentiment analysis tool that misjudges sarcasm as positive.

In one real-world case, a North American airline’s AI-powered chatbot hallucinated about its fare and refund policies, leading to legal action that forced the company to honor the erroneous promise. In another incident, an airline’s chatbot, misinterpreting a customer’s praise for a flight attendant’s care of her “plant cutting,” mistakenly referred the customer to a crisis hotline.

Hallucinations like these have various underlying dynamics. AI models generate outputs based on the data they’ve been trained on, which could be a mix of accurate and inaccurate information that could also be riddled with social and cultural biases. These models play entirely by algorithmic rules, mimicking patterns in their training data without discerning truth from falsehood. Any biases or errors in the data can also be replicated in the AI system’s outputs, so while they might sound plausible, convincing, and coherent, they can be factually incorrect and superficial. Even when trained solely on accurate data, experts have cautioned that they can still combine patterns in unexpected or unexplainable ways, which can result in potentially misleading or inaccurate content.

Fine-tuning the model’s architecture and postprocessing techniques can help, but current AI technology can’t guarantee that hallucinations will never occur. This is where the human touch makes a difference.

At TDCX, ensuring that AI models deliver precise, on-brand customer interactions starts well before the first line of code is written. The process includes a meticulous pretraining phase where datasets are not just assembled but carefully curated. This involves selecting, organizing, and refining the data to guarantee its quality, relevance, and accuracy. Any errors are identified and rectified, missing data is handled with care, and the dataset is thoroughly prepared to maximize the effectiveness of the machine learning models that will follow.

TDCX’s AI-powered CX solutions are also enhanced by retrieval-augmented generation (RAG) and reasoning techniques to establish ground truths. This is critical for mitigating hallucinations and ensuring accurate, reliable, and on-brand responses. TDCX’s AI projects also incorporate reinforcement learning from human feedback (RLHF), an approach that iteratively refines the AI models based on continuous input from human evaluators. RLHF plays a crucial role in tackling hallucinations by steering the AI model’s outputs toward human-aligned preferences and accurate information, iteratively refining it to deliver more reliable results. Human oversight, backed by precise data, helps ensure that our AI solutions not only curb the spread of misinformation but also bolster trust and safety in customer interactions.

Humans are the ties that bind our AI projects. This human-in-the-loop approach is what enables TDCX to transform AI into a powerful extension of the brand’s customer experience.

With 72% of enterprises already adopting AI, it’s indeed redefining how businesses interact with their customers. However, recent research of global enterprises emphasized that enthusiasm in AI for customer experience alone won’t cut it. AI models built on bad data are draining 6% of their annual revenue (or US$406 million on average). As businesses continue to use conversational AI and generative AI for customer experience, the cost of poor data quality can’t be ignored.

TDCX Talks: Creating Powerful CX in the Age of AI, happening on October 10 at Marina Bay Sands, Singapore, will explore how data can both unlock the promises and expose the pitfalls of AI in CX. Industry leaders and TDCX experts will share practical insights and success stories that will help businesses transform AI and GenAI into valuable tools for delivering exceptional customer experiences. 

Speak with our experts