Shaping the Future of Contact Center QA with ChatGPT

Artificial Intelligence
Posted on: April 24, 2023
Updated on: May 23, 2025

Written by: Scorebuddy

What is Scorebuddy Labs?

Scorebuddy Labs is our innovation center, a team that experiments with new technologies to build a better platform. It’s a great place to be, discovering new possibilities, validating them with key clients and partners, and establishing these innovations as part of our product roadmap.

DSC_0051 (1) (1)-2

While the implementation of generative AI for contact centers is still in its infancy, the potential of tools like ChatGPT is clear to see. At Scorebuddy Labs we are focused on experimentation, innovation, and, importantly, validation. We want to deliver more than promises to our customers—we want to deliver tried and tested solutions.

Emmanuel Doubinsky, Head of Innovation, Scorebuddy Labs

GPT-3 vs GPT-4

So we experimented with ChatGPT for a little while. We were genuinely impressed with the early GPT-3 model, although we could see some shortcomings in its understanding of our prompts, and in the accuracy and relevance of its answers. As a side note, if you’re interested in learning more about AI prompting, we recommend this free, open source course.

Then GPT-4 arrived, and things got better—much better. The AI understood more complex prompts and the quality and accuracy of answers improved.

Trying Out ChatGPT for Contact Center Quality Assurance

We tested some common use cases relevant to the QA programs of our large contact center customers:

Checking grammatical errors
Summarizing conversations
Detecting customer distress and agent empathy
Verifying correct business practice and regulatory compliance
Recommending coaching opportunities

What Worked: The Benefits of ChatGPT for Contact Center QA

The first three appeared to work well at a sample level. ChatGPT does a great job at checking spelling and grammatical mistakes, it provides good summaries of complex conversations, and it sometimes performs even better than human evaluators when it comes to detecting customer anger and proper agent empathy responses.

At Scorebuddy Labs, we’re undertaking extensive testing with customers and partners in order to determine, on a larger scale, whether or not ChatGPT is truly a revolutionary tool for contact center quality assurance.

Out-of-the-Box Interaction Analysis

Perhaps the most impressive feature was the quasi “ready to use” nature of the AI. ChatGPT processed all of these agent-customer interactions without any prior training.

This training aspect has killed many AI projects in the past, as it takes weeks or months of configuration and testing before initial deployments and even then further issues can arise, requiring additional training.

This “ready to go” capability of ChatGPT is a total game changer in AI.

What Didn’t Work: The Flaws of ChatGPT for Contact Center QA

ChatGPT is not a cure-all for contact center quality assurance. Some other use cases didn’t work as well, and it’s worth understanding what ChatGPT can and cannot do, so everyone wins when adopting it.

Monitoring correct business practices and regulatory compliance is not simple—for a human or a robot. Sometimes you have to check if an agent took the correct action on another platform, or if the agent mentioned the correct disclaimer for the specific context of the conversation in question.

Here, we hit some roadblocks. GPT-3 would answer all of these questions without hesitation, even though it sometimes had absolutely no context or background information to give such answers. This would be a major “no-no” for compliance checks, so we began to dismiss the technology for these use cases, relying more on our in-house AI and NLP capabilities.

GPT-4 improved things somewhat. It began to realize when it didn’t have the necessary context, which in itself is really impressive, and the new-ish model-tuning capabilities might just be able to bridge that gap.

The question is how much tuning is needed? And for how long? This will be the focus of our next experiment alongside our customers and partners, and we’re hopeful that we can come to a positive conclusion.

Sometimes ChatGPT Just Makes Stuff Up

One last challenge—and it’s a big one—is that ChatGPT is prone to “hallucinations”. What do we mean by this? Well, sometimes ChatGPT invents facts. Furthermore, it won’t tell you that these facts are invented, nor is it even aware that it made them up. This problem was a significant factor in the failure of our compliance use case, as this area requires total accuracy around its answers.

It also led to the failure of our final use case, wherein we asked ChatGPT to give us coaching recommendations that would help a particular agent improve on a specific weakness.

Coaching is a key focus of contact center QA. It supports the idea QA is not here to micromanage agents, but instead to help them become better at what they already do well. Unfortunately, this message doesn’t quite ring true when the coaching recommendations given by ChatGPT sound auto generated and unrelated to the issue you want to address.

It’s difficult to blame ChatGPT for this, but it’s concerning that the AI is totally unaware when it spits out fabricated information. This being said, it is possible that fine-tuning a model will help close this gap.

For now, however, the risk remains that ChatGPT does not know when it’s having one of these “hallucinations”.

It is capable of giving sincere answers, very well formulated, that appear as credible as those given by a human expert. Unfortunately, these answers are sometimes total gibberish.

Test Drive: Asking ChatGPT to Change a Car Battery

I asked ChatGPT the following question about my old car (which I have since upgraded to a newer model, thanks to my excellent work and contribution to Scorebuddy):

“How do I change the battery of my Mercedes E Class 2004?”

I was served with a complete answer detailing each step involved in this operation. Feel free to skip to the end (unless you drive a 2004 Mercedes E Class of course), but I want to share the deep level of credible details ChatGPT generated for me:

“Here are the general steps to change the battery on a Mercedes E Class 2004:

1. Turn off the engine and remove the key from the ignition.
2. Open the hood and locate the battery. The battery is usually located in the engine compartment, on the right-hand side.
3. Use a 10mm wrench to loosen and remove the negative cable from the battery. The negative cable is usually black and has a "-" symbol.
4. Use a 10mm wrench to loosen and remove the positive cable from the battery. The positive cable is usually red and has a "+" symbol.
5. Use a battery strap or a battery carrier to lift the battery out of the compartment. Be careful not to drop the battery, as it is heavy and can be damaged easily.
6. Clean the battery tray and the cable ends with a wire brush and a solution of baking soda and water.
7. Place the new battery in the battery tray, making sure that it is secure and level.
8. Connect the positive cable to the positive terminal of the battery and tighten the nut with a 10mm wrench.
9. Connect the negative cable to the negative terminal of the battery and tighten the nut with a 10mm wrench.
10. Close the hood and start the engine to make sure everything is working properly.

Please note that these steps are general and may vary slightly depending on the specific model and year of your Mercedes E Class. It is also important to take safety precautions, such as wearing gloves and safety glasses, when handling a car battery.”

I was impressed by the level of detail. “Use a 10 mm wrench”—so precise. “Clean the battery tray with a wire brush”—I never thought of doing that, this ChatGPT guy must work in a great garage. And I really loved the safety precaution at the end, it made the answer appear genuine and trustworthy.

But wait, if I open the hood of my beloved 2004 Mercedes E Class, there’s no sign of a battery? As it turns out, step 2 was wrong. The battery is not in the hood, it’s in the trunk (or the “boot” for us Europeans).

Asking the same question on Google instantly returned a video describing the correct steps, beginning with “open the trunk”.

Where Do We Go From Here?

I asked ChatGPT that very question and its long, honest answer contained this interesting section:

“It's important to always verify information that you receive, whether it's from an AI language model like myself or from any other source. Double-checking the facts with reliable sources and seeking out multiple perspectives can help you make informed decisions and avoid spreading misinformation.”

This is nice to say, but while my Google search gave me a verifiable source for the information, ChatGPT does not give you any sources—because it doesn’t know what its sources are.

What’s Next for Scorebuddy Labs?

Fine tuning, testing, more fine tuning, and more testing is probably the answer. Not everything in ChatGPT is ready to go right out of the box, but there’s more than enough there to set it apart as a potential game changer for contact center quality assurance.

Scorebuddy Labs will continue experimenting with this promising technology and let you know how our real-world deployments go. Unless we’re all replaced by robots—in which case, I’m sure they’ll keep you posted.

Stay up-to-date with Scorebuddy AI

Table of Contents