Business press agrees that there are numerous potential benefits of chatbots, and significant funding has poured into chatbot companies to realize this potential. For this potential to materialize, chatbots need to be successful however there are numerous public chatbot failures which could be limiting the growth of bots. Effective testing can reduce chatbot failures. We compared 7 chatbot testing frameworks which include comprehensive chatbot testing approaches, chatbot testing software and chatbot testing services.
What are important chatbot testing concepts?
Most testing approaches lack standardization as it is hard to quantify the frequency of conversations that test cases cover, especially before a bot is launched. The aim should be to cover the most likely scenarios thoroughly. For example, Chatbottest is an open-source project that provides a database of 120 questions to test the chatbot and user experience.
The concept they developed follows a Gaussian nature. The test mechanism developed broadly follows three categories. Expected scenarios, possible scenarios, and almost impossible scenarios. This scenario testing structure can be mapped to sigma distances.
Empirically, after testing for almost impossible scenarios which can be considered as the 3-sigma distance, the chatbot performance would be observed for 99% confidence interval. It would be costly to test further since there is an infinite combination of ways humans can use language.
Areas for testing
Chatbottest provides 7 broad categories for testing
- Personality: Does the chatbot have a clear voice and tone that fits with the users and with the ongoing conversation?
- Onboarding: Are users understanding what is the chatbot about? and how to interact with him from the very beginning?
- Understanding: Requests, Small talk, idioms, emojis… What is the chatbot able to understand?
- Answering: What elements does the chatbot send and how well it is doing it? Are they relevant to the moment and context?
- Navigation: How easy is to go through the chatbot conversation? Do you feel lost sometimes while speaking with the chatbot?
- Error management: How good is the chatbot dealing with all the errors that are going to happen? Is it able to recover from them?
- Intelligence: Does the chatbot have any intelligence? Is it able to remember things? Uses and manages context as a person?
- Response time: Customers want fast responses, faster responses can keep customers more engaged with bots.
For more information about the chatbot testing, feel free to visit our Chatbot testing guide focusing on a/b testing research
What are chatbot testing frameworks to put these concepts to practice?
|Framework/software||Source code||Contributors on Github||Last commit on Github||Notes|
|Botium.at||Open||10||8/Dec/2019||Chatbot test automation|
|chatbottest.com||Open||3||8/Oct/2018||Set of questions to standardize chatbot testing|
|dimon.co||Propriatery||Chatbot test automation. Dimon has integration with major platforms such as Slack, Facebook Messenger, Telegram, and WeChat|
|qbox.ai||Propriatery||NLP training data optimization|
|Zypnos.com||Propriatery||Regression testing for chatbots|
What are the limitations of chatbot testing?
Continuous effort is required to ensure that tests remain up-to-date
While standardized tests are crucial, they need to remain dynamic in line with the development of the bot. For example, if we create a test for a specific expression (talk to an operator) to address queries by customers that want to talk to customer service agents, we need to ensure that similar tests in other languages need to be prepared when our bot is launched internationally.
This is a common phenomenon. Goodhart’s law states that once a social or economic measure is turned into a target for policy, it will lose any information content that had qualified it to play such a role in the first place. Therefore, keeping the testing process as dynamic as possible will make the whole testing process more meaningful and would reduce the fragility of the chatbot.
Testing can create a false sense of security
As explained above, static tests lose their relevance over time but a large number of tests, regardless of whether they are up-to-date or not, create a sense of security. However, as tech leaders know quite well, only the paranoid survive.
What are the challenges for chatbot testing?
Multi/cross-channel user experience
Users should be able to communicate with bots seamlessly using different ways such as text, or voice by using various platforms such as desktops, mobile or social media seamlessly switching between channels. However, testing different platforms can be problematic. Chatbots should be able to respond consistently without losing the context and data collected through each channel.
There may be scenarios that a chatbot user needs to help on different topics. For example, a traveler may want to learn the weather conditions and make a reservation for a meal. In this case, a single bot has to switch to different domains. Although using an additional agent bot is a method to fix the problem, this can negatively impact user experience. Therefore, domain switching, which can have numerous combinations, needs to be tested.
Uncertainty of user conversation
Language is ambiguous so testing needs to be context-dependent. For example, the answer to “how will the weather be tomorrow?” will depend on where the user is or whether the user has planned any travel for tomorrow. Therefore, it is not possible to complete chatbot testing by just testing answers with responses. Testing for the understanding of the context is hard but needs to be a part of chatbot testing for truly intelligent chatbots.
Check out our previous articles for more on chatbots:
- General guide about chatbots
- Objective metrics for measuring the performance of a chatbot so you can measure results of testing
- Chatbot success stories. We recommend reading it since success stories are rare and since they can be studied to learn drivers of chatbot success
- Chatbot testing guide focusing on a/b testing
Are you looking for an AI solution? Let us know. We can find the best AI partner for your business.
How can we do better?
Your feedback is valuable. We will do our best to improve our work based on it.