Top 5 Sentiment Analysis Challenges in 2024

Updated on Jan 3

4 min read

Table of contents

1. Context-dependent errors 2. Negation Detection 3. Multilingual Data 4. Emojis 5. Potential Biases in Model Training

Words are the most powerful tools to express our thoughts, opinions, intentions, desires, or preferences. However, they do not have the same meaning in all instances. Instead, the meaning conveyed is mainly shaped by the context. This complexity of human languages constitutes a challenge for AI methods that work with natural languages, such as sentiment analysis.

Consider the following example:

Figure 1. Consumer feedback on a product

The consumer states in his review that he is content with the product, and his words can be classified as positive (e.g., “love,” “amazing,” and “long battery life”). However, in the fifth sentence, he says that his wife does not have similar thoughts. Instead, her sentiment regarding the product is negative (e.g., “too heavy”). So, how would the algorithm classify this review? As positive, negative, or neutral?

Here are the top five challenges of conducting sentiment analysis and how to solve them:

1. Context-dependent errors

Sarcasm

People tend to use sarcasm as a way of expressing their negative sentiment, but the words used can be positive (e.g., “I am so glad that the product arrived in one piece!”). In such cases, sentiment analysis tools can classify the feedback as positive, which in reality is negative.

Solution: Determine the boundaries of sarcasm in the training dataset. For instance, researchers used a multi-head self-attention-based neural network architecture to identify terms that include sarcasm. It highlights the parts that have a sarcastic tone, then connects these parts to each other to obtain an overall score.

Polarity

Although the emotional tone in some sentences can be very apparent and robust (e.g., “It was a terrible experience.”), the others are not easily classified as positive, negative, or neutral (e.g., “The service quality is not mentionable.”). So, the polarity of the statement cannot always be easily inferred by the algorithms.

Solution: Give polarity scores to the words in the training dataset so that the algorithm can classify the difference between statements such as “very good” and “slightly good.”

Polysemy

When words have more than one meaning (e.g., the head of the sales team vs. wearing an earbud hurts the head), then it becomes more challenging for the algorithm to differentiate what the intended meaning is. Thus, as the word is not evaluated in its context, the results of the analysis can be inaccurate.

Solution: Incorporate domain knowledge during text annotation and model training phases. It can help your sentiment analysis algorithms to differentiate between words that have different meanings in different contexts.

2. Negation Detection

Just because a sentence contains negation (e.g., no, not, -non, -less, -dis), it does not mean that the overall sentiment of the statement is negative. Current negation detection methods are not sufficient to classify the sentiment correctly. For instance, “It was not unpleasant” is a statement with negation and can be classified by the algorithm as negative, but it conveys a positive meaning.

Solution: Train your algorithm with large datasets, including all possible negation words. A combination of term-counting methods that regard contextual valence shifters and machine learning methods is found to be effective in identifying negation signals more accurately.

3. Multilingual Data

Although English is the common language used worldwide, as companies grow, they engage with customers globally. This results in customers using different languages while providing feedback. However, the sentiment analysis tools are primarily trained to categorize the words in one language, and some sentiments may get lost in translation. This causes a significant problem, especially while conducting sentiment analysis on non-English reviews or feedback.

Solution: Design systems that can learn from multilingual content and can make predictions regardless of the language. For instance, you can use a code-switching approach that includes parallel encoders at a word and implements models such as deep neural networks. You can also check our article on multilingual sentiment analysis for a comprehensive account.

4. Emojis

Figure 2. The valence and arousal rates for the most used emojis

Emojis have become a part of daily life and are more effective in expressing one’s sentiment compared to words. However, as the sentiment analysis tools depend on written texts, emojis cannot be classified accurately and thus are removed from many analyses. In turn, one ends up with a noncomprehensive analysis.

Solution: Determining the emoji tags and implementing them into your sentiment analysis algorithm can improve the accuracy of your analysis.

5. Potential Biases in Model Training

Although AI algorithms are powerful tools to make accurate predictions, they are trained by humans. This means that they inevitably reflect human biases in the training dataset in their results. For instance, if the algorithm is trained to label the sentence “I am a sensitive person” as negative and label the sentence “I can be very ambitionist” as positive, the results can be biased towards some people with emotional tendencies and may distinguish overly ambitious people.

Solution: Minimize bias in AI systems by conducting debiasing methods. For instance, you can detect the words in your dataset that might involve human bias and develop a dictionary for these words. This way, you can tag them and then compare the overall sentiment in the text with and without these tagged words.

To learn more about sentiment analysis, read our other articles:

If you think your company can benefit from sentiment analysis, check our data-driven list of sentiment analysis services.

Do not hesitate to contact us if you have any further questions:

Find the Right Vendors

Access Cem's 2 decades of B2B tech experience as a tech consultant, enterprise leader, startup entrepreneur & industry analyst. Leverage insights informing top Fortune 500 every month.

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

To stay up-to-date on B2B tech & accelerate your enterprise:

Follow on

Comments

Your email address will not be published. All fields are required.

0 Comments

Related research

Fake Review Detection in 2024: How it works & 3 Case Studies

Feb 164 min read