Your Sentiment Analysis Is A Random Number Generator

May 14, 2026

Almost every social listening tool eventually adds sentiment analysis. It is too tempting not to.

Take a pile of mentions, split them into positive, negative and neutral, draw a nice chart, and suddenly the product looks smarter. The customer gets a number. The dashboard gets a color. Everyone feels like something has been measured.

The problem is that the number is usually worthless. Not "slightly imprecise". Worthless.

It is good enough to make pretty charts. It is not good enough to make decisions.

The uncomfortable truth: most sentiment analysis in monitoring tools is a random number generator with a professional UI.

The old way

Traditional sentiment analysis is a natural language processing task. That sounds fancy, but the simple version is not magic.

The system breaks text into words, normalizes them, removes some boring words, and then tries to score what remains. Words like "great", "love", "excellent" and "fantastic" push the score up. Words like "bad", "hate", "terrible" and "catastrophic" push the score down. Slightly better systems handle negation, punctuation, word order, and old-school machine learning trained on labeled examples.

This is cheap. You can run it over a huge number of mentions for almost nothing. That is why it became popular.

It is also why the cheap version is broken. It does not really understand what the sentiment is about. It mostly sees sentiment words.

Consider this sentence:

My car is great, and all others are fucking terrible and catastrophic.

Is that positive or negative? About the person’s car, it is positive. About all other cars, it is negative. As a single global sentiment score, it is nonsense.

The example that a single score cannot answer

Here is an even clearer example:

My experience at Starbucks sucked, it’s the worst cafe ever, so I drove off in my fantastic Audi, it’s the best car ever.

A tool that gives this one sentiment score is already wrong. The comment is negative about Starbucks and positive about Audi. The same sentence contains both. Without target-aware analysis, there is no correct single answer.

This is not an edge case. This is how people write on forums. They compare products. They complain about one thing while praising another. They quote somebody else. They use sarcasm. They swear for emphasis. They mention three brands in one paragraph.

The important question is not "is this text positive or negative?" The important question is "positive or negative about what?"

AI fixes the shape of the problem

Modern AI models can solve the structural problem. You can ask them to classify sentiment toward a specific brand, product, feature, or competitor.

That distinction matters. For the Starbucks and Audi example, the model can return:

Starbucks: negative.
Audi: positive.

This is also how we do it in Syften for select customers as a custom integration. We do not ask the model for generic sentiment. We specify the brand the sentiment should be about. We define the labels, the output format, and the cases where the model should say that there is no relevant sentiment.

That works much better. It is still not free.

The cost problem

We tested this on tens of thousands of Reddit comments. Our average cost per analyzed Reddit comment was:

GPT-5 mini: $0.00148 per comment.
GPT-5: $0.01033 per comment.

At first glance GPT-5 mini looks affordable. At Syften Pro volume, 500 matches per day is about 15,000 matches per month. That would cost $22.20 per month just for sentiment analysis.

The problem is accuracy. GPT-5 mini is terribly inaccurate for this job. In our tests, compared with the highest-IQ models, it can classify as much as 15% of comments wrong. That is not a rounding error. That is a dashboard lying to you with confidence.

GPT-5 is good enough for many cases, but still makes mistakes. It costs $0.01033 per Reddit comment in our data. At 500 matches per day, that becomes $154.95 per month.

Syften Pro costs about $100 per month. So GPT-5 sentiment analysis alone would cost more than the entire plan revenue.

What about GPT-5.5?

GPT-5.5 is very good. For this kind of work it is close to the point where I would no longer expect many obvious classification mistakes.

It is also more expensive. As of May 2026, OpenAI’s API pricing lists GPT-5.5 at $5 per million input tokens and $30 per million output tokens. The GPT-5 price point we measured against was $1.25 per million input tokens and $10 per million output tokens.

Because sentiment classification is mostly input tokens, GPT-5.5 would likely cost about 3x to 4x as much as GPT-5 for the same workload. Using our GPT-5 cost as the baseline, that puts GPT-5.5 around $0.031 to $0.041 per analyzed Reddit comment.

At 15,000 comments per month, that is roughly $465 to $620 per month for sentiment analysis. On a $100 plan.

That is the real reason proper sentiment analysis is not a checkbox feature. The cheap version is wrong. The good version is expensive. The excellent version is absurd as a default feature on normal SaaS pricing.

Why I do not expect this to change quickly

Old NLP sentiment analysis is already cheap. Its cost can stay low forever and it will not fix the core problem, because the core problem is not compute. The core problem is that it does not understand the target of the sentiment.

AI does understand the target, but now you are paying for real reasoning over messy human text. Model prices will move over time, but the gap is too large to hand-wave away. Even a large price drop would still make "analyze everything by default" hard to justify for a $100 plan.

More importantly, the moment sentiment quality becomes good enough to trust, people will want more than positive, neutral and negative. They will want purchase intent, competitor sentiment, urgency, topic, feature area, author quality, and whether a reply would help. That pushes you back into expensive custom analysis.

The practical answer

That is why sentiment analysis makes sense as a custom integration, not as a decorative metric. When a customer has a real use case and enough value behind the result, we can build it properly. We specify what brand the sentiment is about. We test it against real examples. We choose a model that is accurate enough for the decision it supports.

That is a very different product from "here is a sentiment chart because charts look nice in screenshots."

Summary

The next time a monitoring tool promises sentiment analysis, ask what it is actually classifying. The whole post? The sentence? The brand? Which brand? What happens when the comment compares two products? What model is doing it? How much does that model cost per mention?

Without good answers, the feature is not measuring sentiment. It is generating random values for pretty charts.