I have articulated this point in various previous posts or blogs before, but the world is so dominated by the mainstream that it does not seem to carry. So let me make it simple to be understood:
The sentiment classification approach based on bag of words (BOW) model, so far the dominant approach in the mainstream for sentiment analysis, simply breaks in front of social media. The major reason is simple: the social media posts are full of short messages which do not have the “keyword density” required by a classifier to make the proper sentiment decision. Larger training sets cannot help this fundamental defect of the methodology. The precision ceiling for this line of work in real-life social media is found to be 60%, far behind the widely acknowledged precision minimum 80% for a usable extraction system. Trusting a machine learning classifier to perform social media sentiment mining is not much better than flipping a coin.
So let us get straight. From now on, whoever claims the use of machine learning for social media mining of public opinions and sentiments is likely to be a trap (unless it is verified to have involved parsing of linguistic structures or patterns, which so far has never been heard of in practical systems based on machine learning). Fancy visualizations may make the results of the mainstream approach look real and attractive but they are just not trustable at all.
Why deep parsing rules instead of deep learning model for sentiment analysis?
Pros and Cons of Two Approaches: Machine Learning and Grammar Engineering
Coarse-grained vs. fine-grained sentiment analysis