The mainstream sentiment approach simply breaks in front of social media

I have articulated this point in various previous posts or blogs before, but the world is so dominated by the mainstream that it does not seem to carry. So let me make it simple to be understood:

The sentiment classification approach based on bag of words (BOW) model, so far the dominant approach in the mainstream for sentiment analysis, simply breaks in front of social media. The major reason is simple: the social media posts are full of short messages which do not have the "keyword density" required by a classifier to make the proper sentiment decision. Larger training sets cannot help this fundamental defect of the methodology. The precision ceiling for this line of work in real-life social media is found to be 60%, far behind the widely acknowledged precision minimum 80% for a usable extraction system. Trusting a machine learning classifier to perform social media sentiment mining is not much better than flipping a coin.

So let us get straight. From now on, whoever claims the use of machine learning for social media mining of public opinions and sentiments is likely to be a trap (unless it is verified to have involved parsing of linguistic structures or patterns, which so far has never been heard of in practical systems based on machine learning). Fancy visualizations may make the results of the mainstream approach look real and attractive but they are just not trustable at all.

Why deep parsing rules instead of deep learning model for sentiment analysis?
Pros and Cons of Two Approaches: Machine Learning and Grammar Engineering
Coarse-grained vs. fine-grained sentiment analysis
一切声称用机器学习做社会媒体舆情挖掘的系统，都值得怀疑
 【立委科普：基于关键词的舆情分类系统面临挑战】

发布者

立委

立委博士，多模态大模型应用高级咨询。出门问问大模型团队前工程副总裁，聚焦大模型及其AIGC应用。Netbase前首席科学家10年，期间指挥研发了18种语言的理解和应用系统，鲁棒、线速，scale up to 社会媒体大数据，语义落地到舆情挖掘产品，成为美国NLP工业落地的领跑者。Cymfony前研发副总八年，曾荣获第一届问答系统第一名（TREC-8 QA Track），并赢得17个小企业创新研究的信息抽取项目（PI for 17 SBIRs）。查看立委的所有文章

发布者

立委

发表回复