The mainstream sentiment approach simply breaks in front of social media

I have articulated this point in various previous posts or blogs before, but the world is so dominated by the mainstream that it does not seem to carry.  So let me make it simple to be understood:

The sentiment classification approach based on bag of words (BOW) model, so far the dominant approach in the mainstream for sentiment analysis, simply breaks in front of social media.  The major reason is simple: the social media posts are full of short messages which do not have the “keyword density” required by a classifier to make the proper sentiment decision.   Larger training sets cannot help this fundamental defect of the methodology.  The precision ceiling for this line of work in real-life social media is found to be 60%, far behind the widely acknowledged precision minimum 80% for a usable extraction system.  Trusting a machine learning classifier to perform social media sentiment mining is not much better than flipping a coin.

So let us get straight.  From now on, whoever claims the use of machine learning for social media mining of public opinions and sentiments is likely to be a trap (unless it is verified to have involved parsing of linguistic structures or patterns, which so far has never been heard of in practical systems based on machine learning).  Fancy visualizations may make the results of the mainstream approach look real and attractive but they are just not trustable at all.

Related Posts:

Why deep parsing rules instead of deep learning model for sentiment analysis?
Pros and Cons of Two Approaches: Machine Learning and Grammar Engineering
Coarse-grained vs. fine-grained sentiment analysis
一切声称用机器学习做社会媒体舆情挖掘的系统,都值得怀疑
【立委科普:基于关键词的舆情分类系统面临挑战】

发布者

liweinlp

立委博士,自然语言处理(NLP)资深架构师,Principle Scientist, jd-valley, Netbase前首席科学家,期间指挥团队研发了18种语言的理解和应用系统。特别是汉语和英语,具有世界一流的分析(parsing)精度,并且做到鲁棒、线速,scale up to 大数据,语义落地到数据挖掘和问答产品。Cymfony前研发副总,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个美国国防部的信息抽取项目(PI for 17 SBIRs)。立委NLP工作的应用方向包括大数据舆情挖掘、客户情报、信息抽取、知识图谱、问答系统、智能助理、语义搜索等等。

发表评论