立委博士,问问副总裁,聚焦大模型及其应用。Netbase前首席科学家10年,期间指挥研发了18种语言的理解和应用系统,鲁棒、线速,scale up to 社会媒体大数据,语义落地到舆情挖掘产品,成为美国NLP工业落地的领跑者。Cymfony前研发副总八年,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。
白:S+/N N S/2N,这是一样的。但是,惠格是必选论元,可以置换一个免费额度给萝卜“中国”重用。间宾是正式工作,发工资的,介宾是客串临时工,不发工资的。在中国,不是惠格,没有这个待遇。介宾就是正式工作了。中间代表“中国”的那个N,重用与否,有免费额度与否,区别主要在这儿。这是“服务”的论元结构决定的。
下图叫 Brand Passion Index Trend,内涵舆情挖掘的简约但丰富的信息,反映的是对于狗肉好恶的消长趋势,图中截取了过去一年半中社会舆论喜爱或痛恨(吃)狗肉的争议变化,三个泡泡反映了每半年的好恶(BPI)指标:泡泡的深浅度反映了数据的新旧,具体说就是,左下颜色最深的泡泡是最近半年统计挖掘出的好恶指标,中间那个泡泡是一年前的统计,颜色最浅的右下泡泡反映的是一年半前的指标。泡泡的大小表明了热议度,譬如一年前比半年前热议更多。泡泡所处的坐标位置反映了两项舆情,一是好恶(越往左越喜欢),二是情绪烈度(越往上越情绪化)。可见,一年前那个统计情绪烈度最大,而对(吃)狗肉的厌恶则随着时间推移越来越大(越来越靠左)。这说明什么呢?应该可以看到近年来,动物保护主义的影响在逐渐增大,反对吃狗肉的呼声正在变高。
一个偶然的系统测试,暴露出百度与“哪里有小姐”身影相随。这个发现在朋友间立即引起轩然大波,有称妙的(way to go, u r onto sth),有调侃的(曰:百度本来就源自“众里寻她千百度”嘛),有怀疑的(the results are not faked?)。阴谋论者伊妹儿我,指责此云有侮辱百度之嫌。
(2)不過一年來也有10多次短暫的亮點,聲望處於零度以上(褒大於貶),雖然都好景不長:從圖上看,去年七月初到九月初之間是正面聲望持續最長的區間(只在八月短暫跌入零度以下),不知道有什麽亮麗的政治表現還是由於團隊公關得力,有興趣的讀者可以查證一下。馬總統宣誓就職的五月中,凈情緒指標尚在零下30度左右徘徊,怎麽到了七月就迅速回暖至零度以上,持續約兩個月,直到九月2日的+35的峰值。我對臺灣政治不熟悉,也沒有精力去探究 data 和證據鏈(盡管我們的工具提供了多項 drill down 的功能),但這個區間似乎確是馬總統二度當選以來得到民眾認可的最佳時期。此後就一蹶不振,只在十月、十一月與今年元月短暫回升。一年來的最低點在三月四日的-44,十二月16日也很慘,一度跌入-42,冰凍刺骨。總而言之,馬英九自從去年初當選以來,不是很順,民眾的失望抱怨情緒彌漫網壇。
博主回复(2013-12-25 07:09):你提到“抓取关键词”,怀疑系统不能处理否定式(“也许是有人说不用 fear 了”),那是你不了解我的背景,虽然我在100多篇科普性博客已经多方面描述过系统的能力。简言之,我们的舆情挖掘不是通常的关键词技术,而是建立在高级得多的深度语法分析(deep parsing)之上的信息抽取和挖掘。不仅可以对付否定式,否定之否定等更复杂的语言现象也能处理。
我其实没有什么立场,也没有相关的生物知识背景,转基因从来不是我关注的对象(因为是热点话题才选它当小白鼠做舆情挖掘的试验,而不是对其感兴趣)。通过朋友的争论和综述, 觉得两边的极端派掐架很难看,都有误导和蛊惑。(By the way,我觉得挺转人士当年犯了致命错误,他们不该把 GM 翻译成转基因,要是翻译成生物高科技最新改良食品伍的,就会减少很多阻力和疑虑。名不正则言不顺,言不顺则事不成。现在好多百姓听到转基因就跟听到癌症似的,你说说这个术语翻译是不是害死人。后来金大米起的名字就很好,无奈受转基因的牵累,还是遭到很多人的排斥。)
我本人不介意吃转基因食品,因为从来没有感受到有危险。我去肯德基最喜欢买的就是他们的烤玉米。从来不问出处。但事已至此,转基因就不单是科学的问题了。要上老百姓餐桌的话,老百姓的感受不能不顾及。作为一种过渡,我觉得在中国有必要给转基因食品做标识(或给非转食品做标识,one way or the other),给人民选择的权利。这个不必要循美国不标识的例,原因是国情不同,老百姓为食品安全困扰太久,井绳之忧是自然的反应。转基因的最终胜出,应该靠自己的实力,譬如价格的低廉,日益显示出来的安全性等。标识以后,科学人士和我等无所谓(畏)人士会自然成为其消费者。最后会争取到其他中间用户。至于反转死硬分子,就让他们一辈子多花冤枉钱去消费“纯天然”食品也蛮好的。
【转BT基因的BT蛋白是否引起过敏是FDA/EPA必须检测的项目】就表明了有这个担心。
这里不需要讲什么“转基因的蛋白会引起面筋过敏的实例和原理”,只要相信墨菲的定律(http://zh.wikipedia.org/zh-cn/摩菲定理):“凡是可能出错的事均会出错。”(Anything that can go wrong will go wrong.)。可引申为“若缺陷有很多个可能性,则它必然会朝着最坏、最可怕的方向发展”。
回复 : 当然,舆情都是人为制造出来的,不是上帝或者自然的现象。没有制造,就没有挖掘。在媒体学中,对企业或政府有组织的制造舆情与个体自然流露的民情,是严格区分的,叫 push media and pull media。企业有公关部也有游说为其利益服务。政府(特别是赤色政府或白色政府)专门设置庞大的宣传部,目的就是制造和牵引舆论,为稳定或洗脑用。两类舆情有相互渗透的时候,但本质上代表了完全不同的诉求。现在网络普及了,各显神通,开始有僵尸和水军,也是为了制造舆论,或者搅混水。这些都应该在挖掘和测量中予以分别处理和对待。这条路很漫长,但曙光就在前面。
有问,这一波热潮会不会是类似2000年的又一个巨大的泡沫?我的观察是,也是,也不是。的确,在大数据的市场还不成熟,发展和盈利模式还很不清晰的时候,大家一窝蜂拥上来创业、投资和冒险,其过热的行为模式确实让人联想到世纪之交的互联网 dot com 的泡沫。然而,这次热潮不是泡沫那么简单,里面蕴含了实实在在的内容和价值潜力,我们下面会具体谈到。当然这些潜在价值与市场的消化能力是否匹配,仍是一个巨大的问题。可以预见三五年之后的情景,涅磐的凤凰和死在沙滩上的前浪共同谱写了大数据交响乐的第一乐章。
所谓大数据,更多的是社会媒体火热以后的专指,是已经与施事背景相关联的数据,而不是搜索引擎从开放互联网搜罗来的混杂集合。没有社会媒体及其用户社会网络作为背景,纯粹从量上看,“大数据”早就存在了,它催生了搜索产业。对于搜索引擎,big data 早已不是新的概念,面对互联网的汪洋大海,搜索巨头利用关键词索引(keyword indexing)为亿万用户提供搜索服务已经很多年了。我们每一个网民都是受益者,很难想象一个没有搜索的互联网世界。但那不是如今的 buzz word,如今的大数据与社会媒体密不可分。当然,数据挖掘领域把用户信息和消费习惯的数据结合起来,已经有很多成果和应用。自然语言的大数据可以看作是那个应用的继续,从术语上说就是,text mining (from social media big data)是 data mining 的自然延伸。对于语言技术,NLP 系统需要对语言做结构分析,理解其语义,这样的智能型工作比给关键词建立索引要复杂千万倍,也因此 big data 一直是自然语言技术的一个瓶颈。
在处理海量数据的问题解决以后,查准率和查全率变得相对不重要了。换句话说,即便不是最优秀的系统,只有平平的查准率(譬如70%,抓100个,只有70个抓对了),平平的查全率(譬如30%,三个只能抓到一个),只要可以用于大数据,一样可以做出优秀的实用系统来。其根本原因在于两个因素:一是大数据时代的信息冗余度;二是人类信息消化的有限度。查全率的不足可以用增加所处理的数据量来弥补,这一点比较好理解。既然有价值的信息,有统计意义的信息,不可能是“孤本”,它一定是被许多人以许多不同的说法重复着,那么查全率不高的系统总会抓住它也就没有疑问了。从信息消费者的角度,一个信息被抓住一千次,与被抓住900次,是没有本质区别的,信息还是那个信息,只要准确就成。疑问在一个查准率不理想的系统怎么可以取信于用户呢?如果是70%的系统,100条抓到的信息就有30条是错的,这岂不是鱼龙混杂,让人无法辨别,这样的系统还有什么价值?沿着这个思路,别说70%,就是高达90%的系统也还是错误随处可见,不堪应用。这样的视点忽略了实际的挖掘系统中的信息筛选(sampling)与整合(fusion)的环节,因此夸大了系统的个案错误对最终结果的负面影响。实际上,典型的情景是,面对海量信息源,信息搜索者的几乎任何请求,都会有数不清的潜在答案。由于信息消费者是人,不是神,即便有一个完美无误的理想系统能够把所有结果,不分巨细都提供给他,他也无福消受(所谓 information overload)。因此,一个实用系统必须要做筛选整合,把统计上最有意义的结果呈现出来。这个筛选整合的过程是挖掘的一部分,可以保证最终结果的质量远远高于系统的个案质量。总之,size matters,多了就不一样了。大数据改变了技术应用的条件和生态,大数据 更能将就不完美的引擎。
3 大数据不是决策的唯一依据,只是依据之一。正确的决策必须综合各种信息来源。大事不提,看看笔者购买洗衣机是怎样使用大数据、朋友口碑、实地考察以及种种其他考量的吧。以为有了大数据,就万事大吉,是不切实际的。值得注意的是,即便被认为是真实反映的同一组数据结果也完全可能有不同的解读(interpretations),人们就是在这种解读的争辩中逼近真相。一个好的大数据系统,必须创造条件,便于用户 drill down 去验证或否定一种解读,便于用户通过不同的条件限制及其比较来探究真相。
分享【3】On Big Data NLP热度 1 李维2013-7-27 20:43Admittedly, it is not easy to develop an NLP ( Natural Language Processing ) system with both high precision and high recall (i.e. high F-score) due to the ambiguity and complexity of natural language phenomena. Social media is even more challenging, full of misspellings, irregularities, and ...个人分类: 立委科普|766 次阅读|2 个评论
【9】【立委科普:所谓大数据(BIG DATA)】热度 3 李维2013-3-21 04:58Big data is not just data that are big. In the sense of data load, big data has been there for quite a while in Internet, on which the entire search industry was based and developed. The current buzz word big data is different, it is innately associated with users' background and social ...个人分类: 立委科普|1175 次阅读|3 个评论
【10】广而告之:科学网“双百”博主立委四月一日在北京演讲大数据挖掘热度 11 李维2013-3-20 19:57UPDATE:立委愚人节北京讲演时间地点已经确认,感谢中文信息学会孙教授的邀请和安排,也感谢董振东前辈教授的建议和推举: The loacation is : Room 334, 3rd floor, building 5 Institute of Software, Chinese Academy of Sciences, No. Zhongguancun South 4th Street 10:00~12:00 It' ...个人分类: 立委科普|1283 次阅读|13 个评论
分享【11】Coarse-grained vs. fine-grained sentiment extraction李维2013-3-12 06:51As for sentiment extraction itself, there are different layers: 1. sentiment classification: thumbs-up and down (or plus neutral) 2. sentiment association: to associate a sentiment with a topic or brand 3. fine-grained sentiment extraction: for example, who made the sentiment comment? about w ...个人分类: 立委科普|671 次阅读|没有评论
Five challenges to keyword-based sentiment classification: (1) domain portability; (2) micro-blogs: sentence/twit classification is a lot tougher than document classification; (3) when big data become small: big data load when sliced and diced based ...个人分类: 立委科普|1372 次阅读|1 个评论
【17】【科研笔记:big data NLP, how big is big?】热度 1 李维2012-10-31 19:03Big data 与 云计算一样,成为当今 IT 的时髦词 (buzzword / fashion word ). 随着社会媒体的深入人心以及移动互联网的普及,人手一机,普罗百姓都在随时随地发送消息,发自民间的信息正在微博、微信和各种论坛上遍地开花,big data 呈爆炸性增长。对于信息受体(人、企业、政府等),信息过载(information overlo ...个人分类: 立委科普|967 次阅读|1 个评论
Automatic survey complements and/or replaces manual survey. That is the increasingly apparent direction and trend as social media are getting more popular everyday. 自动民调(or 机器民调: Automatic Survey / Machine Survey)指的是利用电脑从语言数据中自动抽取挖掘有关特定话题的民间舆论,其技术 ...个人分类: 立委科普|1530 次阅读|3 个评论
分享【23】比起英语,汉语感情更外露还是更炽烈?李维2012-4-28 04:29Chinese is a more sentiment-intensive language than English?? FW: Counts of sentiment words in Chinese and English Interesting finding: that Chinese more than doubles the negative words and more than triples the positive words in comparison with the English vocabulary. This is based on the 5 ...个人分类: 立委科普|1158 次阅读|没有评论
【26】《科普随笔:机器八卦》李维2011-10-14 17:09机器八卦:Text Mining and Intelligence Discovery (13219) Posted by: liwei999 Date: June 10, 2006 10:07PM 犀角提议,干脆用机器挖掘吧。我不想吓唬大家,但是,理论上说,除非你不冒泡,言多必失,机器八卦,比人工挖掘,可能揭示出你的更多特征。好在该技术还不成熟。 Text mining 是我这 ...个人分类: 立委科普|863 次阅读|没有评论
【27】言多必露,文本挖掘可以揭示背景信息热度 1 李维2011-7-11 01:03言多必露,挖掘有商用价值的背景信息 文本挖掘(text mining)中,Demographic Profile Extraction 的任务是要给网虫自动分类,揭示其背景信息(年龄,性别,身份,族裔,人生阶段,家庭背景等)。 一些简单的规则,查准率高(high precision),查全率并不高(moderate recall),譬如: I am X -- X (student, t ...个人分类: 立委科普|939 次阅读
分享【43】只认数据不认人:IRT 的鼓噪左右美国民情了么?热度 3 李维2013-12-30 06:27套用北韩最近流行的歌颂红太阳金正恩的红歌,数据,数据,《除了它我们谁也不认!》 当然,还有上帝: In God We Trust. In everyone else we need data. 大数据时代更是如此,只认数据不认人。道理很简单,在信息爆炸的时代,任何个人的精力、能力和阅历都是有限的,所看到听到的都是冰山一角。小崔如此,其他大V也 ...个人分类: 社媒挖掘|918 次阅读|10 个评论
分享【48】Social media mining: Teens and Issues李维2013-9-9 21:36As is well known, the teenager years are a special and important period of growth for children, or young adults, to be more precise. It is growing pain, mixed with joy. It is often a rebellious phase when both parents and teens find it difficult to communicate with each other. Thi ...个人分类: 社媒挖掘|542 次阅读|没有评论
分享【49】【微博自动民调:薄熙来、薛蛮子和李天一】热度 2 李维2013-8-30 09:33Automatic Survey from the last month of Sina Weibo (Chinese twitter, the most influential social media Microblog site) on three major characters: the former Chinese politician Bo Xilai in his on-going trial, the very famous social media figure Charles Xue who is said to have millions of fans and w ...个人分类: 社媒挖掘|898 次阅读|2 个评论
分享【54】【自动民调:美国名牌大学人气排名】热度 1 李维2013-8-12 16:46For the first time, the automatic survey of social media 1-year archive on some US brand name universities shows the rankings as follows, which are quite different from official ranking (Harvard and Caltech accidentally not included): 1. UCSD; 2.Chicago; 3. UPenn; 4. Carnegie Mellon ...个人分类: 社媒挖掘|794 次阅读|1 个评论
分享【57】舆情挖掘用于股市房市预测靠谱么?热度 1 李维2013-4-18 21:24Can social media sentiment mining be used for predicting stock/property market? I tried our Chinese system for that and it proved to be right. Is that pure luck or there is some value in using public opinions and sentiments to assist prediction of markets? 作为技术展示,曾经用中文社交媒体的舆 ...个人分类: 社媒挖掘|605 次阅读|1 个评论
Maytag Maxima 4.3 cu. ft. High-Efficiency Front Load Washer with Steam in Granite, ENERGY STAR Model # MHW7000XG 989.10/EA−EachWAS989.10/EA−EachWAS1,399.0 0 LG Electronics 4.0 cu.ft. High-Efficiency Front Load Washer in Graphite Steel, ENE ...个人分类: 社媒挖掘|943 次阅读|2 个评论
分享【63】《大数据时代的购物策略:洗衣机寻购记(2)》热度 3 李维2013-2-25 22:41洗衣机的选择:top loading 抑或 front loading? 作者: 立委 日期: 02/24/2013 23:35:39 本来我们是要放弃 front loading (镜先生考证,国内叫滚筒式)洗衣机,去选更容易清洁的 top loading (国内称作 波轮式 )的。可是如今大数据了,领导还是要看看二者的优劣,听听用户都怎么选择的。 于是挖掘 ...个人分类: 社媒挖掘|1067 次阅读|4 个评论
分享【64】《大数据时代的购物策略:洗衣机寻购记(1)》热度 8 李维2013-2-25 21:07ABSTRACT Brand Passion Index (BPI) is used to help us make an informed decision in our on-going purchase of a new washer. Using our own product, we generated two BPIs, one to compare the major washer brands in the US market and the other to compare front loading vs. top loading. With ...个人分类: 社媒挖掘|1996 次阅读|10 个评论
【Brand Passion Index 3: international fast food brands in China market face challenges】 Chinese Social Media Mining: Brand Passion Index for international fast food brands McDonald's, Pizza Hut, KFC and Yoshinoya in China. Fairly negative. The golden time when McDonald's ...个人分类: 社媒挖掘|1858 次阅读|9 个评论
Chinese mobile phone market is found to be still in the stage of multiple vendors competing with each other with no single one clearly ahead of others. Even Apple iPhone is on a par, in terms of net sentiments and consumer passion, with HTC, Samsung, Nokia and Chinese brand Xiaomi d ...个人分类: 社媒挖掘|810 次阅读|1 个评论
RE: What do these tell us more than we've known already? very good question: however, if it is known info, it confirms its validity 日期: 01/01/2013 11:11:49 it builds the users' (and developers') confidence in the automatic summerization of the computer processing of t ...个人分类: 社媒挖掘|644 次阅读|没有评论
Let us have a look at the past year 2012, which is more associated with the hardest year in people's mind than a good/best year.个人分类: 社媒挖掘|838 次阅读|没有评论
Most every hot topic coming to my mind these days, I will check our social media system to see how social media reflects it. Word clouds are intriguing vehicles to present the common social image. Most word clouds generated by other systems are based on statistics of keywords mentioned ...个人分类: 社媒挖掘|804 次阅读|1 个评论
一个偶然的系统测试,暴露出百度与“哪里有小姐”身影相随。这个发现在朋友间立即引起轩然大波,有称妙的(way to go, u r onto sth),有调侃的(曰:百度本来就源自“众里寻她千百度”嘛),有怀疑的( the results are not faked? )。阴谋论者伊妹儿我,指责此云有侮辱百度之嫌。 我跟老友说:我没有结论。有 ...个人分类: 社媒挖掘|1518 次阅读|没有评论
今天测试知名品牌百度的TagCloud,有惊人发现 日期: 12/12/2012 18:51:14 在简体字的world里面,与百度最紧密关联的词语是: 哪里有小姐 在繁体字的 world,最关联的词是 美元 不知怎么就想起了 Google 被赶出中国前对谷歌的指责:说 Google 太黄了。 黄得过百度么? A follow-up post a ...个人分类: 社媒挖掘|888 次阅读|3 个评论
Obama won the debate, see our evidence 民调自动化,技术带领你自动检测舆情: 社会媒体twitter的自动检测表明,奥巴马显然赢了昨晚的第二次辩论。人 气曲线表明他几乎在所有议题上领先罗梅尼。 对奥巴马真正具有挑战性的议题有二:一是他在第一任总统期间的经 济表现(6:55pm);二是批判他对中国不够强硬 ...个人分类: 社媒挖掘|1209 次阅读|1 个评论
分享【99】社会媒体舆情自动分析:马英九 vs 陈水扁李维2012-9-29 16:51Different social images and social media sentiments for Ma Yingjiu, Taiwan President, and Chen Shuibian, Taiwan former president. 不同的社会媒体评价,截然不同的民间形象,台湾现总统马英九 vs 台湾前总统陈水扁,社会媒体自动分析的初步结果凸显二者的不同形象和风格。 (1) 高频情绪性词的词频分析的对 ...个人分类: 社媒挖掘|830 次阅读|没有评论
分享【101】舆情自动分析表明,谷歌的社会评价度高出百度一倍李维2012-9-8 20:32拖了这么久,中文系统的初步试验终于开始 日期: 09/06/2012 21:04:35 本来核心系统的开发最难,最耗时间 ,结果在真实生活中,工程架构、存贮和搞定content这些纯技术性操作性环节往往也会成为时间瓶颈,怪也不怪。 这次试验只有海外twitter和百度贴吧天涯论坛等来源的半年数据,但做出的分析也蛮有意思。 I did a ...个人分类: 社媒挖掘|987 次阅读|没有评论
国人爱说反话:夸奖的背后藏着冷笑,社会媒体尤其如此 作者: 立委 (*) 日期: 09/07/2012 15:42:32 大陆政客属于敏感词,这里不表。以台湾政客为例, 譬如说陈水扁是“中国最清廉的总统”,就明显是反话。 It is interesting to find that many positive comments about A Bian are sarcastic. In thi ...个人分类: 社媒挖掘|892 次阅读|1 个评论
1. The existing data are not very large (400k mentions a year), but the results make sense with decent data quality
2. From geos stats, we know most data on Walmart come from China (dark color) instead of overseas sources
3. From domains stats, the data actually include data from Sina Weibo (weibo.com) and Tencent Weibo (t.qq.com) although the data flow from these two important Microblog sources is not stable at this point. Also the domains stats show that the major domains are all from China. I know that Walmart is a very influential brand in China and has many stores in cities of China.
4. The net sentiment 48% is fairly high, which is reflected in the emotions stats (data quality very good): big green fonts emotional terms include 放心 (piece of mind),喜欢 (like),乐 (happy),支持/推 (support),很好 (very good), 不错(not bad),成功 (success) etc. The negative emotional words (in small red font) are not many, including 差劲 (bad),抱怨 (complain),不喜欢 (dislike),垃圾 (garbage),很一般 (very so-so: meaning not as good as expected).
5. In the proscons word cloud, the likes include money-saving (省钱/便宜)and first-class service(服务一流); more interesting insights come from the dislikes, including (1) fake beef (using fox meat 狐狸肉事件); (2) recall (召回some product?); (3) cheating(欺诈); (4) scandal(丑闻) etc.
6. In order to drill down to see what negative incidents led to the above dislikes, the Walmart_con_sample shows some related sound bites which look like negative news on some incidents: 1st sound bite reports CCTV news on Walmart’s fake alcohol and fake meat (using fox meat) incidents; 2nd sound bite reports using fox meat to fake beef and donkey meat and using chicken to fake beef in the sold burgers at its Sam’s Club; the third sound bite reports three incidents of Walmart at different times and its apologies, including using cheap frozen meat to fake organic green food; using cheap fox meat to fake beef; and its lack of quality control in importing low quality products for sale, having issued 200 permits within 7 years for disqualified products to be on shelf.
7. Note that the above sound bites are selectively collected to show that our system can indeed capture detailed negative incidents of the brand in the media. When I drill down, there are quite some duplicates in our sound bites (one bad news gets re-posted everywhere); another thing is that the negative comments are not mainly from social media users, but from news (state-run news which get posted in social media too).
8. Unlike the overwhelming positive terms in emotions word cloud and the summary, the behavior word cloud shows more or bigger negative behavior terms than the positive terms. This is understandable because of the heavily reported incidents as shown above in the sample sound bites. Eye-catching negative behavior terms include “revealed”(被曝), “take to court”/”being sued”(告上法庭); “closed”(关闭); “have to take off shelf” (下架)etc.
9. From the above negative behavior terms, I drilled down to see more details in the sample sound bites below, which is similar to the sample discussed in 6. These two sound bites both come from negative news of Walmart, which originated from traditional news and got spread all over Internet.
Chinese TV star Bi Fujian caught on tape privately insulting Mao, which triggered a huge political debate in social media between the leftist and the rightist. China is presently stuck between post-Mao era entering modern society with limited speech freedom (at least on private occastions) and the totalitarian government inheriting Mao's legacy, hence the regulatory pressure to the star himself suspending his job for 4 days. Bi's speech would have made him sentenced to death or life in prison in Mao's time.
The pop queen Teresa Teng passed away 20 years ago and her songs remain popular in the Chinese communities all over the world. Social media from Taiwan where she was born, from Mainland China, from Hong Kong, from Singapore, from Japan, from US and other parts of the world are full of all kinds of commemoration of her life and songs in Mandarin Chinese, Cantonese, Japanese and English. See the results of our multilingual text mining for how dearly she has been loved and remembered across generations of Chinese in Asia and around the world.