有问，这一波热潮会不会是类似2000年的又一个巨大的泡沫？我的观察是，也是，也不是。的确，在大数据的市场还不成熟，发展和盈利模式还很不清晰的时候，大家一窝蜂拥上来创业、投资和冒险，其过热的行为模式确实让人联想到世纪之交的互联网 dot com 的泡沫。然而，这次热潮不是泡沫那么简单，里面蕴含了实实在在的内容和价值潜力，我们下面会具体谈到。当然这些潜在价值与市场的消化能力是否匹配，仍是一个巨大的问题。可以预见三五年之后的情景，涅磐的凤凰和死在沙滩上的前浪共同谱写了大数据交响乐的第一乐章。
所谓大数据，更多的是社会媒体火热以后的专指，是已经与施事背景相关联的数据，而不是搜索引擎从开放互联网搜罗来的混杂集合。没有社会媒体及其用户社会网络作为背景，纯粹从量上看，“大数据”早就存在了，它催生了搜索产业。对于搜索引擎，big data 早已不是新的概念，面对互联网的汪洋大海，搜索巨头利用关键词索引（keyword indexing）为亿万用户提供搜索服务已经很多年了。我们每一个网民都是受益者，很难想象一个没有搜索的互联网世界。但那不是如今的 buzz word，如今的大数据与社会媒体密不可分。当然，数据挖掘领域把用户信息和消费习惯的数据结合起来，已经有很多成果和应用。自然语言的大数据可以看作是那个应用的继续，从术语上说就是，text mining （from social media big data）是 data mining 的自然延伸。对于语言技术，NLP 系统需要对语言做结构分析，理解其语义，这样的智能型工作比给关键词建立索引要复杂千万倍，也因此 big data 一直是自然语言技术的一个瓶颈。
在处理海量数据的问题解决以后，查准率和查全率变得相对不重要了。换句话说，即便不是最优秀的系统，只有平平的查准率（譬如70%，抓100个，只有70个抓对了），平平的查全率（譬如30%，三个只能抓到一个），只要可以用于大数据，一样可以做出优秀的实用系统来。其根本原因在于两个因素：一是大数据时代的信息冗余度；二是人类信息消化的有限度。查全率的不足可以用增加所处理的数据量来弥补，这一点比较好理解。既然有价值的信息，有统计意义的信息，不可能是“孤本”，它一定是被许多人以许多不同的说法重复着，那么查全率不高的系统总会抓住它也就没有疑问了。从信息消费者的角度，一个信息被抓住一千次，与被抓住900次，是没有本质区别的，信息还是那个信息，只要准确就成。疑问在一个查准率不理想的系统怎么可以取信于用户呢？如果是70%的系统，100条抓到的信息就有30条是错的，这岂不是鱼龙混杂，让人无法辨别，这样的系统还有什么价值？沿着这个思路，别说70%，就是高达90%的系统也还是错误随处可见，不堪应用。这样的视点忽略了实际的挖掘系统中的信息筛选（sampling）与整合（fusion）的环节，因此夸大了系统的个案错误对最终结果的负面影响。实际上，典型的情景是，面对海量信息源，信息搜索者的几乎任何请求，都会有数不清的潜在答案。由于信息消费者是人，不是神，即便有一个完美无误的理想系统能够把所有结果，不分巨细都提供给他，他也无福消受（所谓 information overload）。因此，一个实用系统必须要做筛选整合，把统计上最有意义的结果呈现出来。这个筛选整合的过程是挖掘的一部分，可以保证最终结果的质量远远高于系统的个案质量。总之，size matters，多了就不一样了。大数据改变了技术应用的条件和生态，大数据 更能将就不完美的引擎。
3 大数据不是决策的唯一依据，只是依据之一。正确的决策必须综合各种信息来源。大事不提，看看笔者购买洗衣机是怎样使用大数据、朋友口碑、实地考察以及种种其他考量的吧。以为有了大数据，就万事大吉，是不切实际的。值得注意的是，即便被认为是真实反映的同一组数据结果也完全可能有不同的解读（interpretations），人们就是在这种解读的争辩中逼近真相。一个好的大数据系统，必须创造条件，便于用户 drill down 去验证或否定一种解读，便于用户通过不同的条件限制及其比较来探究真相。
分享【3】On Big Data NLP热度 1 李维2013-7-27 20:43Admittedly, it is not easy to develop an NLP ( Natural Language Processing ) system with both high precision and high recall (i.e. high F-score) due to the ambiguity and complexity of natural language phenomena. Social media is even more challenging, full of misspellings, irregularities, and …个人分类: 立委科普|766 次阅读|2 个评论
【9】【立委科普：所谓大数据（BIG DATA）】热度 3 李维2013-3-21 04:58Big data is not just data that are big. In the sense of data load, big data has been there for quite a while in Internet, on which the entire search industry was based and developed. The current buzz word big data is different, it is innately associated with users’ background and social …个人分类: 立委科普|1175 次阅读|3 个评论
【10】广而告之：科学网“双百”博主立委四月一日在北京演讲大数据挖掘热度 11 李维2013-3-20 19:57UPDATE：立委愚人节北京讲演时间地点已经确认，感谢中文信息学会孙教授的邀请和安排，也感谢董振东前辈教授的建议和推举： The loacation is : Room 334, 3rd floor, building 5 Institute of Software, Chinese Academy of Sciences, No. Zhongguancun South 4th Street 10:00~12:00 It’ …个人分类: 立委科普|1283 次阅读|13 个评论
分享【11】Coarse-grained vs. fine-grained sentiment extraction李维2013-3-12 06:51As for sentiment extraction itself, there are different layers: 1. sentiment classification: thumbs-up and down (or plus neutral) 2. sentiment association: to associate a sentiment with a topic or brand 3. fine-grained sentiment extraction: for example, who made the sentiment comment? about w …个人分类: 立委科普|671 次阅读|没有评论
Five challenges to keyword-based sentiment classification: (1) domain portability; (2) micro-blogs: sentence/twit classification is a lot tougher than document classification; (3) when big data become small: big data load when sliced and diced based …个人分类: 立委科普|1372 次阅读|1 个评论
【17】【科研笔记：big data NLP, how big is big?】热度 1 李维2012-10-31 19:03Big data 与 云计算一样，成为当今 IT 的时髦词 （buzzword / fashion word ). 随着社会媒体的深入人心以及移动互联网的普及，人手一机，普罗百姓都在随时随地发送消息，发自民间的信息正在微博、微信和各种论坛上遍地开花，big data 呈爆炸性增长。对于信息受体（人、企业、政府等），信息过载（information overlo …个人分类: 立委科普|967 次阅读|1 个评论
Automatic survey complements and/or replaces manual survey. That is the increasingly apparent direction and trend as social media are getting more popular everyday. 自动民调（or 机器民调: Automatic Survey / Machine Survey）指的是利用电脑从语言数据中自动抽取挖掘有关特定话题的民间舆论，其技术 …个人分类: 立委科普|1530 次阅读|3 个评论
分享【23】比起英语，汉语感情更外露还是更炽烈？李维2012-4-28 04:29Chinese is a more sentiment-intensive language than English?? FW: Counts of sentiment words in Chinese and English Interesting finding: that Chinese more than doubles the negative words and more than triples the positive words in comparison with the English vocabulary. This is based on the 5 …个人分类: 立委科普|1158 次阅读|没有评论
【26】《科普随笔：机器八卦》李维2011-10-14 17:09机器八卦：Text Mining and Intelligence Discovery (13219) Posted by: liwei999 Date: June 10, 2006 10:07PM 犀角提议，干脆用机器挖掘吧。我不想吓唬大家，但是，理论上说，除非你不冒泡，言多必失，机器八卦，比人工挖掘，可能揭示出你的更多特征。好在该技术还不成熟。 Text mining 是我这 …个人分类: 立委科普|863 次阅读|没有评论
【27】言多必露，文本挖掘可以揭示背景信息热度 1 李维2011-7-11 01:03言多必露，挖掘有商用价值的背景信息 文本挖掘（text mining）中，Demographic Profile Extraction 的任务是要给网虫自动分类，揭示其背景信息（年龄，性别，身份，族裔，人生阶段，家庭背景等）。 一些简单的规则，查准率高（high precision），查全率并不高(moderate recall)，譬如： I am X — X (student, t …个人分类: 立委科普|939 次阅读
分享【43】只认数据不认人：IRT 的鼓噪左右美国民情了么？热度 3 李维2013-12-30 06:27套用北韩最近流行的歌颂红太阳金正恩的红歌，数据，数据，《除了它我们谁也不认！》 当然，还有上帝： In God We Trust. In everyone else we need data. 大数据时代更是如此，只认数据不认人。道理很简单，在信息爆炸的时代，任何个人的精力、能力和阅历都是有限的，所看到听到的都是冰山一角。小崔如此，其他大V也 …个人分类: 社媒挖掘|918 次阅读|10 个评论
分享【48】Social media mining: Teens and Issues李维2013-9-9 21:36As is well known, the teenager years are a special and important period of growth for children, or young adults, to be more precise. It is growing pain, mixed with joy. It is often a rebellious phase when both parents and teens find it difficult to communicate with each other. Thi …个人分类: 社媒挖掘|542 次阅读|没有评论
分享【49】【微博自动民调：薄熙来、薛蛮子和李天一】热度 2 李维2013-8-30 09:33Automatic Survey from the last month of Sina Weibo (Chinese twitter, the most influential social media Microblog site) on three major characters: the former Chinese politician Bo Xilai in his on-going trial, the very famous social media figure Charles Xue who is said to have millions of fans and w …个人分类: 社媒挖掘|898 次阅读|2 个评论
分享【54】【自动民调：美国名牌大学人气排名】热度 1 李维2013-8-12 16:46For the first time, the automatic survey of social media 1-year archive on some US brand name universities shows the rankings as follows, which are quite different from official ranking (Harvard and Caltech accidentally not included): 1. UCSD; 2.Chicago; 3. UPenn; 4. Carnegie Mellon …个人分类: 社媒挖掘|794 次阅读|1 个评论
分享【57】舆情挖掘用于股市房市预测靠谱么？热度 1 李维2013-4-18 21:24Can social media sentiment mining be used for predicting stock/property market? I tried our Chinese system for that and it proved to be right. Is that pure luck or there is some value in using public opinions and sentiments to assist prediction of markets? 作为技术展示，曾经用中文社交媒体的舆 …个人分类: 社媒挖掘|605 次阅读|1 个评论
Maytag Maxima 4.3 cu. ft. High-Efficiency Front Load Washer with Steam in Granite, ENERGY STAR Model # MHW7000XG 989.10/EA−EachWAS989.10/EA−EachWAS1,399.0 0 LG Electronics 4.0 cu.ft. High-Efficiency Front Load Washer in Graphite Steel, ENE …个人分类: 社媒挖掘|943 次阅读|2 个评论
分享【63】《大数据时代的购物策略：洗衣机寻购记（2）》热度 3 李维2013-2-25 22:41洗衣机的选择：top loading 抑或 front loading？ 作者: 立委 日期: 02/24/2013 23:35:39 本来我们是要放弃 front loading （镜先生考证，国内叫滚筒式）洗衣机，去选更容易清洁的 top loading （国内称作 波轮式 ）的。可是如今大数据了，领导还是要看看二者的优劣，听听用户都怎么选择的。 于是挖掘 …个人分类: 社媒挖掘|1067 次阅读|4 个评论
分享【64】《大数据时代的购物策略：洗衣机寻购记（1）》热度 8 李维2013-2-25 21:07ABSTRACT Brand Passion Index (BPI) is used to help us make an informed decision in our on-going purchase of a new washer. Using our own product, we generated two BPIs, one to compare the major washer brands in the US market and the other to compare front loading vs. top loading. With …个人分类: 社媒挖掘|1996 次阅读|10 个评论
【Brand Passion Index 3: international fast food brands in China market face challenges】 Chinese Social Media Mining: Brand Passion Index for international fast food brands McDonald’s, Pizza Hut, KFC and Yoshinoya in China. Fairly negative. The golden time when McDonald’s …个人分类: 社媒挖掘|1858 次阅读|9 个评论
Chinese mobile phone market is found to be still in the stage of multiple vendors competing with each other with no single one clearly ahead of others. Even Apple iPhone is on a par, in terms of net sentiments and consumer passion, with HTC, Samsung, Nokia and Chinese brand Xiaomi d …个人分类: 社媒挖掘|810 次阅读|1 个评论
RE: What do these tell us more than we’ve known already? very good question: however, if it is known info, it confirms its validity 日期: 01/01/2013 11:11:49 it builds the users’ (and developers’) confidence in the automatic summerization of the computer processing of t …个人分类: 社媒挖掘|644 次阅读|没有评论
Most every hot topic coming to my mind these days, I will check our social media system to see how social media reflects it. Word clouds are intriguing vehicles to present the common social image. Most word clouds generated by other systems are based on statistics of keywords mentioned …个人分类: 社媒挖掘|804 次阅读|1 个评论
Obama won the debate, see our evidence 民调自动化，技术带领你自动检测舆情: 社会媒体twitter的自动检测表明，奥巴马显然赢了昨晚的第二次辩论。人 气曲线表明他几乎在所有议题上领先罗梅尼。 对奥巴马真正具有挑战性的议题有二：一是他在第一任总统期间的经 济表现（6:55pm）；二是批判他对中国不够强硬 …个人分类: 社媒挖掘|1209 次阅读|1 个评论
分享【99】社会媒体舆情自动分析：马英九 vs 陈水扁李维2012-9-29 16:51Different social images and social media sentiments for Ma Yingjiu, Taiwan President, and Chen Shuibian, Taiwan former president. 不同的社会媒体评价，截然不同的民间形象，台湾现总统马英九 vs 台湾前总统陈水扁，社会媒体自动分析的初步结果凸显二者的不同形象和风格。 (1) 高频情绪性词的词频分析的对 …个人分类: 社媒挖掘|830 次阅读|没有评论
分享【101】舆情自动分析表明，谷歌的社会评价度高出百度一倍李维2012-9-8 20:32拖了这么久，中文系统的初步试验终于开始 日期: 09/06/2012 21:04:35 本来核心系统的开发最难，最耗时间 ，结果在真实生活中，工程架构、存贮和搞定content这些纯技术性操作性环节往往也会成为时间瓶颈，怪也不怪。 这次试验只有海外twitter和百度贴吧天涯论坛等来源的半年数据，但做出的分析也蛮有意思。 I did a …个人分类: 社媒挖掘|987 次阅读|没有评论
国人爱说反话：夸奖的背后藏着冷笑，社会媒体尤其如此 作者: 立委 (*) 日期: 09/07/2012 15:42:32 大陆政客属于敏感词，这里不表。以台湾政客为例， 譬如说陈水扁是“中国最清廉的总统”，就明显是反话。 It is interesting to find that many positive comments about A Bian are sarcastic. In thi …个人分类: 社媒挖掘|892 次阅读|1 个评论
1. The existing data are not very large (400k mentions a year), but the results make sense with decent data quality
2. From geos stats, we know most data on Walmart come from China (dark color) instead of overseas sources
3. From domains stats, the data actually include data from Sina Weibo (weibo.com) and Tencent Weibo (t.qq.com) although the data flow from these two important Microblog sources is not stable at this point. Also the domains stats show that the major domains are all from China. I know that Walmart is a very influential brand in China and has many stores in cities of China.
4. The net sentiment 48% is fairly high, which is reflected in the emotions stats （data quality very good）: big green fonts emotional terms include 放心 (piece of mind)，喜欢 (like)，乐 (happy)，支持/推 (support)，很好 (very good), 不错(not bad)，成功 (success) etc. The negative emotional words (in small red font) are not many, including 差劲 (bad)，抱怨 (complain)，不喜欢 (dislike)，垃圾 (garbage)，很一般 (very so-so: meaning not as good as expected).
5. In the proscons word cloud, the likes include money-saving （省钱/便宜）and first-class service（服务一流）; more interesting insights come from the dislikes, including (1) fake beef (using fox meat 狐狸肉事件); (2) recall (召回some product?); (3) cheating（欺诈）; (4) scandal（丑闻） etc.
6. In order to drill down to see what negative incidents led to the above dislikes, the Walmart_con_sample shows some related sound bites which look like negative news on some incidents: 1st sound bite reports CCTV news on Walmart’s fake alcohol and fake meat (using fox meat) incidents; 2nd sound bite reports using fox meat to fake beef and donkey meat and using chicken to fake beef in the sold burgers at its Sam’s Club; the third sound bite reports three incidents of Walmart at different times and its apologies, including using cheap frozen meat to fake organic green food; using cheap fox meat to fake beef; and its lack of quality control in importing low quality products for sale, having issued 200 permits within 7 years for disqualified products to be on shelf.
7. Note that the above sound bites are selectively collected to show that our system can indeed capture detailed negative incidents of the brand in the media. When I drill down, there are quite some duplicates in our sound bites (one bad news gets re-posted everywhere); another thing is that the negative comments are not mainly from social media users, but from news (state-run news which get posted in social media too).
8. Unlike the overwhelming positive terms in emotions word cloud and the summary, the behavior word cloud shows more or bigger negative behavior terms than the positive terms. This is understandable because of the heavily reported incidents as shown above in the sample sound bites. Eye-catching negative behavior terms include “revealed”（被曝）, “take to court”/”being sued”（告上法庭）; “closed”（关闭）; “have to take off shelf” （下架）etc.
9. From the above negative behavior terms, I drilled down to see more details in the sample sound bites below, which is similar to the sample discussed in 6. These two sound bites both come from negative news of Walmart, which originated from traditional news and got spread all over Internet.
Chinese TV star Bi Fujian caught on tape privately insulting Mao, which triggered a huge political debate in social media between the leftist and the rightist. China is presently stuck between post-Mao era entering modern society with limited speech freedom (at least on private occastions) and the totalitarian government inheriting Mao’s legacy, hence the regulatory pressure to the star himself suspending his job for 4 days. Bi’s speech would have made him sentenced to death or life in prison in Mao’s time.
The pop queen Teresa Teng passed away 20 years ago and her songs remain popular in the Chinese communities all over the world. Social media from Taiwan where she was born, from Mainland China, from Hong Kong, from Singapore, from Japan, from US and other parts of the world are full of all kinds of commemoration of her life and songs in Mandarin Chinese, Cantonese, Japanese and English. See the results of our multilingual text mining for how dearly she has been loved and remembered across generations of Chinese in Asia and around the world.
I showed the First Lady’s news pictures to my daughter. She was so intrigued, “Dad, Mom told me that you used to teach First Lady many years ago, is that true?” “It is true, but that was only a short time, one or two semesters, and it was not her major course. As a part-time lecturer, I was teaching Advanced English to graduate students in the music conservatory and she happened to be one in my class. She was already famous then as a new star for folk songs.” Tanya got excited, “Well, you never know, maybe her English training in graduate school helps her in state visits today. My Dad is cool.” She continued, “Dad, Mom also told me that you were interpreter for foreign minister when she dated you, is that true?” “Well, that was largely an accident, only happened once when I substituted some professor to act as interpreter for the former foreign minister and former Chinese congress vice-chairman Mr. Huang Hua. Your Mom agreed to date me partially because of her seeing a picture of me interporeting for the VIP Mr. Huang. So I guess I benefited from that ‘accident’.” Tanya was amused and felt very proud, “I have the coolest Dad in the world. He was so successful even when he was young, teaching future first lady and interpreting for the then foreign minister. Wow.”
The personal story aside, Chinese social media are never short of coverage and fans of Chinese First Lady Mrs Peng Liyuan in the last few years. For too long China watched the western media covering first ladies in the US and other countries without being able to brag about its own. Since Mrs. Peng went on the spotlight and accompanied Chinese President Xi Jinping on world trips, the Chinese netters have been overjoyed to follow her all the way with compliments and amazement in her gracefulness. Mrs. Peng has been a star in the Chinese music industry for decades and knows how to present herself in the public. A more recent story came from APEC last year when the Russian president Putin was seen to stand up, gracefully placing a blanket around the shoulders of Chinese First Lady, too gentleman an act that triggered waves of online comments.
Using our own text mining tool, we collected one year Chinese social media data to see what the public image looks like for the First Lady. Overwhelming praises and admiration, on her grace, intelligence and personality, with almost no negative comments. The only eye-catching criticism that was uncovered involves early days of Peng Liyuan “wearing fat army trousers (穿肥大的军裤)”, which seems not to be something that agrees with first lady’s image in people’s mind. (It turned out that this was a story about the First Lady’s dating the president long ago when she wanted to test the present if he was only attracted to her appearance by wearing not as nice on purpose. The story got spread all over the net.) But look at the Photo News today, First Lady is now leading the fashion trend of China.
I showed the First Lady’s news pictures to my daughter. Tanya was so intrigued, “Dad, Mom told me that you used to teach First Lady many years ago, is that true?” “It is true, but that was only a short time, one or two semesters, and it was not her major subject. As a part-time lecturer, I was teaching Advanced English to graduate students in the music conservatory and she happened to be one in my class. She was already famous then as a new star for folk songs.” Tanya got excited, “Well, you never know, maybe her English training in graduate school helps her in state visits today. My Dad is cool.” She continued, “Dad, Mom also told me that you were interpreter for foreign minister when she dated you, is that true?” “Well, that was largely an accident, only happened once when I substituted some professor to act as interpreter for the former foreign minister and former Chinese congresss vice-chairman Mr. Huang Hua. Your Mom agreed to date me partially because of her seeing a picture of me interporeting for Mr. Huang. So I guess I benefited from that ‘accident’.” Tanya was amused and felt very proud, “I have the coolest Dad in the world. He was so successful even when he was young, teaching future first lady and interpreting for the then foreign minister. Wow”
不过最近克林顿的选情是原地踏步，并没有明显进展。比较克林顿的三个圈可知，最淡的圈是过去30天的前10天，明显落后于川普，后两个圈是最近20天，基本原地，只是圈子变大了，说明竞选的投入和力度加大了，但效益并不明显。而从川普方面的三个圈圈看趋势，这老头儿实际的总体趋势是下跌，过去三十天，中间的十天舆情有改观，但最近的十天又倒回去了，虽然热议度有增长。（MD，这个分析没法细做，越做越惊心动魄，很难保持平和的心态，可咱是 data scientist 啊。朋友说，“就是要挖点惊心动魄的”，真心唯恐天下不乱啊。）看看川普的30天社煤的褒贬云图（Word Cloud for pros and cons）和情绪云图（Word Cloud for emotions）吧：
“But the entrepreneur admitted that there were limitations to the data in that sentiment around social media posts is difficult for the system to analyze. Just because somebody engages with a Trump tweet, it doesn’t mean that they support him. Also there are currently more people on social media than there were in the three previous presidential elections.“
haha，同行是冤家，他的AI能比我自然语言deep parsing支持的 I 吗？从文中看，他着重 engagement，这玩意儿的本质就是话题性、热议度吧。早就说了，川普是话题大王，热议度绝对领先。（就跟冰冰一样，话题女王最后在舆情上还是败给了舆情青睐的圆圆，不是？）不是码农相轻，他这个很大程度上是博眼球，大家都说川普要输，我偏说他必赢。两周后即便错了，这个名已经传出去了。川普团队也会不遗余力帮助宣传转发这个。
现在的纠结是，【大数据告诉我们，希拉里选情告急】，到底发还是不发？为了党派利益和反川立场，不能发。长老川志气，灭吾党威风。为了 data scientist 的职业精神，应该发。一切从数据和事实出发，是信息时代之基。中和的办法是，先发一篇批驳那篇流传甚广的所谓印度AI公司预测川普要赢，因为那一篇的调查区间与我此前做的调查区间基本相同，那是希拉里选情最好的一个月，他们居然根据 engagement alone 大嘴巴预测川普的胜选，根本就没有深度数据的精神，就是赌一把而已。也许等批完了伪AI，宣扬了真NLU，然后再发这篇 【大数据告诉我们，希拉里选情告急】
FBI director 说这次重启调查，需要很长时间才能厘清。现在只是有了新线索需要重启，不能说明希拉里有罪无罪。没有结论前，先弄得满城风雨，客观上就是给选情带来变数。虽然在 prove 有罪前，都应该假定无罪，但是只要有风声，人就不可能不受影响。所以说这个时间点是最关键的。如果这次重启调查另有黑箱，就更惊心动魄了。如果不是有背后的黑箱和势力，这个时间点的电邮门爆炸纯属与新线索的发现巧合，那就是希拉里的运气不佳，命无天子之福。一辈子强性格，卧薪尝胆，忍辱负重，功亏一篑，无功而返，保不准还有牢狱之灾。可以预测，大选失败就是她急剧衰老的开始。
爱因斯坦把时空统一成了 SPACE TIME 的四维模型，说明了时空不可分割、时空相互作用，甚至时空弯曲等等，但实际上还是给了时间一个特殊的位置。（其他物理学家的确有人要取消这个特殊位置，认定时间与空间维度一样可以倒流。）后来的弦论（String Theory）主张10+1维度的宇宙模型，也是把时间打入另类。后加的7个维度总是在空间的延长线上做文章，visualize 为一个微观世界中增加的维度。反正微观看不见，在想象中可以任意增加，只要能凑出一个大一统理论（theory of everything）就行。
Chang uses the video to clarify the difference between TACC and Autopilot. TACC controlled the vehicle’s speed solely and did not have any warning apart from an icon on the central infotainment screen. Activating it demanded just a tap down on the right lever. Autopilot controls the steering and the speed and demands two taps on the same lever.
The Chinese government demanded Tesla to fix the operation of TACC because it could have a higher speed than the one in which the car was going. If the driver tapped down the right lever by accident—such as in a sharp turn—the vehicle could accelerate and lead to a “misjudgment of vehicle control” and crashes.
可以想象的解决办法是允许用户做基于地理定位的针对性配置，譬如根据GPS记忆某处的处理方式需要与模型不同。主要是老马压得太紧，很多时候是现在顾不上这些，不是不能实现。例如，高档特斯拉的空气 suspension 系统就可以根据GPS记忆实现用户指定的可调控的高度，增加驾驶舒适度，但 Model 3 和 Y 没有空气悬挂，所以开车很硬，颠簸厉害。
说到特斯拉小三和Y的短板，颠簸绝对算一条。这有个故事。我家特斯拉起名叫 big white （大白）。原来是刚开始买的基本型特斯拉模型3 起名叫 Xiaobai（小白），开了两天觉得颠簸，三天内无条件退换，就去换了辆双马达长续航。其实长得完全一样，尺寸不变，但加了几千块钱，于是叫大白。其实也依然颠簸，开车很硬。好路没感觉，遇到路况不好（加州公路常常失修，路段质量无法恭维），想起来就跟当年学手扶拖拉机驾驶时候的感觉一样。这毛病没治，特斯拉只有豪华版车型 S 和 X 才有空气悬挂，开起来才舒服。但人很奇怪 有心理因素 加了几千块钱 就似乎觉得不那么颠簸了。马斯克本人对此心知肚明，建议顾客不要把轮胎的气放一放，说这样就不会那么硬，瞧，这出的都是什么臭主意：轮胎气不足不仅耗电，降低续航，还有很多其他问题甚至危险。老马曾经一度说要给小三添加空气悬挂，结果回去一合计觉得成本压力太大，自食其言。
A Taxonomy of AV Myths
Myth: AV software is a singular “AI” that simply learns to drive over time.
Myth: Creating an AV is just a matter of collecting a large amount of data and putting it into a neural net.
Myth: The company with the most data is necessarily in the lead.
Myth: It’s possible to enumerate every situation an AV will ever encounter. Creating an AV is then just a matter of experiencing each possible situation one time and adding it to the data set.
Myth: An AV can only use its sensors, or its map. It cannot use both at the same time. If the sensor input disagrees with the map, it is an irreconcilable problem and the AV can’t work anymore.
Myth: Mapping is extremely expensive (billions of dollars) and/or time-consuming (years).
Myth: If an AV uses a map, then it cannot handle construction zones or other changes.
Myth: If an AV uses a map, then it cannot ever operate in unmapped areas.
Myth: If an AV uses a map, then it’s just a tram running on virtual rails.
Myth: Roads are designed for vision, so other sensor modalities like lidar and radar are useless.
Myth: Lidar uses a huge amount of power.
Myth: Lidar is so power-hungry that it can’t be used on a battery electric vehicle.
Myth: Computer vision is as good as lidar, so lidar is useless.
Myth: Humans drive just fine with two eyeballs, so other sensor modalities (radar and lidar) are useless.
Myth: Once your AV sort of works for a few miles at a time, it’s an easy process to improve it to superhuman reliability. It’s just the March of Nines, which requires nothing but time, or more data.
Myth: If an AV completes a trip without an intervention or disengagement, then it was L4 (or L5) for that trip.
Myth: It is possible for individuals to observe an AV system over the course of their typical personal driving needs and declare the system universally safe.
Myth: The trolley problem is of fundamental importance to the design of AVs.
Myth: AVs can’t work unless we put sensors or beacons in all the roads, and/or make all the cars talk to each other wirelessly.
Myth: Simulations are useless.
Myth: It’s all just a matter of finishing the software.
Myth: Regulations are the only thing holding back AVs.
Myth: Humans have to be sacrificed today in order for an AV system to improve, so it will eventually save lives.
Myth: This is a taxonomy, and not just a litany.