【李白董冯吕64:NLPers 谈 NLP 渊源及其落地】

董:
冯老师,姜博士,李维,白硕,宋柔老师,这个系统正式上线前,想先请各位看看,横挑鼻子竖挑眼。这个系统是去年6月开始开发的。时间短。最近几个月更是忙得厉害

李:
刚发朋友圈了。“中国nlp老前辈董老师的知网支持的平台 值得关注 推荐。世界上自然语言理解的深度 董老师是最深的了。逻辑语义的开创者。三十多年的智慧和知识积累 董振东老师是 让我辈高山仰止的语义巨人(见 科学网《语义三巨人)。【语知科技】多语种NLP平台正式上线。  demo.keenage.com

冯:
董老师,语义理解,还是要依靠规则。深度学习不行!

李:

Manning 教授昨天座谈时说 最近三年是他一辈子做nlp感觉进步最大的三年 他主要指的是深度学习。曼宁是一位一直强调语言结构和理解的老教授 NLP最知名的权威了。他的感受应该是真切的 不过来不及细问他 这种感受多大程度上是基于深度学习在语音处理以及mt方面的突破性进展,文本方面其实目前很难说深度学习引发了革命。不过 word embedding 还有什么 adversary 学习方面 开始在词汇语义级发力 有些结果令人印象深刻。parsing 要等到深度学习能把 parsing 落地为应用 才值得侧目以待 目前不行。曼宁教授还是很学究 甚至有些腼腆的气质 现在火得不得了 也是时势使然。ai 一热 nlp 就热。病急多投医,nlp各路 也跟着提升了在ai中的地位,I guess。

董:
@李,你说的让我脸红了。我在研究上是个工匠,做学问认死理。我研究语义,是叫当年的机器翻译研究逼出来的。什么是理解?什么是常识?什么是知识库?人的知识是如何建构的?我后来的感觉人是用“少”,而非用“多”来计算语义的。你常说起的Cyc,是“多”的典型。语义的关键是“关系”,而分类只是语义关系的一种。近年我们开发了基于知网的翻译,最近一年开发了中文分析,从学术的观点看,是为了考验知网,是为了给自己30年前的设想做个交代。这个交代既是给自己的,也是给别人的。告诉人们哪些努力是值得的,哪些努力是仍然无法圆满成功的。

李:
董老师退而不休 能够完成心愿 给我们留下的这笔知识财富 我们需要时间咀嚼 消化 但我坚信这种影响是深远的。潮起潮落 有些东西是不变的。语义及其语义研究的一些方法 具有相对恒定的价值 好比金子 总会发光 这个没有疑问。ai 中真正懂 nlp 特别是 nlu 的人不多,好多声称nlp的专家 只是做过自然语言的某个端对端的应用 专家做 text NLP与他做Visual 做 audio 做生物DNA 是一样的路数和算法,不过是数据不同而已。没有啥语言学。

白:
佩服董老师的执着和务实。如果说哪里还差点什么,就是对于“多个爹”的刻画机制问题。

董:
白硕说的对。“多个爹”是我们想做的新的探讨。两个问题我们遇到的:一是“是不是个“爹””,如何在文句中正确的确定那个“爹”。

冯:
有的作自然语言理解的人不关心语言学。词向量效果不错,理论机制不清楚。

李:
戏不够 词来凑。语言的分析理解主要有两个支柱,一个是词汇 吕先生称为珍珠;一个是结构 称为串子。传统说法是没有串子 做不了语言理解的项链。

白:
@冯志伟 词向量是保运算的降维,数学上是清楚的,但是跟理解搭不上钩。

李:
所以我们这些擅长 deep parsing 的人就开始看不起只看到珍珠的人,但是 其实二者的 overlapping,从完成nlp任务的角度 作为两种证据源 其实是相当地大。因此善于把 词模型 词向量模型用到极致的人,有时候的确产生了让我们意想不到的结果。

冯:
珍珠和串子是缺一不可的!

白:
如果每颗珍珠都有插销插座,确实可以不用串子。或者说,寓串子于珍珠之中。

李:
我的意思是 我们多少有些老古董了。虽然可以继续执着下去 也的确可看到一些我们擅长的nlu工作 词模型似乎根本就没有可能赶上来。但还是应该保持一种 open 心态。

白:
开开脑洞还是必要的。
词负载结构是好东西。

冯:
我们要关注词向量的成果,更要问一个为什么?

李:
对 应该探究这种表达的背后。

白:
设想回到老乔刚冒泡的年代,那时的语言学家怎么看老乔?

李:
其实我最早读乔姆斯基的转换生成语法的介绍的时候,非常看不起,觉得就是儿戏。那还是在 1982 年备考语言研究所刘老师的机器翻译研究生的时候不得不临时抱佛脚,了解一些计算语言学的基本概念。此前我做过英语教学多年(包括插队做民办教师),从中学做学生的时代就教别人的英语句法分析,主动语态被动语态等转换烂熟于心。觉得老乔演示的那些转换生成案例,太低级 太常识 太机械 乏味,太不具有神秘感。直到后来学形式语言理论 才生出崇拜感。

白:
说实话,如果知识结构跟不上老乔,恐怕根本没有跟老乔PK过招的可能。

李:
人岁数大以后的一个好处是,可以直抒己见,没有啥顾忌。我其实在读博士做汉语形式研究的时候,就对乔老爷的句法独立句法自足论极为不满,就在 seminar 指出乔老爷错了,离开语义,汉语句法形式分析搞不定。被我的导师劈头盖脸一顿轻蔑,大意是:你不知道天高地厚,一边去。被哑口很多年过去,我还是发扬光大了当年的批判。

冯:
我在57年读到乔的三个模型就开始崇拜他了!

李:
是啊,一个理论把自然语言与电脑语言形式上统一起来,使得电脑语言编译越来越像语言学,这种抽象让人震撼。震撼过后的很多年 就是对乔老爷的不断扬弃和批判,批判的主要点还是源于他的抽象:是赞也抽象,批也抽象。的确 他把电脑语言带进了语言学,居功至伟 。同时他也把自然语言带进了坑里面,误导了太多的人 整整一代人。(详见:《立委:乔姆斯基批判》 )。

白:
@wei 你那不算是语言学的批判,只是工程视角的批判。他是语言学家,但从未承诺过NLP什么。NLP掉坑里也是自作多情

李:
我们下意识还是 认定他应该引领nlp 和 cl,结果是他越走越远 越走越邪门,nlp 已然与他无关了。

白:
他不管工程,不管技术,只管数学。

李:
结构分析中的叠床架屋 使得过程中夹杂了太多的 assumptions,看上去高度抽象 追求共性 实际上是越来越像空中楼阁。当然 我肯定戴了有色眼镜,做了一辈子nlp 对纯语言学很难批评得中肯 只是一种感觉而已。老乔的语言学 对于绝大多数NLP践行者包括在下,都是供在菩萨庙里面的 只膜拜 不 follow。

白:
如果从工程角度批判,估计人家看都不看—-关我什么事。

李:
老乔的思维高度自然不看 也不用看 这种批判。但是老乔下面的语言学家我认识很多,我就是这堆人里面混出来的,从他们身上我能感觉到他误导的后果。这些人很多时候就是在老乔的框架里面 自己跟自己玩游戏 没有理论创新 只好在语言数据上玩游戏,而且是一点都不感觉高明的游戏。说的是一批 或一大批语言学家。(也有一些绝顶聪明的纯语言学家让我叹服的,为数极少。)

洪:
做计算机编译的,没人认为老乔误导 Knuth和老乔貌是惺惺相惜。

吕:
赞@wei , 很多看法深有同感

李:
编译的理论基础 编译的祖师爷,电脑界理应崇拜 给10个图灵奖也不过分。当然 乔老爷哪里在意什么图灵奖。@吕正东 有机会咱俩坐下来谈。你最近的大作(见 独家|专访深度好奇创始人吕正东:通向理解之路)中我最不满意的就是一句话:说什么 符号逻辑规则路线没有成功的(大意,查原文是:“这三点都导致至今没有成功的规则系统”)。我得让你见识一下符号系统,没有深度学习的任何一家目前可以做到这个NLU的,无论深度 广度 速度 鲁棒 迁移度 可行性 还是其他指标 (It is untrue that Google SyntaxNet is the “world’s most accurate parser)。

吕:
@wei 惭愧,改日一定当面请教。

白:
说这些其实是在以史为镜。今天语言学界看DL、看词向量的心态,跟当初老语言学界看老乔的心态,有没有几分相似?

吕:
我的意思是说规则系统很难做到我所期望的NLU,不是说现在最好的规则系统弱于DL的系统。当然我对规则系统确实了解不够(现在正在补课),不免贻笑方家。

李:
不知道你去期望是什么。如果期望是现实的,很可能已经接近你的期望,如果期望是科幻,不谈。开玩笑了。王婆卖瓜而已。

吕:
我那篇访谈其实更多的是反对generic DL system 搞定一切的天真想法…

董:
@吕正东 你所期望的NLU,能否举个例子。

白:
里面有些模块可以是神经的,这有啥。

李:
所以我说我其实只有一点不满。你的访谈很好,

吕:
当然是现实的… 我们有现实的语义理解的项目

李:
@白硕 前乔姆斯基时代的老语言学界,陷入了田野工作的泥坑,是老乔把他们带出来的,革命了这个领域。纯粹的田野工作的确也是没大意思 比码农好不到哪里去。

吕:
@董振东 董老师,比如从一个偏口语的对事件的描述中得到对该事件的(“法律相关”)事实的表示…. , 当然这个定义是不那么严谨的

白:
其中一些方法,包括《降临》主角跟外星人沟通并试图破译其语言的一些方法,其实和主动机器学习很像了。

李:
一辈子也常陷入事务主义 没完没了地田野作业 自得其乐,但好在自我感觉好像心里还有某种哲学的俯视。有如神授:在田野工作的间歇 在某个高远的所在 有指引着道路。我是 语言工程师 knowledge engineer 的一员 毫无疑问。而且90%的时间都是。但是一辈子感觉这些田野作业的乐趣的本源却不在田野,而是在于架构。所以自我定义为架构师是最感觉自豪和 job satisfaction 的所在,否则与一头驴有啥区别。

白:
就是说,理想的田野工作一定是遵循某种算法的。而且算法不仅包括学习,还包括主动采样。

李:
所以在自我兜售的时候,强调 hands on 的田野作业,只是不想让人觉得飘在上面。但实际上卖的还是哲学。

Nick:
@wei 你就是自作多情

李:
我就自作多情 怎么着,你一边去 给冰冰多情去。@Nick  还想垄断哲学,搞什么哲学评书,不许我们搞哲学。王老五的桌子里面还有哲学呢,何况我辈语言学家。

董:
《福州晚报》7月15日报道,针对日前在日本横滨被证实遭杀害的福建姐妹一事,记者了解到,两姐妹均为福清江镜镇文房村人。
该报记者采访了该对姐妹花的父亲陈先生,陈先生回顾了得知姐妹被杀害的过程,并称女儿对父亲说的最后一句话是“谢谢爸爸”。

这一段事件,nlu 是什么呢?

吕:
@董振东 好难… , 实际上我们关注的是更加“冷冰冰”的事实,但即使这样也很难

白:
这里最大的问题,就是产品经理。

李:
同意,应用场景和应用角度 做技术的人很难看准。

白:
nlu是一层,但不构成核心服务。核心服务是另外的东西,让你贴近客户的东西。
相对称呼对身份一致性形成干扰,但相对称呼的谜一解开,倒也不是很难。姐妹花、姐妹,语境里的意思都是互为姐妹。

董:
我一直困惑:什么叫“我懂了”,“我明白了”。我觉得是高度抽象的关系。

李:
董老师30年前的论文(董振东:逻辑语义及其在机译中的应用)不是一再强调,所谓我懂了这句,核心就是懂了这句的逻辑语义吗?董老师的这个“理解”的教导,是一辈子遵循的指针。

白:
这得举例子吧……几何题的证明思路也可以“我懂了”“我明白了”,确定那也跟nlu相关?

董:
“姐妹花”,做为一个词语,可能合适。因为它就是“姐妹”,而且不见能产,如“母女花”

白:
“母女花”输入法里都有

李:
婆媳花 可能不在,但可以想见。

白:
我刚才意思是说,懂,明白,具有比nlu更宽泛的外延,nlu里面说的懂、明白,要窄、狭义。

董:
这样就可以依靠大数据了。对吧?
比“懂”、“明白”,要窄、狭义,那是什么呢?我如何在系统里体现呢?

白:
我们先说逻辑语义包括什么。我的观点:一包括symbol grounding,二包括role assignment。这两个搞定了,就是nlu的u。最狭窄了。茅塞顿开什么的,那种“懂”,跟nlu毛关系没有。可以说不在讨论范围内。在系统里体现,如果是role assignment,其实很好办,就是知网啊。如果是symbol grounding,那就要看系统的对接能力了。对接电话本、位置、天气、颜色、声音、实体知识库,都属于symbol grounding。

吕:
怒赞白老师1024次

白:
对接网页,往好里说属于兜底,往坏里说属于耍赖。一看见对接网页,我基本上可以判断系统黔驴技穷了。这都是在系统里能体现的,不知道入不入董老师法眼。

董:
是的,说得明白。我们的中文分析归根结底追求的就是你说的这两件东西。用逻辑语义和深层逻辑语义(多个爹),表示你说的role assignment,用ID No来落实实体知识库的symbol,即概念。所以判别歧义是不得不做的事情。

白:
我现在的方法处理“多个爹”已经成体系了。直接在句法分析阶段就能拿到“多个爹”的结构。

李:
Node to concept,Arc to logic semantics。很多时候 词到概念可以不做,wsd 绕过去,到了应用场景 再定 哪些词需要落地 其实多数根本就不用落地。

白:
这是parser提供商的思路。但是这思路在商业上有问题。不现实。比如,阿里的parser,为啥给京东用来落地?

李:
商业上就是 落地也做 当成 Professional services,量身定制,利用 parsing 的结构优势。 parser 不卖,内部消化。

白:
内部消化的本质还是深耕行业,而不是什么通用性。当你定位为深耕行业者,恭喜你做对了,但是牺牲通用性是板上钉钉的。

李:
卖components或平台基本没有做大的,还不如卖服务 做承包商。

白:
卖服务我判断也是不成立的。场景不长在你手里,实体数据库落不下来,图啥?跟通用系统如董老师的系统,根本不存在可比性。

李:
其实 目前为止 卖工具 卖服务 都没戏。实际上nlp还是寄生在产品应用。技术人的命运掌握在产品老总手里。赶巧遇到好产品 就一起飞。其次不死不活 这算好的。更多是陪葬。

白:
这不叫寄生,应该叫赋能 enabling

李:
一个牛的技术 可以降低陪葬 增加不死不活的可能  譬如我过去的二十年,但是无法让产品飞起来。

白:
除了产品经理,还有nlp之外的其他技术,也不是吃素的。到底谁贡献最关键,有得扯

李:
就是啊 使不上力 只好拼运气 看根誰搭档了。

白:
深耕行业定位下的nlper要时刻警觉的三点:1、你已有局限,不要和学术界比通用;2、你只是一个方面的enabler,服从、配合产品designer是天职;3、从产品全面看,其他方面的enabler或许贡献更大,nlp不见得一定是这个场景下最具杀手性质的技术成分,心理该平衡还是要平衡。

李:
让不让人活啊。
白老师所说极是。

吕:
谨记白老师教诲

刘:
赞 @白硕 @wei @吕正东 白老师关于nlp应用要语义落地的说法我深以为然。现在我的一些工作的motivation就是把基于NN的一些NLP的工作跟落地的语义结合起来,我希望能在这方面做一些通用性的工作,而不是仅仅局限于具体的应用。这应该是一条很长的路,有很多事情可做。

梁:
@wei nlp 不是“寄生”于产品,是“add value”, 正面去说。在最终产品的增值链上,nlp 只是其中一环。Me too. 谨记白老师教诲。

李:
寄生是负面说法 赋能是正面激励。

张:
主动学习、主动适应、主动釆样~~~白硕老师的“主动学说”给当年在MT创业的我极大启示@白硕

董:
关于研究与产品、通用与专用、寄生与赋能,我的想法和做法是:这些是对立且统一的。研究要通用些,要深,而应用要专门,要浅。研究可以关起门,应用就必须是开门的。这就是“深研究,浅应用”。就像一个老师,备课不怕深、广,但讲课要深入浅出。我经历过多次处理研究与应用间的关系的机会。

白:
@董振东 董强这次展示的系统,感觉还是学术性质的。@董振东 “备课”工作的一部分,确实可以用大数据、机器学习来做,全都人来做,周期太长,对灵魂人物的要求太高。

董:
@白硕 欢迎多多指教。我们希望有人帮助我们走向非学术性的。

白:
伟哥说大树吃小树,这话对的,不过涉及到时空的错乱。是吃了小树的树自然而然地成为了大树。

洪:
应该这样“全国猪人工智能受精关键技术研讨会”

吕:
猪AI会议可以撸出小猪来,不少国内的人AI会议,只能撸出更多的傻叉和骗子…..

冯:
近年来计算语言学的发展迅速,越来越工程化,文科背景的师生有的难以适应,海涛和他团队明察秋毫,迅速由计算语言学转入计量语言学,用计量方法来研究语言本体,这是聪明的做法。希望他们在这个研究方向上作出更多的成绩。海涛是我的博士生,浙大外国语学院教授。他是院士,世界语研究院院士。我仍然坚守在计算语言学的阵地上,没有转入计量语言学。我老了,不中用了!

李:
白老师说的大树吃小树 背景在这里:《科研笔记:NLP “毛毛虫” 笔记,从一维到二维

冯:
学习了。毛毛虫有道理。

李:
@冯志伟 洗脚池转文总是漏掉后面的 相关文章 现补上白老师这篇奇文:【白硕 – 穿越乔家大院寻找“毛毛虫”

 

【相关】

董振东:逻辑语义及其在机译中的应用

立委:乔姆斯基批判

独家|专访深度好奇创始人吕正东:通向理解之路

科学网《语义三巨人

尼克:哲学评书

科研笔记:NLP “毛毛虫” 笔记,从一维到二维

白硕 – 穿越乔家大院寻找“毛毛虫”

It is untrue that Google SyntaxNet is the “world’s most accurate parser

【语义计算:李白对话录系列】

中文处理

Parsing

【置顶:立委NLP博文一览】

《朝华午拾》总目录

【从 sparse data 再论parsing对信息抽取的核心作用】

parsing关键是它节省了语用层面的开发。没有parsing,抽取是在表层进行,存在的困境是 sparse data 和长尾问题。表层的东西学不胜学,而有了 deep parsing  的支持,抽取规则可以以一当百,至少从规则量上看,这绝不是夸张。这是其一。

其二,deep parsing 使得领域移植性增强太多。

没有 parsing 抽取任务变了,一切须推到重来。

对于规则体系,有了 deep parsing,抽取任务随领域变了就不需要那么大的返工。parsing 吃掉了约 90% 的重复劳动(语言知识和结构本质上是跨领域的),返工的部分不到 10%。

parsing 意义之重大 正在于此。

对于机器学习,NLP应用的知识瓶颈在 (1)sparse data;(2) 任务变,训练库必须重新标注:前一个任务的标注对后续任务基本没有可重复使用的可能,因为标注是在语用层进行的。

如果有 parsing support,理论上讲,机器学习可以更好地克服 sparse data,但实践上,到目前为止,结合 structure features 和 keywords 在机器学习中一锅煮,目前还处于探索研究阶段,没有多少成熟的案例。我们以前尝试过这种探索,似乎parsing的参与有推进系统质量的潜力,但是还是很难缠,模型复杂了,features 混杂了,协调好不是一件容易的事儿。

事实上,规则体系做抽取,没有 parsing 差不多有寸步难行的感觉。因为人的大脑要在语言表层写规则,数量太大,写不过来。只有机器学习,才可以绕开parsing去学那数量巨大的抽取规则或模型,但前提是有海量标注的训练集。否则面对的是 sparse data 的严重困扰。

sparse data 远远不是单指表层的出现频率低的 ngrams (习惯用法、成语等)的累积,那种 sparse data 相对单纯,可以当做专家词典一样一条一条编写,愚公或可移山。如果训练数据量巨大,譬如机器翻译,那么这类 sparse data 对于机器学习也有迹可循。当然大多数场景,标注的训练集始终大不起来,这个知识瓶颈 is killing ML。

更重要的 sparse data 是由于缺乏结构造成的,这种 sparse data 没有parsing就几乎无计可施。表层的千变万化,一般遵循一个正态分布,长尾问题在结构化之前是没有办法有效捕捉的。而表层的变化被 parsing 规整了以后,表层的 sparse 现象就不再 sparse,在结构层面,sparse patterns 被 normalize 了。这是 parsing 之所以可以称为NLP应用之核武器的根本。

没有 parsing,结构性 sparse data 就玩不转。

乔姆斯基纵有一万个不是,一千个误导,但他老人家提出的表层结构和深层结构的思想是不朽的。parsing 就是吃掉各种表层结构,生成一个逻辑化的深层结构。在这种深层结构上做抽取或其他语义语用方面的应用型工作,事半功倍。

Deep parsing consumes variations of surface patterns, that’s why it is as powerful as nuclear bombs in NLP。

别说自然语言的语句的表层多样化,咱们就是看一些简单的语言子任务,譬如 data entity 的自动标注任务,就可以知道表层的 sparse data 会多么麻烦:如 “时间”的表达法,再如“邮件地址”的表达法,等等。这些可以用正则表达式 parse 涵盖的现象,如果在表层去用 ngram 学习,那个长尾问题就是灾难。

自然语言文句之需要 parsing,与标注 data entity,正则表达式优于 ngram 学习, 其道理是相通的。

原载:《泥沙龙笔记:从 sparse data 再论parsing乃是NLP应用的核武器

《立委科普:关键词革命》 

《李白毛铿锵行: 漫谈中文NLP和数据流》

【自然语言parsers是揭示语言奥秘的LIGO式探测仪】 

创新,失败,再创新,再失败,直至看上去没失败 》

科学网—乔姆斯基批判

【置顶:立委NLP博文一览】

《朝华午拾》总目录

 

【泥沙龙铿锵行:再论NLP与搜索】

李:上次提过,先搜后parse,是可行的。

早在十几年前,AskJeeves 被华尔街追捧。这里面也有很多IT掌故我专门写过博文( 【问答系统的前生今世】【 金点子起家的 AskJeeves 】)。 当时NLP (Natural Language Processing) 红透半边天,下一代 Google 呼之欲出的架势,尽管AskJeeves其实NLP含量很低。他们不过利用了一点NLP浅层对付问题的分析。这才有我们后来做真正基于NLP的问答系统的空间。

就在AskJeeves上市的当天,我与另一位NLP老革命 Dr. Guo,一边注视着股市,一边在网上谈先search后parse的可行性。此后不久我的团队就证实了其可行,并做出了问答系统的prototype,可以通过无线连接,做掌式demo给投资人现场测试。当年还没有 smart phone 呢,这个demo有wow的效果,可以想见投资人的想象被激发,因此我们顺顺当当拿到了第一轮一千万的华尔街风投(这个故事写在《朝华午拾:创业之路》)。

问答系统有两类。一类是针对可以预料的问题,事先做信息抽取,然后index到库里去支持问答。这类 recall 好,精度也高,但是没有 real time search 的灵活性和以不变应万变。

洪:文本信息抽取和理解,全靠nlp

李:另一类问答系统就是对通用搜索的直接延伸。利用关键词索引先过滤,把搜罗来的相关网页,在线parse,on the fly, 深度分析后找到答案。这个路子技术上是可行的。应对所谓factoid 问题:何时、何地、谁这样的问题是有效的。(但是复杂问题如 how、why,还是要走第一类的路线。)为什么可行?因为我们的深度 parsing 是 linear 的效率,在线 parsing 在现代的硬件条件下根本不是问题,瓶颈不在 parsing,无论多 deep,比起相关接口之间的延误,parsing 其实是小头。 总之,技术上可以做到立等可取。

对于常见的问题,互联网在线问答系统的 recall 较差根本就不是问题,这是因为网上的冗余信息太多。无论多不堪的 recall,也不是问题。比如,问2014年诺贝尔物理奖得主是谁。这类问题,网上有上百万个答案在。如果关键词过滤了一个子集,里面有几十万答案,少了一个量级,也没问题。假设在线 nlp 只捞到了其中的十分之一,又少了一个量级,那还有几万个instances,这足以满足统计的要求,来坐实NLP得来的答案,可以弥补精度上可能的偏差(假设精度有十个百分点的误差)。

IBM 花生机器在智力竞赛上 beat 人, 听上去很神奇, 里面会有很多细节的因应之道,但从宏观上看,一点也不神奇。因为那些个竞赛问题,大多属于 factoid 问题,人受到记忆力有限的挑战,肯定玩不过机器。

雷:@wei 为什么说事先对材料进行deep parsing的搜索不灵活?

李:事先(pre-parsing)更好。我是主张建立一个超级句法树库的,资源耗费大。但急于成事的工程师觉得也没必要。在线做的好处是,内容源可以动态决定。

雷:假设一下,我们把谷歌拥有的材料通通进行了deep parsing,那么这个搜索会是什么样的? 再辅佐以人工的高级加工

李:nlp parsing 比关键词索引还是 costs 太大。

雷:是,但是现在硬件的条件下,还是可行的吧?那就是把信息转化为了fact的知识

李:是的,哪怕只是把 Google 网页里面的百分之一parse 一遍那也有不得了的威力。那是核武器。就是 Powerset Ron 他们当年绘制的图景。可是这种大规模运用NLP不是我们可定的,成本是一个大因素,还有就是观念和眼光,那是 norvig 这样的人,或其上司才能拍板的。

雷: 暂时局限在一个领域呢?

Nick可以先小规模吗,如wiki等?

破坏google的力量是semantic web. 如果每个网站使用的是semantic web,who needs google, 但是现在的问题是把一个web2.0的site转化为web3.0的成本

李:Wiki已经可行,Powerset 当年就是拿它展示的。但市场切入点呢? Wiki其实是小菜,比起我们目前应对的 social media, 无论是量,还是语言的难度。

Nick:但wiki有结构

李:做wiki技术上没有任何问题。问题在产品和businesd model.

Nick:做一个wiki的语法树,再叠加wiki的结构,已经很有用了。

wiki 到 dbpedia 还是只有很低的percentage吧?

李:Ron 当年游说你们和微软,不就是wiki么,其实他们的demo,纯粹从技术的角度完全可以通过 due diligence。

大家都知道知识挖掘,在大数据时代有巨大潜力,这是宏观上的认识,永远正确。微观层面,还是要有人在知识基础上做出可挣钱的产品来。微软买Powerset的时候,肯定也是基于这种宏观认识。但没有后续的产品化,买来的技术就是个负担。

RW:Google 是靠se抓流量,然后ads赚钱,Se技术本身不变现

Nick:@wei powerset我看过,not impressive at all

李:那是因为,你的角度不同。他们没有把那种结构的威力,用通俗的方式,做成投资人容易看懂的形式。我也玩过 Powerset,它的核心能力,其实是有展现的。不过要绕几道弯,才能发现和体会。方向上他们没错。

当然我不是为 Ron 唱赞歌,他再牛,再有名气,他的parser比我的还是差远了。这个世界上 yours truly 是第三 — 如果上帝是第一,在下的下一个系统是第二的话。

当然吹这种牛是得罪人的,不妨当笑话看。

呵呵,不用上税,无妨的

Nick: 你的不好意思不得罪人

李:Jobs不是说过,只有疯狂到以为自己可以改变世界的,才能在雪地里撒尿,并留下一些味道或痕迹。我们是毛时代生人,自小有一种精英意识。天将降大任于斯人也,自己吃不饱,也要胸怀世界,解放全人类。老子天下第一的心态就是那种legacy。

Chris Manning前两年就跟database/information retrieval的辩论说,别啥啥fact db和information extraction,直接deep parsing齐活。

:@洪 我农民,东西放哪里啊

李:Parsing real time 的应用场景,东西放内存就可以了,用完就扔,用时再来,现炒现卖。当然那个做不了真正意义上的text mining,只见树木,难见森林。但可以应对搜索引擎对付不了的简单问题。

哇哈,不得了。改不改变世界且不说,我的作息时间先被改变了。

我以为做机器学习的人在在豪气冲天,原来@wei也是!

@雷 一个爱在雪地……

@雷 Chris Manning的意思是,all information is in deep parsed text

facts不就是来源于deep parsed text吗

facts are usually triples extracted from text with consensus。

: under a set of ontologies, these facts form a network, that is, linguistic factors are removed。

db & ir people dont really believe nlp is a must path for retrieval tasks

you are right. This is why wei made such big efforts here to point out the problems of those guys.

linguistic info is transparent to native human speaker , but I don’t think it’s transparent to computer. So, I believe in communicating with machine, or communicating with people through computer, simpler language in query or logic form should be better. Why do we want to make computer understand human language? It doesn’tmake sense at all.

李:洪爷说的是哪国话? 本来就不存在机器理解语言, 那个 NLU 只是一个比喻。其实也不存在人工智能,那也是个比喻。

现在大多数人可不把ai/nlu当比喻

李:所谓机器理解语言不过是我们模拟分解了分析理解的过程达到某种表达 representations,这种表达是达到最终任务的一个方便的桥梁,如此而已。

按你的说法,机器人过不了turing test 这一关

李:我是回应你为什么要让机器理解语言。回答是,从来就不需要它去理解。而是因为在人类从形式到内容的映射过程中,我们找到一些路径,沿着这个路径我们对人类的理解,似乎有一个说得过去的解释。

当然,那位IR仁兄说的其实是一个具体环节, 指的是搜索框,他说好好的搜索框,给几个关键词就可以查询,既快又好又简单,为什么要把搜索框变成一个自然语言接口,像以前的AskJeeves那样,让人用自然语言提问,然后逼迫机器去理解?从他的角度,这完全不make sense,这种感觉不无道理。明明不用自然语言,多数搜索任务都可以完成得很好,没有道理硬要与机器说“人话”,增加overhead, 还有机器理解过程中的误差。关键词蛮好。互联网搜索这么多年,我们用户其实也被培养出来了,也都习惯了用尽可能少的关键词,以及怎样用关键词的不同组合,容易找到较理想的结果。自然语言接口似乎没有出场的必要。

可是,这只是问题的一个方面。问题是关键词搜索也许可以解决80% 乃至 90% 的基本信息需求(只是基本,因为心中问题的答案还是需要人在搜索结果中去用人脑parse来确定,这个过程不总是容易轻松的)。但还有相当一部分问题,我们或者难以用关键词找到线索,或者找出来的所谓相关网页需要太多的人肉阅读还不能搞定。这时候,我们可能就会想,要是有个懂人话的机器,自动解答我们的信息问题多好啊。自然语言接口终究会以某种形式重回台面,增强而不是取代关键词的接口。

:理解就是 1.能在人与人之间当二传手;2.能根据自己存储的知识和具备的行动能力做出人所认可的反应

李:说白了,就是从线性的言语形式到语法树的映射。这是人类迄今最伟大的发现,或发明,或理论属于最高天机。人类还没有更好的理论来解释这个理解过程。这个建树的过程,赶巧可以程序化来模拟,于是诞生了 NLU

:在图灵测试中,我们是把机器看成黑盒子。但是要让机器通过图灵测试,它就得理解人的语言才能作出反应。 两位大侠,能否推荐几本书看看?最好是科普类的,看着不吃力。

李:洪爷,不能因为在某些语言任务上,没有语言分析,也做到了,就来否定语言分析的核武器性质。LSA根本就没有语言分析,但它用到给中学生自动评判作文方面,效果也不错。

最近重读了几本认知方面的旧书,我倾向于认为人的内部表征是一种imaginary的多维图式表征,linguistic system只是个人际交流的接口。把多维信息压到线性。让计算机理解小说诗歌,估计永远做不到,因为计算机没有人那么强大的imaginary内部表征。@毛 wei和我一起来推荐几本nlp方面的书,就像PDP一样经典

:@wei 句子的语意理解后的表征方式是什么?还是tree吗?

李:逻辑语义,这是董老师的表述。外面叫 logical form,这是从乔老爷那里借来的术语。具体表现细节没必要相同。

那么我们把句子给理解后,tree与logical form并存在记忆中?

李:二者等价。细分可以有:句法树;语义树;语用树。所谓信息抽取,就是建语用树。句法树到语义树,就是乔老爷的表层结构到深层结构的逆向转换。

Chomsky之所以不谈语义啥的,因为实在没啥科学证据。现在我们所讲的语义都不是native的,都是人类的数学逻辑发明,在计算机上热起来的。出口转内销

: 是不是与那时的行为主义为主流有关,因为语意很难有操作定义?

李:这个讨论越来越高大上,也越来越形而上。

:是啊,再往上一点,就到哲学、认识论的层面了。另,跟PDP一样经典的是什么书?

李:乔老爷57年小册子。

: 什么书名?我以前只是从编译的角度了解他在形式语言方面的理论(现在也忘了),却不知道他在自然语言方面的贡献。以前我对自然语言毫不关心,也就是这一阵听你们高论才觉得这东西挺有意思。

: 有关语言学和认知科学的科普书,Steven Pinker写的系列都不错

The Language Instinct (1994) ISBN 978-0-06-097651-4
How the Mind Works (1997) ISBN 978-0-393-31848-7
Words and Rules: The Ingredients of Language (1999) ISBN978-0-465-07269-9
The Blank Slate: The Modern Denial of Human Nature (2002) ISBN978-0-670-03151-1
The Stuff of Thought: Language as a Window into Human Nature(2007) ISBN978-0-670-06327-7

有关NLP:
Dan Jurafsky and James Martin’s Speech and Language Processing.

有关基于统计方法的NLP:
Chris Manning and Hinrich Schütze’s Foundations of Statistical NaturalLanguage Processing

好像这两本书国内都有影印本

白:总结一下:wei的中心意思,nlp技术在他手里已经很过关了,只是苦于木有好的商业模式,再加上微软谷歌等传统势力的封杀,商业上还不能成大气候。有人建议说回国发展。deep nlp,性能不是问题,可以保证线性online parse,最坏情形回退到搜索。瓶颈在别处。

:元芳你怎么看

李:元芳呢?

谢谢白老师的总结,实际上就是这么回事。决定成败的不是技术,而是产品方向。技术差,可以砸了产品;技术好,不能保证产品在市场的成功。技术增加的是产品的门槛。

: 好的商业模式有两个特点,一个是技术壁垒,一个是侵略性。nlp前者不是问题,问题在后者。需要一张极富侵略性的皮。讯飞也有马失前蹄啊。

: 多讨论,应该能够找到好的方向。讯飞很多年都做得苦逼死了,熬到这两年才爽。现在做一个新的搜索引擎公司不现实。问答类概念已经被用滥了。出门问问也是因为问答不好做,改作智能手表,反而卖的不错。智能家居的语音交互界面,本质上是一个问答系统。

李:对于关键词,语法树就是颠覆。

: 信息服务三个阶段:门户网站,域名成为商品;搜索引擎,关键词成为商品;社交网络,粉丝成为商品。下一个成为商品的是啥?问答只是表象,关键是要回答什么成为商品。分析树也不直接是商品。

李:白老师说的极是。关键是什么是商品,可以来钱,这个确定了,作为后台的技术产品才有门槛,核武器才能发挥威力。

我们还是想想,高精准度的deep nlp服务,把什么作为标的商品,才能具有侵略性。

Philip: 给@wei 的高大上技术找个商业模式

我个人算是比较擅长于设计商业模式的,但是对于NLP的直接应用,还是觉得太偏后端,很难找出一个前端产品,对于用户是可感知的刚需。

不在多而在狠,uber就够狠。

 

原载:《泥沙龙笔记:parsing 是引擎的核武器,再论NLP与搜索》

 

【相关】

从 sparse data 再论parsing乃是NLP应用的核武器

《parsing 可以颠覆关键词吗?》

《立委科普:关键词革命》 

《李白毛铿锵行: 漫谈中文NLP和数据流》

【自然语言parsers是揭示语言奥秘的LIGO式探测仪】 

泥沙龙笔记:创新,失败,再创新,再失败,直至看上去没失败 

【置顶:立委NLP博文一览】

《朝华午拾》总目录

【立委随笔:听喜马拉雅老罗侃人工智能】

关于AI的几点小感想:

1. AI这一波狂热,甚至连文科生罗胖都开始信服了。那天堵车听喜马拉雅,收听罗胖的跨年演讲,谈产品 高科技等,觉得这小子铁嘴钢牙,不全是嘴皮子功夫,有真货。譬如,论产品和服务,从空间移向时间,就很有高度。唯独他谈AI的时候,让我莫名。有几分道理,更多是谬误,参杂一起,言之凿凿。这才相信了AI的营销已经多么有成效,把一个完全的外行人也整成导师了。

罗胖说,很快(言之凿凿说的是五到十年)一多半人就要失业了,为人工智能取代。这不是新鲜的观点,开复老师等大咖也一直在说。老罗说人类历史都是人奴役、压榨、剥削人的历史,终于历史走到了尽头:大多数人连被奴役被剥削的价值也没有了,因为人工智能不怕被奴役和剥削,不抱怨、不造反、不暴动。干嘛要剥削人呢?

怎么办?也许就是给这些完全没有价值的多余人一个VR的眼镜,让他们一辈子在游戏中度过。如果不想落入这个群体,很少的几个可能是做设计家和创新家。因为对于现存的一切,人工智能无所不能。只有对不存在的东西,对于新设计和新创新,人类还有一点点赢面。

这些把人类与人工智能割裂成两个平等、独立或对立的竞争对手的说法,已经流传甚广,似是而非,但也很难证伪。先放在一边。

奇葩的是,他居然搞懂了深度神经。简单说就是深度神经就是一个怪兽,胃口特别大,只要给它喂大数据,他就无所不能。罗胖说,深度神经可以看历史上所有的医书,然后分析你所有的生命数据,然后给你建议吃什么药。你必须听他的,因为你穷尽一辈子也不可能理解人工智能的智能。但是他肯定比你和你在人类所能找到的任何专家高明。

总之人工智能行的是上帝的逻辑,我们人的逻辑无法理解 也无须理解。顺之者昌逆之者亡。这个逻辑的算法基础就是深度神经。

罗胖这些说法其实早已在现下媒体,以及早先的科幻中,为无数记者作家描述过。他不过是利用他所特长的语言艺术表达出来。

2. 看老友在朋友圈提ai+酒,就让我想起历史上魏晋的药+酒,都是性感、时髦、流行而且催情的。很浪漫,也很颓靡。

3. 老友接着谈他看好机器人情感,觉得前景无限。老友说的机器人的 AI 情感,这里的情感是说的什么呢?

(1) AI 对于人类表述的情感的捕捉:譬如舆情挖掘
(2) AI 机器人(譬如微软小冰)自己所表现出来的情感表述

(1)是已经和正在实现的事儿,毋庸置疑,但与常人所说的机器人情感大概不是一回事儿,虽然不少人有意无意混淆二者。

(2)是典型的“逢场作戏”:微软小冰细语款款地说爱我很多次了,我要不是做 NLU 的,可能早就被她迷惑了。

说的是(2)这种情感大有前途么?

可以想到的前途是: 虚拟恋人(安慰失恋的人);老人陪伴(宽慰孤独的老人)。不怀疑这种东西最终可以以假乱真。将来市场化时候唯一要着力做的是,消除心理障碍,要给客户洗脑,这个机器人,不是机器,而是人。

(绝不能泄露天机:这与人类情感,没有一毛钱的关系。她爱死你了,你也爱死她了,你们结婚,也绝不会有爱的结晶。)

 

【相关】

【泥沙龙笔记:强人工智能的伟哥测试】

强弱人工智能之辩

【置顶:立委NLP博文一览】

《朝华午拾》总目录

立委译白硕:“入口载体”之争(中英对照)

【立委按】端口(portals),兵家必争。bots,热门中的热门。白老师说,背后的ai才是战略布局的重中之重。又说,平台和服务,非巨头不能。问题是哪家巨头明白战略布局的精要所在。对于中文深度理解,水很深很深。大浪淘沙,且看明日之ai,竟是谁家之天下。不是特别有insights和分量的,我是不会翻译的(尽管有了神经翻译助力,也搭不起那个时间)。白老师绝妙好文,值得咀嚼。(By the way, 最后一段的想象力,秒杀所有科幻作家。)

“入口载体”之争

最近,亚马逊旗下的智能音箱产品 Echo 和出没于 Echo 中的语音助手 Alexa 掀起了一股旋风。不仅智能家居业在关注、人工智能创业公司在关注,IT巨头们也在关注。那么,Alexa 到底有什么独到之处呢?

Recently, Amazon’s AI product Echo and its voice assistant Alexa set off a whirlwind in the industry.  It has drawn attention from not only the smart home industry but also the AI start-ups as well as the IT giants.  So, what exactly is unique about Alexa?

有人说,Alexa 在“远场”语音识别方面有绝活,解决了“鸡尾酒会”难题:设想在一个人声嘈杂的鸡尾酒会上,一个人对你说话,声音虽不很大,但你可以很精准地捕捉对方的话语,而忽略周边其他人的话语。这手绝活,据说其他语音厂商没有,中国连语音处理最拿手的科大讯飞也没有。

Some people say that Alexa has solved the challenging “cocktail party” problem in speech recognition: imagine a noisy cocktail party, where a person is chatting with you, the voice is not loud, but you can accurately capture the speech with no problem while ignoring the surrounding big noise. Alexa models this amazing human capability well, which is said to be missing from other leading speech players, including the global speech leader USTC iFLYTEK Co.

有人说,Alexa 背后的“技能”极其丰富,你既可以点播很多节目,也可以购买很多商品和服务;既可以操控家里的各款家电设备,也可以打听各类消息。总而言之,这是一个背靠着强大服务资源(有些在端,更多在云)的语音助手,绝非可与苹果的 Siri 或者微软的小冰同日而语。

Others say that behind Alexa are very rich cross-domain know-hows:  one can ask Alexa for on-demand programs, he can also buy goods and services through it; it can be instructed to control the various appliances of our home, or inquire about all kinds of news.  All in all, this is a voice assistant backed by a strong service (with some resources local, and more in the cloud).  Apple’s Siri or Microsoft’s Little Ice are believed to be by no means a match for Alexa in terms of these comprehensive capabilities.

端方面的出色性能,加上端+云方面的庞大资源,构成了 Alexa 预期中的超强粘性,形成了传说中巨大的入口价值。这也似乎是Alexa在美国市场取得不俗业绩的一个说得通的解释。有相当一部分人意识到,这可能是一个巨大的商机,是一个现在不动手说不定将来会追悔莫及的局。尽管在美国以外的其他市场上,Alexa的业绩并不像在美国市场那样抢眼,但是这股Alexa旋风,还是刮遍了全球,引起了同业人士的高度紧张和一轮智能音箱模仿秀。

The excellent performance by the end device, coupled with the huge cloud resources in support of the end, constitute Alexa’s expected success in customers’ stickiness, leading to its legendary value as an information portal for a family.  That seems to be a good reason for Alexa’s impressive market performance in the US.  A considerable number of people seem to realize that this may represent a huge business opportunity, one that simply cannot be missed without regret.  Although in other markets beyond the United States, Alexa’s performance is not as eye-catching as in the US market, this Alexa whirlwind has till been scraping the world, leading to the industry’s greatest buzz and triggering a long list of smart speaker simulation shows.

Alexa 动了谁的奶酪?抢了谁的饭碗?怎样评价 Alexa 的入口价值?怎样看待入口之争的昨天、今天、明天?

Hence the questions: What are the effects of this invention of Alexa? Who will be affected or even replaced?  How to evaluate Alexa’s portal value? Where is it going as we look into the yesterday, today and tomorrow of this trend?

我们不妨来回顾一下“入口”的今昔变迁。所谓“入口”,就是网络大数据汇聚的必经之地。从模式上看,我们曾经经历过“门户网站”模式、“搜索引擎”模式和“社交网络”模式,目前新一代的入口正在朝着“人工智能”模式迁移。从载体上看,“门户网站”和“搜索引擎”模式的载体基本上是PC,“社交网络”模式的载体基本上是以智能手机为主的端设备。“人工智能”模式有可能的改变载体吗?换句话说,Echo-Alexa 软硬合体,能够以人工智能的旗号,从智能手机的头上抢来“入口载体”的桂冠吗?

We may wish to reflect a bit on the development of portals in the IT industry history.  The so-called “portal” is an entry point or interface for an information network of large data flow, connecting consumers and services.  From the model perspective, we have experienced the “web portal” model, the “search engine” model and more recently, the “social network” model, with the on-going trend pointing to a portal moving in the “artificial intelligence” mode. From the carrier perspective, the carrier for the”web portal” and “search engine” models is basically a PC while the “social network” model carrier is mainly a smart phone-based end equipment. Does the “artificial intelligence” model have the potential to change the carrier? In other words, is it possible for the Echo-Alexa hardware-software combination, under the banner of artificial intelligence, to win the portal from the smart phone as the select point of human-machine interface?

本人认为,这是不可能的。原因有三。

I don’t think it is possible.  There are three reasons.

第一,场景不对。哪怕你抗噪本事再强大,特定人跟踪的本事再大,只要安放地点固定,就是对今天已经如此发达的移动场景的一种巨大的倒退。试想,家庭场景的最大特点就是人多,人一多,就形成了个小社会,就有结构。谁有权发出语音指令?谁有权否定和撤销别人已经发出的语音指令?最有权的人不在家或者长期沉默,听谁的?一个家庭成员如果就是要发出一个不想让其他家庭成员知道的私密语音指令怎么办?个人感觉,语音指令说到底还是个体行为大于家庭行为,私密需求大于开放需求。因此,家庭语音入口很可能是个伪命题。能解析的语音指令越多,以家庭场景作为必要条件的语音指令所占比重就越少。

First, the scene is wrong. Even if Alexa is powerful with unique anti-noise ability and the skills of tracking specific people’s speech, since its location is fixed, it is a huge regression from today’s well-developed mobile scenes.  Just think about it, the biggest feature of a family scene is two or more individuals involved in it.  A family is a small society with an innate structure.  Who has the right to issue voice commands? Who has the authority to deny or revoke the voice commands that others have already issued? What happens if the authoritative person is not at home or keeps silent? What if a family member intends to send a private voice instruction? To my mind, voice instruction as a human-machine interaction vehicle by nature involves behaviors of an individual, rather than of a family, with privacy as a basic need in this setting.  Therefore, the family voice portal scene, where Alexa is now set, is likely to be a contradiction. The more voice commands that are parsed and understood, the less will be the proportion of the voice commands that take the home scenes as a necessary condition.

第二,“连横”面临“合纵”的阻力。退一步说,就算承认“智能家居中控”是个必争的入口,智能音箱也面临其他端设备的挑战。我们把聚集不同厂家家居设备数据流向的倾向称为“连横”,把聚集同一厂家家居设备数据流向的倾向称为“合纵”。可以看出,“连横”的努力是对“合纵”的生死挑战,比如海尔这样在家庭里可能有多台智能家居设备的厂商,如非迫不得已,自家的数据为什么要通过他人的设备流走呢?

Second, the “horizontal” mode of portal faces the “vertical” resistance.  Even if we agree that the “smart home central control” is a portal of access to end users that cannot be missed by any players, smart speakers like Alexa are also facing challenges from other types of end equipment.  There are two types of data flow in the smart home environment.  The horizontal mode involves the data flow from different manufacturers of home equipment.  The vertical mode portal gathers data from the same manufacturer’s home equipment.  It can be seen that the “horizontal” effort is bound to face the “vertical” resistance in a life and death struggle.  For example, the smart refrigerator and other smart home equipment manufactured by Haier have no reasons to let go its valuable data and flow it away to the smart speaker manufacturers.

第三,同是“连横”的其他端设备的竞争。可以列举的有:家用机器人、家庭网关/智能路由器、电视机、智能挂件等。这些设备中,家用机器人的优势是地点无需固定,家庭网关的优势是永远开机,电视机的优势是大屏、智能挂件(如画框、雕塑、钟表、体重计等)的优势是不占地方。个人感觉,智能音箱面对这些“连横”的竞争者并没有什么胜算。

Third, the same struggle also comes from other competitions for the “horizontal” line of equipment, including house robots, home gateway / intelligent routers, smart TVs, intelligent pendants and so on.  The advantage of the house robots is that their locations need not be fixed in one place, the advantage of the home gateway is that  it always stays on, the TVs’ advantage lies in their big screens, and intelligent pendants (such as picture frames, sculptures, watches, scales, etc.) have their respective advantage in being small.  In my opinion, smart speakers face all these “horizontal” competitions and there does not seem to be much of a chance in winning this competition.

综上所述,Echo-Alexa 的成功,具有很强的叠加特点。它本质上是亚马逊商业体系的成功,而不是智能家居设备或者语音助手技术的成功。忽略商业体系的作用,高估家庭入口的价值,单纯东施效颦地仿制或者跟随智能音箱,是没有出路的。个人觉得,智能手机作为移动互联时代的入口载体,其地位仍然是不可撼动的。

In summary, the Echo-Alexa’s success comes with a strong superposition characteristic. It is essentially a success of the Amazon business system, rather than the success of smart home appliances or the voice assistant technology. Ignoring the role of its supporting business system, we are likely to overestimate the value of the family information portal, and by simply mimicking or following the smart speaker technology, there is no way out.  Personally, I feel that the smart phone as the carrier of an entry point of information in the mobile Internet era still cannot be replaced.

语音交互时代真的到来了吗?

Is the era of voice interaction really coming?

IT巨头们关注 Alexa 还有一个重要的理由,就是由 Alexa 所代表的语音交互,或许开启了人机交互的一种新型范式的兴起。当年,无论是点击模式的兴起还是触摸模式的兴起,都引发了人机交互范式的革命性变化,直接决定了IT巨头的兴亡。点击模式决定了 wintel 的崛起,触摸模式决定了 wintel 被苹果的颠覆,这些我们都以亲身经历见证过了。如果语音交互真的代表了下一代人机交互范式,那么 Alexa 就有了人机交互范式的代际转换方面的象征意义,不由得巨头们不重视。

One important reason for the IT giants to look up to Alexa is that the voice interaction represented by Alexa perhaps opens a new paradigm of human-computer interaction.  Looking back in history, the rise of the click-mode and the rise of the touch-mode have both triggered a revolutionary paradigm shift for human-computer interaction, directly determining the rise and fall of the IT giants. The click-mode led to the rise of Wintel, the touch mode enabled Apple to subvert Wintel: we have witnessed all these changes with our own eyes.  So if the voice interaction really represents the next generation paradigm for human-computer interaction, then Alexa has a special meaning as the precursor of the human-computer interaction paradigm shift.  The giants simply cannot overlook such a shift and its potential revolutionary impact.

然而个人认为,单纯的语音交互还构不成“代际转换”的分量。理由有三:

However, personally, I do not think that the speech interaction alone carries the weight for an “intergenerational revolution” for human-machine interaction.   There are three reasons to support this.

第一,语音本身并不构成完整的人机交互场景。人的信息摄入,百分之八十以上是视觉信息,在说话的时候,经常要以视觉信息为基本语境,通过使用指示代词来完成。比如指着屏幕上一堆书当中的一本说“我要买这本”。就是说,语音所需要的语境,有相当部分来自视觉的呈现,来自针对和配套可视化对象的手势、触摸或眼动操作。这至少说明,我们需要multi-modal人机交互,而不是用语音来取代其他人机交互手段。

First, the speech itself does not constitute a complete human-computer interaction scene.  People’s information intake, more than 80% of times, involves the visual information.  When speaking, we often take some visual information as basic context, through the use of a pronoun to refer to it.  For example, pointing to a book on the screen, one may say, “I want to buy this.” In other words, a considerable part of the context in which the speech is delivered comes from the visual presentation, ranging from gestures, touches or eye movements that target some visual objects. This at least shows that we need multi-modal human-computer interaction, rather than using voice alone to replace other human-computer interaction vehicles.

第二,目前语音输入还过不了方言关。中国是一个方言大国,不仅方言众多,而且方言区的人学说普通话也都带有方言区的痕迹。“胡建人”被黑只是这种现象的一个夸张的缩影。要想惠及占全国总人口一半以上的方言区,语音技术还需要经历进一步的发展和成熟阶段。

Second, the current speech recognition still cannot handle the dialect well.  China is a big country with a variety of dialects.  Not only dialects, but also the people in dialect areas speack Mandarin with a strong accent. To benefit more than half of the total population in the dialect areas, the speech technology still needs to go through a stage of further development and maturity.

第三,目前语音输入还很难解决“转义”问题。所谓转义问题就是当语音指令的对象是语音输入本身的时候,系统如何做出区分的问题。人在发现前一句说的有问题需要纠正的时候,有可能需要用后一句话纠正前一句话,这后一句话不是正式的语音输入的一部分;但也有可能后一句话并不是转义,而是与前一句话并列的一句话,这时它就是语音输入的一部分。这种“转义”语音内容的识别,需要比较高级的语义分析技术,目前还不那么成熟。

Third, the current speech recognition still has difficulty in solving the “escape” problem. The so-called escape problem involves the identification of scenarios when the speech refers to itself.  When people find there is an error in the first utterance and there is a need to correct it, they may choose to use the next sentence to correct the previous sentence, then this new sentence is not part of the naturally continuous speech commands, hence the need for “being escaped”.  But it is also possible that the latter sentence should not be escaped, and it is a sentence conjoined with the previous sentence, then it is part of the normal speech stream.  This “escape” identification to distinguish different levels of speech referents calls for more advanced semantic analysis technology, which is not yet mature.

所以,以语音输入目前的水平,谈论语音输入的“代际转换”或许还为时尚早。甚至,语音可能只是一个叠加因素,而并不是颠覆因素。说未来会进入multi-modal输入的时代,说不定更加靠谱一点。

So, considering the current level of speech technology, it seems too early to talk about the “intergenerational revolution”.  Furthermore, speech may well be just one factor, and not necessarily a disruptive one.  It seems more reasonable to state that the future of human-computer interaction may enter an era of multi-modal input, rather than speech alone.

语义落地是粘性之本

The semantic grounding is the key to the stickiness of users.

语义这个字眼,似乎被某些人玩得很滥,好像会分词了就摸到语义了,其实不然。语义的水很深。

Semantics as a term seems abused in all kinds of interpretations.  Some even think that once words are identified, semantics is there, which is far from true. The semantics of natural languages is very deep and involves a lot.  I mean a lot!

从学术上说,语义分成两个部分,一个叫“符号根基”,讲的是语言符号(能指)与现实世界(也包括概念世界)中的对象(所指)的指称关系;另一个叫“角色指派”,讲的是语言符号所指的现实或概念对象之间的结构性关系。符号根基的英文是“symbol grounding”,其中的 grounding 就有落地的意思。所以,我们说的语义落地,无论学术上还是直观上,都是一致的。Siri 在通信录、位置、天气等领域首开了在移动互联设备上实现语义落地的先河,这几年语义落地的范围越来越广。

From the academic point of view, semantics is divided into two parts.  One called “symbol grounding”, which is about the relationship of the language symbol (signifier) and its referent to the real world entity (including the conceptual world).  The second is called “role assignment”, which is about the relationship between the referents of the language symbols in the reality.  Siri is the pioneer in the mobile semantic grounding realized in the domain apps such as Address, Map and Weather.  The past few years have seen the scope of semantic grounding grow wider and wider.

前面说了,“端方面的出色性能,加上端+云方面的庞大资源,构成了 Alexa 预期中的超强粘性”。我们在这一节里面要进一步探讨:“端的性能”和“端+云的资源”这两者中,谁是产生 Alexa 粘性的更根本原因?笔者无意玩什么“都重要,谁也离不开谁”之类的辩证平衡术,那是便宜好人,说起来冠冕堂皇,做起来毫无方向。坦率地说,如果归因错误,那么就会产生投入方向的错误。而投入方向的错误,将使模仿者东施效颦,输得体无完肤。

Let me review what I said before: “the excellent performance by the end equipment, coupled with the huge cloud resources in support of the end, constitute the Alexa’s expected success in users’ stickiness”.  We can further explore along this line in this section.  Between “the performance by the end equipment” and “the cloud resources in support of the end”, which is the root cause for Alexa’s stickiness with the customers?  I do not intend to play the trick of dialectical balance by saying something like both are important and no one can do the job without the other.  That is always true but cheap, and it gives no actionable insights. The consequence includes possible blind investments in both for the copycat, such investments may well lead to a complete failure in the market.

作者认为,“端的性能”是硬件对场景的适应性。这充其量是“好的现场体验”。但没有实质内容的“好的现场体验”会很快沦为玩具,而且是不那么高档的玩具。没有“有实质意义的服务”就不可能产生持久的粘性,而没有持久的粘性就充当不了持久的数据汇集入口。然而,“有实质意义的服务”,一定源自语义落地,即语音指令与实际服务资源的对接,也就是 Alexa 的所谓“技能”。底下所说的语义落地,都是指的语音指令与无限可能的实际服务资源对接这种落地。

The author argues that “the performance by the end equipment” is about the adaptability of the hardware to the scene.  This is at best about a “good live experience” of users. But a product with “good user experience” without real content will soon degrade to a toy, and they cannot even count as high-end toys.  If there is no real “meaningful service” associated, there will be no sustainable stickiness of customers. Without user stickiness, they cannot become sustainable data collection entry points as a data flow portal.  However, any associated “meaningful services” must come from the semantic grounding, that is, the connection from a speech command with its corresponding actual service.  This is the essence behind Alexa’s so-called “know-hows.”  Semantic grounding as mentioned hereafter all refers to such connection from the speech command with infinitely possible actual service resources.

语义落地需要一个强大的、开放领域的NLP引擎。服务资源千千万万,不可能局限在一个或少数领域。一个只能面对封闭领域的NLP引擎,无法胜任这样的任务。能够对接开放领域,说明这个引擎一定在语义分析上有非同寻常的功力,一定在语义知识的表示和处理方面走在了正确的道路上。在这方面,英语做得好,不一定汉语做得好。还不了解汉语在开放领域的NLP引擎是一个什么样难度的人,不可能做出规模化的语义落地效果。这方面的技术壁垒可以在做同一个事情的公司间拉开有如天壤之别的巨大差距。

Comprehensive semantic grounding requires a strong open-domain NLP engine. Service resources are so diverse in tens of thousands, and they can hardly be confined to one or only a few narrow domains.  An NLP engine functioning only in a narrow domain cannot do this job well.  To work in the open domain requires an engine to be equipped with extraordinary capacity in the semantic analysis, and it must be on the right path in the semantic knowledge representation and processing.  In this regard, even if an English engine is doing decently well, it does not necessarily mean the Chinese counterpart will work well.  For those who do not yet understand the difficulty and pain points of the Chinese NLP engine in the open domain, it is hardly possible to expect them to achieve large-scale semantic grounding effects. Such technology barriers can set apart a huge gap in products attempting to do the same thing in the market between companies equipped with or without deep semantic capabilities.

语义落地需要对服务资源端的接口做出工程化的适配。这同样是一个非常艰巨的任务,而且是拼资源、拼效率、拼管理的任务。小微规模的初创公司不可能有这样的资源整合能力和工程组织能力,这一定是大公司的强项。有人说,我由小到大行不行?我说,不行,时间不等人。在语义落地领域,如果不能在短时间内爆发,等着你的就是灭亡。

Semantic grounding requires an engineering adaptation at the interface to the service resources.  This is also a very difficult task, and it involves competitions in the scale of resources as well as efficiency and management. Start-up companies can hardly have such a resource integration capacity and the engineering organization capabilities, these are the strength of large companies. Some people say that I can start small and gradually scale up, okay? I said, no, time does not wait for people.  In the area of semantic grounding, if products are not developed in a relatively short time to capture the market, there are little chances for survival.

语义落地还需要对人机对话场景本身的掌控能力。这涉及语境感知、话题切换、情感分析、语言风格选择、个性塑造等多项技术,不一而足。语音助理不见得都是越“贫”越“萌”越好,比如适度的渊博、犀利甚至粗鲁,也都可以是卖点。

Semantic grounding also calls for the ability to manage the man-machine interactive scene itself. This involves a variety of technologies such as contextual perception, topic switching, sentiment analysis, language style selection, personality shaping and many others. A speech assistant is not necessarily the best if it only mimics human’s eloquence or seemingly likable ways of expressions. Skills such as moderate profoundness or sharpness in arguments and even some rudeness at times can all be selling points as an intelligent assistant.

所以,我们强调语义落地对 Alexa 用户粘性的决定性作用,强调庞大服务资源对于 Alexa 成功故事的决定性贡献。在中国,没有与亚马逊规模相当、服务资源体量相当的超大型互联网企业出手,没有对面向汉语的开放领域NLP引擎开发重量级团队的出手,单凭语音技术是不可能产生这样的用户粘性的。

Therefore, we would point out the key role of semantic grounding on the stickiness of Alexa users, emphasizing the decisive contribution of large service resources behind Alexa’s success story.  In China, if Chinese IT giants with a comparable size of the Amazon service resources do not take the lead, coupled by a solid open domain Chinese NLP engine with a star team, the speech technology alone has no way to generate such a user stickiness as we see in Alexa.

谁会胜出?

这年头,一切不以获取用户数据为目的的端设备都是耍流氓。智能手机独领风骚多年了,各类智能家居连横合纵也斗了有几年了。Alexa 的横空出世,给了业界很多刺激和启示,但地盘属谁,并没有盖棺论定。大家还有机会。但是就端云结合、入口和入口载体结合形成数据闭环这件事,方向性、趋势性的东西不可不查,否则机会就不是你的。

Who will win then?

In essence, it is all about gathering the user data by the end equipments.  Smartphones dominate the industry for years, all kinds of smart home solutions across the verticals have also been fighting for several years now.  Alexa’s coming to the market stirs the industry with a lot of excitement and revelations, but it is far from what is all set.  We still have opportunities.  But keep in mind, it cannot be overemphasized to look into issues involving the combination of the end devices with the cloud and the combination between the entry point and the entry point carrier to form a closed-loop data stream.  If we lose the sense of directions and trends in these issues, the opportunity will not be ours.

什么是方向性、趋势性的东西呢?听我道来。

第一,人工智能一定是下一代的入口模式。也就是说,各种对服务的需求,必将最终通过人工智能的多通道输入分析能力和人机互动优势,从端汇集到云;各种服务资源,必将最终借助人工智能的知识处理与认知决策能力,从云对接到端。你不布局人工智能,未来入口肯定不是你的。

So what is the direction and what are the trends? Let me give an analysis.

First, artificial intelligence is bound to be the next generation portal. In other words, all kinds of service needs will inevitably go from the end devices to the cloud through the artificial intelligence multi-channel input analysis, leveraging the human-computer interaction advantages.  The variety of service resources will eventually use the knowledge of artificial intelligence and cognitive decision-making ability, to provide to users from the cloud to the end. If you do not lay out a roadmap in developing artificial intelligence, the future portal is definitely not yours.

第二,智能手机在相当长一段时间内,仍然是入口载体事实上的“盟主”,地位不可撼动。人走到哪里,通信节点和数字身份就跟到哪里,对现场的感知能力和作为服务代言者的app就跟到哪里。在入口载体所需要的个人性、私密性和泛在性这几个最关键的维度上,还没有哪一个其他端设备能够与智能手机相匹敌。

Second, the smartphone for a long time to come will stay as defacto chief carrier. Wherever is the person going, the communication node and the digital identity will follow and the perception of the life scene and the app as the service agent will also follow. There are no other end devices that match the smartphone on the most critical dimensions of the individualness, privacy, and the ubiquitous nature as needed by a portal carrier.

第三,端设备的通信功能和服务对接功能将逐步分离。随着可对接的服务越来越多样化,用一个端设备“包打天下”已不可能,但每个端设备均自带通信功能亦不可取。Apple watch 和 iPhone 之间的关系是耐人寻味的:iPhone 作为通信枢纽和客户端信息处理枢纽,Apple watch 作为专项信息采集和有限信息展示的附属设备,二者之间通过近场通信联系起来。当然,二者都是苹果自家人,数据流处在统一掌控之下。一家掌控,分离总是有限的、紧耦合的。但是,做得初一,就做得十五,今后各种分离将层出不穷,混战也将随之高潮迭起。今天是 Alexa 刮旋风,明天兴许就是谁下暴雨。如果手机厂商格局再大一点,在区块链的帮助下,在数据的采集方面对各种附属端设备的贡献进行客观的记录,据此在数据和收益的分享方面做出与各自贡献对等的合理安排,说不定某种松耦合形式的分离就会生米做成熟饭,端的生态到那时定会别样红火。可以设想,在一个陌生的地方,你从怀里掏出一张软软的薄薄的可折叠的电子地图,展开以后像一张真的地图那么大,却又像手机地图一样方便地触摸操作甚至可以结合语音操作,把它关联到你的手机上。当然,这张图也可以没有实物只有投影。而你的手机只管通信,所有的操控和展现都在这张图上完成,根本不需要掏出手机。这样的手机也许从头至尾就根本无需拿在“手”里,甚至可以穿在脚上,逐渐演化成为“脚机”……

Third, there will be separation between the communication function of a terminal device and the demanded service function. As the service grows more and more diversified, it becomes impossible for one end device to handle all types of service needs.  But it is not desirable for each end device to come with its own communication function.  The relationship between Apple Watch and iPhone is intriguing in this regard: iPhone serves as the communication hub as well as the client information processing hub while Apple Watch functions as a special device for information collection and limited information display.  They are connected through a “near field communication” link.  Of course, both are Apple’s products in one family, the data flow is therefore under a unified control.  In such a setting, they are tightly coupled, and the separation is always limited. However, this mode sheds lights to the future when all kinds of separation may be required but they should also be connected in some way.  If the mobile phone manufacturers keep an open mind, they can use the block chain technology in data collection with a variety of ancillary equipment to make an objective record of the respective contributions and accordingly make reasonable arrangements with regards to the data and proceeds sharing. A loose coupling of the separation will then evolve and mature, promoting the rapid ecological development of end devices in all kinds of forms. It is imaginable that, when we are in a new place, we can take out from our pocket a soft thin foldable electronic map.  This map, when unfolded, looks as big as a real paper map, but it works conveniently just like a mobile map app: it responds to the touch operations and may even accommodate speech instructions to associate with our phone. Of course, this map can also simply be a virtual projection, not necessarily taking the form of a real object.  Our phone only needs to take care of communication, all the control and display are accomplished on the map, and we do not even need to physically take out the phone. Such a phone may never need to be held in hands, we may even wear the phone on the foot, and the hand mobile device gradually evolves into a “foot phone” … …

Alexa旋风带给你的机会和启发是什么,想好了吗?

Are you ready for the opportunity and inspirations brought by the Alexa whirlwind?

Translated by: Dr. Wei Li based on GNMT
本文获作者白硕老师授权转载和翻译,特此感谢,原文链接:“入口载体”之争

 

【Related】

S. Bai: Natural Language Caterpillar Breaks through Chomsky’s Castle

S. Bai: Fight for New Portals

【李白对话录系列】

中文处理

Parsing

【置顶:立委NLP博文一览】

《朝华午拾》总目录

S. Bai: Fight for New Portals

Author: Bai Shuo

Recently, Amazon’s AI product Echo and its voice assistant Alexa set off a whirlwind in the industry.  It has drawn attention from not only the smart home industry but also the AI start-ups as well as the IT giants.  So, what exactly is unique about Alexa?

Recently, Amazon’s AI product Echo and its voice assistant Alexa set off a whirlwind in the industry.  It has drawn attention from not only the smart home industry but also the AI start-ups as well as the IT giants.  So, what exactly is unique about Alexa?

Some people say that Alexa has solved the challenging “cocktail party” problem in speech recognition: imagine a noisy cocktail party, where a person is chatting with you, the voice is not loud, but you can accurately capture the speech with no problem while ignoring the surrounding big noise. Alexa models this amazing human capability well, which is said to be missing from other leading speech players, including the global speech leader USTC iFLYTEK Co.

Others say that behind Alexa are very rich cross-domain know-hows:  one can ask Alexa for on-demand programs, he can also buy goods and services through it; it can be instructed to control the various appliances of our home, or inquire about all kinds of news.  All in all, this is a voice assistant backed by a strong service (with some resources local, and more in the cloud).  Apple’s Siri or Microsoft’s Little Ice are believed to be by no means a match for Alexa in terms of these comprehensive capabilities.

The excellent performance by the end device, coupled with the huge cloud resources in support of the end, constitute Alexa’s expected success in customers’ stickiness, leading to its legendary value as an information portal for a family.  That seems to be a good reason for Alexa’s impressive market performance in the US.  A considerable number of people seem to realize that this may represent a huge business opportunity, one that simply cannot be missed without regret.  Although in other markets beyond the United States, Alexa’s performance is not as eye-catching as in the US market, this Alexa whirlwind has till been scraping the world, leading to the industry’s greatest buzz and triggering a long list of smart speaker simulation shows.

Hence the questions: What are the effects of this invention of Alexa? Who will be affected or even replaced?  How to evaluate Alexa’s portal value? Where is it going as we look into the yesterday, today and tomorrow of this trend?

We may wish to reflect a bit on the development of portals in the IT industry history.  The so-called “portal” is an entry point or interface for an information network of large data flow, connecting consumers and services.  From the model perspective, we have experienced the “web portal” model, the “search engine” model and more recently, the “social network” model, with the on-going trend pointing to a portal moving in the “artificial intelligence” mode. From the carrier perspective, the carrier for the”web portal” and “search engine” models is basically a PC while the “social network” model carrier is mainly a smart phone-based end equipment. Does the “artificial intelligence” model have the potential to change the carrier? In other words, is it possible for the Echo-Alexa hardware-software combination, under the banner of artificial intelligence, to win the portal from the smart phone as the select point of human-machine interface?

I don’t think it is possible.  There are three reasons.

First, the scene is wrong. Even if Alexa is powerful with unique anti-noise ability and the skills of tracking specific people’s speech, since its location is fixed, it is a huge regression from today’s well-developed mobile scenes.  Just think about it, the biggest feature of a family scene is two or more individuals involved in it.  A family is a small society with an innate structure.  Who has the right to issue voice commands? Who has the authority to deny or revoke the voice commands that others have already issued? What happens if the authoritative person is not at home or keeps silent? What if a family member intends to send a private voice instruction? To my mind, voice instruction as a human-machine interaction vehicle by nature involves behaviors of an individual, rather than of a family, with privacy as a basic need in this setting.  Therefore, the family voice portal scene, where Alexa is now set, is likely to be a contradiction. The more voice commands that are parsed and understood, the less will be the proportion of the voice commands that take the home scenes as a necessary condition.

Second, the “horizontal” mode of portal faces the “vertical” resistance.  Even if we agree that the “smart home central control” is a portal of access to end users that cannot be missed by any players, smart speakers like Alexa are also facing challenges from other types of end equipment.  There are two types of data flow in the smart home environment.  The horizontal mode involves the data flow from different manufacturers of home equipment.  The vertical mode portal gathers data from the same manufacturer’s home equipment.  It can be seen that the “horizontal” effort is bound to face the “vertical” resistance in a life and death struggle.  For example, the smart refrigerator and other smart home equipment manufactured by Haier have no reasons to let go its valuable data and flow it away to the smart speaker manufacturers.

Third, the same struggle also comes from other competitions for the “horizontal” line of equipment, including house robots, home gateway / intelligent routers, smart TVs, intelligent pendants and so on.  The advantage of the house robots is that their locations need not be fixed in one place, the advantage of the home gateway is that  it always stays on, the TVs’ advantage lies in their big screens, and intelligent pendants (such as picture frames, sculptures, watches, scales, etc.) have their respective advantage in being small.  In my opinion, smart speakers face all these “horizontal” competitions and there does not seem to be much of a chance in winning this competition.

In summary, the Echo-Alexa’s success comes with a strong superposition characteristic. It is essentially a success of the Amazon business system, rather than the success of smart home appliances or the voice assistant technology. Ignoring the role of its supporting business system, we are likely to overestimate the value of the family information portal, and by simply mimicking or following the smart speaker technology, there is no way out.  Personally, I feel that the smart phone as the carrier of an entry point of information in the mobile Internet era still cannot be replaced.

Is the era of voice interaction really coming?

One important reason for the IT giants to look up to Alexa is that the voice interaction represented by Alexa perhaps opens a new paradigm of human-computer interaction.  Looking back in history, the rise of the click-mode and the rise of the touch-mode have both triggered a revolutionary paradigm shift for human-computer interaction, directly determining the rise and fall of the IT giants. The click-mode led to the rise of Wintel, the touch mode enabled Apple to subvert Wintel: we have witnessed all these changes with our own eyes.  So if the voice interaction really represents the next generation paradigm for human-computer interaction, then Alexa has a special meaning as the precursor of the human-computer interaction paradigm shift.  The giants simply cannot overlook such a shift and its potential revolutionary impact.

However, personally, I do not think that the speech interaction alone carries the weight for an “intergenerational revolution” for human-machine interaction.   There are three reasons to support this.

First, the speech itself does not constitute a complete human-computer interaction scene.  People’s information intake, more than 80% of times, involves the visual information.  When speaking, we often take some visual information as basic context, through the use of a pronoun to refer to it.  For example, pointing to a book on the screen, one may say, “I want to buy this.” In other words, a considerable part of the context in which the speech is delivered comes from the visual presentation, ranging from gestures, touches or eye movements that target some visual objects. This at least shows that we need multi-modal human-computer interaction, rather than using voice alone to replace other human-computer interaction vehicles.

Second, the current speech recognition still cannot handle the dialect well.  China is a big country with a variety of dialects.  Not only dialects, but also the people in dialect areas speack Mandarin with a strong accent. To benefit more than half of the total population in the dialect areas, the speech technology still needs to go through a stage of further development and maturity.

Third, the current speech recognition still has difficulty in solving the “escape” problem. The so-called escape problem involves the identification of scenarios when the speech refers to itself.  When people find there is an error in the first utterance and there is a need to correct it, they may choose to use the next sentence to correct the previous sentence, then this new sentence is not part of the naturally continuous speech commands, hence the need for “being escaped”.  But it is also possible that the latter sentence should not be escaped, and it is a sentence conjoined with the previous sentence, then it is part of the normal speech stream.  This “escape” identification to distinguish different levels of speech referents calls for more advanced semantic analysis technology, which is not yet mature.

So, considering the current level of speech technology, it seems too early to talk about the “intergenerational revolution”.  Furthermore, speech may well be just one factor, and not necessarily a disruptive one.  It seems more reasonable to state that the future of human-computer interaction may enter an era of multi-modal input, rather than speech alone.

The semantic grounding is the key to the stickiness of users.

Semantics as a term seems abused in all kinds of interpretations.  Some even think that once words are identified, semantics is there, which is far from true. The semantics of natural languages is very deep and involves a lot.  I mean a lot!

From the academic point of view, semantics is divided into two parts.  One called “symbol grounding”, which is about the relationship of the language symbol (signifier) and its referent to the real world entity (including the conceptual world).  The second is called “role assignment”, which is about the relationship between the referents of the language symbols in the reality.  Siri is the pioneer in the mobile semantic grounding realized in the domain apps such as Address, Map and Weather.  The past few years have seen the scope of semantic grounding grow wider and wider.

Let me review what I said before: “the excellent performance by the end equipment, coupled with the huge cloud resources in support of the end, constitute the Alexa’s expected success in users’ stickiness”.  We can further explore along this line in this section.  Between “the performance by the end equipment” and “the cloud resources in support of the end”, which is the root cause for Alexa’s stickiness with the customers?  I do not intend to play the trick of dialectical balance by saying something like both are important and no one can do the job without the other.  That is always true but cheap, and it gives no actionable insights. The consequence includes possible blind investments in both for the copycat, such investments may well lead to a complete failure in the market.

The author argues that “the performance by the end equipment” is about the adaptability of the hardware to the scene.  This is at best about a “good live experience” of users. But a product with “good user experience” without real content will soon degrade to a toy, and they cannot even count as high-end toys.  If there is no real “meaningful service” associated, there will be no sustainable stickiness of customers. Without user stickiness, they cannot become sustainable data collection entry points as a data flow portal.  However, any associated “meaningful services” must come from the semantic grounding, that is, the connection from a speech command with its corresponding actual service.  This is the essence behind Alexa’s so-called “know-hows.”  Semantic grounding as mentioned hereafter all refers to such connection from the speech command with infinitely possible actual service resources.

Comprehensive semantic grounding requires a strong open-domain NLP engine. Service resources are so diverse in tens of thousands, and they can hardly be confined to one or only a few narrow domains.  An NLP engine functioning only in a narrow domain cannot do this job well.  To work in the open domain requires an engine to be equipped with extraordinary capacity in the semantic analysis, and it must be on the right path in the semantic knowledge representation and processing.  In this regard, even if an English engine is doing decently well, it does not necessarily mean the Chinese counterpart will work well.  For those who do not yet understand the difficulty and pain points of the Chinese NLP engine in the open domain, it is hardly possible to expect them to achieve large-scale semantic grounding effects. Such technology barriers can set apart a huge gap in products attempting to do the same thing in the market between companies equipped with or without deep semantic capabilities.

Semantic grounding requires an engineering adaptation at the interface to the service resources.  This is also a very difficult task, and it involves competitions in the scale of resources as well as efficiency and management. Start-up companies can hardly have such a resource integration capacity and the engineering organization capabilities, these are the strength of large companies. Some people say that I can start small and gradually scale up, okay? I said, no, time does not wait for people.  In the area of semantic grounding, if products are not developed in a relatively short time to capture the market, there are little chances for survival.

Semantic grounding also calls for the ability to manage the man-machine interactive scene itself. This involves a variety of technologies such as contextual perception, topic switching, sentiment analysis, language style selection, personality shaping and many others. A speech assistant is not necessarily the best if it only mimics human’s eloquence or seemingly likable ways of expressions. Skills such as moderate profoundness or sharpness in arguments and even some rudeness at times can all be selling points as an intelligent assistant.

Therefore, we would point out the key role of semantic grounding on the stickiness of Alexa users, emphasizing the decisive contribution of large service resources behind Alexa’s success story.  In China, if Chinese IT giants with a comparable size of the Amazon service resources do not take the lead, coupled by a solid open domain Chinese NLP engine with a star team, the speech technology alone has no way to generate such a user stickiness as we see in Alexa.

Who will win then?

In essence, it is all about gathering the user data by the end equipments.  Smartphones dominate the industry for years, all kinds of smart home solutions across the verticals have also been fighting for several years now.  Alexa’s coming to the market stirs the industry with a lot of excitement and revelations, but it is far from what is all set.  We still have opportunities.  But keep in mind, it cannot be overemphasized to look into issues involving the combination of the end devices with the cloud and the combination between the entry point and the entry point carrier to form a closed-loop data stream.  If we lose the sense of directions and trends in these issues, the opportunity will not be ours.

So what is the direction and what are the trends? Let me give an analysis.

First, artificial intelligence is bound to be the next generation portal. In other words, all kinds of service needs will inevitably go from the end devices to the cloud through the artificial intelligence multi-channel input analysis, leveraging the human-computer interaction advantages.  The variety of service resources will eventually use the knowledge of artificial intelligence and cognitive decision-making ability, to provide to users from the cloud to the end. If you do not lay out a roadmap in developing artificial intelligence, the future portal is definitely not yours.

Second, the smartphone for a long time to come will stay as defacto chief carrier. Wherever is the person going, the communication node and the digital identity will follow and the perception of the life scene and the app as the service agent will also follow. There are no other end devices that match the smartphone on the most critical dimensions of the individualness, privacy, and the ubiquitous nature as needed by a portal carrier.

Third, there will be separation between the communication function of a terminal device and the demanded service function. As the service grows more and more diversified, it becomes impossible for one end device to handle all types of service needs.  But it is not desirable for each end device to come with its own communication function.  The relationship between Apple Watch and iPhone is intriguing in this regard: iPhone serves as the communication hub as well as the client information processing hub while Apple Watch functions as a special device for information collection and limited information display.  They are connected through a “near field communication” link.  Of course, both are Apple’s products in one family, the data flow is therefore under a unified control.  In such a setting, they are tightly coupled, and the separation is always limited. However, this mode sheds lights to the future when all kinds of separation may be required but they should also be connected in some way.  If the mobile phone manufacturers keep an open mind, they can use the block chain technology in data collection with a variety of ancillary equipment to make an objective record of the respective contributions and accordingly make reasonable arrangements with regards to the data and proceeds sharing. A loose coupling of the separation will then evolve and mature, promoting the rapid ecological development of end devices in all kinds of forms. It is imaginable that, when we are in a new place, we can take out from our pocket a soft thin foldable electronic map.  This map, when unfolded, looks as big as a real paper map, but it works conveniently just like a mobile map app: it responds to the touch operations and may even accommodate speech instructions to associate with our phone. Of course, this map can also simply be a virtual projection, not necessarily taking the form of a real object.  Our phone only needs to take care of communication, all the control and display are accomplished on the map, and we do not even need to physically take out the phone. Such a phone may never need to be held in hands, we may even wear the phone on the foot, and the hand mobile device gradually evolves into a “foot phone” … …

Are you ready for the opportunity and inspirations brought by the Alexa whirlwind?

Translated by: Dr. Wei Li based on GNMT

【Related】

S. Bai: Natural Language Caterpillar Breaks through Chomsky’s Castle

Dr Wei Li’s English blogs

立委译白硕:“入口载体”之争(中英对照)

【李白对话录系列】

 

神经机译:川普宣告,米国人民今天站起来了

川普宣告,人民当家作主,米国人民今天站起来了!

川普今天总统登基,发表就职演说,谷歌神经翻译如下,请听(作为一个老机译,给这篇机器翻译打分的话,我会给忠实度85分,顺畅度90分,可懂度95分,个人觉得已经超越人工现场翻译的平均水平。当然,演说一般属于翻译中容易的部分。演说写稿人为了效果,喜欢用短句、白话,喜欢重复):

TRUMP:首席大法官罗伯茨,卡特总统,克林顿总统,布什总统,奥巴马总统,美国人和世界人民,谢谢。

我们,美国公民,现在加入了伟大的国家努力,重建我们的国家,恢复其对我们所有人民的承诺。
在一起,我们将决定美国和世界的路线许多,未来几年。我们将面临挑战,我们将面临艰难,但我们将完成这项工作。

每四年,我们将采取这些步骤,进行有秩序和和平的权力转移,我们感谢奥巴马总统和第一夫人米歇尔奥巴马在这一过渡期间的恩典援助。他们是壮观的。谢谢。

然而,今天的仪式具有非常特殊的意义,因为今天我们不仅仅是将权力从一个政府转移到另一个政府,或从一个政党转移到另一个政府,而是我们从华盛顿转移权力,并将其交还给你,人民。

长期以来,我们国家首都的一个小团体获得了政府的奖励,而人民承担了成本。华盛顿蓬勃发展,但人民没有分享其财富。政治家兴旺,但工作离开,工厂关闭。企业保护自己,但不是我们国家的公民。他们的胜利不是你的胜利。他们的胜利不是你的胜利。虽然他们在我们国家的首都庆祝,但没有什么可以庆祝在我们的土地上奋斗的家庭。

所有的变化从这里开始,现在,因为这一刻是你的时刻,它属于你。

它属于今天聚集在这里的每个人,每个人都在整个美国。这是你的一天。这是你的庆祝。而这个,美利坚合众国,是你的国家。

真正重要的不是哪个党控制我们的政府,而是我们的政府是否由人民控制。

2017年1月20日将被记住为人民成为这个国家的统治者的那一天。

我们国家被遗忘的男人和女人将不再被忘记。

每个人都在听你的。你来自成千上万的人成为历史运动的一部分,世界从未见过的那些喜欢。

在这个运动的中心是一个关键的信念,一个国家存在为其公民服务。美国人想要他们的孩子的伟大的学校,他们的家庭的安全的邻里,并为自己好的工作。这些是对义人和公义的公正和合理的要求。

但对于我们太多的公民,存在一个不同的现实:母亲和儿童陷入我们内部城市的贫困;生锈的工厂散落像墓碑横跨我们国家的景观;教育制度与现金齐齐,但使我们年轻美丽的学生失去了所有的知识;和犯罪,帮派和毒品偷走了太多的生命,抢夺了我们国家这么多未实现的潜力。

这美国大屠杀停在这里,现在停止。

我们是一个国家,他们的痛苦是我们的痛苦。他们的梦想是我们的梦想。他们的成功将是我们的成功。我们分享一颗心,一个家,一个光荣的命运。我今天所做的宣誓就是对所有美国人的忠诚宣誓。

几十年来,我们以牺牲美国工业为代价丰富了外国产业;补贴了其他国家的军队,同时允许我们的军队非常悲伤的消耗。我们捍卫了其他国家的边界 ,拒绝为自己辩护。

在海外花费了数万亿美元,美国的基础设施已经失修和腐烂。我们已经使其他国家富有,而我们国家的财富,实力和信心已经消失了地平线。

一个接一个地,工厂关闭了,离开了我们的岸边,甚至没有想到数百万和数百万留在美国工人。我们的中产阶级的财富已经从他们的家里被剥夺,然后再分配到世界各地。

但这是过去。现在,我们只看到未来。

我们今天聚集在这里,正在发布一项新法令,在每个城市,每个外国首都和每一个权力大厅上听到。从今天起,我们的土地将有一个新的愿景。从这一天开始,它将只有美国第一,美国第一。

每一项关于贸易,税收,移民,外交事务的决定都将使美国工人和美国家庭受益。我们必须保护我们的边界免受其他国家的蹂躏,使我们的产品,偷窃我们的公司和破坏我们的工作。

保护将导致巨大的繁荣和力量。我会为我的身体每一口气,为你而战,我永远不会让你失望。

美国将再次赢得胜利,赢得前所未有的胜利。

我们将带回我们的工作。

我们将带回我们的边界。

我们将会

 

Google Translated from:

TRUMP: Chief Justice Roberts, President Carter, President Clinton, President Bush, President Obama, fellow Americans and people of the world, thank you.

We, the citizens of America, are now joined in a great national effort to rebuild our country and restore its promise for all of our people.
Together, we will determine the course of America and the world for many, many years to come. We will face challenges, we will confront hardships, but we will get the job done.

Every four years, we gather on these steps to carry out the orderly and peaceful transfer of power, and we are grateful to President Obama and First Lady Michelle Obama for their gracious aid throughout this transition. They have been magnificent. Thank you.

Today’s ceremony, however, has very special meaning because today, we are not merely transferring power from one administration to another or from one party to another, but we are transferring power from Washington, D.C. and giving it back to you, the people.

For too long, a small group in our nation’s capital has reaped the rewards of government while the people have borne the cost. Washington flourished, but the people did not share in its wealth. Politicians prospered, but the jobs left and the factories closed. The establishment protected itself, but not the citizens of our country. Their victories have not been your victories. Their triumphs have not been your triumphs. And while they celebrated in our nation’s capital, there was little to celebrate for struggling families all across our land.

That all changes starting right here and right now because this moment is your moment, it belongs to you.

It belongs to everyone gathered here today and everyone watching all across America. This is your day. This is your celebration. And this, the United States of America, is your country.

What truly matters is not which party controls our government, but whether our government is controlled by the people.

January 20th, 2017 will be remembered as the day the people became the rulers of this nation again.

The forgotten men and women of our country will be forgotten no longer.

Everyone is listening to you now. You came by the tens of millions to become part of a historic movement, the likes of which the world has never seen before.

At the center of this movement is a crucial conviction, that a nation exists to serve its citizens. Americans want great schools for their children, safe neighborhoods for their families, and good jobs for themselves. These are just and reasonable demands of righteous people and a righteous public.

But for too many of our citizens, a different reality exists: mothers and children trapped in poverty in our inner cities; rusted out factories scattered like tombstones across the landscape of our nation; an education system flush with cash, but which leaves our young and beautiful students deprived of all knowledge; and the crime and the gangs and the drugs that have stolen too many lives and robbed our country of so much unrealized potential.

This American carnage stops right here and stops right now.

We are one nation and their pain is our pain. Their dreams are our dreams. And their success will be our success. We share one heart, one home, and one glorious destiny. The oath of office I take today is an oath of allegiance to all Americans.

For many decades, we’ve enriched foreign industry at the expense of American industry; subsidized the armies of other countries, while allowing for the very sad depletion of our military. We’ve defended other nations’ borders while refusing to defend our own.

And spent trillions and trillions of dollars overseas while America’s infrastructure has fallen into disrepair and decay. We’ve made other countries rich, while the wealth, strength and confidence of our country has dissipated over the horizon.

One by one, the factories shuttered and left our shores, with not even a thought about the millions and millions of American workers that were left behind. The wealth of our middle class has been ripped from their homes and then redistributed all across the world.

But that is the past. And now, we are looking only to the future.

We assembled here today are issuing a new decree to be heard in every city, in every foreign capital, and in every hall of power. From this day forward, a new vision will govern our land. From this day forward, it’s going to be only America first, America first.

Every decision on trade, on taxes, on immigration, on foreign affairs will be made to benefit American workers and American families. We must protect our borders from the ravages of other countries making our products, stealing our companies and destroying our jobs.

Protection will lead to great prosperity and strength. I will fight for you with every breath in my body, and I will never ever let you down.

America will start winning again, winning like never before.

We will bring back our jobs.

We will bring back our borders.

We will ……

 

【相关】

Newest GNMT: time to witness the miracle of Google Translate

【谷歌NMT,见证奇迹的时刻】 

关于机器翻译

《朝华午拾》总目录

【置顶:立委NLP博文一览】

立委NLP频道

 

【杞人忧天:可怕的信息极乐世界】

今天想信息过载的问题,有点感触。

我们生在大数据信息过载的时代。以前一直觉得作为NLPer,自己的天职就是帮助解决这个过载的问题。就好像马云的宏愿是天下没有难做的生意,我们玩大数据的愿景应该就是,天下没有不能 access 的信息。于是谷歌出现了,用粗糙的关键词和数不厌大的气概,解决了信息长尾问题。于是我们开始批判谷歌,信息长尾解决的代价是数据质量太差。于是人智(AI)派来了,借力深度作业(deep processing, whether deep learning or deep parsing),企图既要解决大数据的长尾,也要大幅提升数据质量,让全世界对于信息感兴趣的心灵,都有一个源源不断的信息流。这是从我们从业者的角度。
今天换了个角度想这个问题,从信息受众的角度。作为消费者,作为白领,我们从人类的信息过载的战役不断优化的过程中得到了什么?我们得到的是,越来越高质量的、投我所好的信息流。以前是在过载的海洋、信息垃圾里淹死,如今是在精致的虚假的满足里噎死。感受不同了,但反正都是死。哪怕做鬼亦风流,死鬼却从不放过我们。于是我们花费在朋友圈、新闻apps、娱乐apps的时间越来越多。无数天才(很多是我的同行高人)绞尽脑汁研究我们的喜好,研究如何黏住我们,研究什么诡计让我们拼死吃河豚。
一个人敌不过一个世界,这是铁律。七情六欲血肉之躯的消费者个体敌不过无数盯着消费者喜好的商家及其帮凶(包括在下)。于是我们沉沦了,成为了信息的奴隶。我们同时也不甘心,在努力寻求自救,不要在糖罐里甜腻死,虽然这甜越来越幽香、巧妙,充满诱惑。我们就这么一路挣扎着。但随着信息技术的提升,中招的越来越多,能自救的越来越少。
世界有n十亿人,m千万个组织,在每时每刻产生信息。假如我们把自我信息满足的门槛,用各种 filters 无限拔高,拔高到千万分之一,我们面对的仍然是 n百人和m个组织的产出。当技术提升到我们可以 access 这个高纯度但仍然能淹死人的信息的时候,我们一定相见恨晚,乐不思蜀,有朝闻道夕死可矣的感觉。这是一个可怕的极乐世界。
我们作为消费者在打一个注定失败的自虐之仗,试图抵制抵制不了的诱惑。说一点个人的应对体会,结束这个杞人早忧天的议论。这个体会也从朋友中得到印证过。
体会就是,有时候我们可以学林彪副统帅,不读书不看报,突然就掐了信息源和apps,专心做自己的事儿。一个月甚至半年过去,回头看,自己其实没有损失什么,而且完成了拖得很久的工作(其中包括如何去用语言技术提高信息质量诱惑别人的工作,不好意思,这颇滑稽,但无奈它是在下借以安身立命的天职)。
同行刘老师有同感,他是做事儿的人。我问他要不要加入群,咱们大伙儿聊聊NLP啥的。刘老师说,我这人经不起诱惑,曾经加入了n多群,一看话题有趣,就忍不住要看、要回应、要投入。结果是做不完手头的事儿。后来一横心,退了所有的群,就差把手机扔了。刘老师的做法也是一种自救。
其实我们最后还是要回到信息流中,再坚强的灵魂也不可能苦行僧一样长时期拒绝高品质信息以及消遣式信息享受。一味拒绝也自有其后果。意志力强的是在这两种状态中切换。更多的人意志力不够,就一步步淹没。退休了被淹没,也可算是福气。年轻人被淹没,这就是罪过,而恰恰是后者才是最 vulnerable 的群体。“忽视信息视而不见”乃是白领劳动者的生存技巧,但对于涉世未深的年轻人很难很难。据观察,在信息轰炸中淹没(info-addiction),其问题的严重性已经不亚于吸毒和酗酒,感觉与游戏的泛滥有一拼,虽然我没有统计数据。
因此,我想,人智可以缓行,我们没必要那么急把全世界的人生和时间都吞没,可以积点德或少点孽。同时,希望有越来越多的人研究如何帮助人抵制信息诱惑,抵抗沉沦。理想的世界是,我们既有召之即来的高质量信息,又有挥之即去的抵制工具在(类似戒毒program)。虽然后者的商业利益少,但却是拯救世界和人类的善举。
最可怕的是在下一代,可以看到他们的挣扎和无助。games、social media 和 internet 吞噬了无数青春。而世界基本是束手无策,任其沉沦。家长呢,只有干着急。我们自己都不能抵制诱惑,怎么能指望年青一代呢。充满 curiosity 和躁动的心灵,注定受到信息过载的奴役最深。其社会成本和代价似乎还没有得到应有的深入研究。
今天就扯到这儿,希望不是信息垃圾。
【相关】

Trap of Information Overdose

【置顶:立委NLP博文一览】

《朝华午拾》总目录

 

Trap of Information Overdose

Today, my topic relates to the issue of information overload.

We are born in the era of big data and information overload. As an NLPer (Natural Language Processor), for years I have been stuck in the belief that my sole mission is to help solve this problem of information overload. Just like Alibaba’s Jack Ma’s vision that there should be no barriers for any business in this e-commerce world, my colleagues and I seem to share the vision in the community that there should be no barriers for instant access to any information amid the big data. So Google appeared, with crude keywords as basis and with its insatiable appetite to cover as big data as possible, to  have solved the problem of information long tail. Today whatever your query, and however rare your information need is, you google it and you get some relevant info back. We don’t want to stop there, so we begin to criticize Google because its solution to the information on the long tail has the defect of poor data quality. Hence AI (Artificial Intelligence) is proposed and being practiced to enhance the deep processing of data (whether via deep learning or deep parsing), in an attempt to both handle big data for its long tail, as well as to drastically raise the data quality through natural language understanding (NLU). The aim is to satisfy any soul with information needs, whether explicitly requesting it or implicitly carried in the mind, by a steady flow of quality information. This is the perspective from us practitioners’ point of view, currently mixed with lots of excitement and optimism.

Let us change our perspective to ask ourselves, as a consumer, what have we benefited from this exciting AI battle on information overload? Indeed, what we now get is more and more data — to the point, high-quality, with constant and instant feeds, which we have never before been able to reach. Previously we were drowned in the overload of the information ocean, mostly garbage occasionally with a few pearls, and nowadays we end up being choked to death by over-satisfaction of quality information thanks to the incredible progress of information high-tech via AI. So the feelings are dramatically different, but the ending remains the same, both are an inescapable path to death, drowned or choked. So each day we spend more and more time in the social media among our circles of friends, on all types of news apps, or entertainment apps, with less and less time for real-life work, family and serious thinking. Numerous geniuses out there (many are my talented peers) racked their brains to study our preferences, study how to make us stick to their apps, and what tricks they can apply to drive us crazy and addicted to their products.

It is the iron law that a person is no match for a calculated and dedicated world. Made of flesh and blood, each individual consumer is no match for an invisible legion of tech gurus (including myself) from businesses and their accomplices in the information industry, looking closely into our behavior and desires. So we are bound to sink to the bottom, and eventually become a slave of information. Some of us begin to see through this trap of information overdose, struggling hard to fight the addiction, and seeking self-salvation against the trend. Nevertheless, with the rapid progress of artificial intelligence and natural language technology, we see the trend clear, unstoppable and horrifying: more and more are trapped in the info, and those who can save themselves with a strong will are a definite minority.

The world has n billion people, and m million organizations, each producing information non-stop every moment, which is now recorded one way or the other (e.g. in social media). Even if we raise our bar higher and higher for our information needs for work and for pleasure, to the extent of an incredible ratio to the effect of something like ten-millionth, using a variety of technology filters of information, we are still faced with info feeds from n-hundred human entities and m-organizations. There is simply no way in our lifetime to exhaust it all and catch up with its feeds. We end up feeling over-satisfied with information most of which we feel we simply cannot and should not miss. We are living in a terrible bliss of an over-satisfying world. As consumers we are doomed in this battle to fight the addiction against our own nature, trying to resist the temptation that by nature cannot be resisted.

Having pointed out the problem, I have no effective remedy to this problem to offer. What I myself do is that at times, I simply shut down the channels to stay in info-diet or hungry mode, focusing on family and the accumulated to-do list of work. This seems to work and I often got my work done, without feeling I have missed that much for the information gap during the “diet” period, but it is not a sustainable technique (with exception perhaps of very few super guys I know whom I admire but really cannot tell whether that lifestyle is really for better or not as shutting the info channels for too long has its own side effects, or consequences, to my mind). In the end, most of us fall back to being willing slaves of information. The smarter minds among us have learned to shift between these two modes: shutting channels down for some time and going back to the “normal” modern way of information life.

For people who want and need to kill time, for example, the retired in the lonely senior homes, info age is God-sent: their quality of killing time has never been made better. But how about the younger generation who is most vulnerable to info overdose, as much as the addiction to the crazily popular games today. The “shutting the channels” technique is a survival skill of middle-aged generation who needs to dedicate sufficient time to go about their daily work and life, making a living, supporting the family and keeping it running. But this technique is almost impossible for the young generation to practice, given that they are born in this info age, and social media and stuff are part of their basic lifestyle. Nevertheless, there is no short of struggles and helplessness as we observe when they are being drowned in the sea of games, social media and Internet, in front of the academic pressure and career training competition. The external world is not in the least prepared and is basically helpless to them. So are us parents. Many times we cannot resist the temptation from being enslaved in the information trap for ourselves, how can we expect our next generation to learn the balancing skill easily, considering they are at the age of exploration with tremendous curiosity and confusion.

Sometimes I tell myself: why should we work so hard on info technology if we know it has both positive effects as well as huge negative impact which we have no clues how to fix. After all, we do not need to rush the entire world of life and time to be engulfed by info no matter how high quality we can make it to be. Meanwhile, I really hope to see more and more study to get invested in addressing how to help people resist the temptation of the information trap. The ideal world in my understanding should be that we stay equipped with both intelligent tools to help access quality information as nutrients to enrich our lives, as well as tools to help resist the temptation from info over-satisfaction.

Translated and recompiled from the original post in my Chinese blog: 【杞人忧天:可怕的信息极乐世界

 

[Related]

杞人忧天:可怕的信息极乐世界

Dr Li’s NLP Blog in English

 

【泥沙龙笔记:从三星购买Siri之父的二次创业技术谈起】

最近新闻:【三星收购 VIV 超级智能平台,与 Siri 和 Google 展开智能助理三国杀

我:
人要是精明,真是没治。一个 Siri,可以卖两次,而且都是天价,都是巨头,并且买家还是对头,也是奇了。最奇的是,Siri 迄今还是做玩具多于实用,满足好奇心多于满足市场的刚性需求。最最奇的是,Siri 里面的奥妙并不艰深,有类似水平和技术的也不是就他一家。
世界上有些事儿是让人惊叹的,譬如当 iPhone 问世的时候。但有些事儿动静很大,也在历史上留下了很深的足迹,但却没有叹服的感受。譬如 IBM 花生的问答系统,NND,都进入计算机历史展览馆了,作为AI里程碑。再如 Siri,第一个把人机对话送到千家万户的手掌心,功不可没。但这两样,都不让人惊叹,因为感觉上都是可以“看穿”的东西。不似火箭技术那种,让人有膜拜的冲动。IBM 那套我一直认为是工程的里程碑,是大数据计算和operations的成就,并非算法的突破。

查:
@wei 呵呵 估计搞火箭的也看不上SpaceX

我: 那倒也是,内行相轻,自古而然,因为彼此都多少知底。

陈:
最近对Watson很感冒

我:
花生是在大数据架构热起来之前做成的。从这方面看,IBM 的确开风气之先,有能力把一个感觉上平平的核心引擎,大规模部署到海量数据和平行计算上。总之,这两样都不如最近测试谷歌MT给我的震撼大。谷歌的“神经”翻译,神经得出乎意表,把我这个30年前就学MT的老江湖也弄晕糊了,云里雾里,不得不给他们吹一次喇叭

陈: 咋讲

我:
还讲啥,我是亲手测试的。两天里面测试翻译了我自己的两篇博文:

【Question answering of the past and present】

Introduction to NLP Architecture

洪:
伟爷被自己的影子吓坏了。

陈:
效果奇好?

我:
是的。前神经时代我也测试过,心里是有比较的。天壤之别。
如果你撞上了他们的枪口,数据与他们训练的接近,谷歌MT可以节省你至少 80% 的翻译人工。80% 的时候几乎可以不加编辑,就很顺畅了。谁在乎 20% 以内的错误或其他呢,反正我是省力一多半了。最重要的是,以前用 MT,根本就不堪卒读,无论你多好的脾气。现在一神经,就顺溜多了。当然,我的 NLP 博文,也正好撞上了他们的枪口。

陈:
以后也可以parsing。试一些医学的

我:
据说,他们擅长 news,IT,technology,好像 法律文体 据说也不错。其他领域、口语、文学作品等,那就太难为它了。

陈:
有双语语料

我:
就是,它是在千万个专业翻译的智慧结晶上。人的小小的脑袋怎么跟它比拼时间和效率呢,拼得了初一,也熬不过15。

陈:
谷歌的重大贡献是发掘人类已经存在的知识。包括搜索,锚文本是核心.

马:
我挺佩服IBM的华生的,如果是我,绝不敢在2007年觉得能做出这么一个东西出来

我:
可是算法上看真地不需要什么高超。那个智力竞赛是唬人的,挑战人的记忆极限。对于机器是特别有利的。绝大多数智力竞赛问答题,都是所谓 factoid questions
主要用到的是早已成熟的 Named Entity 技术,加上 question 的有限 parsing,背后的支撑也就是 IR。恰好智力竞赛的知识性问题又是典型的大数据里面具有相当 redundancy 的信息。这种种给IBM创造了成功的条件。

1999 年开始 open domain QA 正式诞生,不久上面的技术从核心引擎角度就已经被验证。剩下的就是工程的运作和针对这个竞赛的打磨了。

 

【相关】

【问答系统的前生今世】

【Question answering of the past and present】

谷歌NMT,见证奇迹的时刻

Newest GNMT: time to witness the miracle of Google Translate

《新智元笔记:知识图谱和问答系统:开题(1)》 

《新智元笔记:知识图谱和问答系统:how-question QA(2)》 

【置顶:立委NLP博文】

 

Who we are. Not an ad, but a snapshot.

NetBase

WHO WE ARE

n1

EMPOWERING GLOBAL BUSINESSES WITH SOCIAL INSIGHTS

We are uniquely positioned to help global businesses create real business value from the unprecedented level of growth opportunities presented each day by social media. We have the industry’s fastest and most accurate social analytics platform, strong partnerships with companies like Twitter, DataSift, and Tumblr, and award-winning patented language technology.

We empower brands and agencies to make the smartest business decisions grounded on the deepest and most reliable consumer insights from social. We’ve grown 300 percent year-over-year and excited to see revenue grow by 4,000% since the second quarter of 2012.

RECENT ACCOLADES

We were recently named a top rated social media management platform by software users on TrustRadius and a market leader by G2 Crowd.

n2

“NetBase is one of the strongest global social listening and analytics tools in the market. Their new interface makes customized dashboard creation a breeze.”

– Omri Duek, Coca-Cola

“Data reporting is both broad and detailed, with the ability to drill down from annual data to hourly data. NetBase allows us to have a pulse on the marketplace in just a few minutes.”

– Susie Thomas, VP, Palisades Media Group

“We started with a gen one solution, but then found that we needed to move to a tool with a better accuracy that could support digital strategy and insights research. NetBase satisfied all our needs.”

– Jared Degnan, Director of Digital Strategy

“As one of the first brands to test NetBase Audience 3D for our Mobile App launch, we’ve found that we could engage with our consumers on a deeper, more human level that further drives them to be brand champions.”

– Mihir Minawala, Manager of Social, Industry & Competitive Intelligence, Taco Bell

OUR CUSTOMERS

We work with executives from forward-looking agencies and leading brands across all verticals in over 99 countries. Our customers use NetBase for real-time consumer insights across the organization, from brand and digital marketing, public relations, product management to customer care.

KEY MILESTONES

  • March 2003
    Founded by Michael Osofsky at MIT. Later joined by Wei Li, Chief NetBase Scientist
  • July 2009
    P&G, Coca-Cola and Kraft signed as first customers of NetBase
  • January 2014
    Named Best-in-Class By Consumer Goods Technology
  • April 2014
    Launched Brand Live Pulse, the first real-time view of brands’ social movements
  • May 2014
    Celebrated 10 years with 500% customer growth in 3 years
  • January 2015
    AdAge Names 5 NetBase Customers to the Agency A-List
  • March 2015
    Introduced Audience 3D, the first ever 3D view of audiences
  • April 2015
    Raised $33 MM in Series E Round
  • November 2015
    Named Market Leader by G2 Crowd. Earned Top Ratings by Trust Radius

n3

What inspired you to join NetBase?

It was exciting to build the technology that could quickly surface meaningful customer insights at scale. For example, what used to take a day to run a simple analysis now takes just a second. Our platform now analyzes data in “Google time”, yet the depth and breadth of our analysis is exponentially deeper and larger than what you’ll ever get from a Google search.

What are you most proud of at NetBase?

I’m especially proud that we have the industry’s most accurate, deepest, fastest, and more granular text analysis technology. This enables us to gives our customers very actionable insights, unlike other platforms that offer broad sentiment analysis and general trending topics. Plus, NetBase reads 42 languages. Other platforms don’t even come close. We are customer-centric. Our platform truly helps customers quickly identify their priorities and next steps. This is what sets us apart.

What is the next frontier for NetBase?

With the exploding growth of social and mobile data and new social networks emerging, we’ll be working on connecting all these data points to help our customers get even more out of social data. As Chief Scientist, I’m more excited than ever to develop a “recipe” that can work with the world’s languages and further expand our language offerings.

WE’RE GLOBAL: 42 LANGUAGES, 99+ COUNTRIES, 8 OFFICES

NetBase Solutions, Inc  © 2016

Overview of Natural Language Processing

Dr. Wei Li’s English Blog on NLP

【创业笔记:安娜离职记】

安娜是个很可爱的俄罗斯上进女青年,从小弹钢琴跳芭蕾,小学没毕业即随父母移民美国。她身材高佻,曲线优美,性情温和,举止得体,善解人意,给人一种古典但不古板,现代却不俗艳,阳光而浪漫的印象。大家知道,虽然俄罗斯大嫂大多偏胖粗线条,但俄罗斯姑娘却多有迷人的风采,老帮菜耳熟能详念念不忘的就有钢铁怎样炼成里面的资产阶级小姐冬妮亚,芭蕾舞天后乌兰诺娃,风华绝代的花样滑冰艺术家 Ekaterina Gordeeva。安娜也是这样一位俄罗斯女郎,每天就在身边,给满屋大多是 boys 的办公室带来了温馨柔和的气息。自然地,大家都喜欢她。

然而,安娜辞职了,很快就要离开,大家都舍不得。我心里也不是滋味,想到午餐时不再有她的说说笑笑,餐后也不能邀她打乒乓球了,失落落的。我问她一定要离开么,你不是说很喜欢这个环境么?You know this office is already too crowded with boys, and we are trying to change this situation, trying to find some girls with affirmative action, and you are leaving?

她回说,我喜欢这个环境,是因为在这里我接触的都是你这样的世界上最聪明的人,因为你们太聪明了,结果我的发展道路堵死了,只好痛下决心离开了,我还是去 consulting company 做我擅长的分析工作去吧。两年来,我亲眼目睹我的20小时的人工怎样被你的20秒的全自动搜索所替代,而且结果往往比人工更好更全更有一致性。

她说的不假。确实是技术的转移抢走了她的饭碗,但公司不想辞她,决定让她转型做在线客户服务,可她思前想后,觉得年轻轻不能放弃自己的专长,只好决定离开了。

作为技术带头人,她的离开与我直接相关。这是一个活生生的机器取代人工的例子。

两年前我加入公司的时候,公司基本上是一个 professional service 类型的公司,虽然也开发了一个内部使用的系统,但系统的输出只是缩小了人工范围,必须有长时间的后编辑,手动增删修补,分析归纳,才能提供给客户。编辑人员我们称为信息分析员,要求语言能力强,阅读理解一目十行,并具有分析综合的技能。安娜就是信息分析员中的佼佼者。经她过手的分析报告,客户特别满意。

可是公司需要成本核算。核算的结果是,肉工可以,要适度,否则入不敷出,是亏本买卖。当时平均每个搜索分析的订单需要肉工22小时方能完工,这22小时叫做 pain time (既是分析员的pain, 更是公司的pain)。要想赚钱,理想的 pain time 支出需要控制在两个小时之内,在当时有点天方夜谭。老板找我谈的时候,就把它定为主要目标,但并没有设置时间限度,因为没有人知道其可行性以及达成这样的目标需要多少资源。我自己也不明白,只是感觉到了这个重担。我以前做过的工作,都是先研究,后做原型引擎,然后寻找应用领域,最后开发产品。而这家公司与多数技术创新公司截然相反,它是先有客户,后有粗糙的引擎,最后才引进人才和技术,把希望寄托在技术的快速转移身上。这条路子让我觉得新鲜和刺激,觉得可以试一下,我的技术转移技能能不能如鱼得水,发挥出来。先有客户和应用领域的好处是显而易见的,就像搞共产主义有了遵义会议的明灯一样,省却了在黑暗中的漫长摸索。道路是光明的,就看路怎样走才能赚钱了。

长话短说。我上马以后,三个月把系统的核心部分替换了,半年下来结果明显改善,到一周年的时候,肉工的痛苦时间已经缩短到两小时以下,老板喜不自禁。

人心不足蛇吞象,老板告诉我,Wei,你知道,你的技术给我们的业务带来了革命性变化。我们的立足已经不成问题,只要我们愿意,维持一个机器加人工的服务,发展成年入几千万的企业指日可待。但是,只要有人工,就不能 scale up, 赚钱就有限,盘子就做不大。我知道你是有雄心的人(我心里说,子非鱼),肯定不满足小打小闹。不管多大风险,我们还是决定放弃这条道路,而走全自动的路子,让系统可以服务所有的分析客户,而不是只供我们内部人工(安娜这样的)或者需要专门训练的 power users 使用。我们的目标是让世界上每个分析员都离不开我们,就如大家离不开Google一样。为此,我们必须做到 pain time  为零,这是着险棋,但是前景不可限量。

好家伙,这个口气,就梦想称霸全世界了。美国是个很有意思的地方,这方水土盛产百折不挠,心比天高的企业梦想家。但美国并非梦想家的乐园,95%的梦想家牺牲了,不到5%得以生存,其中不过1%最终做大,真正是一将功成万骨枯。虽然如此,美国造企业梦想家仍然前赴后继,生生不息。我其实很喜欢这些梦想家,他们的坚韧豪情很感染人。

一年又过去了。我们实现了在一个主要分析领域完全铲除痛苦时间的目标(pain time 0),把搜索分析从两年前的22小时人工,发展成为如今的20秒钟全自动立等可取,无需任何人工编辑。

得之桑榆,失之东隅, 两年的奋战取得了超出所有人预料的成就,但同时也失去了一位可爱的俄罗斯女郎。

【二次创业笔记】 记于2008年四月

【后记】关于安娜,还有一个小插曲。大家知道,创业公司的人都爱做梦数小鸡,股票期权则是催梦剂。

有一天,公司哥们跟往常一样数小鸡玩儿,安娜跟我说:Wei, come here, I got something to show you. 我走近一看,是一辆轿车。她跟我一字一板地说:

I like this car. I just love it. It is my dream car. I want to buy it.
Guys, work hard so I can own this car.

及至仔细一看价码,吓了一个筋斗,百万以上,她可真敢想啊,乖乖隆的东,here it is:

http://abcnews.go.com/GMA/Moms/story?id=1406161

相关篇什:

【离皇冠上的明珠只有一步之遥的感觉】

1471802218_457583

parsing 是最好的游戏,而且实用。

据说好玩的游戏都没用,有实用价值的东西做不成游戏。但是,对于AI人员,parsing 却是这么一个最好玩但也最有用的游戏。纵情于此,乐得其所,死得其所也。

禹:
李老师parser有没有觉得太烧脑呢?
做parser少了个做字。感觉上先是一个比较优雅的规则集,然后发现规则之外又那么多例外,然后开始调规则,解决冲突,然后’整理规则的事情还得亲力亲为,做好几年感觉会不会很烦?

我:
不烦 特别好玩。能玩AI公认的世界级人类难题且登顶在望,何烦之有?
烦的是好做的语言 做着做着 没啥可做了 那才叫烦。英语就有点做烦了。做中文不烦 还有不少土地没有归顺 夺取一个城池或山头 就如将军打仗赢了一个战役似的 特别有满足感。

梁:
收复领地?

我:

【打过长江去,解放全中国!】

parsing 是最好的游戏。先撒一个default的网,尽量楼。其实不能算“优雅的规则集”,土八路的战略,谈不上优雅。倒有点像原始积累期的跑马,搂到越多越好。然后才开始 lexicalist 的精度攻坚,这才是愚公移山。在 default 与 lexicalist 的策略之间,建立动态通信管道,一盘棋就下活了。
譬如说吧,汉语离合词,就是一大战役。量词搭配,是中小战役。ABAB、AABB等重叠式是阵地战。定语从句界限不好缠,算是大战役。远距离填坑,反而不算大战役。因为远距离填坑在句法基本到位之后,已经不再是远距离了,而且填的逻辑SVO的坑,大多要语义相谐,变得很琐碎,但其实难度不大。(这就是白老师说的,要让大数据训练自动代替人工的语义中间件的琐碎工作。而且这个大数据是不需要标注的。白老师的RNN宏图不知道啥时开工,或已经开工?)

parsing 是最好的游戏。一方面它其实不是愚公面对的似乎永无尽头的大山,虽然这个 monster 看上去还是挺吓人的。但大面上看,结构是可以见底的,细节可以永远纠缠下去。另一方面,它又是公认的世界级人类难题。不少人说,自然语言理解(NLU)是人工智能(AI)的终极难题,而 deep parsing 是公认的通向NLU的必由之路,其重要性可比陈景润为攀登哥德巴赫猜想之巅所做出的1+1=2.  我们这代人不会忘记30多年前迎来“科学的春天”时除迟先生的如花妙笔:“自然科学的皇后是数学。数学的皇冠是数论。哥德巴赫猜想,则是皇冠上的明珠。…… 现在,离开皇冠上的明珠,只有一步之遥了。”(作为毛时代最后的知青,笔者是坐着拖拉机在颠簸的山路回县城的路上读到徐迟的长篇报告文学作品【哥德巴赫猜想】的,一口气读完,头晕眼花却兴奋不已。)

不世出的林彪都会悲观主义,问红旗到底要打到多久。但做 deep parsing,现在就可以明确地说,红旗登顶在望,短则一年,长则三五年而已。登顶可以定义为 open domain 正规文体达到 95% 左右的精度广度(f-score, near-human performance)。换句话说,就是结构分析的水平已经超过一般人,仅稍逊色于语言学家。譬如,英语我们五六年前就登顶了

最有意义的还是因为 parsing 的确有用,说它是自然语言应用核武器毫不为过。有它没它,做起事来就大不一样。shallow parsing 可以以一当十,到了 deep parsing,就是以一当百+了。换句话说,这是一个已经成熟(90+精度可以认为是成熟了)、潜力几乎无限的技术。

刘:
@wei 对parsing的执着令人钦佩

我:
多谢鼓励。parsing 最终落地,不在技术的三五个百分点的差距,而在有没有一个好的产品经理,既懂市场和客户,也欣赏和理解技术的潜力。

刘:
任何技术都是这样的

我:
量变引起质变。90以后,四五个百分点的差别,也许对产品和客户没有太大的影响。但是10多个百分点就大不一样了。譬如,社会媒体 open domain 舆情分析的精度,我们利用 deep parsing support 比对手利用机器学习去做,要高出近20个百分点。结果就天差地别。虽然做出来的报表可以一样花哨,但是真要试图利用舆情做具体分析并支持决策,这样的差距是糊弄不过去的。大数据的统计性过滤可以容忍一定的错误,但不能容忍才六七十精度的系统。

当然也有客户本来就是做报表赶时髦,而不是利用 insights 帮助调整 marketing 的策略或作为决策的依据,对这类客户,精度和质量不如产品好用、fancy、便宜更能打动他们。而且这类客户目前还不在少数。这时候单单有过硬的技术,也还是使不上劲儿。这实际上也是市场还不够成熟的一个表现。拥抱大数据成为潮流后,市场的消化、识别和运用能力还没跟上来。从这个角度看市场,北美的市场成熟度比较东土,明显成熟多了。

 

【相关】

泥沙龙笔记:parsing 是引擎的核武器,再论NLP与搜索

泥沙龙笔记:从 sparse data 再论parsing乃是NLP应用的核武器

It is untrue that Google SyntaxNet is the “world’s most accurate parser”

【立委科普:NLP核武器的奥秘】

徐迟:【哥德巴赫猜想】

《朝华点滴:插队的日子(一)》

关于 parsing

【关于中文NLP】

【置顶:立委NLP博文一览】

《朝华午拾》总目录

On Hand-crafted Myth of Knowledge Bottleneck

In my article “Pride and Prejudice of Main Stream“, the first myth listed as top 10 misconceptions in NLP is as follows:

[Hand-crafted Myth]  Rule-based system faces a knowledge bottleneck of hand-crafted development while a machine learning system involves automatic training (implying no knowledge bottleneck).

While there are numerous misconceptions on the old school of rule systems , this hand-crafted myth can be regarded as the source of all.  Just take a review of NLP papers, no matter what are the language phenomena being discussed, it’s almost cliche to cite a couple of old school work to demonstrate superiority of machine learning algorithms, and the reason for the attack only needs one sentence, to the effect that the hand-crafted rules lead to a system “difficult to develop” (or “difficult to scale up”, “with low efficiency”, “lacking robustness”, etc.), or simply rejecting it like this, “literature [1], [2] and [3] have tried to handle the problem in different aspects, but these systems are all hand-crafted”.  Once labeled with hand-crafting, one does not even need to discuss the effect and quality.  Hand-craft becomes the rule system’s “original sin”, the linguists crafting rules, therefore, become the community’s second-class citizens bearing the sin.

So what is wrong with hand-crafting or coding linguistic rules for computer processing of languages?  NLP development is software engineering.  From software engineering perspective, hand-crafting is programming while machine learning belongs to automatic programming.  Unless we assume that natural language is a special object whose processing can all be handled by systems automatically programmed or learned by machine learning algorithms, it does not make sense to reject or belittle the practice of coding linguistic rules for developing an NLP system.

For consumer products and arts, hand-craft is definitely a positive word: it represents quality or uniqueness and high value, a legit reason for good price. Why does it become a derogatory term in NLP?  The root cause is that in the field of NLP,  almost like some collective hypnosis hit in the community, people are intentionally or unintentionally lead to believe that machine learning is the only correct choice.  In other words, by criticizing, rejecting or disregarding hand-crafted rule systems, the underlying assumption is that machine learning is a panacea, universal and effective, always a preferred approach over the other school.

The fact of life is, in the face of the complexity of natural language, machine learning from data so far only surfaces the tip of an iceberg of the language monster (called low-hanging fruit by Church in K. Church: A Pendulum Swung Too Far), far from reaching the goal of a complete solution to language understanding and applications.  There is no basis to support that machine learning alone can solve all language problems, nor is there any evidence that machine learning necessarily leads to better quality than coding rules by domain specialists (e.g. computational grammarians).  Depending on the nature and depth of the NLP tasks, hand-crafted systems actually have more chances of performing better than machine learning, at least for non-trivial and deep level NLP tasks such as parsing, sentiment analysis and information extraction (we have tried and compared both approaches).  In fact, the only major reason why they are still there, having survived all the rejections from mainstream and still playing a role in industrial practical applications, is the superior data quality, for otherwise they cannot have been justified for industrial investments at all.

the “forgotten” school:  why is it still there? what does it have to offer? The key is the excellent data quality as advantage of a hand-crafted system, not only for precision, but high recall is achievable as well.
quote from On Recall of Grammar Engineering Systems

In the real world, NLP is applied research which eventually must land on the engineering of language applications where the results and quality are evaluated.  As an industry, software engineering has attracted many ingenious coding masters, each and every one of them gets recognized for their coding skills, including algorithm design and implementation expertise, which are hand-crafting by nature.   Have we ever heard of a star engineer gets criticized for his (manual) programming?  With NLP application also as part of software engineering, why should computational linguists coding linguistic rules receive so much criticism while engineers coding other applications get recognized for their hard work?  Is it because the NLP application is simpler than other applications?  On the contrary, many applications of natural language are more complex and difficult than other types of applications (e.g. graphics software, or word processing apps).  The likely reason to explain the different treatment between a general purpose programmer and a linguist knowledge engineer is that the big environment of software engineering does not involve as much prejudice while the small environment of NLP domain  is deeply biased, with belief that the automatic programming of an NLP system by machine learning can replace and outperform manual coding for all language projects.   For software engineering in general, (manual) programming is the norm and no one believes that programmers’ jobs can be replaced by automatic programming in any time foreseeable.  Automatic programming, a concept not rare in science fiction for visions like machines making machines, is currently only a research area, for very restricted low-level functions.  Rather than placing hope on automatic programming, software engineering as an industry has seen a significant progress on work of the development infrastructures, such as development environment and a rich library of functions to support efficient coding and debugging.  Maybe in the future one day, applications can use more and more of automated code to achieve simple modules, but the full automation of constructing any complex software project  is nowhere in sight.  By any standards, natural language parsing and understanding (beyond shallow level tasks such as classification, clustering or tagging)  is a type of complex tasks. Therefore, it is hard to expect machine learning as a manifestation of automatic programming to miraculously replace the manual code for all language applications.  The application value of hand-crafting a rule system will continue to exist and evolve for a long time, disregarded or not.

“Automatic” is a fancy word.  What a beautiful world it would be if all artificial intelligence and natural languages tasks could be accomplished by automatic machine learning from data.  There is, naturally, a high expectation and regard for machine learning breakthrough to help realize this dream of mankind.  All this should encourage machine learning experts to continue to innovate to demonstrate its potential, and should not be a reason for the pride and prejudice against a competitive school or other approaches.

Before we embark on further discussions on the so-called rule system’s knowledge bottleneck defect, it is worth mentioning that the word “automatic” refers to the system development, not to be confused with running the system.  At the application level, whether it is a machine-learned system or a manual system coded by domain programmers (linguists), the system is always run fully automatically, with no human interference.  Although this is an obvious fact for both types of systems, I have seen people get confused so to equate hand-crafted NLP system with manual or semi-automatic applications.

Is hand-crafting rules a knowledge bottleneck for its development?  Yes, there is no denying or a need to deny that.  The bottleneck is reflected in the system development cycle.  But keep in mind that this “bottleneck” is common to all large software engineering projects, it is a resources cost, not only introduced by NLP.  From this perspective, the knowledge bottleneck argument against hand-crafted system cannot really stand, unless it can be proved that machine learning can do all NLP equally well, free of knowledge bottleneck:  it might be not far from truth for some special low-level tasks, e.g. document classification and word clustering, but is definitely misleading or incorrect for NLP in general, a point to be discussed below in details shortly.

Here are the ballpark estimates based on our decades of NLP practice and experiences.  For shallow level NLP tasks (such as Named Entity tagging, Chinese segmentation), a rule approach needs at least three months of one linguist coding and debugging the rules, supported by at least half an engineer for tools support and platform maintenance, in order to come up with a decent system for initial release and running.  As for deep NLP tasks (such as deep parsing, deep sentiments beyond thumbs-up and thumbs-down classification), one should not expect a working engine to be built up without due resources that at least involve one computational linguist coding rules for one year, coupled with half an engineer for platform and tools support and half an engineer for independent QA (quality assurance) support.  Of course, the labor resources requirements vary according to the quality of the developers (especially the linguistic expertise of the knowledge engineers) and how well the infrastructures and development environment support linguistic development.  Also, the above estimates have not included the general costs, as applied to all software applications, e.g. the GUI development at app level and operations in running the developed engines.

Let us present the scene of the modern day rule-based system development.  A hand-crafted NLP rule system is based on compiled computational grammars which are nowadays often architected as an integrated pipeline of different modules from shallow processing up to deep processing.  A grammar is a set of linguistic rules encoded in some formalism, which is the core of a module intended to achieve a defined function in language processing, e.g. a module for shallow parsing may target noun phrase (NP) as its object for identification and chunking.  What happens in grammar engineering is not much different from other software engineering projects.  As knowledge engineer, a computational linguist codes a rule in an NLP-specific language, based on a development corpus.  The development is data-driven, each line of rule code goes through rigid unit tests and then regression tests before it is submitted as part of the updated system for independent QA to test and feedback.  The development is an iterative process and cycle where incremental enhancements on bug reports from QA and/or from the field (customers) serve as a necessary input and step towards better data quality over time.

Depending on the design of the architect, there are all types of information available for the linguist developer to use in crafting a rule’s conditions, e.g. a rule can check any elements of a pattern by enforcing conditions on (i) word or stem itself (i.e. string literal, in cases of capturing, say, idiomatic expressions), and/or (ii) POS (part-of-speech, such as noun, adjective, verb, preposition), (iii) and/or orthography features (e.g. initial upper case, mixed case, token with digits and dots), and/or (iv) morphology features (e.g. tense, aspect, person, number, case, etc. decoded by a previous morphology module), (v) and/or syntactic features (e.g. verb subcategory features such as intransitive, transitive, ditransitive), (vi) and/or lexical semantic features (e.g. human, animal, furniture, food, school, time, location, color, emotion).  There are almost infinite combinations of such conditions that can be enforced in rules’ patterns.  A linguist’s job is to code such conditions to maximize the benefits of capturing the target language phenomena, a balancing art in engineering through a process of trial and error.

Macroscopically speaking, the rule hand-crafting process is in its essence the same as programmers coding an application, only that linguists usually use a different, very high-level NLP-specific language, in a chosen or designed formalism appropriate for modeling natural language and framework on a platform that is geared towards facilitating NLP work.  Hard-coding NLP in a general purpose language like Java is not impossible for prototyping or a toy system.  But as natural language is known to be a complex monster, its processing calls for a special formalism (some form or extension of Chomsky’s formal language types) and an NLP-oriented language to help implement any non-toy systems that scale.  So linguists are trained on the scene of development to be knowledge programmers in hand-crafting linguistic rules.  In terms of different levels of languages used for coding, to an extent, it is similar to the contrast between programmers in old days and the modern software engineers today who use so-called high-level languages like Java or C to code.  Decades ago, programmers had to use assembly or machine language to code a function.  The process and workflow for hand-crafting linguistic rules are just like any software engineers in their daily coding practice, except that the language designed for linguists is so high-level that linguistic developers can concentrate on linguistic challenges without having to worry about low-level technical details of memory allocation, garbage collection or pure code optimization for efficiency, which are taken care of by the NLP platform itself.  Everything else follows software development norms to ensure the development stay on track, including unit testing, baselines construction and monitoring, regressions testing, independent QA, code reviews for rules’ quality, etc.  Each level language has its own star engineer who masters the coding skills.  It sounds ridiculous to respect software engineers while belittling linguistic engineers only because the latter are hand-crafting linguistic code as knowledge resources.

The chief architect in this context plays the key role in building a real life robust NLP system that scales.  To deep-parse or process natural language, he/she needs to define and design the formalism and language with the necessary extensions, the related data structures, system architecture with the interaction of different levels of linguistic modules in mind (e.g. morpho-syntactic interface), workflow that integrate all components for internal coordination (including patching and handling interdependency and error propagation) and the external coordination with other modules or sub-systems including machine learning or off-shelf tools when needed or felt beneficial.  He also needs to ensure efficient development environment and to train new linguists into effective linguistic “coders” with engineering sense following software development norms (knowledge engineers are not trained by schools today).  Unlike the mainstream machine learning systems which are by nature robust and scalable,  hand-crafted systems’ robustness and scalability depend largely on the design and deep skills of the architect.  The architect defines the NLP platform with specs for its core engine compiler and runner, plus the debugger in a friendly development environment.  He must also work with product managers to turn their requirements into operational specs for linguistic development, in a process we call semantic grounding to applications from linguistic processing.   The success of a large NLP system based on hand-crafted rules is never a simple accumulation of linguistics resources such as computational lexicons and grammars using a fixed formalism (e.g. CFG) and algorithm (e.g. chart-parsing).  It calls for seasoned language engineering masters as architects for the system design.

Given the scene of practice for NLP development as describe above, it should be clear that the negative sentiment association with “hand-crafting” is unjustifiable and inappropriate.  The only remaining argument against coding rules by hands comes down to the hard work and costs associated with hand-crafted approach, so-called knowledge bottleneck in the rule-based systems.  If things can be learned by a machine without cost, why bother using costly linguistic labor?  Sounds like a reasonable argument until we examine this closely.  First, for this argument to stand, we need proof that machine learning indeed does not incur costs and has no or very little knowledge bottleneck.  Second, for this argument to withstand scrutiny, we should be convinced that machine learning can reach the same or better quality than hand-crafted rule approach.  Unfortunately, neither of these necessarily hold true.  Let us study them one by one.

As is known to all, any non-trivial NLP task is by nature based on linguistic knowledge, irrespective of what form the knowledge is learned or encoded.   Knowledge needs to be formalized in some form to support NLP, and machine learning is by no means immune to this knowledge resources requirement.  In rule-based systems, the knowledge is directly hand-coded by linguists and in case of (supervised) machine learning, knowledge resources take the form of labeled data for the learning algorithm to learn from (indeed, there is so-called unsupervised learning which needs no labeled data and is supposed to learn from raw data, but that is research-oriented and hardly practical for any non-trivial NLP, so we leave it aside for now).  Although the learning process is automatic,  the feature design, the learning algorithm implementation, debugging and fine-tuning are all manual, in addition to the requirement of manual labeling a large training corpus in advance (unless there is an existing labeled corpus available, which is rare; but machine translation is a nice exception as it has the benefit of using existing human translation as labeled aligned corpora for training).  The labeling of data is a very tedious manual job.   Note that the sparse data challenge represents the need of machine learning for a very large labeled corpus.  So it is clear that knowledge bottleneck takes different forms, but it is equally applicable to both approaches.  No machine can learn knowledge without costs, and it is incorrect to regard knowledge bottleneck as only a defect for the rule-based system.

One may argue that rules require expert skilled labor, while the labeling of data only requires high school kids or college students with minimal training.  So to do a fair comparison of the costs associated, we perhaps need to turn to Karl Marx whose “Das Kapital” has some formula to help convert simple labor to complex labor for exchange of equal value: for a given task with the same level of performance quality (assuming machine learning can reach the quality of professional expertise, which is not necessarily true), how much cheap labor needs to be used to label the required amount of training corpus to make it economically an advantage?  Something like that.  This varies from task to task and even from location to location (e.g. different minimal wage laws), of course.   But the key point here is that knowledge bottleneck challenges both approaches and it is not the case believed by many that machine learning learns a system automatically with no or little cost attached.  In fact, things are far more complicated than a simple yes or no in comparing the costs as costs need also to be calculated in a larger context of how many tasks need to be handled and how much underlying knowledge can be shared as reusable resources.  We will leave it to a separate writing for the elaboration of the point that when put into the context of developing multiple NLP applications, the rule-based approach which shares the core engine of parsing demonstrates a significant saving on knowledge costs than machine learning.

Let us step back and, for argument’s sake, accept that coding rules is indeed more costly than machine learning, so what? Like in any other commodities, hand-crafted products may indeed cost more, they also have better quality and value than products out of mass production.  For otherwise a commodity society will leave no room for craftsmen and their products to survive.  This is common sense, which also applies to NLP.  If not for better quality, no investors will fund any teams that can be replaced by machine learning.  What is surprising is that there are so many people, NLP experts included, who believe that machine learning necessarily performs better than hand-crafted systems not only in costs saved but also in quality achieved.  While there are low-level NLP tasks such as speech processing and document classification which are not experts’ forte as we human have much more restricted memory than computers do, deep NLP involves much more linguistic expertise and design than a simple concept of learning from corpora to expect superior data quality.

In summary, the hand-crafted rule defect is largely a misconception circling around wildly in NLP and reinforced by the mainstream, due to incomplete induction or ignorance of the scene of modern day rule development.  It is based on the incorrect assumption that machine learning necessarily handles all NLP tasks with same or better quality but less or no knowledge bottleneck, in comparison with systems based on hand-crafted rules.

 

 

Note: This is the author’s own translation, with adaptation, of part of our paper which originally appeared in Chinese in Communications of Chinese Computer Federation (CCCF), Issue 8, 2013

 

[Related]

Domain portability myth in natural language processing

Pride and Prejudice of NLP Main Stream

K. Church: A Pendulum Swung Too Far, Linguistics issues in Language Technology, 2011; 6(5)

Wintner 2009. What Science Underlies Natural Language Engineering? Computational Linguistics, Volume 35, Number 4

Pros and Cons of Two Approaches: Machine Learning vs Grammar Engineering

Overview of Natural Language Processing

Dr. Wei Li’s English Blog on NLP

【语义计算群:带歧义或模糊前行,有如带病生存】

众所周知,作为符号系统,自然语言与电脑语言的最大差异和挑战在于其歧义性,有两类,结构歧义(structural ambiguity)和一词多义(相应的消歧任务叫WSD,word sense disambiguation)。如果没有这些随处可见的歧义,自然语言的自动分析就会与电脑语言的编译一样做到精准无误。因此,一般认为,自然语言parsing和NLU(自然语言理解)的核心任务就是消歧。至少理论上如此。

有意思的是,尽管自然语言一词多义极为普遍,结构歧义也颇常见,人类用语言交流却相当流畅,很多时候人根本就没有感觉到歧义的存在。只是到了我们做 parser 在计算机上实现的时候,这个问题才凸显。与宋老师的下列对话显示,计算语言学家模拟结构分析常遭遇歧义。

宋:
“张三对李四的批评咬牙切齿”,这是两可。
“張三对李四的批评不置一词”,这里有第三种可能。
“張三对李四的批评保持中立”,另一种两可。
“張三对李四的批评态度温和”,这是三可了。

我:
宋老师 我已经晕了。您是计算语言学家的敏感或敏锐,绝大多数 native speakers 是感觉不到这些句子之间的结构歧义及其不同之处的。

t0708o

目前的 parsing 结果,“保持中立” 的主语(S)是“批评”,这个解读不是不可能(批评意见的保持中立,可以间接指代给出这个批评的“张三”),但很勉强;多数人的解读应该是:“张三” 保持中立,“张三”不是“批评”的主语,“李四”是,不仅如此,“批评”隐含宾语回指到“张三”。第二句的parse倒显得更合理一些,关于这个“批评”(Topic),(其)“态度”是”温和”的,指代的是“张三”,而“批评”“李四”的正是“张三”。

宋:
“张三对李四的批评”+谓语,就批评者和被批评者来讲,有3种填坑的可能:
(1)批评者是张三,被批评者是李四。(2)批评者是李四,被批评者是张三。(3)批评者是李四,被批评者是第三者。
“置若罔闻”与“不置一词”不一样。对于这个V的主体A来说,一定是有一个评论,“置若罔闻”是说该评论是针对A的,而且是负面的;“不置一词”则没有这两条限制。

我:
两个逻辑谓词(句末的谓语和前面的“批评”)抢同一个PP(对),计算上总会遇到 scope 纠缠。再加一个 “对(or 对于)” 歧义就没了。“张三【‘对于’【‘对’李四的批评】 保持中立】。” 可是两个 “对” 听起来别扭,很少人这么用。

结构歧义其实没有我们想象的可怕。如果目标是语义落地 需要调整的不是追求落地前消灭一切歧义,而是反过来思维,如何让语义落地能够容忍歧义的保留,或者歧义的休眠,或者任意的某个 valid 的路径。其实人的理解和响应 也不是在 ambiguity-free 的前提下进行。现代医学有一个概念,叫带病生存。语言理解也应该有一个概念,带歧义落地。适度的歧义作为常态来容忍。

这是结构歧义,WSD 更是如此。绝大多数语义落地 可以容忍或绕过 WSD 的不作为(【NLP 迷思之四:词义消歧(WSD)是NLP应用的瓶颈】)。MT 可能是对 WSD 最敏感的一个语义落地的应用了。即便如此,也并非先做好 WSD 然后才能做好 MT 落地(MT中叫 “lexical transfer”)。有亲戚关系的语言对之间 有很大的 keep ambiguity untouched 的空间 自不必说。即便在不相关的语系之间,譬如英汉的MT中,实践证明,全方位的 WSD 也是不必要的。细线条的 WSD 则更不必要。细线条指的是 词典里面的那些义项, 或 WordNet 中 synsets,其中的很多本义和引申义的细微差别 没有必要区分。

还有那些那些 hidden 的逻辑语义,是不是要挖掘出来呢?迄今为止,我们在句法后的语义中间件中做了部分这样的工作,但一直没有全力以赴去做全,虽然因为句法结构树已经提供了很好的条件了,这个工作并不是高难度的。

今天思考的结果是,其实很多 hidden links 没有必要整出来。如果一个 hidden link 本身就很模糊或歧义,那就更应该置之不理。自然语言带有相当程度的模糊性,语言本身也不是为了把每个细节都弄清白。人的交流不需要。如果一个细节足够重要,但这个细节在表达上是 hidden 的,省略的,或模糊的,那么人的交流就会在接下去的句子中把它 explicitly 用清晰无误的句法结构表达出来。

从语义落地的实践中也发现,大多数的 hidden links 也是不必要的。背后的道理是:信息流动的常态是不完整,不完整在信息交流中起到了减轻记忆负担、强化信息核心的重要作用。

理论上,每一个提到的谓词都有自己的 arg structure,里面都有潜在的坑,需要信息的萝卜来填。但语言的句法会区分谓词的不同地位,来决定是否把萝卜显性地表达出来,或隐去萝卜。常见的情形是,隐去、省略的萝卜或者不重要,或者不确定,都是信息交流双方不太 care 的细节。譬如一个动词 nominalize 后,就往往隐去 args (英语的动名词,汉语利用“的”的NP句式)。这种自然的隐去已经说明了细节不是关注点,我们何苦要硬去究它呢?

当然,上面说的是原则。凡原则一定有例外,某个隐去的细节如果不整明白,语义就很难落地到某个产品。能想到的“例外”就是,很多 hidden links 虽然其语义本身在语用上不是重要的信息,但是至少在 MT 的产品中,这个 hidden link 可以提供结构条件,帮助确定更合适的译词: e.g. this mistake is easy to make:make 与 mistake 的 hidden VO link 不整出来,就很难确定 make 的合适译法为 “犯(错误)”
关于隐去或省略的大多是不重要的,因此也 NLU 通常不 decode 出来也 OK,可以举个极端的例子来说明:

Giving to the poor is a virtue
Giving is a virtue

give 是一个 3-arg 的谓词,who give what to whom,但是在句法的名物化过程中,我们看到第一句只显性保留了一个萝卜(“to the poor”)。第二句连一个萝卜也没有。
我们要不要从上下文或利用标配去把这些剩下的坑都填上呢?
不。

白:
从陈述性用法“降格”为指称性用法的时候,对坑所采取的态度应该是八个字:“来者不拒、过时不候。”  比如,”这本书,出版比不出版好。”
我们没有必要关心谁出版,但是既然提高了这本书,填坑也就是一个举手之劳。

我:
很同意。就是说,一般来说对于这些有坑近处没萝卜的,我们不要觉得愧疚和心虚,who cares

 

【相关】

NLP 迷思之四:词义消歧(WSD)是NLP应用的瓶颈

【置顶:立委NLP博文一览】

《朝华午拾》总目录

 

【一日一parsing:NLP应用可以对parsing有所包容】

白: “西方人类比用得少,是因为西方的逻辑学产生的早。”
t0614a
什么叫狗屎运?我的定义就是:
遇到一个找茬的顾客,看到他藏着陷阱的“自然语言”语句,心里有点没数,但测试自己的系统,一次通过了。
今天是个好日子,撞了一个狗屎运,不必 debug 了,因为此例就没有 bug。
当然,真是通不过,需要 debug 也没啥,所有的系统都不是一锤子买卖。只要这种 bug 是在你设计的框架内,有一个顺达的对症下药之路,而不是为了这个 bug,没完没了折腾系统。严格说,也可以找到瑕疵:理想的 parse 最好是对 “西方人” 耍个流氓,label 成 Topic,而不是 S,但这个 Topic 的流氓不见得比现在这个 parsing 强,半斤八两吧。现在的parsing 是把 “西方人类比”当成主语从句了。S 是主语,Subj 是主语从句。
对于半斤八两的句法分析路径 怎么判断对错?
一个包容的系统,就认可两者,因为其间的区别已经很 sutble 了,连人很多时候也糊涂。所谓包容的系统,指的是,在语用层面做产品需要语义落地的时候,parser 对此类现象给出的两个不同的路径,应该不影响落地。这个对于句法和语用 integrated 的系统,是没有问题的。后者可以也容易实现这种鲁棒性。对于汉语常见的 NP1+NP2+Pred 的现象,下列分析大都可以被包容:
(1) Topic + S + Pred
(2)[S + Pred] +Pred
when the second element can be Pred (V, A, or deverbal N)
(3) [Mod + S] Pred
包容的都是可以预见的,因为可以预见,因此可以应对,hence robustness
顺便做个广告,承蒙高博协助,立委 NLP (liweinlp)频道 再张大吉:
liweinlp.com

【相关】

关于 parsing

【关于中文NLP】

《朝华午拾》总目录

【关于NLP应用】

立委科普:问答系统的前生今世

《新智元笔记:知识图谱和问答系统:开题(1)》

《新智元笔记:知识图谱和问答系统:how-question QA(2)》

【立委科普:NLP应用的平台之叹】

【Bots 的愿景】

【泥沙龙笔记:NLP 市场落地,主餐还是副食?】

《泥沙龙笔记:怎样满足用户的信息需求》

《新智元笔记:微软小冰,人工智能聊天伙伴(1)》

《新智元笔记:微软小冰,可能的商业模式(2)》

《新智元笔记:微软小冰,两分钟定律(3)》

新智元笔记:微软小冰,QA 和AI,历史与展望(4)

泥沙龙笔记:把酒话桑麻,聊聊 NLP 工业研发的掌故

泥沙龙笔记:创新,失败,再创新,再失败,直至看上去没失败

泥沙龙笔记:parsing 是引擎的核武器,再论NLP与搜索

【立委科普:从产业角度说说NLP这个行当】

社会媒体(围脖啦)火了,信息泛滥成灾,技术跟上了么?

2011 信息产业的两大关键词:社交媒体和云计算

再说苹果爱疯的贴身小蜜 死日(Siri)

从新版iPhone发布,看苹果和微软技术转化能力的天壤之别

非常折服苹果的技术转化能力,但就自然语言技术本身来说 …

科研笔记:big data NLP, how big is big?

与机器人对话

【立委科普:机器翻译】

立委硕士论文【附录一:EChA 试验结果】

《机器翻译词义辨识对策》

【立委随笔:机器翻译万岁】

【河东河西,谁敢说SMT最终一定打得过规则MT?】

 

【Bots 的愿景】

其实 所谓 bots 只是一个用户端的入口,不久未来的世界中的人机接口。

从长远来看,它不仅仅是连接音响或其他apps的启动装置,也不满足于做一个聊天的玩具。加入知识图谱以后,它就变成了知识问答。IBM沃森的问答超越人类,作为AI的里程碑,其背后的原理也不过如此。沃森系统更多是工程的成就,而不是研究的突破,是大数据 大架构 大运算的成果。从系统本身看,并没有超出我们当年做问答系统的基本原理和算法。在第一届 TREC-8 问答系统大赛中,我在Cymofny做的QA系统赢得第一名,66 分,比 IBM 系统(沃森系统的前身)超出了 20 多分。他们后来的成就是因为 IBM 有实力把这个工作进行到底,而整个业界在 NASDAQ 2001 年坍台以后,全部抛弃了 QA 的应用开发,投资人撤资或冻结这方面的任何尝试。我们当年是转向去专做企业情报挖掘了。

广义的图谱包括 parse trees 可以对付无法预见的语义搜索的长尾问题。本义的图谱专指针对领域和应用的 predefined 的知识挖掘,可以精准回答可以预见的问题。由于大数据的信息冗余,使得 imperfect NLP 技术也一样在问答应用上闪闪发光,打败人类。IBM 系统底层的 NLP 和 IE 内核,据了解并非一流的水平,但这不妨碍它在大数据大运算大存贮大架构的工程运作下,一鸣惊人。

图谱是动态的,这一点有几个应用方面的视角:

首先,图谱的知识来源是动态的,因此图谱需要定时和不断地更新
我们做社会媒体挖掘,挖出来的 sentiment 图谱大约是一个季度更新一次,有特别需要的时候更新更快。在并行云计算的架构里,每次更新前后需要约三周时间 近200台servers。

其次,图谱里面的关系和事件是动态连接的
它有无数种进一步组合的可能性,也有进一步挖掘其隐含关系或 trends 的潜力。这些潜力需要一个触发机制去调动它 根据应用的需要和接口。

浅谈一下使用动态图谱的几个场景:

1 semantic search:包括 SVO search

这是对关键词搜索的直接延伸,保留了关键词搜索的应对长尾的能力,可以应对无法事先预见的问题和信息索求,同时大幅度提高搜索的精准度 借助(广义)图谱或 parse tree 的结构 leverage。

2 问答系统

这是对可以事先预见的问题,或一个领域的 FAQ 等设计的 是知识图谱的拿手好戏。根据需要回答的问题,制定图谱抽取挖掘的目标,针对性强,有备而来,焉得不成?

3. 智能浏览

这也是图谱的拿手好戏,因为图谱本身就是互相连接的实体的关系与事件的巨大的网络。只要有一个起点,顺藤摸瓜似的动态智能浏览可以设计得随心所欲,让信息随着人的关注点动态 real time 转移,满足人类没有特定目标或只有模糊目标时的信息需求 对于研究者是特别有力的工具。

 

【相关】

【立委科普:实体关系到知识图谱,从“同学”谈起】

【泥沙龙笔记:知识图谱是烧钱但靠谱的战略项目】

《朝华午拾:信息抽取笔记》

泥沙龙笔记:搜索和知识图谱的话题

置顶:立委NLP博文一览(定期更新版)】

《朝华午拾》总目录

立委NLP频道

【语义计算沙龙:深层做出来了,抽取还会远吗?】

mei:
@wei 我提议过,你的parser,能做个API 吗?NLP as a service。如diffbot.com. They are good revenue. Single founder. Large impact. Recently got $10 million series A from ten cent.
还有,你能做几个语言?
diffbot 的NLP/IE 比FB 好。其实小公司 比大公司做的好是 常有的事。所以大公司只好靠acquisition
我:
现在就是 API,内部也是 API 调用。NLP 做 service 或 component technology 作为 business 以前很少见到挺得住的,但对于个人创业,是可能短期“成功”的,毕竟现在的形势也不同了。
我带领我们组做过 18 个语言,囊括所有主要欧亚主要语言。我自己亲手做的是英语汉语世界语,亲自指导过的是法语、俄语、土耳其语。
我:
你说的是 FB 的 deep text??
mei:
FB 几次 NLP effort都差
现在形式是不同
Diffbot 很成功。每个大公司要买,他们不卖
我:
前两年我与FB里面的人聊天,他们才刚想 NL 的事儿,没什么概念,做得很浅。
Diffbot 是个什么背景?
马:
这家公司不算NLP吧 他们主要做爬虫和正文抽取
我:
抽取可以不用 parsing,或者只用一点 shallow parsing 但抽取是 NLP 这个大伞下面的。另一方面,有了 deep parsing,抽取就是一个玩儿。
马:
他这个抽取不是IE而是从html里提取文本 对文本的结构并无任何分析
我:
那是当年 whizbang!,一个路数。
当年做得很好 泡沫破灭投资人撤资 可怜几麻袋源码 白菜价拍卖。最后 inxight 买了也没见消化 自己也当白菜卖了。
马:
也许会历史轮回,也许会” This Time Is Different ”
mei:
Diffbot 没做parsing做IE
crawling,取text只是第一步
白:
深层就算做出来了,怎么抽取还有很多问题。几个难点:1、多重否定:我不是没注意到这件事他没生气。2、多重模态:我知道他相信你不否认某某的领导能力。3、高阶表述:对油价快速上涨的预期减弱。
如果都还原成情感三元组:《对象、属性、极性》,貌似有问题。
你可以放过,前提是识别准确;如果没放过又做了错误识别,就玩大了。
mei:
IE parsing 都只是NLU 的一部分。真正的story understanding 且得做呢。研究生时学的story understanding/knowledge representations, 现在没一家做到。做NLU/AI且得做呢!
我:
白老师说的那些 都见识过。Sentiment 做了四五年了 抽取挖掘做了18年了 能想到的 都见到了。只不过 绕弯不过三 这是基本原则 有时有意选择不做。不是不可以做 是不必做。
白老师的所谓三元组的表达更不是问题 因为表达是自己跟自己玩。识别了 还能无法表达 人不会被尿憋死的。
白老师认可一部分长尾可以选择不去做 但警告说不做不过是漏掉长尾而已 对于大数据 漏掉不是问题 问题是没漏掉 却抓反了。这个对没有经验的开发者 的确是个挑战。对于我们 早已突破了。
白:
自己跟自己玩是因为同质化的东东放在一起有计算手段上的优势,来一个异质化的东东,只好撇在大锅之外了。伟哥有本事开小灶,其他人就难说了。
我:
因为选择不做 与识别准确 不在一个量级上。白老师说的这些问题对学习系统构成的挑战 要比规则系统大很多。
白:
不怕漏识,关键是别误识
我:
对于学习 不是选择做与不做的问题 基本上是到不了选不选的那一步。因为缺乏结构 只能撞大运。凡是绕了几层的结构对于缺乏结构的系统 基本上是噪音 如果标识的时候 手工排除 对学习更有利。不要指望解决它 最好是不要干扰了学习 把系统弄糊涂。
我们从来不是为深层而深层 深层从 day one 就是与抽取挖掘无缝连接的 就是为了支持语用的。不像 syntaxnet 离开应用还有 n 丈远。

NLP核武器的奥秘

我总说,deep parsing 是NLP应用的核武器,有人以为夸张,今天就说说这道理儿。

NLP 的应用主要分两部分,一部分是对于 text input 的分析或“理解”,一部分是反映这种分析理解的 output(俗称语义落地,譬如 output 是另一个语言,就是MT;output 是 response,就是对话系统;output 是针对 input 问题的答案,就是问答系统;等等)。NLP 应用就是连接 input 到 output 的系统。其中第一部分是关键,核心就是 parsing,可以实现为作为条件的模式匹配,而第二部分很多时候不过是与第一部分对应的作为结论的 mapping 或 side effects。

在上述场景的抽象描述中,可以这样来看 parsing 对于处理text的作用。首先,input 的样本就是我们 parsing 的 dev corpus,样本中的语句相同或类似意义的是我们识别的对象。绝大多数情形,识别了就算 “理解” 了,系统就知道如何应对最合适。

自然语言的难点在于表达这些 input 的语句千变万化,因此用 ngram 枚举它们是不现实的。现实的办法是 parse 相同或类似意义的input语句成为结构树,然后在这些树上找共同 patterns,叫“最大公约树”吧(intuitively 叫最小才顺耳,就是这些树的common core,树大了的话就没有召回率了,白老师提议叫“最大公共子树”)。如果找不到,那就把这批句子分而治之 直到可以找到几个所谓子树 patterns,写成逻辑或的模式规则。

Patterns 的宽严度调试得恰到好处 就可以以有限的规则 应对无限的表达了。宽严不外是调整结构arc的条件 或 节点(node)的条件,deep parsing 说到底就是创造这些结构条件的机器。

以不变应万变,用有限的patterns抓住无限的语言变化,这就是自然语言核武器威力的表现。

【相关】

泥沙龙笔记:parsing 是引擎的核武器,再论NLP与搜索

泥沙龙笔记:从 sparse data 再论parsing乃是NLP应用的核武器

【置顶:立委科学网博客NLP博文一览(定期更新版)】

《朝华午拾》总目录

【河东河西,谁敢说SMT最终一定打得过规则MT?】

Xi:
@wei , 评论一下李明教授的机器翻译。我纳闷这年头这么多人跨界来和你抢食啊?

我:
评论啥,我对MT无感了,都。
我现在是,胸怀知识图谱,放眼世界大同。早翻过MT那一页了。
不过话说回来,学自然语言的人如果入行做的就是规则机器翻译,那是上天的赐福。新一辈这种人没有了,所以很多入行多年的人,看到的语言世界,还是井底的一线天。
如果你在没有平台支持下被逼着去做机器翻译,你有福了。你必须从头开始做词典、做 tokenization,做 POS,做短语,做 SVO 句法,你还要做双语结构转换、WSD 词义消歧,最后还有目标语的生成,包括形态生成、调序,修辞上的一些 final touches。
总之 方方面面 你必须全部做到 如果没有平台 没有专用语言 像我们做硕士论文那样用 general purpose language (COBOL,ALGOL,BASIC,甚至汇编)做,那就是在太上老君八卦炉里炼 没得不炼成火眼金睛 后去做 NLP 任何一个方面和应用 都洞若观火。
现在的 CL 硕士博士呢 动不动就下载一个软件包,瞅准一个子任务 譬如切词,譬如 sentiment,譬如WSD,哪怕是做 MT, 也不用涉及那么多的层次和模块。
老老年文:【立委科普:机器翻译】 但并没完全失效。还有这篇:【立委随笔:机器翻译万岁】。
SMT 不用涉及那么多层次 是因为迄今的 SMT 基本是在浅层打转 从来就没有做到深层,论深度和结构 远远不及我们 30 年前做的 规则MT。
马:
但是比规则的系统实用啊
我:
河东河西啊。
如今董老师的系统等也打磨经年了,很难说谁更实用。论精度 则绝对是后者强,甩出一条街去。
smt 的先驱应该是 ibm ,从加拿大议会英法双语语料开始的。

Guo:
Translation memory 算什么?

我:
说起这个概念,我还有掌故呢。以前记过,差不多也成了 MT 野史或外传了,见《朝华午拾:欧洲之行》,Victor 称作为 translation unit (TU)。他们的所谓的 Chinese Week,当时董老师也去了,我和刘老师也去了。傅爱萍大姐派人领我们参观了红灯区以后,并没有随着我们去参加这个活动。这个活动的设立与我当年为他们做的“汉语依存文法”的工作密切相关。
QUOTE 研究组的骨干还有国际世界语协会的财务总监,知名英国籍世界语者 Victor Sadler 博士,我在71届国际世界语大会上跟他认识。作为高级研究员,他刚刚完成一项研究,利用 parsed (自动语法分析)过的双语对照的语料库(BKB, or Bilingual Knowledge Base)的统计信息,匹配大小各异的翻译单位(translation unit)进行自动翻译,这一项原创性研究比后来流行的同类研究早了5-10年。显然,大家都看好这一新的进展,作为重点向我们推介。整个访问的中心主题,仍然是解答他们关于汉语句法方面一些疑难问题。他们当时正在接洽欧洲和日本的可能的投资人,预备下一步大规模的商业开发,汉语作为不同语系的重要语言,其可行性研究对于寻找投资意义重大。
索性把怀旧进行到底 《朝华午拾:一夜成为万元户》: 这是我为这个DLT项目所做的 Chinese Dependency Grammar 的故事。这篇汉语形式文法的原始版本有链接可以下载:Li, W. 1989. “A Dependency Syntax of Contemporary Chinese”, BSO/DLT Research Report, the Netherlands.  我的工作应该是中国做依存关系最早最完整的作品了。所谓 【美梦成真】 就是这么个来历,跨越近 30 年,纸上谈兵的 syntax 终于化为现实的 deep parser。
刚才一边吃晚饭,一边琢磨这段MT外传,觉得还是有几点可以总结的,笔记如下,各位指正。
(1) 荷兰这个多语 MT 计划本来是规则系统起家,用世界语作为媒介语,用的是依存关系文法的框架,实现的机制是 ATN (Augmented Transition Network),技术领头是德国语言学家舒伯特。
(2) 可是做着做着,剑桥出身的 Victor 博士想出了统计的路线,定义了一个在句法分析基础上、根据统计和记忆决定的可大可小的 Translation Unit (有点像我们用的“句素”的概念),做了实验验证了这条路线的创新,把整个项目在收尾阶段翻了个个儿。而这时候(1989年),其他的MT研究虽然也有 IBM 等开始的统计 MT,但没有一个达到这样的深度。
(3)事实上,直到今天,回顾这个科研创新可以看出,根据 parsed 以后的双语数据库的平行对比,从统计去找 Translation Units,比起后来多数缺乏结构、本质上是 ngram 记忆的 SMT,还是远高出一筹。
(4)在 SMT 中加入 parsing 并不是每个人都有这个条件,DLT 赶巧是先做 parser 做了四五年,有了这个基础。现在和今后的方向从宏观上来看是,SMT 应该重温类似 BKB 双语parsed平行语料库的尝试,走带入结构的道路,才有希望克服现在显而易见的结构瓶颈,譬如定语从句翻译的错误。

mei:
语言学家做MT注重语言的结构,深的浅的。我是ai出生,注重“知识“,互相通融的,但侧重点有区别。
Guo:
一谈到统计和规则,总不免让人想起,库恩的科学革命的结构。根本说来,统计和规则,对于什么是nlp,是有完全不同的定义的。站在统计的角度,古埃及文的解读,作者和鹰品的辨识,错别字的检查和矫正,文章可读性的分类,还有很多很多这样的,都是历史悠久的成功故事。说历史悠久,是因为他们早于乔姆斯基太多年了。但是从规则的角度看,这些大概都不属于nlp。

我:
规则也并非一定要是句法的规则,任何 patterns 包括 ngrams 都可以是规则。学习派用的是 ngram 的分布统计,规则派很难量化这些 ngrams 的统计数据,只好把“gram”定义为从线性序列到句法单位的一个动态 unit,用结构化的深度 弥补统计性的不足。

Guo:
其实对于mt,统计这一派也更多的是从”机助”翻译甚至阅读来看问题。不管大佬们怎么吹牛,统计这一派从来不以理解人模仿人为目标。他们是非常工程性,实用主义的。

我:
当 gram 被定义为我导师刘倬老师所阐述过的“句素”以后,产生了两个飞跃:
第一是距离从线性走向平面,甚至远距离现象也可以被这种 “ngram” 抓住了: 这类例证我此前显示过很多。第二是 gram 本身从直接量 (literal) 提升为一个具有不同抽象度的 features 的语言学单位总和,连ontolgy亦可带入。这两个飞跃使得应对自然语言错综复杂的规则,变得切实可行。
smt 我们迄今看到的流行成熟的系统,譬如大投入造就的百度和谷歌MT,其缺乏结构和parsing支持的缺点是如此显然,结构瓶颈随处可见。可反过来看董老师在群里显示出来的传统规则+知识 的系统,结构的优势不言而喻。
也许从 scale up,从对付鸡零狗碎的成语性的 ngrams,董老师这类系统目前还无法匹敌百度谷歌 smt,但是假如以董老师系统为核心,给以同等的资源投入和维护,我觉得百度系统无法打得过规则 MT。当然 最佳的办法是二者的某种结合,取长补短。我想说的是,如果硬要硬碰硬的话,在同等投入的基础上,谁敢拍胸脯说主流 smt 一定会胜过规则 mt 呢?
现在是不平等比较,根本不是 apple to apple 较量。历史把 规则mt 推下了主流舞台,但是 smt 的人无论多么傲慢 也还是应该看得见自己的短板和规则mt的亮点。

Guo:
统计这一派,其实有很多人试图引入结构,但鲜有能够有效减少perplexity的。核心的争论,就是问题到底出在哪儿?一种观点是,结构,并不承载太多的附加信息。另一种就是,我们还没有发现更好的更有效的数学模型。这就是为什么,好些人对深度神经就像打了鸡血。

我:
heterogeneous features 引入后的 evidence overlapping 以及 perplexity 等,是研究课题,不过说结构不承载太多附加信息等价于说 ngram 线性的 model 无需改变,这个 model 在20多年中已经被推向了极致,没有多少油水了。白老师说话,model 不对,语言长得啥样框架上就没留下空间,再多的数据,再deep的学习,也是必然遭遇瓶颈的。
的确在某些粗线条任务中 譬如 document classification,一袋子词的ngram模型已经足以满足应用的需要,精度已经够高,结构即便加入也改进余地不大了:这不是我们需要讨论的。我们关注的都是那些显然遭遇瓶颈的任务,包括 MT、包括 IE、包括 Sentiment Analysis,这些任务,显然统计的路线在没有结构助力下,深入不下去。
到目前为止 纵然有一些带入结构的尝试,但很可能是浅尝辄止,还不到结论的时候。
深度神经是一种训练的算法,与语言的结构深度没有必然联系。事实上 迄今为止 对于 text NLP 的深度神经的尝试,除了专门做中间件 parsing 的 research 如 SyntaxtNet 外,对于 NLP 应用方面的任务,基本上还是在语言浅层进行。带入结构的深度神经用于 text NLP, 到底有几家在做?如果没做 或还没做出结果来 那么所谓 Deep Text 就是有意无意的误导(见 【遭遇脸书的 Deep Text】 )。

杨:
我理解:深度学习主要是可能在语意理解领域 可能会有所改变

我:
譬如?
哪些任务是深度神经擅长、文法工程短板的语义理解呢?
凡是条分缕析的细线条任务,想不出来深度学习可做,文法工程不可做的,无论语义如何落地。

杨:
比如文字到图像的映射搜索呢?我不懂,瞎说的。当然 这个目前远远不成熟 只是猜想

我:
这个还真是没想到,因为其中一端是 text (captions?),可另一端是 image,对于学习,无论神经的深浅,这个任务只要有大量的 data (带有 captions 的 图片集),就是一个很自然的学习的任务。而对于规则,这种形式化的语义落地(映射到图像)在图像那边如何处理并integrate 到规则系统中来对接,似乎没有显然而见的自然接口。

杨:
不过 图像这块就不够成熟 要做这个且早呢。

我:
好。短板不怕,只要心里有数就好。早就知道规则的“经典”短板了:
手工规则系统的软肋在文章分类】 。
QUOTE 人脑(规则)可能顾不上这么多细微的证据及其权重,但是人脑往往可以根据不同状况抓住几条主线,然后在其中调控,达到解决问题。在 deep parsing 这个 NLP 的关键/核心领域,规则系统的优势更加明显
再有就是搜索。关键词检索的鲁棒、对付长尾 query 的能力,是规则系统难以匹敌的。
但是如果把关键词搜索作为 backoff,那么加入结构的精准智能搜索(我们叫 SVO search)就顺理成章了。

 

【相关】

立委科普:机器翻译
立委随笔:机器翻译万岁

朝华午拾:欧洲之行
朝华午拾:一夜成为万元户
美梦成真
手工规则系统的软肋在文章分类
遭遇脸书的 Deep Text

Li, W. 1989. “A Dependency Syntax of Contemporary Chinese”, BSO/DLT Research Report, the Netherlands.

【置顶:立委科学网博客NLP博文一览(定期更新版)】

《朝华午拾》总目录