ChatGPT海啸狂飙,谁将被取代?谁将借力跃升?

AIGC“尖峰系列”丨李维博士:ChatGPT海啸狂飙,谁将被取代?谁将借力跃升?

2023/03/10

在上一篇文章

《AIGC“尖峰系列”丨李维博士:人类语言“通天塔”建成,ChatGPT的辉煌与挑战》

中,我们分享了李维博士关于ChatGPT对于人类和机器交互的意义,以及其背后LLM(Large Language Model)语言大模型是如何炼成的等话题。

本篇文章我们将分享李维博士对于“ChatGPT让AI生态重新洗牌”“语言大模型(LLM)的生态和应用”等话题的独到观点,以下为演讲内容精选。

01

经历ChatGPT海啸之后

AI生态面临洗礼和洗牌

ChatGPT海啸引发的直接影响就是,NLP生态面临全面洗礼或洗牌,每一个现存的NLP产品服务或赛道都要在LLM参照系下重新审视。

一开始我们内部AI老友热议ChatGPT的时候,大家首先思考的是,ChatGPT怎么和搜索技术结合起来,它能颠覆搜索吗?

搜索是可追踪的,返回的每条结果都有记录,谈不上信息融合。ChatGPT是不可追踪的,长于信息融合:ChatGPT本质上不存在抄袭的可能性,它给你吐出来的每句话,都是自己消化之后的语言。可见,传统搜索与ChatGPT是两个完全不同的处理方式,各有优缺点。

搜索是信息服务之王,无处不在,有自己的巨头(谷歌,还有中国的百度)和非常稳定的商业模式。自从Web1.0时代搜索崛起以来,搜索的形态和模式基本没有改变,二十多年了。其实,多年来一直不断有新技术和创业者企图挑战搜索,风险投资界也一直关注可能成为“next Google”的潜在的搜索颠覆者,但搜索的地位一直固若金汤。但这一次有所不同,微软携ChatGPT的独家代码授权,大刀阔斧高调发布所谓“new Bing”。躺着赚钱的谷歌不得不紧急动员,正面迎战。一场搜索+LLM的大戏正在上演,像个活话剧,告诉我们虽然融合两项技术还有很多困难需要克服,但大势所趋,重塑搜索新生态势在必行。

除了搜索,那些被打磨得很圆的定向信息产品和服务,现在全部面临被重新审视和洗礼的宿命,包括聊天、功能对话、文法纠错、机器翻译、文摘、知识问答等等,这些方面的代表作品(Siri、小冰、Gramnarly等)以前曾有的技术护栏,一下子被降低了,真好像大水冲了龙王庙。

NLP龙王庙中,虽然不少产品由于多年的打磨以及用户的惰性,还不至于面临灭顶之灾,有的还可能存续很长时间,毕竟都在走下坡的路。这是通用AI对于传统AI的划时代胜利。是我们以前不敢相信的,曾经那么地怀疑通用路线,就等着看鼓吹AGI人的笑话,谁想到人家不笑则已,一笑倾城,甚至“倾国倾球”,所向披靡。

看看13年前苹果就发布的Siri。13年比深度学习革命黄金十年的历史还要长,但搞到现在Siri才刚刚推出两轮或者三轮的对话能力。现在来了个ChatGPT降维打击,苹果怎么办?肯定只有拥抱LLM。亚马逊的风云产品Alexa也是一样,也打磨了好几年了,积累了那么多的用户数据。虽然它在边边角磨得很圆,不可能马上被取代,但依然会面临技术上的调整。

再者是大家常见的电商客服,众所周知,无论是阿里、还是京东的在线售后客服都打磨得很圆了。因为售后服务的问题相对比较集中,问题集不大,在积累了足够数据以后,用户体验就慢慢好起来了。但客服不仅仅局限于售后的问题答复,当客户提出的问题超越了预期的问题集的时候,目前客服常常显得“人工智障”,无论理解和回应都捉襟见肘。面对ChatGPT的问答超能力和多轮对话的丝滑性,怎么办?除了拥抱它,没有别的出路。

在ChatGPT之前,小冰算是把多轮聊天推到了极致,据报道有人痴迷于与她聊天,聊一个晚上还意犹未尽。它打造具有人格化的形象,可以与人做情感上的交流。在前ChatGPT时代,小冰是聊天的绝对天花板,多轮交互的能力把对手远远抛在后面。谁料想半路杀出个程咬金,ChatGPT出来后,小冰的地位就显得非常尴尬。ChatGPT可不是为了闲聊设计的,chat只是它为了达到多任务的一个桥梁,本质上是人机接口,聊天只是它的副产品,即便如此,通用大模型还是实现了降维横扫定向产品的效果。在ChatGPT的丝滑度和通用性面前,一个人格化的聊天机器人跟它不在一个层次上。除了去拥抱它,仍别无他法。

在国外,怎么修改作文的拼写、语法错误这一块,用户体验做得最好,唯一活下来站稳市场的只有Grammarly,有上亿用户了。现在它的地位也极为尴尬,因为同样的辅助写作,ChatGPT也是拿手好戏。长远一点来看,Grammarly的选择也终将如此,要么去拥抱ChatGPT,要么就走向末路。

谷歌MT是机器翻译领域的代表,国内的有道、搜狗和百度也是用的神经机器翻译,但同属神经路线的ChatGPT出来以后,仍然是一种降维打击。用ChatGPT去做机器翻译,译文更加地道而且多样化。生成大模型的随机本性使得每次翻译出来的结果都有所不同,你可以拿同一个文本不断的试它,然后挑其中一个你最满意的。专项机器翻译系统显然面临如何拥抱LLM的问题。

最后谈谈教育。ChatGPT大模型降维碾压所有的教育产品,是很显然的。在教育赛道,搞生态产品应用的人,都需要在大模型的框架下重新审视一遍怎样拥抱这个LLM新时代。教育本身是跟语言打交道的,无论文理。虽然现在的大模型理工科能力不怎么强,但这个知识短板应该很快就会得到不同程度的弥补。ChatGPT必然对教育带来颠覆,同时也为教育现代化提供了一个最大的机遇。语言学习与电脑编程教育就不用说了,ChatGPT本身就是一个语言大模型。虽然目前它的编程还不到专业工程师的水平,但是常用的代码形式已经学得很好了,至少它能辅助你的编程,实际上,GPT赋能的Co-pilot已经成为越来越多码农的辅助工具了。

往后退一步,我们同时也面临着一个巨大的风险,比如说假新闻。如果你希望吹捧一家公司,你可以让ChatGPT生成出五花八门的软文来,讲得头头是道。那些大众点评将来也会被真假莫辨的评论所掩盖,因为制造假新闻的成本趋近于零。如果没有很好的防范措施,这一切就会把人类置于真假莫辨的世界之中。我们现在一直在讲它的好处,LLM怎样赋能新生态,相信在新生态下,今后的五到十年一定会出现新的阿里、百度等等,这是从发展的角度看技术生态的大转变。但我们面对的LLM滥用的危险同样巨大,人类准备好了吗?显然还没有。当然,这是另一个话题,我们这里就点到为止。

02

大模型:万众创业的浪潮正在到来

以ChatGPT为巅峰的LLM好比核弹,有了它,还有更多的产品形态和赛道等待创业者开拓和落地。

关于这个话题,我们需要特别强调ChatGPT带来的前所未有的创业条件:ChatGPT本身已经成为一个产品的试验场,它就是一个门槛无限低、人人可玩的playground(游乐园)。门槛低是因为前面提到的人机接口的范式改变(paradigm shift)。AI历史上第一次,机器开始迁就人,而不是人迁就机器。是人类语言,而不是计算机代码成为人机交互的工具,这个改变对于NLP新生态大爆发的意义,怎么强调也不过分。实际上,这是为“万众创业”提供了条件。

AI创过业的人应该都有这个体会。创业团队要有成功的机会,最基本的一条是产品老总与技术老总能够密切配合和沟通。产品老总凭着自己的市场直觉和对于客户需求的了解,努力寻找技术转化为服务的最佳市场切入角度,形成产品的设计方案。这个设计方案的可行性需要技术老总来验证和背书。然而,很多时候,由于不同的专业背景和知识结构,产品老总与技术老总鸡同鸭讲的情况也不罕见,一旦出现这种情况,这个创业公司基本上注定没戏。

ChatGPT根本上杜绝了鸡同鸭讲的问题。以前只有技术老总和码农可以验证一个方案的可行性,现在好了,产品老总/CXO、工程技术人员、数据分析员、用户这些不同背景和专长的人,都有一个统一的平台ChatGPT,可以交流产品的创意。大家可以在上面去做模拟的服务。不仅是人机之间,而且人类之间的交流障碍都被克服了。这个东西的发生,就是一个产品大爆发和万众创业的先机条件。

在美国,有好几百家初创公司,现在跟着大模型去做,你上游的大模型还没有完全理顺,他们在下游做的东西就是正在进行时了。还有无数的普通大众,不断的在网上现身说法,告诉大家如何两三个小时就能利用ChatGPT赚到5000块钱,这一类分享越来越多,这意味着草根群众的创业积极性被调动起来了。每个人似乎都可以利用这个机会去找到一个创业视角。归纳总结这些草根的创意,也可能找到可以流程化规模化的满足市场需求的信息服务的新赛道。

ChatGPT这样的大模型最终是一种操作系统级别的存在,每一个与AI有关的,特别是语言和知识有关的信息产品和服务,都离不开它。当年英特尔主导的时候,那个著名的logo是Intel Inside,将来就是Chat-Inside,这还不够准确,应该叫Chat-In&Out。怎么讲?ChatGPT这样的大模型赋能产品的时候,它既是服务员也是大厨,服务员可以跟你接单子,对话交互,了解需求,同时,它还自己去做工,满足你的信息需求,交付也还是它。既有表又有里,既用到它的语言天才,也要用它的知识技能。

这就是我说的在未来的五年可能是最大发展的一个新生态形式,叫做LLM专家坐台,它或许会打开了无数的创业大门。基本的服务形态就是各行各业的在线信息服务,不管是在线教育、在线律师、在线顾问、在线金融、在线旅游,都是奔着大幅度提高服务的效率去的。有了ChatGPT之后,你只需要雇一个专家,来代替以前需要10个专家甚至100个专家才能应对的任务,最后迎来的就是生产力大爆发。

至此,应用生态很清晰了,也靠谱。原则就是对结果要专家最后把关(human judge as final filter)。这是最基本的设置,当然也不排除专家对于输入做提示词的调配,以便激发LLM回应更好的结果。

对几乎每一个场景应用,都有一个打造专家工作台(“坐台”)的任务。下游创业沿着这个思路去,有无数切入市场的机会,包括补充现有产品或服务的不足,例如在线教育的每一个细分场景,还有在线医生,在线律师,在线金融咨询,等等,也包括去开拓以前不敢想或没想到的业务场景。这是看得见的即将发生的生态形态的大变革,或重新洗牌,提供的是高效专家建议(expert-in-loop services)。

说到坐台,国内电商大厂都曾经打造过有相当规模的客服坐台,那是在用户需求和满意度无法用全自动方案满足,更无法用全人工应对的压力下出台的。现在有了LLM,继承这种形态推广到所有在线服务领域的条件已经成熟。这件事所能带来的生产率大爆发,超出想象。

“Human as judge”的设计理念在近几年的低代码平台(例如RPA平台、parser-enabed信息抽取平台等)已经验证了其有效性和有效率性。我的最新几个专利就专门讲的这个过程(human as judge to replace human as coder),但这是说的低代码快速开发环境,这个human虽然不必要手工写代码,但还是要熟悉软件开发的流程,例如单元测试、回归测试和debug等等,不是仅仅就做个judge。这里说的是全新的形态,human只需要做judge即可完成服务。现在完全有可能打造针对各种细分赛道或场景的在线信息服务“坐台”。具体说,专家的作用只是在最终go or no-go 的当口,以他的知识和经验做出判定。做裁判比做运动员,效率要高太多了。

值得强调的是,这次ChatGPT横空出世带来的新鲜事儿是,ChatGPT既是后台也坐前台。这就好比找对象娶媳妇,通常都是漂亮的见识短,能干的不漂亮。突然来了一位既“万能”又漂亮的,这无法不激发无数追求者的想象极限。我们信息产业的创业者就是ChatGPT的追求者。上得厅堂下得厨房,说的就是ChatGPT,这是因为chat只是ChatGPT的表,本质是人机接口,而能够完成NLP各种任务才是它的里子。有表有里,就可以围绕它建立下游生态的产品或服务。英特尔时代,电脑产品的品牌广告记得是Intel inside,今后的新生态应该叫chat in&out,指的就是LLM赋能的新生态,不仅赋能人机交互的表面,同等重要的,或者更重要的(看具体落地服务的性质了),是也赋能产品服务的内涵,只是要让专家最后把个关。在这种形态下,专家也还是隐身在台后的。就是说,活交给它做,出面交付也还是它,只是后面安插一个专家督导和裁决而已。再打个比方,LLM既是服务员也是大厨,只是出餐前需要一个经理过一下目,为服务质量把关,也承担责任(例如在线医生、在线律师、在线咨询师等)。

在这样的生态下,今后5年会是在线服务的大爆发时期。赶巧的是,三年疫情也极大推动了在线服务的草根意识(awarenss),帮助养成了用户的在线习惯,培育了市场。例如我个人疫情前从来不用外卖的apps,也不用在线医生,可是现在二者都用了,比以前自己到餐馆点外卖,为个伤风感冒自己去预约诊所,不知道方便了多少,再也不想回到以前的低效率线下服务了。天时地利,借着这个东风,新生态不可能没有机会。

怎样建造坐台?既然已经LLM in&out了,听上去好像这个坐台谁都可以建,每个坐台配上专家,明天就可以在线开业服务了,那还有创业者什么事儿?当然不是这么简单。这是因为ChatGPT这样的LLM作为工作引擎(work horse),显示出各种专业知识的潜力,但这种潜力却是漏洞百出、有内伤的。这些内伤前面论过,按照现在的路线是不可根治的。就是说,表面光,里子并不扎实,结果不可靠,甚至会要人命的。坐台的建设就是要试图解决这个问题:如何加强内功,使得其在线服务,仅仅需要expert的流程化介入,而不是专家的生产性投入(例如RPA)。要的是坐台部署以后output一端的结果审核(go/no-go及校订post-editing),这都是在线的介入,而不是离线的调教(fine tune)。离线调教是坐台建设者的任务,这就开始有点渐入新生态的深水区了,其中有些路线图是蛮清晰的,有些是可以预见不久会解决的,还有较少的一些点,目前不够清晰,还需要探索和进一步验证。

细看一下这里的主要问题在哪里,有哪些可能的突破点和解决方案呢?首先,论专业知识的广度,LLM很厉害。没办法,人家记忆力强,肚子大,消化的材料多,这些都超出了专家,你可以用一个领域的专有术语去试试就知道了,LLM对任何一个很偏很狭窄的主题都会有自己的消化总结,成套成套的,可能细处有错漏,但在面面俱到方面碾压专家。为什么这一点也很重要,因为LLM弥补了人类包括专家的缺陷,用软件的话说,人类precision(精度)有余,recall(召回)不足;而LLM正相反,precision不足,recall有余。LLM可以把可能遗漏的东西,从大数据的黑洞翻上来,随时提到人类认知的雷达上。因此,坐台建设的重中之重就是要克服LLM的precision瓶颈。

我们并不企图彻底解决这个问题:话说回来,如果彻底解决了,就没有人类什么事儿了,前景很诡异,不论。我们是要把精度提高到这样的程度,其结果不至于严重影响坐台的在线专家的工作效率。LLM如果用一堆垃圾轰炸专家肯定是不行的。只要LLM输出的结果有1/4可以达到手工专家自己调研所能达到的水平,这个坐台的效率就得到了保障,这个在线服务就可能站得住。因为专家不过就是4个go/no-go的裁决而已,由于这4个结果的最优解的出场是随机的,对于专家的实际工作体验,大约也就是每看两个结果,就可以放行一个,GO!这不是负担,也不会降低在线服务的效率和竞争性。1/4是一个容错性很大的预期,现在的求精方案达到这个门槛,总体是具有可行性的。正因为有了这个总体具有可行性的基本判断,才可以结论说:LLM新生态下的创业大门的确是打开了。

https://new.qq.com/rain/a/20230310A01FH200

 

见鬼,才发现,我早期的发在科学网的发表记录,居然也被屏蔽了:复制如下,并加上后来的一些

立委发表记录

屏蔽已有 4554 次阅读 2010-2-19 05:44 |个人分类:立委其人|系统分类:论文交流| NLP, extraction

专著

商务印书馆:李维 郭进《自然语言处理答问》(2020

机械工业出版社:李维 等《知识图谱:演进、技术与实践》(2023, 已定稿,最后编辑校对中)

人民邮电出版社:李维《巴别塔影:符号自然语言处理之旅》(2023,已定稿,最后编辑校对中)

电子工业出版社:《大模型风暴:LLM与ChatGPT专家辨析》(2023, 初稿完成,与编辑在编选调整中)

学术杂志和国际会议论文选要

《规则系统的移植性太差吗?》W Li, T Tang

【计算机学会通讯】2014年第8期(总第102期)                                                                                             2014

Mining public opinions from Chinese social media, Wei Li, Lei Li, Tian Tang

《海外学人》杂志【大数据专刊】                                                                                                                          2013

《主流的傲慢与偏见:规则系统与机器学习》 W Li, T Tang

【计算机学会通讯】2013年第8期(总第90期)                                                                                               2013

长篇译文:《Church:钟摆摆得太远》 W Li, T Tang

【计算机学会通讯】2013年第12期(总第94期)                                                       2013

Publications

Srihari, R, W. Li and X. Li, 2006.
Question Answering Supported by Multiple Levels of Information Extraction, a book chapter in T. Strzalkowski & S. Harabagiu (eds.), Advances in Open- Domain Question Answering. Springer, 2006, ISBN:1-4020-4744-4.
online info

Srihari, R., W. Li, C. Niu and T. Cornell. 2006.
InfoXtract: A Customizable Intermediate Level Information Extraction Engine. Journal of Natural Language Engineering, 12(4), 1-37, 2006.
online info

Niu,C., W. Li, R. Srihari, and H. Li. 2005.
Word Independent Context Pair Classification Model For Word Sense Disambiguation.. Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL-2005).

Srihari, R., W. Li, L. Crist and C. Niu. 2005.
Intelligence Discovery Portal based on Corpus Level Information Extraction. Proceedings of 2005 International Conference on Intelligence Analysis Methods and Tools.

Niu, C., W. Li and R. Srihari. 2004.
Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proceedings of ACL 2004.

Niu, C., W. Li, R. Srihari, H. Li and L. Christ. 2004.
Context Clustering for Word Sense Disambiguation Based on Modeling Pairwise Context Similarities. In Proceedings of Senseval-3 Workshop.

Niu, C., W. Li, J. Ding, and R. Rohini. 2004.
Orthographic Case Restoration Using Supervised Learning Without Manual Annotation. International Journal of Artificial Intelligence Tools, Vol. 13, No. 1, 2004.

Niu, C., W. Li and R. Srihari 2004.
A Bootstrapping Approach to Information Extraction Domain Porting. AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM), California.

Srihari, R., W. Li and C. Niu. 2004.
Corpus-level Information Extraction. In Proceedings of International Conference on Natural Language Processing (ICON 2004), Hyderabad, India. [PDF(to be added)]

Li, W., X. Zhang, C. Niu, Y. Jiang, and R. Srihari. 2003.
An Expert Lexicon Approach to Identifying English Phrasal Verbs. In Proceedings of ACL 2003. Sapporo, Japan. pp. 513-520.

Niu, C., W. Li, J. Ding, and R. Srihari 2003.
A Bootstrapping Approach to Named Entity Classification using Successive Learners. In Proceedings of ACL 2003. Sapporo, Japan. pp. 335-342.

Li, W., R. Srihari, C. Niu, and X. Li. 2003.
Question Answering on a Case Insensitive Corpus. In Proceedings of Workshop on Multilingual Summarization and Question Answering - Machine Learning and Beyond (ACL-2003 Workshop). Sapporo, Japan. pp. 84-93.

Niu, C., W. Li, J. Ding, and R.K. Srihari. 2003.
Bootstrapping for Named Entity Tagging using Concept-based Seeds. In Proceedings of HLT/NAACL 2003. Companion Volume, pp. 73-75, Edmonton, Canada.

Srihari, R., W. Li, C. Niu and T. Cornell. 2003.
InfoXtract: A Customizable Intermediate Level Information Extraction Engine. In Proceedings of HLT/NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS). pp. 52-59, Edmonton, Canada.

Li, H., R. Srihari, C. Niu, and W. Li. 2003.
InfoXtract Location Normalization: A Hybrid Approach to Geographic References in Information Extraction. In Proceedings of HLT/NAACL 2003 Workshop on Analysis of Geographic References. Edmonton, Canada.

Li, W., R. Srihari, C. Niu, and X. Li 2003.
Entity Profile Extraction from Large Corpora. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada.

Niu, C., W. Li, R. Srihari, and L. Crist 2003.
Bootstrapping a Hidden Markov Model for Relationship Extraction Using Multi-level Contexts. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada.

Niu, C., Z. Zheng, R. Srihari, H. Li, and W. Li 2003.
Unsupervised Learning for Verb Sense Disambiguation Using Both Trigger Words and Parsing Relations. In Proceedings of Pacific Association for Computational Linguistics 2003 (PACLING03). Halifax, Nova Scotia, Canada.

Niu, C., W. Li, J. Ding, and R.K. Srihari 2003.
Orthographic Case Restoration Using Supervised Learning Without Manual Annotation. In Proceedings of the Sixteenth International FLAIRS Conference, St. Augustine, FL, May 2003, pp. 402-406.

Srihari, R. and W. Li 2003.
Rapid Domain Porting of an Intermediate Level Information Extraction Engine. In Proceedings of International Conference on Natural Language Processing 2003.

Srihari, R., C. Niu, W. Li, and J. Ding. 2003.
A Case Restoration Approach to Named Entity Tagging in Degraded Documents. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, Aug. 2003. [PDF(to be added)]

Li, H., R. Srihari, C. Niu and W. Li 2002.
Location Normalization for Information Extraction. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002). Taipei, Taiwan.

Li, W., R. Srihari, X. Li, M. Srikanth, X. Zhang and C. Niu 2002.
Extracting Exact Answers to Questions Based on Structural Links. In Proceedings of Multilingual Summarization and Question Answering (COLING-2002 Workshop). Taipei, Taiwan.

Srihari, R. and W. Li. 2000.
A Question Answering System Supported by Information Extraction. In Proceedings of ANLP 2000. Seattle.

Srihari, R., C. Niu and W. Li. 2000.
A Hybrid Approach for Named Entity and Sub-Type Tagging. In Proceedings of ANLP 2000. Seattle.

Li. W. 2000.
On Chinese parsing without using a separate word segmenter. In Communication of COLIPS 10 (1). pp. 19-68. Singapore. [PDF(to be added)]

Srihari, R. and W. Li. 1999.
Information Extraction Supported Question Answering. In Proceedings of TREC-8. Washington

Srihari, R., M. Srikanth, C. Niu, and W. Li 1999.
Use of Maximum Entropy in Back-off Modeling for a Named Entity Tagger, Proceedings of HKK Conference, Waterloo, Canada

Li. W. 1997.
Chart Parsing Chinese Character Strings. In Proceedings of the Ninth North American Conference on Chinese Linguistics (NACCL-9). Victoria, Canada.

Li. W. 1996.
Interaction of Syntax and Semantics in Parsing Chinese Transitive Patterns. In Proceedings of International Chinese Computing Conference (ICCC’96). Singapore

Li, W. and P. McFetridge 1995.
Handling Chinese NP Predicate in HPSG, Proceedings of PACLING-II, Brisbane, Australia.

Liu, Z., A. Fu, and W. Li. 1992.
Machine Translation System Based on Expert Lexicon Techniques. Zhaoxiong Chen (eds.) Progress in Machine Translation Research , pp. 231-242. Dianzi Gongye Publishing House.Beijing.
(刘倬,傅爱平,李维 (1992). 基于词专家技术的机器翻译系统,”机器翻译研究新进展”,陈肇雄编辑,电子工业出版社,第 231-242 页,北京)

Li, Uej (Wei) 1991.
Lingvistikaj trajtoj de la lingvo internacia Esperanto. In Serta gratulatoria in honorem Juan Rgulo, Vol. IV. pp. 707-723. La Laguna: Universidad de La Laguna http://blog.sciencenet.cn/blog-362400-285729.html

Li, W. and Z. Liu. 1990. Approach to Lexical Ambiguities in Machine Translation. In Journal of Chinese Information Processing. Vol. 4, No. 1. pp. 1-13. Beijing.
(李维,刘倬 (1990). 机器翻译词义辨识对策,《中文信息学报》,1990年第一期,第 1-13 页,北京)
[JPG1][

Liu, Z., A. Fu, and W. Li. 1989. Outline of JFY-IV Machine Translation System. In Journal of Chinese Information Processing. Vol. 3, No. 4. pp. 1-10. Beijing 
刘倬,傅爱平,李维 (1989), JFY-IV 机器翻译系统概要,《中文信息学报》,1989年第四期,第 1-10 页,北京

[JPG1][JPG2][JPG3][JPG4][JPG5][JPG6][JPG7][JPG8][JPG9][JPG10] 
(Its abstract published in Computer World 1989/7/26 [JPG])

Liu, Z., A. Fu, and W. Li. 1989. JFY-IV Machine Translation System. In Proceedings of Machine Translation SUMMIT II. pp. 88-93, Munich.

Li, W. 1988. E-Ch/A Machine Translation System and Its Synthesis in the Target Languages Chinese and Esperanto. In Journal of Chinese Information Processing. Vol. 2, No. 1. pp. 56-60. Beijing 
(李维 (1988). E-Ch/A 机器翻译系统及其对目标语汉语和英语的综合,《中文信息学报》,1988年第一期,第 56-60 页,北京)

Li, W. 1988. Lingvistikaj Trajtoj de Esperanto kaj Ghia Mashin-traktado. El Popola Chinio. 1988. Beijing [JPG1][JPG2][JPG3]

Li, W. 1988. An Experiment of Automatic Translation from Esperanto into Chinese and English, World Science and Technology 1988, No. 1, STEA sub Academia Sinica. 17-20, Beijing. [JPG1][JPG2][JPG3][JPG4]

Liu, Y. and W. Li 1987. Babelo Estos Nepre Konstruita. El Popola Chinio. 1987. Beijing (also presented in First Conference of Esperanto in China, 1985, Kunming) [JPG1][JPG2][JPG3]

Li, W. 1986. Automatika Tradukado el la Internacia Lingvo en la Chinan kaj Anglan Lingvojn, grkg/Humankybernetik, Band 27, Heft 4. 147-152, Germany.
[JPG1][JPG2][JPG3][JPG4][JPG5]

Other Publications

Chinese Dependency Syntax

SBIR Grants (17 Final Reports published internally)

Ph.D. Thesis: THE MORPHO-SYNTACTIC INTERFACE IN A CHINESE PHRASE STRUCTURE GRAMMAR

M.A. Thesis in Chinese: 世界语到汉语和英语的自动翻译试验 
–EChA机器翻译系统概述

《立委科普:Machine Translation》 (encoded in Chinese GB)

Li, W. 1997. Outline of an HPSG-style Chinese Reversible Grammar, Vancouver, Canada.

Li, W. 1995. Esperanto Inflection and Its Interface in HPSG, Proceedings of 11th North West Linguistics Conference (NWLC), Victoria, Canada. [PDF(to be added)]

Li, W. 1994. Survey of Esperanto Inflection System, Proceedings of 10th North West Linguistics Conference (NWLC), Burnaby, Canada. [PDF(to be added)]

《ChatGPT:人类语言的“通天塔”》

【立委按:ChatGPT 横空出世,标志着人类语言通天塔的建成,对于做了一辈子NLP的老司机,岂止是美梦成真。古人云,朝闻道夕死可矣。亲眼看到通天塔的建成对于我超过了朝闻道,感觉后去每一天就是赚着了,可以见证ChatGPT引发的信息产业的新生态大爆发。】

ChatGPT导读:

自然语言处理(Natural Language Processing,NLP),是AI皇冠上的明珠。AI主要分为感知智能和认知智能,从感知智能到认知智能的飞跃,主要的标志就体现在NLP任务的完成能力上。人类语言是人类知识的载体,把语言搞定,是进入人类认知智能的一扇大门。千百年来,消除语言障碍一直是人类的梦想。《圣经》中的巴别塔指的就是人类语言的通天塔,但这被认为是一种空想,注定无法建成。我们NLP从业人员也一直在追求这个梦想,感觉真比登天还难。


Download

但是,2022年11月30日,请记住这个日子,以美国人工智能企业OpenAI正式发布ChatGPT模型为标志,通天塔正式落成!它不仅成功消除了人类语言的障碍,还把人类和机器交互的桥梁也建立了起来。这个历史性时刻在国内当时没有引起大的反响,国内同胞不幸正处于疫情高峰期。两个月后等我们从疫情中走出来后,才发现人世间发生了如此巨变,一场ChatGPT海啸开始席卷海内外。

为什么说ChatGPT就是人类语言的通天塔呢?因为它的语言能力其实比Native还要Native:native speakers难免有口误和表达不规范,而大模型做底的ChatGPT的生成却总是那么地道、合乎语言习惯。从输入端来看,它能听,就没有它听不懂的语言,理解能力特别好。从输出端来看,它能说,常常口若悬河。最让人震撼的是,从它的语言表现我们看到了背后的“思维链”和一定的逻辑推理能力,给人的印象是条理清晰。输入输出的背后是大家称为LLM(Large Language Model)的语言大模型,我们用户看它就是个深不见底的黑洞,里面有很多层的神经网络,内部表示是多维向量,俗称“劳碌命”,是它在那里劳碌,分析理解,组词成句。这个“劳碌命”的工作以ChatGPT的形式表现出来,就完美实现了人机的自然语言接口。

我们看看ChatGPT背后的LLM怎么炼成的。这方面的技术性介绍已经汗牛充栋了,我们简述一下背后的原理。它的背后是GPT3,准确的说是被称作达芬奇的GPT3.5最新版作为基础。这个模型首先是规模超大,大力出奇迹。千亿级的tokens作为训练数据,形成千亿级参数的模型。研究表明,通用大模型到了一定规模以后会出现一种技能“涌现”现象,这些涌现技能稍加提示就可以在各种多任务中表现出色。以前笼统地归结为量变引起质变,基本上是把奇迹发生当成一个谜。就好像是说上帝的垂顾,现代版的愚公移山的故事:现代愚公大力不止,感动了上帝。现在看来并没有那么神秘:多任务能力的涌现必须以超大数据LLM为基础,这是因为没有LLM,就没有根据人类偏好来调教模型的空间。

从语言序列学到的生成大模型,最大的特点就是能产性,给一个前文提示,后续有很多种“接龙”的可能性,但这些可能性中只有很小的一个比例是我们希望看到也感觉得益的,还有很多生成是肤浅的、空洞的,甚至有毒的。ChatGPT的突破就是在这最后一步的调教中,以强化学习为核心,找到了一条与人类偏好对齐的有效的方法。这就好比有一个巨大的沙盆,里面装着1000颗大大小小的钻石藏在沙中,现在想把沙子倒掉,有没有一个好的办法倒完沙子又不倒掉钻石呢?试了很多次,几乎不可能。但可以粗线条操控,结果沙子是倒掉了,但也倒掉了900颗钻石。人们知道的是它有效地留下了一批合格的宝贝。能够这么做的前提是,盘子要大。能这么做,敢这么做,只有超大数据的模型。举个例子,正常的语料中,直接与翻译、问答技能相关的数据有多大比例?是个零头吧,数据规模不大的时候,从序列学习的模式中很难学到这些技能。但超大数据就不同了,小的比例乘以一个大数,就有了学习的条件和土壤,这时候如果模型足够大,这些技能就会被潜在地学到。在一个有几乎无限生成可能性的基础模型中,如果不做足后来的功夫,大概率生成的还是水货。于是“符合人类预期”就成为后期调教(fine tune)的最大目标。这个过程中,很多宝贝也给倒掉了,文献中称为 alignment tax(指的是打造自然语言接口模型为与人类对齐必须缴的“税”)。不怕,因为人们看不见被倒掉的宝贝,只要看见的是钻石就行。大模型有足够的冗余,不怕层层过滤和减枝。其实,不是大模型本身出奇迹,而是大模型为奇迹的出现准备了温床。

ChatGPT和以前的大模型不同的地方是它精心筹划了一个人类反馈的强化学习。对于一个通用的开放系统,人类其实也讲不清楚好坏,但是至少可以说你这一轮跟我的对话回答得好还是不好。拿这种反馈去强化训练和微调大模型,ChatGPT突然就显得善解人意了。人机交互从以前的人迁就机器,不得不编写代码,变成了机器迁就人,听懂人话了。这是一个巨大的转变。

强化学习在诸多学习算法中是很不好伺候的一种,因为链条长,而且对于最终目标的定义不是显式和直接的,而是间接以效果论英雄。调教说的是把原基础模型的大概率水货压下去,让隐藏在原模型中的小概率宝贝浮上来:孩子才是符合人类预期的强化目标,但并不是特定的孩子作为优化目标。反正这个世界没有唯一的答案形式,生成通常没有黄金标准,我们有的就是模模糊糊的人类基于偏好而给的反馈:这个回答好,那个是胡扯;这个对路,那个是歧视。能够较好利用这种终局反馈的典型方法正是强化学习。这个反馈回路一旦建立起来,模型可以不断强化和迭代, 表现自然越来越好。于是,强化到了公元2022年11月30号,帷幕揭开,这是人类见证奇迹的时刻。

如实说,我一辈子从事NLP,从没想过在有生之年能够看到这样的奇迹。老祖宗说过,朝闻道夕死可矣。亲眼看到通天塔的建成对于我超过了朝闻道,感觉后去每一天就是赚着了。ChatGPT到现在已经过去3个月了,还是感觉像在做梦一样。有时看着ChatGPT的图标出神,反问自己,这难道就是通向新生态星辰大海的语言之门吗?不得不说,所有的迹象都表明,ChatGPT的背后有着无限的可能性。

 

The ChatGPT Tsunami and Its Impact on IT Landscape and New Ecosystem

This is my recent invited talk given to young entrepreneurs on the LLM and ChatGPT ecosystem.  

1. ChatGPT:  "Tower of Babel" for Human Languages

Natural Language Processing (NLP) is the crown jewel of AI. AI is mainly divided into perceptual intelligence and cognitive intelligence, and the leap from perceptual intelligence to cognitive intelligence is mainly reflected in the ability to complete NLP tasks. Human language is the carrier of human knowledge, and mastering language is a gateway to entering human cognitive intelligence. For thousands of years, eliminating language barriers has always been a dream of mankind. Babel in the Bible refers to the tower that mankind wished to build to overcome barriers of human languages, but it was considered to be impossible to build. We NLP practitioners have also been pursuing this dream, hoping to get closer to the final goal of overcoming the language barrier.


Download

However, on November 30, 2022, remember this day, with the official launch of the ChatGPT model by the American artificial intelligence company OpenAI, the Tower of Babel was officially completed! It not only successfully eliminated the language barriers for mankind but also established a bridge between humans and machines. In no time did we all realize that a ChatGPT tsunami had swept across the world.

Why is ChatGPT judged to be the Tower of Babel? Because its language performance is actually more "native" than native speakers: native speakers inevitably have slips of the tongue from time to time, but the large generative language model like ChatGPT is difficult to make such mistakes and seems to be always in line with language habits. From the input side, it can understand any human language. From the output side, it can speak fluently. What is most shocking is that from its language performance, we can observe what is called the "Chain of Thought" (CoT) behind its responses, with certain logical reasoning abilities, giving people the impression of being clear and organized. Behind the input and output is the so-called LLM (large language model, GPT in particular), which is like a bottomless black hole to users. Inside are actually many layers of neural networks, represented internally as multidimensional vectors, which house a ton of knowledge. 

Let's take a look at how the LLM behind ChatGPT is developed. There are already tons of technical introductions on this topic, and we will briefly describe the underlying principles. Its basis is GPT-3, or more precisely, the latest version called text-davinci-003. This model is first of all extremely large in scale, and its size is believed to have made miracles happen. With billions of tokens as training data, it forms a model with billions of parameters. Research has shown that generic large models will exhibit an "emergence" of certain skills once they reach a certain scale, and these emerging skills can perform well in various multi-task scenarios with minimal prompting. Previously, this phenomenon was generally attributed to the "transformation of quantity into quality", and it was basically treated as a mystery in philosophical terms. It is like saying that everything is attributed to God's favor.

In my understanding, it is not that mysterious, but a reasonably natural result as the emergence of multi-task skills has to be based, and can only be observed, on a super-large data model.  This is because otherwise, there is no sufficient space for the model to tune itself based on human preferences. Large language models are learned from text sequences, and their greatest feature is their ability to over-generate, giving many possibilities for subsequent sequences like "chain reactions", but only a small percentage of these possibilities are desirable and beneficial. Many generations may be shallow, empty, or even toxic. ChatGPT's breakthrough lies in the meticulous final fine-tuning process, using reinforcement learning as its core, it found an effective method to keep aligned with human preferences. This is like having a huge basin with numerous children bathing inside, and now you want to pour out the bathwater without pouring out the children. It is almost impossible. But if you can afford to lose some, the result is that the water is poured out, with some good children still inside the basin to help the case. The premise of doing this is that the basin must be large. Only super-large data models can achieve this with sufficient abilities left for numerous tasks. For example, what proportion of parallel translated text or of data of question-and-answer pairs is there in a normal language raw corpus? It's a tiny tiny fraction, and when the data size is small, it is hard to learn the translation or question-answering skills from sequence-based learning. Only with super-large data and model can the small proportion multiplied by a large number of tokens create the necessary conditions and soil for implicit learning of such skills. In a basic model with almost infinite generation possibilities, if enough work is not done in a later stage, the probability of generating useless responses is high. Therefore, "aligning with human preferences" becomes the ultimate goal of fine-tuning. In this process, many children were also poured out, which is called the "alignment tax" in the literature. But it doesn't really matter, because people can't see the lost treasures, as long as they see the good results, it's fine. Large models have enough redundancy and can survive filtering and pruning at all levels. In fact, it is not the large model itself that creates miracles, but the large model prepares a warm bed for miracles to happen.

What makes ChatGPT different from previous large models is that it has carefully planned for reinforcement learning from human feedback. For a generic open system, humans cannot really pinpoint where it is right or wrong, but at least they can say whether the response is good/useful or bad/no-value. Using this type of feedback to reinforce the learning and to fine-tune the large model, ChatGPT suddenly becomes very human-like. Human-machine interaction has changed from humans accommodating machines and having to write code, to machines accommodating humans and understanding human language. This is a huge transformation.

Reinforcement learning is relatively a difficult type of learning algorithm compared with other supervised learning approaches because it involves a long chain and the definition of the ultimate goal is not explicit and direct, but indirect based on the final outcomes. The idea behind training is to suppress the high probability of poor performance in the original model and bring out the low probability gems hidden in the model: the child is the reinforcement target that conforms to human expectations, but not a specific child as the optimization target. In any case, there is no unique answer format in this world, and there is usually no golden standard for a generation. What we have is the fuzzy feedback given by humans based on preferences: this answer is good, that one is nonsense; this one is correct, that one is discrimination. A typical method that can make good use of this terminal feedback is reinforcement learning. Once this feedback loop is established, the model can be continuously strengthened and iterated, and its performance will naturally improve. So, after some meticulous learning from human feedback, on November 30, 2022, the curtain was lifted, and this was the moment when humans witnessed the miracle.

To be honest, I have been engaged in NLP for my whole life, and I never thought I would see such a miracle in my lifetime. It has been three months since ChatGPT was created, and it still feels like a dream. Sometimes I stare at the ChatGPT icon and ask myself, is this the language gateway to the new ecological universe? I have to say that all the signs indicate that ChatGPT has unlimited potential for NLP.

Let's take a step back and review the contemporary history of the golden decade of artificial intelligence.

Ten years ago, in the ImageNet competition, deep learning overwhelmingly crushed all other machine learning performances in the image field, triggering a landmark neural network revolution. Deep neural networks rely on supervised learning of big data. Since then, we have known that as long as the data is large enough and labeled, deep learning can handle it. After sweeping through image, speech, and machine translation, it encountered the stumbling block of NLP because many NLP tasks do not have large-scale language data with labels.

Five years ago, the NLP field saw the emergence of large language models (LLMs) represented by BERT and GPT. LLM can directly "eat" language without the need for annotations, which is called self-supervised learning in academia. LLM marks the arrival of the second revolution, which pushed NLP to the center of AI and became the core engine of cognitive intelligence. AI finally overcame the dependence on labeled data which had been the knowledge bottleneck for NLP, leaping from perception to cognition.

Three months ago, ChatGPT was born, creating an almost perfect human-machine natural language interface. From then on, machines began to accommodate humans, using natural language to interact, rather than humans accommodating machines, using computer language. This is a groundbreaking change.

From the emergence of LLM to the advent of ChatGPT, it truly externalized both its linguistic talent and its knowledge potential, allowing ordinary people to experience it. Looking back, human-machine interaction and its related applications have been explored for many years, but before ChatGPT came out, it had never really been solved. When the GPT-3 model was launched two years ago, skilled players of us already knew how capable it was. As long as you give it a few examples, it can follow the examples to accomplish various NLP tasks, so-called few-shot learning. It does not require major modifications to the large model or large-scale labeled data. With just a few examples, GPT-3's potential can be unleashed to accomplish various NLP tasks, which is already amazing as it overcomes the knowledge bottleneck of supervised learning. However, the basic limitations of these amazing performances of LLM are mostly known within a small circle of players, and a language bridge is needed for its true breakthrough. ChatGPT has come forward with its biggest feature, zero-shot learning, which means that not a single labeled sample is needed, and you can directly tell it what to do. After five years of supervised learning and five years of self-supervised learning of the deep neural network revolution, the final result has been delivered, and the ChatGPT Bebel tower has been fully constructed, marking the pinnacle of the golden decade of AI. ChatGPT has since been like a tsunami, stirring up the world and causing a sensation all over. 


Download

Looking at the history of AI from a broader perspective, 30 years ago, the main approach to NLP tasks was through symbolic logic. Symbolic routes and machine learning are the two paths that have alternated in dominance in AI history every 20-30 years, like a pendulum. But in the past 30 years, machine learning has been on the rise as the mainstream, with the deep learning revolution in the last 10 years. The pendulum shows no sign of swinging back. We practitioners have been on a long journey of the symbolic rule system. It is not in the mainstream, rarely even mentioned by anyone, but it has not been lacking in its own innovation with its own differentiated advantages. It is worth noting that the symbolic parser has eventually embraced data-driven empiricism and relies on a pipeline of multiple modules to ultimately deal with the hierarchy of language structures. We call this deep parsing. Similar to LLM, deep parsing consists of many levels (around 50-100 levels) of bottom-up processing. It also first digests the language but parses incoming sentence sequences into internal symbolic graph structures, rather than LLM's vector representations. Although deep parsing and deep learning take different representation schemes, both empower downstream NLP tasks, one with structures and the latter with vectors, both greatly improving the efficiency of downstream NLP tasks. Of course, LLM is still the stronger player because it not only masters syntax structures but also performs exceptionally well in discourse and computational styles, the former involving long-distance discourse relationships and the latter capturing subtle differences in language expressions.  Discourse and computational style pose a significant challenge to parsers that primarily focus on sentence structures.

There have always been two main lines in AI. In addition to machine learning, there is traditional symbolic logic, which rises to the philosophical height of rationalism versus empiricism. These two paths have waxed and waned over the past 30 years, with machine learning on the rise and symbolic logic disappearing from the mainstream stage, although the industry has never given up on its use. The transparency and interpretability of symbolic logic translate directly into the convenience of engineering fixed-point error correction, which contrasts with LLM's black-box-like internal vectors. LLM can use retraining to macroscopically improve, or use fine-tuning or few shots to induce. LLM cannot do pinpoint correction or debugging like in surgery. LLM's lack of interpretability also often causes user concerns and confusion in practical applications. Perhaps one day in the future, the two paths will converge at a point where a new AI revolution will occur.

From the perspective of AGI, we see that almost all models before LLM were specialized, and the narrower the task, the better the performance. One exception is the parser, which is in essence the "symbolic foundation model" in the pre-LLM era, empowering downstream NLP tasks with structures, just like LLM does with vectors. From a more general perspective, the emergence of LLM represents a breakthrough in the development of artificial intelligence towards achieving AGI, or Artificial General Intelligence. AGI has long been a controversial goal, and many scholars, including myself, have doubted or even mocked its feasibility. However, with the advent of LLM five years ago, AGI became more scientifically viable, rather than just a Utopia. OpenAI, which champions AGI, has become the shining star in this field, having delivered a long list of influential LLM general models that include the GPT series for NLP, Codex for code writing and debugging (eventually used for Microsoft's Co-pilot service), and DALL-E for image generation.

With ChatGPT as the pinnacle, large models have taken over all NLP tasks simply by using natural language as instructions, not only those defined by the NLP community but also many user-defined tasks. Its NLP tasks are completely open. Tasks related to language and knowledge can be attempted in any language, and often the results are immediate and magical at the same time. Someone has listed 49 task scenarios that it can handle, but it can actually do much more than that.  In addition, new scenarios are being discovered all the time. This is an unprecedented phenomenon in the history of AI, which the industry calls "skill emergence".

We can examine why it is so capable and knowledgeable. Overall, human systematic knowledge is largely expressed in language. Human knowledge is mainly carried in the form of text (written language), and mathematical formulas can be seen as an extension of written language. From a linguistic perspective, human knowledge can be divided into linguistic knowledge and knowledge beyond linguistics. Linguistic knowledge includes lexicon knowledge, syntax, morphology, discourse, style, etc. Knowledge beyond linguistics is a much broader circle with a much wider boundary. Large language models have not yet mastered human knowledge as a whole, and it seems that they have managed to capture some knowledge floating on top of the sea of human knowledge. As for ChatGPT, it can be said that it has mastered almost all of the linguistic knowledge, but only about 20% of human knowledge in general, including common sense, basic logic, and encyclopedic knowledge. It calls for more serious research to quantify it properly, but in the ballpark, it feels like about 20% of the knowledge has been learned, and the remaining 80% is still not within reach. However, the law of large numbers applies here, namely the 80-20 rule, which means that mastering 20% of the knowledge floating on top in effect covers 80% of the scenarios. However, since there is still an 80% knowledge gap, it still pretends to know things it doesn't from time to time.  Given that, LLM can still reshape the ecosystem and the world if we learn to use its strengths and to handle its weaknesses wisely.

How do we judge whether it has learned and how well it has performed a task? In any NLP task, there is a quality assurance (QA) protocol to follow, which requires at minimum a test set of annotated samples. Currently, ChatGPT uses zero-shot learning (i.e. zero samples), where a random task is assigned to it and once it is done, it moves to a new task, so there is no chance for building a persistent test set.  So its performance on result quality cannot be quantified directly. In such cases when the internal testing protocol is missing or no longer applicable, external methods must be used to evaluate the data quality indirectly, such as customer surveys or using my previous company Netbase's social listening service to collect customer feedback online. All the external signs indicate that customer satisfaction seems to be over 80%, and in most task attempts, customer needs are met fairly well, at times with nice surprises and miracle-like performance. Another relatively objective external indicator is user stickiness and growth of user accounts.  ChatGPT has set unprecedented records in this regard, with tens of millions of users in just a few weeks. ChatGPT's customer growth rate exceeds everyone's imagination.

In conclusion, ChatGPT represents a major breakthrough in the field of natural language processing and artificial intelligence. As a large language model, it has revolutionized the way we approach NLP tasks and has demonstrated remarkable versatility and capability. However, it is important to keep in mind that ChatGPT is not perfect and there is still much work to be done in terms of improving its performance and addressing its limitations.

Despite these challenges, ChatGPT has already had a profound impact on the field of AI and is poised to continue shaping the future of technology in significant ways. As AI continues to evolve and advance, it is likely that we will see more breakthroughs of LLMs that push the boundaries of what is possible and help us achieve even greater levels of understanding and innovation.


Download

Over the last three months, there has been no end of online forums, discussions, and talks about ChatGPT, and there is still no sign of aesthetic fatigue. Recently, the former head of Y Combinator China Dr. Lu Qi came to Silicon Valley to give a passionate speech, which added fuel to the fire. He compared ChatGPT's revolution to Web-1. As we all know, the iconic brand that represented the first Internet boom was the Netscape browser. Although Netscape did not grow to a large company, it was the internet revolution it started that created giants like Yahoo, Google, and Amazon. A similar revolution occurred in China, giving rise to world-class companies such as Baidu, Tencent, and Alibaba. Lu Qi believes that we are right now in such an era. He said that the roadmap is so clear, and the trend is so obvious that he has absolutely no doubt in his mind. Overall, I largely agree with his view of technological trends and landscape.

ChatGPT marks the emergence of a new era. Some people say that this is the "iPhone moment" or "Android moment" in the history of contemporary information technology and will lead to a brand-new ecosystem. I feel that Lu Qi's comparison is more comprehensive, as ChatGPT is like the "Netscape browser" that initiated the first Internet revolution. Regardless of the comparison, it is a game-changer.

However, it is essential to note that ChatGPT also has its shortcomings and challenges. One issue that everyone has noticed is the so-called hallucinations, in fabricating details and distorting facts. Although ChatGPT has conquered any form of human language, it has only scraped the tip of the iceberg of cognitive intelligence. Is it possible for LLM to solve this problem completely? In my opinion, the LLM route alone will not solve cognitive intelligence. As mentioned earlier, ChatGPT has only covered about 20% of human knowledge. Even if LLM continues to expand several orders of magnitude in sequence-based learning, in my estimates it can at best reach 40%-50%. The remaining 50% is a deep sea that can hardly be fathomed. The long tail of knowledge is an absolute explosion of combinations, way beyond the reach of sequence-based language learning. The annoying behavior is that for any knowledge beyond its ken, LLM will not hesitate to fabricate it with fake details that appear genuine. This is a severe problem. The accuracy defect of such long-tail knowledge is an inevitable problem for application services based on LLM.

Moreover, there are many other issues that need to be overcome. For example, when a large model empowers downstream scenarios, how can customer privacy and security be protected during the process of calling the large model? This problem has not yet been solved, but it is believed that better solutions will develop in time. The supplier of large models will surely pay special attention to this issue and provide solutions for their ecosystem's development.

Another issue is the complex reasoning ability. From the conversations of ChatGPT, we observe that it already has basic reasoning ability. The source of this ability is very interesting. It mainly benefits from self-supervised learning of the massive computer code base. The GPT3.5 on which ChatGPT is based has been trained not only on human natural language but also on massive available open source code written in various computer languages on GitHub, and most of the code has corresponding natural language explanations (comments) too. Since computer code is by nature more logical than natural language, this has helped ChatGPT to organize its response and speak more coherently. This was said to be a nice surprise that the developers themselves had not anticipated. However, it currently still has shortcomings in complex reasoning logic. Fortunately, complex reasoning ability is different from the boundless knowledge network. It is a relatively closed logical set, and it is believed that it can be solved in not too far a future (perhaps GPT4 might already be able to handle it?).

Lastly, let's talk about the progress of multimodal learning. LLM, as the basic model, has been validated in NLP multi-tasking and has performed exceptionally well. After the breakthrough in NLP, the framework for empowering downstream tasks with a basic model began to radiate toward other modalities. This direction of research is very active in the academic field of multimodal learning. Everything is still ongoing. Currently, the level of multimodal learning in practice is still in the stage of prompt engineering. What is lacking is a natural language interface. People who play with prompts in large models for image and music generation already know the huge potential and effectiveness of the basic model. It is very similar to the situation when we played with few-shot prompts in the GPT-3 playground before ChatGPT was born. It can be foreseen that in near future, a smooth natural language interface will emerge, and users will be able to describe the art they desire, whether it is a painting or a song. The work of aligning with human taste is also ongoing. It is predicted that a natural language to image (NL2img) model like "ChatDalle", similar to ChatGPT, will implement the desired natural language interface. The same trend is bound to happen in natural language to music (NL2music). We are in an exciting new era of AIGC (AI-generated content) for art creation.

Another predictable picture is that based on the trend of multimodal LLM, there will eventually be a unified large model that integrates various modalities and their associated knowledge. The breakthrough of this model barrier will provide critical support for entrepreneurs to utilize LLMs to empower downstream applications in various scenarios. As we all know, whether it is finance, law, or medicine, each major vertical has its accumulated long-standing structured symbolic knowledge base, including the domain ontology and other databases. How to connect to the domain's symbolic resources involves breaking the domain barrier. It is expected that this barrier will be largely solved in the next two to three years.

2. LLM Ecosystem Facing Reshuffling

The direct impact of the ChatGPT tsunami is that the NLP ecosystem is facing a reshuffle, and every existing information product or service must be re-examined in the context of LLM.

When we first discussed ChatGPT’s impact on IT services, the first thing that came to our mind was how to combine ChatGPT with search technology, and whether it could re-invent search.

Search is traceable, and every returned result is recorded, so it involves no information fusion. ChatGPT is untraceable and excels at information fusion: ChatGPT has no possibility of plagiarism in essence. Every sentence it spits out is novel sequence based on its digested information sources. Apparently, traditional search and ChatGPT have their own respective advantages and disadvantages. Search is the king of information services, ubiquitous, with a very stable business model. Since the rise of search in the Web 1.0 era, the form and mode of search have basically not changed for more than 20 years. In fact, new technologies and entrepreneurs have been trying to challenge search continuously over the years, and the venture capital industry has also been paying attention to potential search subverters that may become the "next Google", but the status of search has always been unshakable, at least until now. But this time is different. Microsoft has exclusive code authorization for ChatGPT and has boldly launched the so-called "new Bing". Google, who has dominated the space for so long, has to mobilize urgently and confront it head-on. A drama of search+LLM is unfolding, like a live drama, telling us that although there are still many difficulties to overcome in integrating these two technologies, the trend is unstoppable, and reshaping a new ecology of search is imperative.

In addition to search, those finely polished directional information products and services now face the fate of being re-examined and reformed, including chat, virtual assistants, grammar correction, machine translation, summarization, knowledge Q&A, etc. The representative services in these areas (Siri, Grammarly, etc.) used to have high technological barriers, which have suddenly been lowered.  Although many products are not facing a catastrophic crisis due to years of polishing and user inertia, some may still exist for a long time, after all, they are all on a downhill road. This is a revolutionary victory of general AI over traditional AI. It is something we would not believe feasible before. We used to be so skeptical of the general approach, waiting to see the joke of those who advocated AGI, such as Open AI who managed to launch a series of impressive LLMs (GPT series, Codex, DALL-E) including ChatGPT.

Look at Siri, which was released by Apple 13 years ago. 13 years is longer than the entire golden decade of the deep learning revolution, but Siri has only recently managed to offer 2-round or 3-round conversations. Amazon's popular product, Alexa, is the same. It has been polished for several years and accumulated so much user data. Now, with the advent of ChatGPT, what will Apple and Amazon do? They must embrace LLMs.

Next is the commonly seen e-commerce customer service. As we all know, Alibaba and JD.com's online after-sales customer service has been polished to perfection. Because after-sales service issues are relatively concentrated, the problem set is not large while the data are large, accumulated over the years. However, customer service is not only limited to post-sales.  In order to handle customer service smoothly, LLM cannot be ignored.

Moving on to education, it's clear that the ChatGPT model has the potential to revolutionize all education products and services. Anyone developing educational applications will need to reconsider how to embrace LLMs within the framework of the large model. Education itself deals with language, regardless of whether it is related to arts or science. Although the current large model is not particularly strong in science and engineering (yet), this knowledge gap will be filled to varying degrees soon. ChatGPT is sure to disrupt education, while also providing the largest opportunity for modernizing education. Language learning and computer programming education are obvious areas for ChatGPT to shine, as the model itself is a language model. Although its programming abilities are not yet at the level of professional engineers, it is proficient enough in common code formats to assist with programming and with the learning of programming. In fact, Co-pilot, which has been empowered by the GPT codex, has already become an auxiliary tool for more and more programmers.

Stepping back, we are also facing a huge risk, such as fake news. If one wants to promote a company or product, one can now use ChatGPT to generate all kinds of promotional posts that sound convincing. In the future, those online reviews and comments will also be obscured by fake news, as the cost of creating fake news approaches zero. Without proper precautions, all of this could place humanity in a world where truth and falsehood are indistinguishable. All along, we have been talking about the benefits of LLM and how it can empower new ecosystems for productivity explosion. We expect that in the next five to ten years, new international IT giants like a new Google or New Alibaba will emerge under this new ecosystem, leading to a major transformation in the technology ecosystem. But the danger of LLM misuse is equally great. Is mankind ready for it? Clearly not. Of course, this is another topic, and we will leave it there for now.

3. Wave of Mass Entrepreneurship Coming

With LLM (ChatGPT in particular), there are more product forms and services waiting for entrepreneurs to explore.

Regarding this topic, we need to emphasize the unprecedented entrepreneurial conditions brought by ChatGPT. ChatGPT itself has become a testing ground for products. It is a playground with an infinitely low bar that everyone can play in. The low bar is due to the paradigm shift in human-machine interfaces mentioned earlier. For the first time in AI history, machines began to cater to humans, rather than humans catering to machines. Human language, rather than computer code, became the tool for human-machine interaction. The significance of this change for the new ecology of NLP is difficult to overemphasize. In fact, this provides conditions for "mass entrepreneurship".

Those who have started AI businesses should all have this experience. The most basic condition for a startup team to have a chance of success is that the product manager and the technical leader can work closely together and communicate effectively. The product leader, relying on their market intuition and understanding of customer needs, strives to find the best market entry angle for technology to be transformed into a service and form a product design plan. The feasibility of this design plan needs to be endorsed and then developed by the technical leader. However, often due to different professional backgrounds and knowledge structures, the situation where the product manager and the technical leader talk past each other is not uncommon. Once this situation arises, the startup company is basically doomed to fail.

ChatGPT fundamentally eliminates the problem of talking past each other. Previously, only the technical leader and programmers could verify the feasibility of a plan, but now, the product leader/CXO, engineers, data analysts, and users with different backgrounds and expertise all have a unified platform, ChatGPT, on which they can illustrate product ideas. Everyone can simulate services on it. Not only has the communication barrier between humans and machines been overcome, but also the communication barrier between different teams. The emergence of this thing is a precondition for a product explosion and mass entrepreneurship.

In the United States, hundreds of startups are now exploring ideas of downstream products and services following ChatGPT or the backend LLMs. While the upstream big models are still rapidly progressing, what they are doing downstream is already in active development. There are countless ordinary people sharing their stories online, showing how they can earn 5,000 dollars using ChatGPT in just two or three hours. This kind of sharing means that the entrepreneurial enthusiasm of grassroots people has been mobilized. It seems that everyone can use this opportunity to find an entrepreneurial perspective. Summarizing these grassroots ideas may also lead to new tracks that can be standardized and scaled to meet market demands.

A big model like ChatGPT is ultimately an operating system-level existence. Every AI-related information product and service, especially those related to language and knowledge, cannot do without it. When Intel dominated the market, the famous logo was "Intel Inside". In the future, it will be "Chat-Inside", or more accurately, "Chat-In&Out". Why in and out? When a big model like ChatGPT empowers products, it is both like a waiter and a chef. The waiter can take your order, interact with you, and understand your needs while also doing the cooking and delivering the service. It requires both language talent and knowledge skills. This is what we call the LLM expert workbench, which may be the biggest new ecological form in the next five years and may open countless doors for entrepreneurship. The basic service form is online information services in various industries, whether it is online education, online lawyers, online consultants, online finance, or online tourism. All are aimed at significantly improving service efficiency. With ChatGPT, you only need to hire one expert to replace the 10 experts that were previously needed to handle tasks. The end result is a productivity explosion.

In conclusion, the wave of mass entrepreneurship is coming, and ChatGPT has brought unprecedented entrepreneurial conditions. It has become a testing ground for products with an infinitely low bar that everyone can play in. The emergence of this technology has eliminated communication barriers between humans and machines and between teams, leading to new tracks that can be standardized and scaled to meet market unmet needs. The future of ChatGPT as an operating system-like existence may be the biggest new ecological form in the next five years, called the LLM expert workbench, which open doors for entrepreneurship and will lead to a productivity explosion.

At this point, the application ecosystem seems very clear. The principle is that experts must be the final filter before delivering the results (human judge as final filter). This is the basic setup, but experts may also provide input prompts to inspire LLM to produce better results.

For almost every application scenario, there is a task to create an expert workbench, including supplementing existing products or services, such as every segment of online education, as well as online doctors, lawyers, financial consultants, etc., and exploring previously unthought-of business scenarios. This is a visible transformation or reshuffling of the ecosystem, providing efficient expert advice (expert-in-loop services).

Speaking of workbenches, e-commerce giants have built relatively large customer service workbenches, which were introduced when user needs and satisfaction could not be met with fully automated solutions or with fully manual solutions. Now with LLM, this form can be extended to all online service sectors. The productivity explosion that this can bring about is beyond imagination.

The design concept of "Human as Judge" has been validated for several years in low-code platforms (such as RPA platforms, parser-enabled information extraction platforms, etc.) for its effectiveness and efficiency. Here, we are talking about a completely new form, where humans only need to act as judges to complete the service. It is now entirely possible to create online information service workbenches tailored to various segments or scenarios, with experts sitting in the background. Specifically, the expert's role is only to make the decision based on their knowledge and experience, especially at the final "go or no-go" moment. Being a judge is much more efficient than being an athlete.


Download

It is worth emphasizing that ChatGPT brings something new as enabling information technology, as it serves both at a backend and a frontend. It can perform well in high-level and low-level tasks, which is why chat is just the surface of ChatGPT, and its essence is a human-machine interface. Its ability to complete various NLP tasks is at its core. With both surface and essence, downstream products or services can be built around it. In the Intel era, computer product brand advertisements were remembered as "Intel inside," and in the future, the new ecology should be called "chat in&out," which refers to the new ecology empowered by LLM, not only empowering the human-machine interaction but also empowering the professional services, with only experts providing the final check. In this form, the experts are behind the scenes. To put it another way, LLM is both a waiter and a chef, but an expert needs to review the food and take responsibility before it is served to ensure service quality (such as online doctors, lawyers, consultants, etc.).

In such an ecosystem, the next five years will be a period of explosive growth for online services. Fortunately, the three-year pandemic has greatly promoted the grassroots awareness of online services, helping to cultivate user online habits and develop the market.

While LLM is powerful in terms of breadth of knowledge, it also has its limitations in terms of precision. The key challenge in building an expert-in-loop service is to overcome the precision bottleneck of LLM. The goal is to raise the precision to a level where it does not significantly impact the efficiency of the expert's work. If at least 1/4 of the results generated by LLM can match the level of a manual expert's research, then the efficiency of the expert-in-loop service can be ensured. This is a feasible expectation, and the current solutions are not far from meeting this threshold. With this in mind, we conclude that the door to entrepreneurship in the new ecology of LLM has indeed been opened.