Zhaohua Wushi, Ch.15: Following My Mentors into the Field

Just after the Mid-Autumn Festival, our eldest sister passed on the sad news: Master Liu Zhuo, my foundational mentor and a guiding light of our generation, had passed away at the age of 89.

🎬 In Memoriam: Mentor Liu Zhuo, 2022 (Video)

Forty years ago, I filled in “Machine Translation”(MT) on my graduate school application to the Chinese Academy of Social Sciences. My heart was full of awe and a sense of mystery. When I first entered the field, I worked on foreign-to-Chinese machine translation. I never dared to touch Chinese-to-foreign translation — Chinese grammar was so resistant to formalization, it felt impossibly difficult.

Machine translation is the oldest application area in natural language processing(NLP), dating back to the early 1950s. It has carried the youth and dreams of 啊couple of generations, Chinese and foreign alike — including my own younger years. Today, the dream has become reality. Embedded machine translation is everywhere, a smartphone app that ordinary people summon at will and dismiss just as easily, quietly serving hundreds of millions of users every moment of every day. My daughter uses it to study Chinese, to study Spanish, and to browse Japanese anime websites — she uses it so much that she takes it completely for granted, treating machine translation as an entitlement. Only when a translation goes hilariously wrong does she notice its existence, occasionally mocking it: “So dumb.” But machine translation never complains and never tires. For my daughter’s generation, machine translation has become a natural part of life. While I am full of its history and lore, I don’t really know how to tell her these stories. Through snippets of my scattered words, she seems to dimly sense that machine translation holds a special meaning in her father’s life, but I still can’t recount those tales with the same fluency and fondness I would share with a peer, conveying the weight and sanctity that machine translation carries in my heart. This is more than just an ordinary generation gap — it’s the gulf created by the leapfrog advance of technology, giving two generations fundamentally different perspectives. There’s comfort in that, and also a pang of wistfulness.

My personal observation is that two kinds of people especially appreciate and show patience toward machine translation. One is the older generation of web users who don’t know foreign languages — suddenly the whole world’s internet has opened up to them, with the joy of the blind regaining sight. The other kind is fossil-level veterans like me, who have built machine translation systems themselves and know firsthand how hard it is, and therefore feel compelled to cheer every breakthrough. In recent years, the neural machine translation revolution has elevated MT to genuinely expert-level quality — translations are now fluent and natural, completely free of the stiffness and “machine flavor” of the pre-neural era. One could say that humanity’s thousand-year dream of breaking down language barriers for free communication is being realized before our very eyes. (With the breakthrough in Large Language Model (LLM), the linguistic Tower of Babel for AI has been built.)

Looking back, I consider it a gift from heaven when a student of natural language processing begins their career with symbolic machine translation. This kind of practitioner no longer exists in the new generation — they have all kinds of low-code platforms and resources at their disposal. If you were forced to build rule-based machine translation without any platform support, you suffered, but you were also blessed. You had to build everything from scratch: dictionaries, sentence segmentation (tokenization), part-of-speech tagging, phrasal chunking, SVO syntax and logical semantics. You also had to handle bilingual structural transfer, word sense disambiguation, and finally target-language generation — including morphological generation, re-ordering, and rhetorical smoothing. In short, from linguistic analysis to bilingual transfer to language generation, you had to cover every aspect completely. Without a platform or a domain-specific language, using only general-purpose programming languages as we did for our master’s theses — COBOL, ALGOL, BASIC, even assembly — it was like being tempered in Laozi’s Eight Trigrams Furnace. You couldn’t help but emerge with piercing, flame-tested eyes. Today’s computational linguistics graduate students, by contrast, casually download a software package and zero in on a single subtask — word segmentation, sentiment classification, word sense disambiguation — and even when they do full MT, they never need to touch that many layers and low-level details (Note: this was written pre-LLM days: as we know, after LLM burst into existence, LLM eats up not only MT but also the entire NLP). By a fortunate twist of fate, I received the baptism of early machine translation, studying under the Two Liu's — the founding fathers of Chinese machine translation. This has been a lifelong treasure, and it has also steeled my resolve to “carry forward the lost learning of past sages,” passing the torch of symbolic NLP to the next generation.

A photograph with mentor Liu Zhuo and Sister Aiping, taken before I left for overseas in 1991.

I cannot forget those years after earning my master’s degree, when I stayed at the Institute of Linguistics and worked alongside Sister Aiping under Teacher Liu Zhuo’s leadership, developing every step of the JFY machine translation system. Prof. Liu personally designed a domain-specific NLP language to implement JFY, from the interpreter to the controller, from expert dictionaries (idiosyncratic rules) to sentence-pattern transfer (general rules) — the entire system architecture and pipeline. This project condensed decades of design principles and algorithms from MT research and exploration. Prof Liu had a remarkable skill: without any debugging tools, he could locate system problems purely through intense mental concentration. Often, when the system had a bug, Prof Liu couldn’t sleep at night — the program would turn around and around in his head. Many times he would catch the bug in the middle of the night and rush to the computer the next morning to test his fix; the problem would usually be solved. Sometimes the bug was too deeply hidden to untangle mentally, so Sister Aiping and I would help him with “manual stepping” — literally step-by-step tracing — which could take several days of meticulous work before we discovered the logical flaw. The joy we felt then was like winning a battle. Software engineers today would probably find it hard to imagine how one could code a system in a development environment with no debugging tools at all, but that is precisely how Prof Liu led us, grinding it out day by day.

I found Prof Liu Zhuo’s papers quite a headache to read back then. But his papers were packed with substance — like the dried bean curd of Ma’anshan, tough but rewarding to chew on. I’ve lost count of how many times I read them. He is the old gentleman who commanded deep respect in the field, and with good reason — he had the hard currency of real achievements. Yet more than half of the people who admired him couldn’t understand a word he was saying. It was a curious phenomenon: partly because Prof Liu was not particularly good at popular science communication, and partly because he genuinely had little time, nor any interest in trivialities.

In 2013, I was invited to give a talk on big data NLP at the Chinese Academy of Sciences. Prof.  Liu Zhuo was hospitalized at the time for surgery, and Sister Aiping took me to visit him. I was relieved to see him in good spirits and recovering well after the operation. We chatted about the past and present of NLP. The fine-grained, robust pattern-matching analysis methods that Prof Liu had pioneered years ago remain effective tools in many scenarios where annotated data is scarce, and they still have a role to play in the big data era. Times have changed — today’s computing hardware and software have been upgraded from shotguns to cannons, making it possible to run NLP across hundreds of millions of documents. At scale, everything changes. Miracles happen through quantitative transformation, and we are both creating and witnessing such miracles. And none of this would have been possible without the nurturing and teaching of my mentor all those years ago.

I told Prof Liu that in the age of big data, we can leverage cloud computing — renting hundreds of virtual machines to process vast amounts of data in parallel, performing deep syntactic parsing on hundreds of millions of documents, and extracting and mining public opinion. Such a scale was unimaginable in the old days. Precisely because of the natural information redundancy in big data and the processing capacity that comes with scale, the quality of the intelligence we extract — in terms of both precision and coverage — has dramatically improved from the user’s perspective. Problems that once seemed intractable, such as capturing the motivations behind public opinion and answering why and how questions, have now achieved practical breakthroughs.

Like Prometheus bringing fire from the gods, my other mentor Prof Liu Yongquan returned from the Soviet Academy of Sciences with the sacred knowledge of machine translation. He was a generous and kind elder, and his wife was warm and gracious. On holidays, they would invite students to their home for dinner. Prof Liu would call and say, “Come on over. Do you know how to slaughter a chicken? I’ve got live ones at home.” But his lecturing style was another matter — his tone was extraordinarily slow and drawn out, with an old Beijing accent. Often he would utter the first half of a sentence, then pause for a long while before continuing. In the afternoon, when drowsiness was setting in, I’d pinch my thigh countless times just to stay awake through one class. There were only two students, sitting around the small table in his home, with sunlight slanting in through the window — nodding off would have been utterly unacceptable.

In machine translation class, Prof Liu Yongquan trained us to do manual annotation of structural transfer, using a tag system they had invented themselves called “intermediary components” to label sentences. It was a bit like drawing syntax trees in grammar class — take some complex English sentences and annotate them in such a way that a machine could simulate the process. Once the annotation was done, the translation was essentially there. For example, the concise and practical intermediary component tag [Pre-Prep-Attr-B] — a four-element label — indicated that this prepositional phrase should be fronted in translation, that it was an attributive modifier, and that it was at the B layer. This simple, even crude labeling was the intermediate representation for foreign-to-Chinese MT transfer back then. We tested it on many complex sentences, and remarkably, it handled most of them — the resulting English-to-Chinese translations were quite readable. This was the most innovative invention of my two advisors. The intermediary component analysis method was a source-to-target correlation analysis designed specifically for machine translation, essentially solving the internal representation problem for foreign-to-Chinese translation. This system, distilled from extensive hands-on practice, was something of a magical invention — it looked like nothing more than a limited set of four-element label combinations, yet it could handle the vast majority of sentence translation needs. The four elements boiled down to traditional subject-predicate-object-attributive-adverbial-complement analysis, plus identification of the important prepositional phrase chunk and two kinds of word-order adjustment strategy labels (A and B). Transfer is the link between analysis and generation, and the four-element tags captured exactly the information transfer required — the result being that a relatively simple interpretation and execution algorithm could handle the generation of many complex sentences. In terms of output readability, it was quite effective, at least for foreign-to-Chinese translation. Of course, this system only represented the most critical transfer information, and as an MT intermediate representation it was still overly concise. It worked well in many foreign-to-Chinese scenarios, which actually had much to do with the inherent flexibility of Chinese grammar — applied to generation in other languages, it might fall short. For instance, word-order adjustment information was reduced to just two layers (A and B) in alternation — this greatly simplified the complexity of linguistic structure, but some translations came out less than smooth. With this in mind, in my master’s thesis I made some critical revisions to the tag set, adding more dimensions of information. I had worried that my advisors might take offense at my freely modifying the gold standard they had painstakingly developed, but after submitting the draft of my thesis, both advisors expressed approval. Prof Liu Zhuo even specifically said that he agreed with my criticism that the intermediary component system was overly simplistic.

Among Prof Liu Yongquan’s many teachings, one left the deepest impression on me. In the annotation training process, most of the time I could handle it with ease. Coming from an English major background, we had always excelled at drawing structural trees — this kind of annotation was like child’s play. But occasionally I’d get stuck and turn to Prof Liu for help. He never gave me the answer directly. He would only say: “If you can’t deconstruct it, how does a person understand and translate it?” Strictly speaking, there are certainly cases where human cognitive processes are hard to algorithmicize within a transparent symbolic logic framework. That a person can translate doesn’t mean they can articulate clearly how a machine should do it. But I still feel that this one piece of guidance has benefited me for life. It was a philosophical awakening about cognition, urging us to think about how to formalize human cognitive processes. Even when full formalization proves impossible, it helps us see where the bottleneck lies — common sense, domain knowledge, or something else entirely? This worldview has stayed with me all my life. It is, in essence, a philosophy of symbolic transparency that opposes the mystification of intelligence.

Studying design philosophy, machine translation principles, linguistic fieldwork, and the fine-grained analysis and command of language phenomena under Prof Liu Yongquan — my entire master’s program felt like being bathed in a spring breeze. Prof Liu Zhuo’s guidance was more concrete. In every innovative exploration of early machine translation, he was always at the forefront — separating data from program logic, advancing from fixed-table rule processing to defining a domain-specific language for expressive rules that enabled free-form rule writing, introducing ontology knowledge bases that encoded implicit common sense, and developing techniques for the separation and interaction of idiosyncratic and general rules. Across this entire series of key technological innovations along the symbolic rule-based path, Prof Liu Zhuo was both pioneer and implementer. After earning my master’s degree, I stayed for five years in Prof Liu Zhuo’s research lab, learning algorithms from him and coding a new-generation translation system based on expert dictionaries. I also followed Prof Liu in collaborating with Zhongguancun to productize our carefully designed laboratory system. This process of closely shadowing a master, transforming a research prototype all the way into a practical product — this was the golden experience that benefited me most in my entire life.

Eight years of research and development under the guidance of the Two Liu's became a lifelong treasure. When heaven does not change, the Way does not change; when heaven changes, the Way still does not change. I went abroad, gained a veneer of overseas polish, broadened my horizons, and came to better understand the strengths and weaknesses of different approaches. But certain core ideas in language processing transcend time and space. I am proud to be a torchbearer of the Two Lius’ legacy.

🎬 Memorial Video | Memorial Video 2

Rest in peace, great master. May the old gentleman’s journey onward be a smooth one!

Finalized on September 12, 2022


《朝华之十四:随恩师入行》

中秋刚过,大姐传过来不幸的消息:一代宗师,启蒙导师刘俢先生驾鹤西去,享年89岁。

🎬 缅怀恩师刘俢老师纪念册,2022

40年前,我在社科院硕士报考专业上填写“机器翻译”四个大字,内心充满了敬畏和神秘感。刚入行的时候做的是外汉机器翻译,一直不大敢碰汉外,原因是汉语语法不好形式化,感觉太难了。

机器翻译是自然语言处理领域历史最悠久的应用方向,从上个世纪50年代初发辛,承载了中外几代不知道多少人的青春和梦想,也包括青年时代的立委。如今,梦想化为现实,嵌入式机器翻译无孔不入,已经成为普罗大众手中招之即来挥之即去的手机应用,每时每刻在默默服务着亿万用户。女儿学汉语用它,学西班牙语用它,去日本动漫网页也用它,用到对它熟视无睹,把机器翻译视为理所当然。只在翻译错得离谱的时候才意识到它的存在,不时报以嘲讽:真笨。可机器翻译呢,虚怀若谷,任劳任怨。对于已经天然成为女儿这代人生活一部分的机器翻译,我满腹机器翻译的历史和掌故,却不知如何给她叙说。耳濡目染,她从我断续的话语中似乎隐隐觉得机器翻译对于她父亲的一生具有特别的意义,可是我还是无法象对同辈人那样婉婉道来,如数家珍,传达机器翻译在我心中所蕴含的那份厚重和神圣。这不仅仅是一般意义上的代沟,是技术的跨越式发展造成了两代人迏然不同的视角,让人欣慰更感慨。

我个人的观察是,有两种人会特别欣赏并宽待机器翻译。一种是不识外文的老一代网人,终于全世界的网络对他/她开放了,有盲人重见光明的喜悦。另一种是立委这样的机器翻译化石级元老,因为做过,知其艰辛,不得不为技术突破鼓与呼。近年神经机器翻译的革命,让机器翻译真正达到专家的水平,译文通顺流畅,完全摆脱前神经时代的生硬和“机器味儿”。可以说,人类破除语言壁垒实现自由交流的千年梦想正在我们眼前实现。人工智能的语言巴别塔已然建立。

回想起来,学自然语言的人如果入行做的就是符号机器翻译,那是上天的赐福。新一辈这种人没有了,有各种低代码平台资源可以利用。如果你在没有平台支持下被逼着去做规则机器翻译,你受苦了,也有福了。你必须从头开始做辞典、做断句分词、做词性标注、做短语组块、做 SVO 句法和逻辑语义,你还要做双语结构转换、词义消歧(Word Sense Disambiguation,WSD),最后还有目标语的生成,包括形态生成、调序,修辞意义上的一些平滑。总之,从语言分析、双语转换、语言生成,方方面面你必须全部做到。如果没有平台,没有专用语言,像我们做硕士论文那会儿不得不用通用计算机语言(COBOL,ALGOL,BASIC,甚至汇编)做,那就是在太上老君八卦炉里炼,没得不炼成火眼金睛。现在的计算语言学硕士博士呢,动不动就下载一个软件包,瞧准一个子任务,橘如分词,橘如情感分类,橘如WSD,哪怕就是做整个MT, 也不用涉及那么多的层次模块和底层细节。机缘巧合,有幸受到早期机器翻译的洗礼,师从中国机器翻译之父二刘老师,这既是我一辈子的宝贵财富,也坚定了我“为往圣继绝学”的志向,传承符号NLP的薪火。

1991 年出国前与刘俢导师和爱平大姐合影留念

不能忘记当年硕士毕业留语言研究所,与爱平大姐一起,在刘俢老师的带领下,研发JFY型机器翻译系统的每一步。刘老师亲自设计一套NLP专用语言,用于实现JFY型机器翻译,从解释器到控制器,从专家辞典(个性规则)到句型转换(共性规则)的系统架构和流程。这个项目凝结了几十年MT研究探索的设计思想和算法。刘老师有一个绝技,就是不借用任何工具,可以凭着自己苦思兩想找到系统问题的所在。常常是,系统有一个“虫子”(bug),刘老师晚上就睡不着,程序就在脑子里绕,常常是半夜捉住了bug,第二天迫不急待上机试验,问题往往得到解决。也有隐藏太深的 bug,头脑绕不出来,于是大姐和我就帮着刘老师“人工串图”(就是人工step-through的一种说法),有时候要一步步串好几天才发现逻辑漏洪,那种高兴,跟打了一场胜仗似的。如今的软件工程师大概很难想象在没有任何纠错工具的开发环境下,怎样编码系统,可当年刘老师带领我们就是这么磨出来的。

我当年读刘俢老师的论文就很头大。可是他那论文多是干货,跟马鞍山采石干子似的,耐嚼。当年读了多少遍自己都忘记了。老先生在业内备受景仰,毕竟有硬通货。可是当年景仰他的人,一多半根本看不懂他在说什么。这是一个很奇特的现象,一半是老先生不善于科学普及,另一半也许是老先生确实没有多少时间,也不屑于鸡毛蒂皮。

2013年,我应邀在科学院做大数据NLP的演讲。刘俢老师当时因手术住院,大姐带我去医院探视。看到刘老师术后精神蛮好,恢复不错,感到宽慰。我们闲谈了一些NLP的过去和现在,刘老师当年开创的一套模式匹配的精细鲁棒的分析方法,在很多缺乏标注的场景中依然是有效的工具,可以在大数据时代发挥作用。今非昔比,如今的计算机软硬件鸟枪换炮,使得NLP在亿万文档上施行。大了就不一样了,奇迹在量变中发生,我们正在创造和见证这种奇迹。而这一切都离不开恩师当年的栽培和教诲。

我跟刘老师说,如今大数据了,我们可以利用云计算,租用几百台虚拟机,对海量数据进行并行处理,对上亿的文档做深层语法解析,抽取挖掘舆情。这样的规模是当年不敢想象的。正因为有大数据天然的信息冐余及其规模化的处理能力,我们挖掘的情报质量,从用户体验上看,无论精度还是广度都得到了的大幅度提高。以前看上去无解的难题,橘如捕抓舆情动因,回答why和how这样的问题,如今都取得了实用性的突破。

像普罗米修斯一般从苏联科学院取得机器翻译真经的刘涌泉老师是一位宽厚的长者,师娘和蔼可亲。逢年过节请学生到家里吃饭,刘老师打来电话,说:你们过来吧,会宰鸡么?我家有活鸡。不过,刘老师上课语调特平缓悠长,老北京的腔调,常常是前半句话出口,停顿好久,才接上下文。下午正犯困呢,一堂课下来,我把大腿援了无数次。一共就两个学生,围着他家小桌子坐着,阳光从窗户斜射过来,打瞌睡成何体统。

机器翻译课上,刘涌泉老师训练我们来做结构转换的手工标注,用他们自己创立的“中介成分”标签给句子标注。有点像语法课上画树,弄些复杂的英语句子,说你们的标注, 必须是机器可以模拟处理的。标注完了,翻译也就差不离了。橘如,简明实用的中介成分 【前介定B】 这个四元组标签,说的是这个介词短语翻译的时候要前置,它是定语,处于 B 层。这种简单甚至简鄙的标注就是当年的外汉MT转换的中间标记。我们试验过很多复杂的句子,居然大体都可以对付,英汉翻译出来相当可读。这是我的两位导师当年最具有创新意义的发明创造。中介成分分析法是专门为机器翻译设计的从源语到目标语的相关分析法,基本上解决了外译汉的内部表示问题。这套大量实践总结出来的中介成分体系当时可算是个神奇的发明,看上去只是一套组合有限的四元组标签集合,却能够对付绝大多数句子的翻译需求。四元组说起来就是传统的主谓宾定犹补成分分析,外加重要组块PP的标识以及两种调整词序的策略标注。转换是分析和生成的绎带,四元组体现了转换所需要的信息,结果是只要编写一个相对简单的解释执行算法,就可以应对很多复杂句子的生成。论译文可读性,至少在外译汉场景还是很管用的。当然,这套体系只是表示了最紧要的转换信息,作为MT中间表示,还是显得太过简洁。在外汉的很多场景验证有效,其实与中文文法弾性大的特性有关,用于其他语言的生成就可能捉襟见肘。例如调整词序的信息只归纳为两层(A、B)的交错,好处是它大大简化了语言结构的复杂性,但有些翻译结果就显得不是很顺畅。鉴于此,我在我的硕士项目中,对这套标注集做了一些批评改造,增加了更多元的信息。本来还担心导师会怪罪对他们探索总结的黄金标准肆意改变,但论文初稿提交后,两位导师都表示认可。刘俢老师还特别表示,他同意我对中介成分体系失之简鄙的批评。

刘涌泉老师的教诲中,有一条我印象最深。在做标注培训的过程中,我多数时候是手到牺来。我们英语系出身的人从来就擅长画结构树,做这种标注跟玩似的。但做多了也会有卡壳的时候,于是请教刘老师。刘老师没有给答案,只是说:解构不出来,那人是怎么理解和翻译的呢?严格地说,当然有案例,人的认知过程很难在符号逻辑的透明框架中算法化。人能翻译,不见得可以能讲清楚让机器如何翻译。但是我还是觉得他这句教导使我终身受益。因为这是一种认知哲学的唤醒,促使我们琢磨人的认知过程如何形式化。哪怕不能完全形式化,也让我们明白卡在哪一个环节,常识、专业知识、还是缺了什么?这种世界观跟了我一辈子,它实际上是一种反对智能神秘化的符号透明哲学。

跟刘涌泉老师学哲学,学机器翻译原理,学语言田野工作,学习对于语言现象的精细解析和把握,整个硕士阶段感觉是如沐春风。刘俢老师的教导则更为具体,在早期机器翻译的每个创新探索中总是一马当先,例如把数据和程序分开的策略改变,从规则的固定表格模式处理推进到定义规则特定语言,赋能规则的自由书写,还有引入隐含常识的ontology知识库,以及个性与共性分离与交互技术,等等。这一系列符号规则路线的关键技术创新方面,刘俢老师都是先驱和实现者。硕士毕业后留在刘俢老师的研究室五年,跟他学算法,编码新一代基于专家辞典的翻译系统。也跟着刘老师与中关村合作,把精心设计的实验室系统产品化了。这个近距离跟随大师从研究原型一路转化为实用产品的过程,是我一辈子受益最大的黄金经历。

在二刘老师指导下的八年研发,成为我一生的宝贵财富。天不变,道不变;天变了,道亦不变。留洋了,镀金了,眼界开阔了,对不同方法的优劣比较了解了。但语言处理里面的一些核心思想可以超越时空。我以二刘老师的传人而骄做。

🎬 纪念视频 | 纪念视频二

大师千古,祝他老人家一路走好!

定稿于2022年九月12日


From 《朝华午披》. Original Chinese: 《朝华之十四:随恩师入行》.

发布者

立委

立委博士,多模态大模型应用咨询师。出门问问大模型团队前工程副总裁,聚焦大模型及其AIGC应用。Netbase前首席科学家10年,期间指挥研发了18种语言的理解和应用系统,鲁棒、线速,scale up to 社会媒体大数据,语义落地到舆情挖掘产品,成为美国NLP工业落地的领跑者。Cymfony前研发副总八年,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理