我们生在大数据信息过载的时代。以前一直觉得作为NLPer,自己的天职就是帮助解决这个过载的问题。就好像马云的宏愿是天下没有难做的生意,我们玩大数据的愿景应该就是,天下没有不能 access 的信息。于是谷歌出现了,用粗糙的关键词和数不厌大的气概,解决了信息长尾问题。于是我们开始批判谷歌,信息长尾解决的代价是数据质量太差。于是人智(AI)派来了,借力深度作业(deep processing, whether deep learning or deep parsing),企图既要解决大数据的长尾,也要大幅提升数据质量,让全世界对于信息感兴趣的心灵,都有一个源源不断的信息流。这是从我们从业者的角度。
最可怕的是在下一代,可以看到他们的挣扎和无助。games、social media 和 internet 吞噬了无数青春。而世界基本是束手无策,任其沉沦。家长呢,只有干着急。我们自己都不能抵制诱惑,怎么能指望年青一代呢。充满 curiosity 和躁动的心灵,注定受到信息过载的奴役最深。其社会成本和代价似乎还没有得到应有的深入研究。
subcat:subcat 的原义指的是谓词的子类,这个子类对应了这个词的特定句型(譬如,双宾句型,宾+宾补句型,等)。白老师说的 subcat 扩展到不一定具有对应句型的子类。譬如,碗,背后的subcat是“容器”“餐具”;汤,背后的subcat是“液体”“食物”。这实际上是本体语义(ontology)的层级结构,如 ISA taxonomy chain:碗 ISA 餐具,餐具 ISA 工具,工具 ISA 商品;商品 ISA 人造物品;人造物品 ISA 物品;物品 ISA 实体(逻辑名词,这是这个 chain 的顶端节点 TOP 了)。
“耍流氓”:指的是对于二元依存关系不能定性,但是可以认定具有某种关系。汉语句法中,句首的名词短语在没有确定其性质是主语、宾语或定语、状语之前,往往先给它一个 Topic 标签,挂靠到后面的谓语身上,白老师认为这就是耍流氓。同理,当两个实词之间的关系基本可以确认,但是不能定性的时候,我们往往根据其出现的先后次序,让 parser 给一个 Next 的标签把二者连上,作为一个增强句法分析器鲁棒性(robustness)和查全率(recall)的打补丁的手段。这也算是先耍一下流氓,因为理论上后去还是需要语义模块去确认是何种关系才算深度分析到位。如果是两个中文动词一先一后系统给了 Next,其默认关系是【接续】,就是汉语文法书上所谓的“连动”结构。
Topic:汉语分析中,句首名词短语如果不直接做主语、宾语等,很多分析就给 一个Topic(主题)的标签。汉语文法的一个突出语言句型现象就是所谓双主语句(常常分析成一个Topic or 大主语,加一个小主语:譬如,他身体特别好。这家公司业绩直线上升。)由于这种关系逻辑语义的性质不明,聊胜于无,所以也称这种二元关系的建立为“耍流氓”。
TRUMP: Chief Justice Roberts, President Carter, President Clinton, President Bush, President Obama, fellow Americans and people of the world, thank you.
We, the citizens of America, are now joined in a great national effort to rebuild our country and restore its promise for all of our people.
Together, we will determine the course of America and the world for many, many years to come. We will face challenges, we will confront hardships, but we will get the job done.
Every four years, we gather on these steps to carry out the orderly and peaceful transfer of power, and we are grateful to President Obama and First Lady Michelle Obama for their gracious aid throughout this transition. They have been magnificent. Thank you.
Today's ceremony, however, has very special meaning because today, we are not merely transferring power from one administration to another or from one party to another, but we are transferring power from Washington, D.C. and giving it back to you, the people.
For too long, a small group in our nation's capital has reaped the rewards of government while the people have borne the cost. Washington flourished, but the people did not share in its wealth. Politicians prospered, but the jobs left and the factories closed. The establishment protected itself, but not the citizens of our country. Their victories have not been your victories. Their triumphs have not been your triumphs. And while they celebrated in our nation's capital, there was little to celebrate for struggling families all across our land.
That all changes starting right here and right now because this moment is your moment, it belongs to you.
It belongs to everyone gathered here today and everyone watching all across America. This is your day. This is your celebration. And this, the United States of America, is your country.
What truly matters is not which party controls our government, but whether our government is controlled by the people.
January 20th, 2017 will be remembered as the day the people became the rulers of this nation again.
The forgotten men and women of our country will be forgotten no longer.
Everyone is listening to you now. You came by the tens of millions to become part of a historic movement, the likes of which the world has never seen before.
At the center of this movement is a crucial conviction, that a nation exists to serve its citizens. Americans want great schools for their children, safe neighborhoods for their families, and good jobs for themselves. These are just and reasonable demands of righteous people and a righteous public.
But for too many of our citizens, a different reality exists: mothers and children trapped in poverty in our inner cities; rusted out factories scattered like tombstones across the landscape of our nation; an education system flush with cash, but which leaves our young and beautiful students deprived of all knowledge; and the crime and the gangs and the drugs that have stolen too many lives and robbed our country of so much unrealized potential.
This American carnage stops right here and stops right now.
We are one nation and their pain is our pain. Their dreams are our dreams. And their success will be our success. We share one heart, one home, and one glorious destiny. The oath of office I take today is an oath of allegiance to all Americans.
For many decades, we've enriched foreign industry at the expense of American industry; subsidized the armies of other countries, while allowing for the very sad depletion of our military. We've defended other nations' borders while refusing to defend our own.
And spent trillions and trillions of dollars overseas while America's infrastructure has fallen into disrepair and decay. We've made other countries rich, while the wealth, strength and confidence of our country has dissipated over the horizon.
One by one, the factories shuttered and left our shores, with not even a thought about the millions and millions of American workers that were left behind. The wealth of our middle class has been ripped from their homes and then redistributed all across the world.
But that is the past. And now, we are looking only to the future.
We assembled here today are issuing a new decree to be heard in every city, in every foreign capital, and in every hall of power. From this day forward, a new vision will govern our land. From this day forward, it's going to be only America first, America first.
Every decision on trade, on taxes, on immigration, on foreign affairs will be made to benefit American workers and American families. We must protect our borders from the ravages of other countries making our products, stealing our companies and destroying our jobs.
Protection will lead to great prosperity and strength. I will fight for you with every breath in my body, and I will never ever let you down.
America will start winning again, winning like never before.
当然 用这些服务是要交钱的,但主讲人说很便宜很便宜的,郭兄说,真用上了,其实也不便宜。便宜与否放一边,至少现如今,bots 的门槛很低,需要的不是软件人才,而是领域数据的人。于是,我看到一种前景,以前毕业即失业的语言学家、图书馆业人士,将来可能成为 AI 的主力,只有对数据和细节敏感的人,最终才是 AI 接口的血肉构筑者,反正架构是现成通用的。这个细想想是有道理的。这是沃森 API calls 的价格。
对于 NLP(AI之一种) 我写过 n 篇博文强调,所有的 offshelf 的平台和toolkit(譬如 历史悠久的GATE),甚至一个小插件(譬如 Brill Tagger or some Chinese word segmenter)都不好用。可以 prototyping 但如果稍微有点长期观点 要建一个大规模的NLP的应用,还是一切自家建造为好。当然,自家建造的门槛很高,多数人造不起,也没这个 architect 来指挥。但最终是,自家建造的胜出,从质量上说(质量包括速度、鲁棒性、精度广度、领域的可适应性等关键综合指标)。
ok 你把机器翻译玩转了。因为 MT 有几乎无限的 “自然” 带标数据(其实也不是自然了,也是人工,幸运的是这些人力是历史的积累,是人类翻译活动的副产品,是不需要开发者花钱的 free ride)。可其他的 ai 和 nlp 应用呢,你还可以像 MT 这样幸运 这样享用免费午餐吗?
现在想,紧接着 MT 的具有大数据的热门应用是什么?非 bots 莫属。
对于 bots,数据已经有一定的积累了,其最大的特点在于,bots 的使用过程,数据就会源源而来。问题是 这些数据是对路的,real life data from the field,但还是不带标啊。所以,bots 的前景就是玩的跟数据打仗:可以雇佣人去没完没了地给数据做标注。这是一个很像卓别林的【摩登时代】的AI工厂的场景,或者是列宁同志攻打冬宫的人海战术。看上去很笨,但可以确定的是,bots 会越来越“智能”,应对的场景也越来越多。应了那句老话,有多少人工,就有多少智能。然而,这不是、也不应该是 唯一的克服知识瓶颈的做法。
当年的Watson是建在UIMA(Unstructured Information Management Architecture)的基础上的,确实使用Prolog(The Prolog Interface to the Unstructured Information Management Architecture,https://arxiv.org/ftp/arxiv/papers/0809/0809.0680.pdf)。
“IBM沃森的工具用的是深度神经。”
"直到今天他们仍然是没有句法分析,更甭提深度分析。"
当年Watson打败Jeopardy!冠军后,IBM Journal of Research and Development出过专辑,对于Watson的构造的描述好象不是这样的。比如parsing是这样描述的:http://ieeexplore.ieee.org/document/6177729/
“Two deep parsing components, an English Slot Grammar (ESG) parser and a predicate-argument structure (PAS) builder, provide core linguistic analyses of both the questions and the text content used by IBM Watson™ to find and hypothesize answers. Specifically, these components are fundamental in question analysis, candidate generation, and analysis of passage evidence. As part of the Watson project, ESG was enhanced, and its performance on Jeopardy!™ questions and on established reference data was improved. PAS was built on top of ESG to support higher-level analytics. ”
上图第二句的分析,就是我以前说过的结构歧义的应对。你看整个句法树,有三个 O (宾语) 的路径。其中两个是正确的【到-上海】; 【买-上海的飞机票】。第三个 O 【到-上海的飞机票】 是不对的。可以说,“到上海”和“买飞机票”,但不可以说“到飞机票”。这类结构歧义在汉语特别普遍,因为汉语没有宾格,加上汉语的小词 “的”的辖域是一个很大困扰。
QUOTE 甜甜自记事起,就住在这里,水牛城自然是她心目中不可替代的唯一故乡。记得四年前第一次带甜甜回北京探亲,第一天的晚上住在姥姥家,一切对她是那么陌生,没有她已经习惯的美国卡通电视,她满脸委屈地吵着闹着要回家(“I want to go home!”)--当然是回水牛城的家。我告诉她这就是家呀,是妈妈的家,她怎么也无法认同。
我们 “语义计算” 群在讨论这个句子的句法结构:The asbestos fiber, crocidolite, is unusually resilient once it enters the lungs, with even brief exposures to it causing symptoms that show up decades later, researchers said.
我说,it looks fine in its entirety. "once-clause" has a main clause before it, so it is perfectly grammatical. The PP "with even brief exposures to it" is an adverbial of "causing ...": usually PP modifies a preceding verb, but here it modifies the following ING-verb, which is ok.
我说,我这是语言学程序猿做的规则系统,不是统计方法。句子不在我的 dev corpus 里面。parsing 是一个 tractable task,下点功夫总是可以做出来,其水平可以达到接近人工(语言学家),超越普通人(非语言学家)。说的是自己实践的观察和体会。靠谱的 parsing,有经验的语言学程序猿可以搞定,无需指靠机器学习。为了说明这个观点,我测试了我的汉语 parser:
顾: 而且似乎某些高能人群倾向于省略小词。例如华尔街投行和硅谷人士的某些交流中,如果小词太多反而被鄙视,被认为不简洁不性感,这大概是人性,不是中国独有。举一例,出自Liar's Poker, 某trader跳槽,老板以忠诚挽留,他回答,“You want loyalty, hire a cocker spaniel”
顾: long time no see据认为是汉语入侵英语之后产生的,只是大家觉得自然,英美人也用了。这个语句困扰我很久,在网上查了据说是如此,但未必是严肃考证。
我: long time no see 是最直接的展示我东方躶体美女的一个案例。西人突然悟过来,原来语言可以如此简洁,这样地不遮不掩啊。他们觉得可以接受,是因为赶巧这对应了一个常用的语用(pragmatic)场景,朋友见面时候的套话之一,不分中外。在有语用的帮助下,句法可以马虎一些,这也是这类新成语(熟语)形成的背后理由。
说到含金量,其实很多课题,特别是面向应用的课题,并不是什么高精尖的火箭技术(not rocket science),不可能要求一个申请预示某种突破。撰写申请的人是游说方,有责任 highlight 自己的提议里面的亮点,谈方案远景的时候少不了这个突破那个革命的说辞,多少迎合了政府主管部门好大喜功的心态,但实际上很少有多少研究项目会包含那么多闪光的思想和科学研究的革命性转变。(纯科学的研究,突破也不多吧,更何况应用型研究。)应用领域“奇迹”的发生往往植根于细节的积累(所谓 the Devil is in the details),而不是原理上的突破。而对于问题领域的细节,我是有把握的。这是我的长处,也是我提出科研方案比较让人信服的原因。有的时候,不得不有迎合“时尚”的考量,譬如领域里正流行 bootstrapping 等机器自学习的算法,虽然很不成熟,难以解决实际问题,但是基金报告列上它对申请的批准是有益的。不用担心所提议的听上去时尚的方案最后不工作,由于科研的探索性质,最终的解决方案完全可以是另一种路子。说直白了就是,挂羊头卖狗肉不是诚实的科研态度,但是羊头狗头都挂上以后再卖狗肉就没有问题。绝不可以一棵树上吊死。
当然。次范畴就是小规则,小规则优先于大规则。语言规则中,大类的规则(POS-based rules)最粗线条,是默认规则,不涉及具体的次范畴(广义的subcat)。subcat based 的其次。sub-subcat 再其次。一路下推,可以到利用直接量(词驱动)的规则,那是最优先最具体的,包括成语和固定搭配。
王伟DL
文章透露着落地的经验(经历)的光泽,不同的人对此文吸收和反射的谱线也会不同。我贪婪地一连看完,很多地方只觉得在理,的确是是是,有些地方也想表己见,却欲辨已忘言。“...指与大语料库对应的 grammar trees 自动形成的 forest,比 PennTree 大好几个量级。",好羡慕这个大块头!大块头有大智慧!
@算文解字:这篇顶级高手对话,充满思想,可以当武林秘籍参悟的文章,竟然没人转。。。强烈推荐啊!
算文解字
依存关系的确更好用//@立委_米拉: (1) 分层是正道。最起码要两层,基本短语层和句法关系层。(2)顺便一提,作为生成结果,短语结构的表达远不如依存关系的表达。短语结构叠床架屋,不好用,也不够逻辑和普世(不适合词序自由的语言)。当然,这后一点是另外的话题了,不仅仅是 CFG vs FSG 之争了。
算文解字
也对,镜老师批评的是用同一层次的规则处理不同层次的现象的"原教旨"CFG生成方法,提出的对策为FST分层处理。而在CFG下用coarse2fine的(分层)策略,也算是殊途同归了。//@沈李斌AI: 没必要排斥CFG。CFG树是生成结果,不是生成步骤。设计好coarse to fine的生成策略,控制每一步的perplexity和recall
顾: 而且似乎某些高能人群倾向于省略小词。例如华尔街投行和硅谷人士的某些交流中,如果小词太多反而被鄙视,被认为不简洁不性感,这大概是人性,不是中国独有。举一例,出自Liar's Poker, 某trader跳槽,老板以忠诚挽留,他回答,“You want loyalty, hire a cocker spaniel”
顾: long time no see据认为是汉语入侵英语之后产生的,只是大家觉得自然,英美人也用了。这个语句困扰我很久,在网上查了据说是如此,但未必是严肃考证。
我: long time no see 是最直接的展示我东方躶体美女的一个案例。西人突然悟过来,原来语言可以如此简洁,这样地不遮不掩啊。他们觉得可以接受,是因为赶巧这对应了一个常用的语用(pragmatic)场景,朋友见面时候的套话之一,不分中外。在有语用的帮助下,句法可以马虎一些,这也是这类新成语(熟语)形成的背后理由。
说到含金量,其实很多课题,特别是面向应用的课题,并不是什么高精尖的火箭技术(not rocket science),不可能要求一个申请预示某种突破。撰写申请的人是游说方,有责任 highlight 自己的提议里面的亮点,谈方案远景的时候少不了这个突破那个革命的说辞,多少迎合了政府主管部门好大喜功的心态,但实际上很少有多少研究项目会包含那么多闪光的思想和科学研究的革命性转变。(纯科学的研究,突破也不多吧,更何况应用型研究。)应用领域“奇迹”的发生往往植根于细节的积累(所谓 the Devil is in the details),而不是原理上的突破。而对于问题领域的细节,我是有把握的。这是我的长处,也是我提出科研方案比较让人信服的原因。有的时候,不得不有迎合“时尚”的考量,譬如领域里正流行 bootstrapping 等机器自学习的算法,虽然很不成熟,难以解决实际问题,但是基金报告列上它对申请的批准是有益的。不用担心所提议的听上去时尚的方案最后不工作,由于科研的探索性质,最终的解决方案完全可以是另一种路子。说直白了就是,挂羊头卖狗肉不是诚实的科研态度,但是羊头狗头都挂上以后再卖狗肉就没有问题。绝不可以一棵树上吊死。
Sydney Brenner, 索尔克生物研究所高级研究员(2002年诺贝尔奖得主,在基因编码领域有突出贡献)
Marvin Minsky, 麻省理工学院媒体艺术与科学教授
Noam Chomsky, 麻省理工学院语言与哲学系教授
Emilio Bizzi, 麻省理工学院脑科学研究所教授
Barbara H. Partee 麻省大学语言与哲学系教授
Patrick H. Winston 麻省理工学院人工智能与计算机科学教授
Chomsky的主要观点:
A. Chomsky认为统计语言模型取得过工程意义上的成功,但不关科学的事。
B. 为语言事实建模就像收集蝴蝶标本。科学(尤其是语言学)想要的是基本原则。
C. 统计模型无法理解,并不是关于研究对象的洞见。
D. 统计模型或许可以对一些现象做出精确的模拟,但这是迷途。人们并不根据前面出现的两个单词去预测后面一个单词。人们生成句子(词语序列)的方式是从内在的语义到树结构,再到表层的线性词语序列。
E. 统计模型已经被证实无法用于学习语言。因此语言必然是天生的。用语言模型去解释语言是浪费时间。
Norvig的主要回应:
A. 工程上的成功确实不是科学目标。不过科学和工程是比翼齐飞的。工程上的成功可以作为科学上成功模型的证据。
B. 科学是事实和理论的混合体。理论过分凌驾于事实之上并不可取。在科学史上,不断积累事实是科研正途,并非异类。关于语言的科学也不应例外。
C. 包含几十亿个参数的统计模型确实难以直观理解。个人确实无法核查每个个体参数的意义所在。但是,人们可以通过了解整个模型的特性而获得对于统计模型合理与否的认知:即一个统计模型是怎样有效的,或者为什么无效,它是如何从数据中学到模型函数的,等等。
D. 基于词概率的Markov(马尔科夫模型)确实无法对所有的语言现象建模。这就像没有概率的简单树结构模型无法对所有的语言现象建模一样。我们需要的语言模型是可以覆盖词、树结构、语义、上下文、语篇等等不同层次语言现象的更复杂的概率模型。Chomsky不能因为旧的统计模型的缺点就一概否定所有的统计语言模型。研究如何解释语言(比如语音识别)的人当中,绝大多数人都认同,解释是一个概率问题。当一个语音流到了我耳朵里,要把这串语音流恢复为说话者的意义,是一个概率问题。爱因斯坦说过,让事情变得简单,直到不能再简单为止。许多科学现象都有随机性。最简单的模型就是概率模型。语言也是这样一种现象。因此概率模型是表达语言事实的最好工具。
E. 1967年,Gold定理指出了形式化的数学语言在逻辑推导上的理论限制。但是,这跟自然语言学习者面临的问题毫无关系。无论如何,在1969年,我们就知道了,概率推理不受这一限制的约束(Horning证明学习概率上下文无关文法PCFG是可能的)。我同意Chomsky所说的,人类具有学习语言的天赋。但是我们对如何获得概率化的语言表示,对统计学习,都还缺乏足够的知识。我认为很可能人类学习语言涉及到概率和统计推理,但是我们并不清楚细节。
1) I never, ever, ever, ever, ... fiddle around in any way with electrical equipment.
2) She never, ever, ever, ever, ... fiddles around in any way with electrical equipment.
3) * I never, ever, ever, ever, ... fiddles around in any way with electrical equipment.
4) * She never, ever, ever, ever, ... fiddle around in any way with electrical equipment.
· "It is neutral green, colorless green, like the glaucous water lying in a cellar." The Paris we remember, Elisabeth Finley Thomas (1942).
· "To specify those green ideas is hardly necessary, but you may observe Mr. [D. H.] Lawrence in the role of the satiated aesthete." The New Republic: Volume 29 p. 184, William White (1922).
· "Ideas sleep in books." Current Opinion: Volume 52, (1912).
· "Not gonna do it. Wouldn't be prudent." (Dana Carvey, impersonating George H. W. Bush)
· "Thinks he can outsmart us, does he?" (Evelyn Waugh, The Loved One)
· "Likes to fight, does he?" (S.M. Stirling, The Sunrise Lands)
· "Thinks he's all that." (Kate Brian, Lucky T)
· "Go for a walk?" (countless dog owners)
· "Gotcha!" "Found it!" "Looks good to me!" (common expressions)
语言学家可以为如何解释上面这些现象争个没完没了。但语言的多样性似乎远比用布尔值(true or false)来描述pro-drop参数值要复杂。一个理论框架不应该把简单性置于反映现实的准确性之上。
技术不是问题(笨蛋不算,你要是找到一个只会忽悠的笨蛋,那是 due diligence 太差,怨不得人)。
Nick: 嗨,老套路,骂别人是为了夸自个。
可不,卖瓜王爷。不过,那也是客观事实,内举不避己,不能因为自己能就偏要说不能,最后还是要系统说话。
当然,这玩意儿要做好(精准达到接近人的分析能力,鲁棒达到可以对付社会媒体这样的monster,高效达到线性实现,real time 应用),确实不是一蹴而就能成的。这里有个n万小时定律。大体是,NLP入门需要一万小时(大约五年工龄),找到感觉需要两万小时,栽几个有意义的跟头需要三万小时,得心应手需要四万小时,等你做到五万小时(入行25年)还没被淘汰的话,就可以成精了。那是一种有如神助、如入无人之境的感觉,体会的人不多。打住。
为了所谓语言的递归性,人脑,或电脑,必须有个堆栈的结构才好,这离语言事实太远,也违背了人脑短期记忆的限制。世界上哪里有人说话,只管开门而不关门,只加左括号不加右括号,一直悬着吊着的?最多三重门吧,一般人就受不了了。就算你是超人,你受得了,你的受众也受不了,无法 parse 啊。说话不是为了交流,难道是故意难为人,为了人不懂你而说话?不 make sense 嘛。
Quora has a question with discussions on "Why is machine learning used heavily for Google's ad ranking and less for their search ranking?" A lot of people I've talked to at Google have told me that the ad ranking system is largely machine learning based, while search ranking is rooted in functions that are written by humans using their intuition (with some components using machine learning).
Surprise? Contrary to what many people have believed, Google search consists of hand-crafted functions using heuristics. Why?
479
One very popular reply there is from Edmond Lau, Ex-Google Search Quality Engineer who said something which we have been experiencing and have indicated over and over in my past blogs on Machine Learning vs. Rule System, i.e. it is very difficult to debug an ML system for specific observed quality bugs while the rule system, if designed modularly, is easy to control for fine-tuning:
From what I gathered while I was there, Amit Singhal, who heads Google's core ranking team, has a philosophical bias against using machine learning in search ranking. My understanding for the two main reasons behind this philosophy is:
In a machine learning system, it's hard to explain and ascertain why a particular search result ranks more highly than another result for a given query. The explainability of a certain decision can be fairly elusive; most machine learning algorithms tend to be black boxes that at best expose weights and models that can only paint a coarse picture of why a certain decision was made.
Even in situations where someone succeeds in identifying the signals that factored into why one result was ranked more highly than other, it's difficult to directly tweak a machine learning-based system to boost the importance of certain signals over others in isolated contexts. The signals and features that feed into a machine learning system tend to only indirectly affect the output through layers of weights, and this lack of direct control means that even if a human can explain why one web page is better than another for a given query, it can be difficult to embed that human intuition into a system based on machine learning.
Rule-based scoring metrics, while still complex, provide a greater opportunity for engineers to directly tweak weights in specific situations. From Google's dominance in web search, it's fairly clear that the decision to optimize for explainability and control over search result rankings has been successful at allowing the team to iterate and improve rapidly on search ranking quality. The team launched 450 improvements in 2008 [1], and the number is likely only growing with time.
Ads ranking, on the other hand, tends to be much more of an optimization problem where the quality of two ads are much harder to compare and intuit than two web page results. Whereas web pages are fairly distinctive and can be compared and rated by human evaluators on their relevance and quality for a given query [2], the short three- or four-line ads that appear in web search all look fairly similar to humans. It might be easy for a human to identify an obviously terrible ad, but it's difficult to compare two reasonable ones:
Branding differences, subtle textual cues, and behavioral traits of the user, which are hard for humans to intuit but easy for machines to identify, become much more important. Moreover, different advertisers have different budgets and different bids, making ad ranking more of a revenue optimization problem than merely a quality optimization problem. Because humans are less able to understand the decision behind an ads ranking decision that may work well empirically, explainability and control -- both of which are important for search ranking -- become comparatively less useful in ads ranking, and machine learning becomes a much more viable option.
Edmond Lau's answer is great, but I wanted to add one more important piece of information.
When I was on the search team at Google (2008-2010), many of the groups in search were moving away from machine learning systems to the rules-based systems. That is to say that Google Search used to use more machine learning, and then went the other direction because the team realized they could make faster improvements to search quality with a rules based system. It's not just a bias, it's something that many sub-teams of search tried out and preferred.
I was the PM for Images, Video, and Local Universal - 3 teams that focus on including the best results when they are images, videos, or places. For each of those teams I could easily understand and remember how the rules worked. I would frequently look at random searches and their results and think "Did we include the right Images for this search? If not, how could we have done better?". And when we asked that question, we were usually able to think of signals that would have helped - try it yourself. The reasons why *you* think we should have shown a certain image are usually things that Google can actually figure out.
Part of the answer is legacy, but a bigger part of the answer is the difference in objectives, scope and customers of the two systems.
The customer for the ad-system is the advertiser (and by proxy, Google's sales dept). If the machine-learning system does a poor job, the advertisers are unhappy and Google makes less money. Relatively speaking, this is tolerable to Google. The system has an objective function ($) and machine learning systems can be used when they can work with an objective function to optimize. The total search-space (# of ads) is also much much smaller.
The search ranking system has a very subjective goal - user happiness. CTR, query volume etc. are very inexact metrics for this goal, especially on the fringes (i.e. query terms that are low-volume/volatile). While much of the decisioning can be automated, there are still lots of decisions that need human intuition.
To tell whether site A better than site B for topic X with limited behavioural data is still a very hard problem. It degenerates into lots of little messy rules and exceptions that tries to impose a fragile structure onto human knowledge, that necessarily needs tweaking.
An interesting question is - is the Google search index (and associated semantic structures) catching up (in size and robustness) to the subset of the corpus of human knowledge that people are interested in and searching for ?
My guess is that right now, the gap is probably growing - i.e. interesting/search-worthy human knowledge is growing faster than Google's index.. Amit Singhal's job is probably getting harder every year. By extension, there are opportunities for new search providers to step into the increasing gap with unique offerings.
p.s: I used to manage an engineering team for a large search provider (many years ago).