I showed the First Lady's news pictures to my daughter. Tanya was so intrigued, "Dad, Mom told me that you used to teach First Lady many years ago, is that true?" "It is true, but that was only a short time, one or two semesters, and it was not her major subject. As a part-time lecturer, I was teaching Advanced English to graduate students in the music conservatory and she happened to be one in my class. She was already famous then as a new star for folk songs." Tanya got excited, "Well, you never know, maybe her English training in graduate school helps her in state visits today. My Dad is cool." She continued, "Dad, Mom also told me that you were interpreter for foreign minister when she dated you, is that true?" "Well, that was largely an accident, only happened once when I substituted some professor to act as interpreter for the former foreign minister and former Chinese congresss vice-chairman Mr. Huang Hua. Your Mom agreed to date me partially because of her seeing a picture of me interporeting for Mr. Huang. So I guess I benefited from that 'accident'." Tanya was amused and felt very proud, "I have the coolest Dad in the world. He was so successful even when he was young, teaching future first lady and interpreting for the then foreign minister. Wow"
(2)不過一年來也有10多次短暫的亮點,聲望處於零度以上(褒大於貶),雖然都好景不長:從圖上看,去年七月初到九月初之間是正面聲望持續最長的區間(只在八月短暫跌入零度以下),不知道有什麽亮麗的政治表現還是由於團隊公關得力,有興趣的讀者可以查證一下。馬總統宣誓就職的五月中,凈情緒指標尚在零下30度左右徘徊,怎麽到了七月就迅速回暖至零度以上,持續約兩個月,直到九月2日的+35的峰值。我對臺灣政治不熟悉,也沒有精力去探究 data 和證據鏈(盡管我們的工具提供了多項 drill down 的功能),但這個區間似乎確是馬總統二度當選以來得到民眾認可的最佳時期。此後就一蹶不振,只在十月、十一月與今年元月短暫回升。一年來的最低點在三月四日的-44,十二月16日也很慘,一度跌入-42,冰凍刺骨。總而言之,馬英九自從去年初當選以來,不是很順,民眾的失望抱怨情緒彌漫網壇。
一个偶然的系统测试,暴露出百度与“哪里有小姐”身影相随。这个发现在朋友间立即引起轩然大波,有称妙的(way to go, u r onto sth),有调侃的(曰:百度本来就源自“众里寻她千百度”嘛),有怀疑的(the results are not faked?)。阴谋论者伊妹儿我,指责此云有侮辱百度之嫌。
下面说下感受,必须承认之前本人还停留在规则系统教训的层面,另外,就是顾虑要扯入的人工工作量大的问题。若是李老师通过这样的俯瞰语言,化繁为简,调整规则能达到信手拈来,那么在机器学习满天飞的当下,这存量稀少的规则派之花,自有它的春天。如今是个多元的世界,允许各路英雄竞技,只要有独到之处,更何况人工智能皇冠上明珠,尚无人触及,怎下定论都是早。也曾闻工业界很多可靠的规则系统在默默运行,而学术界则只为提高小小百分点而狂堆系统,专挑好的蛋糕数据大把喂上,哪管产业是否能现实中落地。当然对于人工规则系统 VS 机器学习系统,能有怎样的结局,我确实没有定论,要么一方好的东西自然会好的走下去,要么两方都走得不错而难分输赢,或者发现只有结伴相携更能走远,那谁还能拦着么!
可是这个中国的术语到了英语世界,并不是所有受众都记得住准确的说法了。结果,“标准” 的流行译法 “one belt one road”,被有些老外记错了,成了“one road one belt” or "the road and belt" 等。这也是可以理解的,老外没有政治学习时间也不没有时事政治考核,能记得一个大概就不错了。
虽然说法不同了,次序有变,但两个关健词 road 和 belt 都在,这种成语“泛化”对于人译不构成挑战,因为老外的记忆偏差和“泛化”的路数,与译员的心理认知是一致的,所以人工传译遇到这类绝不会有问题。可是,以大数据驱动的机器翻译这次傻了,真地就神经了,这些泛化的变式大多是口语中的稀疏数据,无法回译成汉语的“一带一路”,笑话就出来了。
早期机器翻译广为流传的类似笑话也是拿成语说事(The spirit is willing, but the flesh is weak,心有余而力不足 据传被翻译成了“威士忌没有问题,但肉却腐烂了”),因为一般人认为成语的理解最难,因此也必然是机器的挑战。这是完全外行的思路。成语的本质是记忆,凡记忆电脑是大拿,人脑是豆腐。
真所谓人算不如天算,看潮起潮落。老友谈养生之道,各种禁忌,颇不以为然,老了就老了,要那么长寿干嘛?最近找到一条长寿的理由,就是,可以看看这个世界怎么加速度变化的。今天见到的发生的许多事情,在 30 年前都是不可想象的:NMT,voice, image, parsing,iPhone,GPS, Tesla, you name it.
White House spokesman Sanders said 14th that TV commentator Codro Larry Kudlow will serve as president of the National Economic Council.
Sanders said in a statement that Trump to Codro as president of the economic policy assistant, as well as the President of the National Economic Council, Codro also accepted, the White House will announce later, Codro time.
Cohn, the president of the National Economic Council, resigned on 6th for not agreeing to Trump’s move, announced earlier this month that it would impose tariffs on imported steel and aluminium.
After Cohen resigned, Codro was one of the successors, and in the past year Codro was Trump’s unofficial advisor and advisor to Trump during the 2016 presidential election campaign. Trump 13th to the media that he had a high chance of appointing Codro chief economic advisor Cohen to resign casual vacancies. Trump said, “Codro has been my best friend for a while, and he’s been my first supporter of my early campaign.” He is a very talented person.
” The 70-year-old Codro oversaw the White House administration and Budget Office for 5 years in former President Ronald Reagan. Codro joined the financial media CNBC in 2001 and became a well-known critic and moderator.
Larry Kudlow, a television commentator, will become President of the national economic council, White House spokesman jay sanders said Sunday.
In a statement, Mr. Sanders said he had accepted Mr. Cordelo as an assistant to the President’s economic policy and the President of the national economic council. The White House would then announce that Mr. Cordelo took office.
The chairman of the national economic council, Gary Cohn, said he would resign on June 6 after he announced that he would impose tariffs on imported steel and aluminum.
When cohn resigned, he was one of his successors. Over the past year, he has been an informal adviser to trump and an adviser to trump during the 2016 presidential campaign.
Trump told the media on Wednesday that he had a high chance of appointing the resignation of coderot’s chief economic adviser, korn/ferry. “He was a good friend for a while,” trump said. “he supported me early in my campaign and was one of my first supporters. He is a very talented person.
The 70-year-old has overseen the White House administration and budget office for five years under former President Reagan. Cordelojoined CNBC in 2001 and became a well-known critic and host.
White House spokesman Saunders said on the 14th that television commentator Larry Kudlow will serve as chairman of the National Economic Council.
Saunders pointed out through the statement that Trump had proposed to Cadero as the president’s economic policy assistant and the chairman of the national economic conference. Caldero also accepted it; the White House will later announce the time when Calderon took office.
Trump announced at the beginning of the month that it would impose tariffs on imported steel and aluminum products. Gary Cohn, chairman of the National Economic Council, resigned on the 6th because he did not agree with Trump.
After Keen resigned, Kedlow was one of the candidates for the replacement. In the past year, Kedro was an informal advisor to Trump and a consultant during Trump’s 2016 presidential campaign.
Trump stated to the media on the 13th that he has a high chance of appointing Kodlow to replace his chief economic advisor. Trump said, “Kadelo has been a friend of mine for some time. He supported me early in my campaign and was one of my initial supporters. He is a very talented person.”
Kodlow, 70, oversees the White House’s Office of Management and Budget for 5 years under former President Reagan. He joined the financial media CNBC in 2001 and later became a famous critic and host.
With the development of productive forces, the team of intellectuals has increased faster than the proletariat, the role of the productive forces leaped to the top, and the Marxist proletarian dictatorship theory in the era of electricity will be outdated. Second, the Bolshevik dictatorship of the proletariat will rapidly evolve into a one-party dictatorship, then become a leader dictatorship.
A society based on deception and violence, in itself, contains self-destructive explosives that, once the truth is revealed, Fall apart immediately.
Third, the “cloth” party will encounter four major crises: Famine crisis, ideological crisis, social economic crisis and collapse crisis, the final regime fell apart, this process may last for decades, but this end can notchange. The greatness of the state is not in its territory or even its history, but in its democratic traditions and the living standards of its citizens. As long as citizens are still poor, as long as there is no democracy, the state will not be in turmoil until it collapses.
谷歌:
The words of Plekhanov:
First, with the development of productive forces, the ranks of intellectuals have grown faster than the proletariat, and they have taken the lead in productivity. In the age of electricity, the Marxist theory of the dictatorship of the proletariat will become obsolete.
2. The dictatorship of the proletariat of the Bolsheviks will rapidly evolve into a one-party dictatorship and will become a dictatorship of leaders. The society based on deception and violence itself contains self-destructive explosives. Once the truth is revealed, it will soon fall apart.
Third, the “cloth” party will encounter four major crises in sequence: the famine crisis, the ideological crisis, the socio-economic crisis, and the collapse crisis. The final regime collapsed. This process may last for decades, but no one can change this outcome.
Fourth, the greatness of the country does not lie in its territory or even its history. It is the democratic tradition and the standard of living of its citizens. As long as citizens are still living in poverty, as long as there is no democracy, the country cannot guarantee that there will be no turmoil until collapse.
有道:
His last words:
One, with the development of productive forces, intellectual team increase faster than that of the proletariat, in the role as the first productivity, at the age of electricity of marxism’s theory of the dictatorship of the proletariat will be out of date.
The dictatorship of the proletariat of the bolsheviks will quickly turn into a one-party state and become a leadership dictatorship. Society, based on deception and violence, contains self-destructive explosives, and when the truth comes out, it will fall apart.
Three, “cloth” party will, in turn, have four big crisis: hunger crisis, the collapse of the ideology, social and economic crisis and crisis, the regime collapse, this process may last for decades, but the end no one can change.
The greatness of a nation lies not in its territory or even its history, but in its democratic traditions and the living standards of its citizens. As long as the citizens are still poor, as long as there is no democracy, there will be no unrest until the country collapses.
“Draw a moon for the lonely night sky.
Draw me under the moon and sing.
Draw a large window for the cold house.
Draw another bed.
Draw a girl with me.
Draw another lace bed.
Draw a stove and firewood.
We were born to live together.
Draw a flock of birds around me.
Let me draw green ridge and green slope.
Picture peace and serenity.
The rain fell on the rice fields.
There’s a rainbow you can touch with your hands.
There are stars in the picture that I have decided not to destroy.
There are endless smooth paths.
The end of the family dream has entered.
Picture mother’s peaceful pose.
There’s also an eraser argument.
Paint food that is not sad in four seasons.
A leisurely person never worries.
I didn’t wipe out the quarrel eraser.
There was only one painting of a lonely pen.
The night sky was no longer bright.
Only a sad child was singing.
Draw a moon for the lonely night sky.”
自然有错译的地方(如 there’s also an eraser argument. I didn’t wipe out the quarrel eraser),可是总体而言,专业出身的我也不敢说一定能译得更好,除非有旬月踟蹰。机器翻译超越业余翻译,已经是不争的事实。
Russian President Vladimir Putin — the country's longest-serving leader since former Soviet dictator Joseph Stalin — was headed to an overwhelming victory in Sunday's election for another six-year term, according to incomplete returns.
Putin's re-election was widely expected, and elections officials had pushed hard for a strong turnout to claim an indisputable mandate. Putin has been president or prime minister since December 1999, making him the only leader that a generation of Russians have ever known.
With ballots counted from 60% of the precincts, Putin won more than 75% of the vote, according to the Central Elections Commission.
By 7 p.m. Moscow time, authorities said turnout had hit nearly 60%.
Putin thanked thousands of people who rallied near Red Square after the vote. He hailed those who voted for him as a “big national team,” adding that “we are bound for success.”
He said the nation needs unity to move forward and urged the crowd to “think about the future of our great motherland.” He then led the enthusiastic crowd to chant “Russia!” the Associated Press reported.
谷歌的问题:
1. 不合适的选词:(不完全)“回报”(returns),(无可争议的)“任务”(mandate),这算小错。
2. as-短语挂错了地方:他赞扬那些投他为“大国家队”的人 (He hailed those who voted for him as a “big national team”,不大不小的错)
3. “we are bound for success.”(“我们一定会成功”)只有谷歌没翻对,它翻成了“我们必须取得成功”。有相当偏差。
百度的问题:
1. 选词不当:(不完全的)“回报”(returns),(无可争辩的)“任务”(mandate),这算小错。
2. 生硬,两个状语的安排不妥:“【用选票从60%的选区】,普京赢得超过75%的选票,【根据中央选举委员会】”。
3. as-短语挂错了地方:他称赞那些投票支持他为“大国家队”的人(He hailed those who voted for him as a “big national team”,不大不小的错)
A couple of months ago one of my old buddies recommended Youdao to me and for some reason, I fell in love with its service and app. So I shifted to Youdao. I downloaded Youdao to my iPhone and use it from time to time for fun, and for real, almost every day. It is very user-friendly and they carefully designed the interface, and most of the time I am very happy with its performance. Despite the name of the app as Youdao Dictionary, we can use the app as an instant speech translator, as if we were accompanied by a personal interpreter all the time. The instant translation is many times just amazing though it makes me laugh from time to time in some crazy translations. From MT as a business perspective, Youdao seems to be gaining momentum. Xunfei is also a big player, especially in speech translation.
some more examples: 红白喜事,冷热风,高低端,东南向,南北向,软硬件,中青年,中老年,黑白道,大小布什 ......
这些个玩意儿说是一个开放集(合成词)吧,也没有那么地开放;说封闭吧,词典也很难全部枚举。它对切词和parsing都构成一些挑战。这是词素省略构成合成词的汉语语言现象,还原以后是 conjoin 的关系 (Ax conj Bx),至于 ABx --> AxBx 的逻辑语义,还真说不定,因词而异,可以是:(1) and:南北美 --> 南美 and 北美;大小布什 -->大布什 and 小布什;(2)or:冷热风 --> 冷风 or 热风;正负能量 --> 正能量 or 负能量;(3)range:中青年 --> from 中年 to 青年,中老年 --> from 中年 to 老年;(4)and/or: 进出口 --> 进口 and/or 出口;(5)一锅粥(and/or/ranging): 高低端 --> 高端 and/or 低端 or from 高端 to 低端。
白:
小微银行;三五度
李:
逻辑语义解析先放一边(很可能说话的人自己就一笔糊涂账,不要勉强听话人或机器去解析 and、or 还是 ranging),就说切词和parsing的挑战怎么应对就好。冷热风 在传统切词中是个拉锯战:【冷热】风 vs 冷【热风】;“南北美”:【南北】美 vs 南【北美】。
问,难道切词或 parser 还能补语言材料?当然能。不能的话,bank 怎么成的 bank1 (as in bank of a river)和 bank2(as in a com李rcial bank)?举个更明显所谓 coreference 的例子:John Smith gave a talk yesterday. Prof Smith (== John Smith), or John (== John Smith)as most people call him, is an old linguist with new tricks.
白:
高低杠、南北朝、推拉门、父母官……
李:
This last example below demonstrates the need for recovering the missing language material:
A: Recently the interest rate remains low.
B: How low is the rate (== interest rate)? // 不补的话,就不是利率了,而是速率。
第一章小结:骨灰级老革命在没有理论探索的情况下,就在 deep parsing 的 field work 中经历了两种事实:一种是不受困扰的多层 parser,一种是深陷其中的单层 parser。因此,当白老师一口咬定深度分析的这个挑战的时候,我觉得一脑门道理,但就是有理说不清。至少一句两句说不清,只好选择逃遁。
对于绝大多数主流NLP-ers,NL的文法只有一派,那就是 CFG,无论多少变种。算法也基本上大同小异,chart-parsing 的某种。这个看法是压倒性的。而多层的有限状态文法做 parsing,虽然已经有半个多世纪的历史,却一直被无视。先是被乔姆斯基主流语言学派忽视,因为有限状态(FSA)的名字就不好听(多层不多层就懒得细究了),太低端小气下位了。由于语言学内部就忽视了它,自然不能指望统计派主流对它有重视,他们甚至对这路parsing没有啥印象(搞个浅层的模式匹配可以,做个 NE tagging 啥的,难以想象其深度parsing的潜力),尽管从有限状态这一点,其实统计派与FSA语言派本是同根生,二者都是乔老爷居高临下批判的对象,理论上似乎无招架还手之力。
白: 概率自动机和马尔可夫过程的关系
李:但是,多层 FSA 的精髓不在有限状态, 而是在多层(就好比 deep learning 的精髓也在多层,突破的是传统神经网络很多年停滞不前的单层)。这就是那天我说,我一手批判统计派,包括所有的统计,单层的多层的,只要他们不利用句法关系,都在横扫之列。因为这一点上还是乔老爷看得准,没有句法就没有理解, ngram 不过是句法的拙劣模仿,你的成功永远是浅层的成功, 你摘下的不过是低枝果实。不过恰好这种果子很多,造成一种虚假繁荣罢了。
另一方面,我又站在统计派一边,批判乔姆斯基的蛮横。实践中不用说了,管用的几乎都是有限状态。乔老爷要打死单层的有限状态,我没有意见。统计派的几乎所有模型(在 deep learning 火起来之前)都是单层,他们在单层里耗太久了不思长进,死不足惜,:)。 蛮横之处在于乔老爷对有限状态和ngam多样性的忽视,一竿子打翻了一船人。
李:当然。次范畴就是小规则,小规则优先于大规则。语言规则中,大类的规则(POS-based rules)最粗线条,是默认规则,不涉及具体的次范畴(广义的subcat)。subcat based 的其次。sub-subcat 再其次。一路下推,可以到利用直接量(词驱动)的规则,那是最优先最具体的,包括成语和固定搭配。
QUOTE: Countless lessons learned over the years in the NLP system development show that a robust real life system should not be too sophisticated just as man should not be too smart. As a rule of thumb, anything involving more than 3 levels of dependency is too delicate. You can "make" it work today, but it will break some day.
Me:
nlp 有平台的一面。当然 可以说 ai 平台涵盖了 nlp 平台,不过毛毛虫机制(formalism)的探索,现有的ai平台貌似不能完全涵盖。
我们所处的时代恰好是领域专家被歧视的时代。主流把领域专家看成资料员,或负担,不是一日两日了,是整整一代,有一代的断层。但大趋势是,领域专家在下一个时代会成为香饽饽,他们是 ai 的主力和实施的关键,质量的保证。对于可能降格为机器人的平台维护人员,领域专家是他们的客户,他们的上帝。一切为领域专家服务。
这不是乌托邦的图景,是有相当明显迹象的趋势。其实在小范围内,这也是一种已经实现过的模式。过去18年在我建立的环境中,基本就是采这种模式。语言学家团队属于领域专家,一直被伺候着。一个很深的体会是,领域专家中有两类,一类是可以培训出来,具有某种 engineering sense,因此可以适应这种 AI 模式。也有领域专家就是不入,虽然领域素养很深厚,但就是与 AI 无缘。
我:
喜欢坏人干的某件好事,被当成支持坏人的证据,在情感分析上没有错。在大数据分析的时候,点点滴滴皆证据。坏人干了好事儿,只要被提到,就给这个坏人加了一分(褒)。这一点儿也不影响对于这个坏人的舆情分析全貌。恰恰相反,这才是真实的舆情。如果坏人干了好事儿被提及 n 次,坏人干了坏事儿提到了 m 次,纯粹厌恶坏人的情绪表达提到了 o 次,纯粹喜欢坏人的情绪表达提到了 p 次(p 通常接近零),那么这个坏人的褒贬指数,就是 (n+p) 与 (m+o) 的比例。请放心,p 基本是 0,而 n 也会远远小于 m,这个大众舆情不仅靠谱,而且真实,没有 bias。
我们生在大数据信息过载的时代。以前一直觉得作为NLPer,自己的天职就是帮助解决这个过载的问题。就好像马云的宏愿是天下没有难做的生意,我们玩大数据的愿景应该就是,天下没有不能 access 的信息。于是谷歌出现了,用粗糙的关键词和数不厌大的气概,解决了信息长尾问题。于是我们开始批判谷歌,信息长尾解决的代价是数据质量太差。于是人智(AI)派来了,借力深度作业(deep processing, whether deep learning or deep parsing),企图既要解决大数据的长尾,也要大幅提升数据质量,让全世界对于信息感兴趣的心灵,都有一个源源不断的信息流。这是从我们从业者的角度。
最可怕的是在下一代,可以看到他们的挣扎和无助。games、social media 和 internet 吞噬了无数青春。而世界基本是束手无策,任其沉沦。家长呢,只有干着急。我们自己都不能抵制诱惑,怎么能指望年青一代呢。充满 curiosity 和躁动的心灵,注定受到信息过载的奴役最深。其社会成本和代价似乎还没有得到应有的深入研究。
subcat:subcat 的原义指的是谓词的子类,这个子类对应了这个词的特定句型(譬如,双宾句型,宾+宾补句型,等)。白老师说的 subcat 扩展到不一定具有对应句型的子类。譬如,碗,背后的subcat是“容器”“餐具”;汤,背后的subcat是“液体”“食物”。这实际上是本体语义(ontology)的层级结构,如 ISA taxonomy chain:碗 ISA 餐具,餐具 ISA 工具,工具 ISA 商品;商品 ISA 人造物品;人造物品 ISA 物品;物品 ISA 实体(逻辑名词,这是这个 chain 的顶端节点 TOP 了)。
“耍流氓”:指的是对于二元依存关系不能定性,但是可以认定具有某种关系。汉语句法中,句首的名词短语在没有确定其性质是主语、宾语或定语、状语之前,往往先给它一个 Topic 标签,挂靠到后面的谓语身上,白老师认为这就是耍流氓。同理,当两个实词之间的关系基本可以确认,但是不能定性的时候,我们往往根据其出现的先后次序,让 parser 给一个 Next 的标签把二者连上,作为一个增强句法分析器鲁棒性(robustness)和查全率(recall)的打补丁的手段。这也算是先耍一下流氓,因为理论上后去还是需要语义模块去确认是何种关系才算深度分析到位。如果是两个中文动词一先一后系统给了 Next,其默认关系是【接续】,就是汉语文法书上所谓的“连动”结构。
Topic:汉语分析中,句首名词短语如果不直接做主语、宾语等,很多分析就给 一个Topic(主题)的标签。汉语文法的一个突出语言句型现象就是所谓双主语句(常常分析成一个Topic or 大主语,加一个小主语:譬如,他身体特别好。这家公司业绩直线上升。)由于这种关系逻辑语义的性质不明,聊胜于无,所以也称这种二元关系的建立为“耍流氓”。
TRUMP: Chief Justice Roberts, President Carter, President Clinton, President Bush, President Obama, fellow Americans and people of the world, thank you.
We, the citizens of America, are now joined in a great national effort to rebuild our country and restore its promise for all of our people.
Together, we will determine the course of America and the world for many, many years to come. We will face challenges, we will confront hardships, but we will get the job done.
Every four years, we gather on these steps to carry out the orderly and peaceful transfer of power, and we are grateful to President Obama and First Lady Michelle Obama for their gracious aid throughout this transition. They have been magnificent. Thank you.
Today's ceremony, however, has very special meaning because today, we are not merely transferring power from one administration to another or from one party to another, but we are transferring power from Washington, D.C. and giving it back to you, the people.
For too long, a small group in our nation's capital has reaped the rewards of government while the people have borne the cost. Washington flourished, but the people did not share in its wealth. Politicians prospered, but the jobs left and the factories closed. The establishment protected itself, but not the citizens of our country. Their victories have not been your victories. Their triumphs have not been your triumphs. And while they celebrated in our nation's capital, there was little to celebrate for struggling families all across our land.
That all changes starting right here and right now because this moment is your moment, it belongs to you.
It belongs to everyone gathered here today and everyone watching all across America. This is your day. This is your celebration. And this, the United States of America, is your country.
What truly matters is not which party controls our government, but whether our government is controlled by the people.
January 20th, 2017 will be remembered as the day the people became the rulers of this nation again.
The forgotten men and women of our country will be forgotten no longer.
Everyone is listening to you now. You came by the tens of millions to become part of a historic movement, the likes of which the world has never seen before.
At the center of this movement is a crucial conviction, that a nation exists to serve its citizens. Americans want great schools for their children, safe neighborhoods for their families, and good jobs for themselves. These are just and reasonable demands of righteous people and a righteous public.
But for too many of our citizens, a different reality exists: mothers and children trapped in poverty in our inner cities; rusted out factories scattered like tombstones across the landscape of our nation; an education system flush with cash, but which leaves our young and beautiful students deprived of all knowledge; and the crime and the gangs and the drugs that have stolen too many lives and robbed our country of so much unrealized potential.
This American carnage stops right here and stops right now.
We are one nation and their pain is our pain. Their dreams are our dreams. And their success will be our success. We share one heart, one home, and one glorious destiny. The oath of office I take today is an oath of allegiance to all Americans.
For many decades, we've enriched foreign industry at the expense of American industry; subsidized the armies of other countries, while allowing for the very sad depletion of our military. We've defended other nations' borders while refusing to defend our own.
And spent trillions and trillions of dollars overseas while America's infrastructure has fallen into disrepair and decay. We've made other countries rich, while the wealth, strength and confidence of our country has dissipated over the horizon.
One by one, the factories shuttered and left our shores, with not even a thought about the millions and millions of American workers that were left behind. The wealth of our middle class has been ripped from their homes and then redistributed all across the world.
But that is the past. And now, we are looking only to the future.
We assembled here today are issuing a new decree to be heard in every city, in every foreign capital, and in every hall of power. From this day forward, a new vision will govern our land. From this day forward, it's going to be only America first, America first.
Every decision on trade, on taxes, on immigration, on foreign affairs will be made to benefit American workers and American families. We must protect our borders from the ravages of other countries making our products, stealing our companies and destroying our jobs.
Protection will lead to great prosperity and strength. I will fight for you with every breath in my body, and I will never ever let you down.
America will start winning again, winning like never before.
当然 用这些服务是要交钱的,但主讲人说很便宜很便宜的,郭兄说,真用上了,其实也不便宜。便宜与否放一边,至少现如今,bots 的门槛很低,需要的不是软件人才,而是领域数据的人。于是,我看到一种前景,以前毕业即失业的语言学家、图书馆业人士,将来可能成为 AI 的主力,只有对数据和细节敏感的人,最终才是 AI 接口的血肉构筑者,反正架构是现成通用的。这个细想想是有道理的。这是沃森 API calls 的价格。
对于 NLP(AI之一种) 我写过 n 篇博文强调,所有的 offshelf 的平台和toolkit(譬如 历史悠久的GATE),甚至一个小插件(譬如 Brill Tagger or some Chinese word segmenter)都不好用。可以 prototyping 但如果稍微有点长期观点 要建一个大规模的NLP的应用,还是一切自家建造为好。当然,自家建造的门槛很高,多数人造不起,也没这个 architect 来指挥。但最终是,自家建造的胜出,从质量上说(质量包括速度、鲁棒性、精度广度、领域的可适应性等关键综合指标)。
ok 你把机器翻译玩转了。因为 MT 有几乎无限的 “自然” 带标数据(其实也不是自然了,也是人工,幸运的是这些人力是历史的积累,是人类翻译活动的副产品,是不需要开发者花钱的 free ride)。可其他的 ai 和 nlp 应用呢,你还可以像 MT 这样幸运 这样享用免费午餐吗?
现在想,紧接着 MT 的具有大数据的热门应用是什么?非 bots 莫属。
对于 bots,数据已经有一定的积累了,其最大的特点在于,bots 的使用过程,数据就会源源而来。问题是 这些数据是对路的,real life data from the field,但还是不带标啊。所以,bots 的前景就是玩的跟数据打仗:可以雇佣人去没完没了地给数据做标注。这是一个很像卓别林的【摩登时代】的AI工厂的场景,或者是列宁同志攻打冬宫的人海战术。看上去很笨,但可以确定的是,bots 会越来越“智能”,应对的场景也越来越多。应了那句老话,有多少人工,就有多少智能。然而,这不是、也不应该是 唯一的克服知识瓶颈的做法。
当年的Watson是建在UIMA(Unstructured Information Management Architecture)的基础上的,确实使用Prolog(The Prolog Interface to the Unstructured Information Management Architecture,https://arxiv.org/ftp/arxiv/papers/0809/0809.0680.pdf)。
“IBM沃森的工具用的是深度神经。”
"直到今天他们仍然是没有句法分析,更甭提深度分析。"
当年Watson打败Jeopardy!冠军后,IBM Journal of Research and Development出过专辑,对于Watson的构造的描述好象不是这样的。比如parsing是这样描述的:http://ieeexplore.ieee.org/document/6177729/
“Two deep parsing components, an English Slot Grammar (ESG) parser and a predicate-argument structure (PAS) builder, provide core linguistic analyses of both the questions and the text content used by IBM Watson™ to find and hypothesize answers. Specifically, these components are fundamental in question analysis, candidate generation, and analysis of passage evidence. As part of the Watson project, ESG was enhanced, and its performance on Jeopardy!™ questions and on established reference data was improved. PAS was built on top of ESG to support higher-level analytics. ”
上图第二句的分析,就是我以前说过的结构歧义的应对。你看整个句法树,有三个 O (宾语) 的路径。其中两个是正确的【到-上海】; 【买-上海的飞机票】。第三个 O 【到-上海的飞机票】 是不对的。可以说,“到上海”和“买飞机票”,但不可以说“到飞机票”。这类结构歧义在汉语特别普遍,因为汉语没有宾格,加上汉语的小词 “的”的辖域是一个很大困扰。
QUOTE 甜甜自记事起,就住在这里,水牛城自然是她心目中不可替代的唯一故乡。记得四年前第一次带甜甜回北京探亲,第一天的晚上住在姥姥家,一切对她是那么陌生,没有她已经习惯的美国卡通电视,她满脸委屈地吵着闹着要回家(“I want to go home!”)--当然是回水牛城的家。我告诉她这就是家呀,是妈妈的家,她怎么也无法认同。