《AI浪潮:关于ChatGPT 的 思维链能力 》

 

立委:Chain-of-thought (CoT,思维链)也可以翻译为不掉链子。这个能力我们在玩 ChatGPT 的时候常常能体会到。相比于以前的模型,它不掉链子的表现令人印象深刻。

鲁为民:我觉得 Google 的 LaMDA 可能也不弱,特别是他们的 Chain of Thought 技术, 看看他们是否可以像当时的搜索一样后来居上。但 OpenAI 目前占有先机: OpenAI 通过 DALLEs, GPTs, 特别是现在的 ChatGPT 积累了大量的用户使用数据以及从微软获得的代码数据 (GitHub)  (我之前问Yao Fu可能选择去 OpenAI 的原因,他提到其数据优势)。

李志飞:Chain of thoughts 没啥具体技术吧?我记得就是 prompt 时加了个咒语  lets think step by step? 有没有具体技术论文。

鲁为民:我之前怀疑 OpenAI 的初步 chain of thought 能力使用了Google 的技术 , 但现在看来是由于其用代码训练的结果; 另外从Google 发布的结果来看,其语音模型的数学推理能力要比 ChatGPT 好一些。

之前发过这个: GPT 缺乏基本的推理能力 (包括这种多步算术推理)。Google 用思维链有些帮助:"In 'Chain of Thought Prompting Elicits Reasoning in Large Language Models,' we explore a prompting method for improving the reasoning abilities of language models. Called chain of thought prompting, this method enables models to decompose multi-step problems into intermediate steps. With chain of thought prompting, language models of sufficient scale (~100B parameters) can solve complex reasoning problems that are not solvable with standard prompting methods"。

https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html

关键怎样激发 Chain of Thought 的能力。这个需要在模型训练时就build in。

刘群:最早提出的CoT是用少量样例(few shot)来激发的,后来有人提出只用lets think step by step,并把这种做法成为Zero-shot CoT。CoT应该是模型大到一定程度才能出现的能力。

吕正东:我觉得CoT的核心思想是很好的,相当于强迫LLM进行一些符号层面的表示和推理,不过咒语式的交互感觉不够用。

李志飞:为啥有COT能力?我有个假设:1. 模型看了大量的代码,学习了怎么一步一步“逻辑思考”;2. 模型看到了少量的咒语(比如说 lets think by steps) 以及跟随咒语后的逻辑推理例子,学会了怎么激活逻辑推理;3. 通过这个,模型把咒语激发(只需少量例子)和具体怎么逻辑推理(需要海量例子)解耦了。

不知道是否对?如果在 Open AI工作,直接可以做实验来验证这个假设是否合理,只可惜这是个假设,现在大模型的研究成了 Open AI 的专属特权了。

至于每一个是怎么“学会”的,我只能理解是符号表示、任务分类、推理逻辑、关系等都被向量化,然后通过attention(不管是cross还是self)把这些向量给关联起来了。

不知道学术界是否有更深更好的解释?

立委:还是直接问chat本尊吧:

wei.li1@
what is CoT (Chain-of-Thought)? How does an LLM acquire this ability? What role does it play in complex reasoning and in enhancing long conversation flows.


I'm not familiar with the term "CoT" or "Chain-of-Thought." It is possible that you may be thinking of a different concept. Could you provide more context or clarify what you are asking about?


wei.li1@
It is claimed that large language model has learned this ability of chain-of-thought, which help the response to a complex problem to be step by step and more logical. I am just wondering how this can be learned


I see. It is possible that the term "Chain-of-Thought" is being used to refer to the ability of a language model to generate coherent and logical text that follows a chain of ideas or arguments.
Language models are trained on large amounts of text data and are able to learn patterns of language usage and structure. This enables them to generate text that is coherent and follows the conventions of written or spoken language.
In terms of complex reasoning, language models may not necessarily be able to reason in the same way that a human can, but they can generate text that appears to be logical and follows a chain of ideas. This can be useful for tasks such as generating responses to questions or creating natural-sounding text in a conversation. However, it is important to note that language models do not have the same kind of understanding or general knowledge as a human and should not be used to make decisions or solve problems that require critical thinking or judgment.


wei.li1@
elaborate ...


Certainly. A language model is a machine learning model that is trained to predict the likelihood of a sequence of words. This is done by analyzing a large amount of text data and learning the statistical patterns and structures that are characteristic of the language. Once trained, a language model can generate new text by predicting the next word in a sequence based on the words that come before it.
One way that a language model can learn to generate text that follows a chain of ideas or arguments is by being trained on data that contains examples of such text. For example, if the model is trained on a dataset of news articles, it may learn to generate text that follows a logical structure, such as an introduction, a body with supporting details, and a conclusion. Similarly, if the model is trained on a dataset of academic papers, it may learn to generate text that presents a research question, describes the methods used to investigate it, presents the results, and discusses the implications of the findings.
While language models can generate text that appears to be logical and coherent, it is important to note that they do not have the same kind of understanding or general knowledge as a human. They are simply following the patterns and structures that they have learned from the training data and do not have the ability to think or reason in the same way that a human can.


wei.li1@
good. learning by examples, that sounds like few shots learning for the chain of thought. Can it be learned by zero shot following natural language instruction such as "decompose it step by step"


It is possible that a language model could be trained to generate text that follows a chain of ideas or arguments in a step-by-step manner by being provided with examples of such text along with explicit instructions or labels indicating the structure of the text. For example, if the model is trained on a dataset that includes text that has been annotated with labels indicating the beginning and end of each step in a process, it may learn to generate text that follows a step-by-step structure.
It is also possible that a language model could be trained to generate text that follows a chain of ideas or arguments by being provided with more general instructions or prompts, such as "decompose it step by step." However, this would likely require a more complex model and more sophisticated training method, such as reinforcement learning, where the model is trained to maximize a reward signal by taking actions that produce the desired output.
It is important to note that while a language model may be able to generate text that follows a logical structure or chain of ideas, it does not have the same kind of understanding or general knowledge as a human and should not be used to make decisions or solve problems that require critical thinking or judgment.

关于 few shots,它的回答是: One way that a language model can learn to generate text that follows a chain of ideas or arguments is by being trained on data that contains examples of such text. 后面给了例子。

关于 zero shot,它的回答更绝:It is also possible that a language model could be trained to generate text that follows a chain of ideas or arguments by being provided with more general instructions or prompts, such as "decompose it step by step." However, this would likely require a more complex model and more sophisticated training method, such as reinforcement learning, where the model is trained to maximize a reward signal by taking actions that produce the desired output.

虽然说,几乎可以肯定它并不是知道内部 know-how 试图透露机密的人/机,它的回答也是根据公开信息,预料之中的标准答案,但它的 chain of one argument after another, each followed by its own explanation or illustration,几乎滴水不漏啊。CoT, flowing, isn't it amazing

这个例子还有一个有意思的点:开始的时候,它根本就不知道这个术语,处于无知状态,说明后来的回答是从我开始解释这个术语的时候,它才 in-context 知晓主题,并能“拼凑”出合适的看上去很有条理的回应。

李志飞:批评一下,你这个属于无脑粉行为了?

立委:接受批评。

李志飞:我看了一些解释强调的是模型能有COT是因为模型大和历史context长(比如说4096个字),我觉得这是不对的,至少没有太大帮助。比如说你可以训练4096-gram的ngram模型,模型够大吧,context一样长吧,但这个ngram模型绝对不可能有COT能力。

鲁为民:这个应该是合理的解释@李志飞 。不过如果需要通过CoT Prompt 激活,确实很局限,而且推理并不保证对。@魯東東

立委:context拉长(比如说4096个字)肯定是有助于学到这种长链条的 discourse 的路数的。很难想象以前的 context cut off 太窄,可以容纳这类学习。模型大,有助于这种 emerging/amazing 能力的孕育。

深刻怀疑做 chatGTP 的人 也在迷惑中 .... 所以大家伙儿也只好猜谜了。

李志飞:我们昨天统一思想了:只把大模型能力归结为涌现是偷懒行为,我们要追根问底。我们要学习think step by step, 否则连GPT都不如了

立委:在下自叹弗如,诚恳地。无论是讲演还是回答问题,根本就没它那种条理性,只有一条“强过”它:比它富有激情,经常自己把自己 carried away。

鲁为民:ChatGPT 在多个方面应该是超过人类的平均水平。所以对于个人来说,应该是大面积被它超越。

 

 

【相关】

《朝华午拾》电子版

李维 郭进《自然语言处理答问》(商务印书馆 2020)

【语义计算:李白对话录系列】

【置顶:立委NLP博文一览】

发布者

立委

立委博士,问问副总裁,聚焦大模型及其应用。Netbase前首席科学家10年,期间指挥研发了18种语言的理解和应用系统,鲁棒、线速,scale up to 社会媒体大数据,语义落地到舆情挖掘产品,成为美国NLP工业落地的领跑者。Cymfony前研发副总八年,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据