After leaving from Jingdong, Zhou Bo Wen has not been so excited for a long time.
ChatGPT came out of nowhere to stir up the world, just like a spring thunder that woke up practitioners from all walks of life, so that they all coincidentally heard the footsteps of AGI coming into reality.
Under the boom, people see Wang Huiwen, Wang Xiaochuan down to start a business, but also see Baidu, Ali tigers and dragons. Zhou Bowen, as the former director of the AI research institute of IBM and Jingdong two big manufacturers, has been researching the basic theory of artificial intelligence and core frontier technology, application and industrialization for more than 25 years, and has founded Articulate Faraway Technology as early as the end of 2021, through self-research large model, with generative AI, multi-round dialogue and human-machine collaboration technology, to help enterprises and consumers to complete product innovation and digital intelligence transformation in the new era of artificial intelligence. "Rather than deciding to start a business in this field, the cause found me", Zhou described it as something that had to be done, like a sense of mission was urging him to act.
After graduating from China University of Science and Technology, he went to the University of Colorado at Boulder to obtain his Ph. As the former director of the AI Foundation Institute at IBM Research's U.S. headquarters, he returned to China after presiding over AI-related work for many years and has served as senior vice president of Jingdong Group, chairman of the Group Technology Committee, president of Jingdong Cloud and AI, and founding director of Jingdong AI Research Institute. As the creator of Jingdong AI, he is responsible for the technical research and business development of Jingdong AI, created Jingdong AI Division, AI Research Institute and Jingdong AI Accelerator from 0, built the national open platform of intelligent supply chain AI, realized the daily call volume from 0 to 10 billion, led the technical reconstruction of Jingdong AI customer service and started external productization, managed the multi-billion scale technical service business and thousands of people He manages multi-billion technical service business and thousands of people's technical, product, marketing and sales integration team.
In 2021, Bo-Wen Zhou foresaw that generative AI would explode in the near future, and decided to leave Jingdong to found Articulate Technology, which is dedicated to helping enterprises in vertical fields to carry out product innovation and digital intelligence transformation with generic large model capabilities, and reshape the value of goods with AI; in 2022, he became Huiyan Chair Professor and long-appointed professor of the Department of Electrical Engineering at Tsinghua University, and founded Tsinghua University Collaborative Interaction Intelligence Research Center in May of the same year. His research direction coincides with GPT's approach of using prompt-driven generative AI.
ChatGPT is coming, and Zhou Bowen also posted in his circle of friends: "I believe that China's OpenAI needs to explore a new path!" Under the grand feeling, there is a thirst for talent. However, unlike other entrepreneurs, Zhou Bowen and Articulate Technology chose to rely on the scale of tens of billions of parameters and unique training methods, so that the big model is better at understanding the relationship between people and commodities on the basis of universal capabilities, and help companies reconstruct the whole chain of innovation system from commodity insight, positioning, design, R&D to marketing with generative AI technology.
Zhou Bowen has said in public that his entrepreneurial direction is to take the lead in integrating artificial intelligence with traditional industries to bring higher value to enterprise digital intelligence innovation, i.e. to achieve a breakthrough in the capability of generic big models in vertical scenarios.
Recently, AI Technology Review reporter and Zhou Bowen had a conversation, the following is the transcript of the conversation, AI Technology Review of the content of the editorial arrangement without changing its original intent:
Let AI learn human intelligence, a new paradigm of interaction and collaboration AI Technology Review: ChatGPT has brought this kind of interaction method of prompt, what do you think is the difference relative to the past interaction methods?
Zhou Bowen: One of my research directions is the interaction between AI and human, and learning in the interaction. Human-computer interaction is not the same as human-computer dialogue, through human-computer interaction AI can learn something in the process, so it is not a simple task to perform, but a means to achieve learning.
The story of Confucius and his seventy-two disciples learning through interaction is recorded in the Analects of Confucius. In the West, similar to Plato and Aristotle's Athenian Academy, the oldest transmission of knowledge and wisdom was done through human dialogue, where the teacher helped the student to learn better by interacting with him.
For example, if a teacher tells a student to pour a glass of water, a simple "command-execute" action is hardly wisdom-growing; but if a teacher teaches a student how to write an essay and tells him or her how to overcome difficulties in the writing process, this is wisdom-growing interaction, and reflects my core view of human-AI interaction. The core idea.
The essence of AI is to collaborate and interact with humans, and it learns from the interaction to collaborate with humans to solve problems better. This view will become more and more important in the near future, but it will also face more challenges at the technical and ethical levels, and it will not be easy to keep the bottom line in the end. AI Alignment, as they say, allows humans to pass their will to the AI and then work with the AI to break down the task, allowing the AI to learn and realize the human will in the process. This is a new way of collaboration, i.e. collaborative interactive intelligence.
AI Technology Review: Do you think that achieving value alignment through interaction is an effective way for the human brain and GPT to synergize? How should humans and AI better synergize?
Berman Zhou: After the explosion of generative AI, AI that learns through collaborative interaction with humans will become stronger and stronger.
In his best-selling book Thinking Fast And Slow, Daniel Kahneman, winner of the Nobel Prize in Economics in 2002, proposed that there are two modes of human thinking - System 1 and System 2. System 1 is fast thinking and intuitive judgment; System 2 is slow thinking and has to do a lot of reasoning and Calculations.
Initially, people thought that AI is more suitable for "System 1" work, for example, face recognition and quality inspection are based on "System 1" pattern recognition. The emergence of ChatGPT validates the feasibility of AI to do System 2, which means that AI can discover new knowledge, and the discovery of new knowledge will help humans design better AI, such as the discovery of brain science and computational optimization, and a flywheel for creating new knowledge will emerge. emerges. The flywheel effect precisely means that AI can make the whole system better at discovering new knowledge, and this new knowledge can help design better AI systems, thus forming a virtuous circle. Thus, a mutually reinforcing relationship is formed between AI and knowledge and innovation, which requires a shift in the way AI and humans work together.
I have previously proposed a research direction of "3+1", which is to take trustworthy AI as the research base and long-term goal, and multimodal representation interaction, human-machine collaborative interpretation, and environmental co-evolution as the research focus, with the core goal of human-machine co-creation, so as to achieve the goal of human helping AI innovation and AI helping human innovation.
One of them is multimodal representational interaction, where there may be a grand unified theory. In 2022, people are still skeptical about this, but with the advent of GPT-4, this multimodal unified representational interaction has become more convincing; another point is human-machine co-interaction. This was also viewed with skepticism in 2022, but now that this interaction has become more plausible, people are starting to believe that it is likely to happen. The third point is the co-evolution of AI and environment, which means that AI needs to not only co-operate with humans but also adapt itself to the surrounding environment. We pioneered this concept in early 2022, and until now have not seen any success stories in this direction, not even with OpenAI.
Can't learn OpenAI, can't do Microsoft, domestic big model startups have to do subtraction AI Technology Review: The special feature of Transformer model is the use of Attention mechanism to model text, we noticed that you have carried out research related to AI Attention mechanism early.
The core highlights of Transformer are the self-attention mechanism and the multi-head mechanism. in June 2017, Google Brain published "Attention is All You Need" in which the self-attention (self-attention) mechanism and the Transformer concept. Later, OpenAI's GPT was also deeply influenced by this paper.
Before that, I was the corresponding author of the first paper that introduced multi-hop self-attention mechanism to improve encoders - "A Structured Self-Attentive Sentence Embedding". This paper was completed and uploaded to arXiv in 2016 and formally published in ICLR in early 2017, and we are the first team to propose this mechanism, and more crucially this is the first natural language representation model that does not consider downstream tasks at all. People have used attention or self-attention in some cases before, but they were task dependent.
AI Technology Review: What kind of discoveries did you make in this paper? And how did these findings influence the later technical iterations of Transformer?
Berman Zhou: We proposed at that time in our paper that the best way to represent natural language (NLP) is to use structured self-attention, and this paper has been cited more than 2,300 times since it was published.
Before that, OpenAI Chief Scientist Ilya Sutskever argued that the best representation is "Seq2Seq", where a model is trained so that sequences in one domain are transformed into sequences in another domain, such as in machine translation, where the source language corresponds to the target language, or in question answer, the question is a sequence and the answer is a sequence. On this basis, the mapping relationship between the two characterized by the deep neural network is learned.
But then, deep learning expert and Turing Award winner Yoshua Bengio's team proposed an "attention mechanism", which centers on the idea that not all words are equally important when answering a question; if one can identify the more critical part based on the correspondence between the question and the answer, and then pay more attention to that part, one can give better answers. In 2015, I led a team at IBM to start a research based on the "Seq2Seq+Attention Mechanism" architecture and idea, and launched some of the earliest generative models for AI writing in natural language, which have been cited more than The related papers have been cited more than 3000 times.
However, I was not satisfied with the content of the paper, because it had a problem that attention was built based on the answers given. The AI trained in this way is, figuratively speaking, like the student who asks the teacher to draw the key points before the final exam in college, and then goes on to review with targeted (attention) focus. Although the performance of such an AI can be improved for specific problems, it is not generalizable. Therefore, we propose to learn which parts are more important and how they relate to each other by reading the AI multiple times without relying on the given task and output, based only on the intrinsic structure of the input natural language, which is the representational learning of self-attention plus multiple head mechanism. This learning mechanism only looks at the input, more like a student learning to understand the course multiple times and systematically right before the exam, instead of targeting and fragmenting based on the exam focus, thus approaching the purpose of general AI more closely and greatly enhancing the AI's learning ability.
AI Technology Review: How does the paper "Attention is All You Need" relate to you?
I am proud to have done some forward-looking work in this area. is All you need", a landmark paper that brought the Transformer model to the world. Our paper "A Structured Self-attentive Sentence Embedding", which first proposed a "multi-hop self-attentive mechanism", was cited in early 2017. The first author of this paper, Ashish Vaswani, was a student I mentored at IBM. "The title of the paper, "Attention is All You Need," says exactly what we thought it meant: "Self-attentiveness is important, multi-heads are important, but RNNs may not be as important as we thought.