Big model helps industry development to realize "end-to-end" value realization
I am glad to participate in this forum to share some thoughts on the development of China's large model industry from the perspective of industrial development.
First of all, let's take a look at why we should attach great importance to big models. It can be said that the development of this wave of big models represented by ChatGPT is actually just the outpost of the arrival of the whole general artificial intelligence, and a series of technological innovations have been generated subsequently, such as the development of graphic generation models like Dalle-2 and Midjourney, whose generation results can sometimes be faked; the arrival of big models with body multimodality like PaLM and PaLM-2 and PaLM-2, are also closely related to the future development of the robotics industry.
Since last December, a series of revolutionary, milestone technological innovations have arrived one after another in just three or four months. It can be said that this also announces the arrival of the era of general artificial intelligence, which will certainly set off some new industrial waves and revolutions in the future.
We believe that this wave of AGI revolution is a new "meta-revolution". As you know, the industrial and technological revolutions, such as the invention of electricity and the steam engine, are all the products of human intelligence itself, but this revolution is the only one about "intelligence" itself - the intelligence of machines is expected to approach or even surpass completely human intelligence.
The revolution of intelligence itself is in no way comparable to those technological revolutions in the traditional sense, so it is more like a "meta-revolution" to which we need to pay great attention strategically. This is not only my personal view, the recent meeting of the National Political Bureau also formed such a view, the whole AGI development is too fast, and now many views of the refresh can not be calculated by year, but by month. Some reports are 2 months old, but in the current environment of rapid development, some views may have become obsolete.
The big model is very important for the development of the industry, which involves the value of the data that I have mentioned on many occasions to realize now. The development of "Digital Intelligence China" and digital economy is a very important development strategy for our country, and all local governments are shouting the slogan of developing digital economy. For the digital economy, one very important part is the realization of the value of data. We used to do artificial intelligence big data, in fact, the realization of the road is very tortuous, very difficult, very heavy, many friends in the investment community are saying that most of the companies invested in the past years always seem to be unable to make ends meet. But now with the big model ushered in a new opportunity to achieve an "end-to-end" value realization.
Big data is not selective, no matter what kind of data, big model can be "refined", "refined" data into a big model, and through the big model to empower. You will find that it does not require too much human intervention, unlike the previous big data, artificial intelligence, the A side not only to pay for people, but also need to inform the knowledge system, business logic and other elements, which is a very heavy way to realize. But with the big model and ChatGPT, the path of realizing the value of data through big model unified empowerment is becoming clearer and clearer. Especially with technologies such as AgentGPT and AutoGPT, many capabilities of big models and information systems can be linked together to solve very complex tasks in business scenarios. Therefore, it can be said that the "end-to-end" realization of big models brings us a major opportunity.
ChatGPT has 200 million users in just a few months, and the ecological plug-ins behind ChatGPT are now very diverse, which means that instead of chatting with friends directly with WeChat, we can first chat with ChatGPT about, for example, what movie tickets to buy, what taxi to take, what maps to open, and what meals to order. This makes it very likely to become the new entrance to the Internet. As you know, every change in the entrance of the Internet industry is a revolution, so it is visible to To C. The new entrance is coming.
For the To B industry, the first meaning of the big model is like an engine upgrade. For example, we have been building cars for hundreds of years, the car is a shell four wheels, but over the past few hundred years we have changed from traditional steam to oil and gas to the current electricity, the car engine is constantly changing. We used to the whole artificial intelligence, big data products, such as information systems, software products, with a small model, now we can replace the big model, this engine upgrade is the first meaning.
Second, the big model is expected to become a new controller in the To B industry. to B scenario, there are various information systems in the enterprise, including customer relationship management system, enterprise resource management system, OA office system, as well as database, knowledge base, industry document library. But all these libraries were scattered before. After we connect a big model above, this big model is expected to become a controller, which can collaborate with various information systems in the enterprise and ecology to complete a new and more complex decision and more complex planning tasks.
The big model is expected to become a controller to link the whole traditional information system together, so as to realize complex decision making in To B scenario in real sense, and To B scenario is essentially to realize complex decision making, so the big model has great significance to To B industry.
The big model announces that AI enters the era of heavy industry
Looking at the key factors of the whole big model industry, it can be said that the development of general AI represented by generative language models has basically announced that AI has embarked on an era of heavy industry. Previously, the whole AI was a typical manual workshop, with many small departments in each sector and each department making an AI product, which was all manual workshop style. Now with the big model, it is basically a priority to give way to the big model. Many companies are doing this, saving all the computing power for the big model.
Why do you think this is the logic behind it? The key to using big models to empower various industries is firstly, whether a big model can be refined well, and secondly, whether many applications and the surrounding ecology can be done well. There are several important factors after the big model enters the era of heavy industry:
First, the big model. How big is the so-called big model? We should have the impression that we used to have only a few hundred million parameters in the BERT era, and then to one billion parameters. But now the mainstream has reached six or seven billion parameters, and even to ten billion parameters, the model itself is the main factor of the increasingly large parameters.
Second, the large computing power. The big model has put forward unprecedented demand for big computing power, which is deeply felt by all. Now all enterprises are either buying computing power or on the way to buy computing power, and what we lack most is computing power. The first wave of competition of big model is the competition of computing power, and the second wave of competition may be the competition of data, but the focus of competition is still on computing power, who has computing power has the initiative and the right to speak, and this phenomenon has been reflected obviously.
Third, big data. Entering the era of data PK, there is no core high-quality data is very critical. The ultimate winner, at least the current winner is the arithmetic power provider, you see Nvidia market value broken trillion, the main logic here. The ultimate winner is likely to be the owner of the data.
Fourth, the fine process. This is the past small model era will not focus on factors, is the refining process, we call it "process". The process of large models is very important. Now, refining large models is very much like alchemy in the past, which requires all raw materials to be dropped into the alchemy furnace. I just visited Baosteel a few days ago, when I took a photo, they kept 2500 cubic meters of steelmaking furnace, the furnace is very large, no matter what raw materials, after the initial cleaning all in it, and then go to refining, refining a few days out of the furnace. Steelmaking out of the furnace is steel, and we are now practicing a large model out of the furnace is a large model, steelmaking to do all the things we have to do, the first of which focuses on the formula for making raw materials. Now the data recipe is the most critical, which data according to what kind of ratio, now many belong to the unique secret, secret, the rapid development of OpenAI big model many times lies in the recipe with good, but we do not know its recipe, many people are also trying.
The second focus is on data cleaning. Baosteel has a special branch factory before sending raw materials into the refining furnace, which specializes in steel-type raw material cleaning, parameter design, including fire, temperature, humidity, and process design, quality control. This is a very critical factor in the process of refining large models, and this process is what is really missing in the development of our current large model industry.
Our data base is as good as other countries. In terms of model, we all use open source model, there is no secret in this, what are we really lacking in? Most of the big models in China only started training in December last year, so it is hard for us to catch up with OpenAI's four-year training process in just three or four months, so we have to calm down and polish the process. This thing is very critical.
From the application driven model base, from the periphery to break the kernel
Has ChatGPT, or the subsequent version GPT-4, really become super powerful now? In fact, it has not. There is still a capability ceiling for large models. We have many arguments now, one is pessimism, thinking that people are strong in everything and we can't do anything; the other is blind optimism, thinking that it is nothing at all. These two extreme arguments, views are problematic, in fact, we have to look at it objectively, it is indeed very strong, but it also has a ceiling, is problematic, and its problems are precisely where our opportunities. So the most important thing is not to be overwhelmed by the development speed of others, to calmly analyze what can be done, what can not be done, it can not do things is precisely what we want to open up a new track. As long as we do better than it where they can not do, we have our value.
Now these big models don't do everything. We are in many complex scenarios, such as in the To B scenario, in the enterprise service market, big models actually still have many shortcomings. ChatGPT is a very important opportunity, but it is still difficult to make it work directly to solve domain problems.
And how to do the intelligence in the real economy, such as industry and agriculture? These intelligences have universal characteristics, most of the tasks are complex decisions, such as troubleshooting industrial equipment, disease diagnosis, investment decisions, these are serious application scenarios, and they require capabilities that are by no means just the open chat capabilities that ChatGPT brings us now.
We admit that ChatGPT open chat capability is very strong and we used to chat for three days and nights without feeling bored, but even if chat is interesting and fun, it cannot solve the problems of these scenarios. The solution to these scenario problems depends on many complex capabilities.
The first is the need to have the knowledge that industry experts should have, like a server failure, what exactly has failed and what is the root cause? This problem can not be solved under the conditions of lack of IT knowledge. The second is the need for a lot of complex logic, like the diagnosis of disease, there is some logical thinking. The third is the need for the ability to judge the macro situation, for example, investment decisions, in different situations in the environment of a stock of the prognosis is completely different. The fourth is the ability to disassemble the integrated task, a very complex task, can be disassembled into a single atomic task. The fifth is the ability of sophisticated planning, in the face of many actions, what I do first and what I do later. Sixth is the ability to make trade-offs between complex constraints, we often face constraints when making a decision, such as cost constraints, so we need to make a lot of trade-offs: which constraints must be met, which constraints must be discarded. The seventh is the ability to foresee the unknown. In the investment process, enterprises may have some new conditions that are difficult to anticipate, and ChatGPT may not have the ability to handle human beings when facing these unexpected conditions. The eighth is the inferring ability of uncertain scenarios. Most of our decisions are made when the information is insufficient and incomplete, otherwise we will lose the first opportunity. All of these capabilities are still capabilities that ChatGPT, or the Generic Grand Model, does not currently possess.
Whether the big models can solve these problems and have these capabilities in the future will directly affect their investment value. After understanding this logic, there are actually two key elements for big models to create business value in the end:
On the one hand, the base model should be powerful, and on the other hand, the domain application should not be neglected. The base model is like a person practicing martial arts practicing internal strength, even if the internal strength is well practiced, eventually we still need to practice the routine, I don't know if you like to read martial arts novels, Jin Yong martial arts novels of the Qi Sect and the Sword Sect is the same thing, the so-called Qi Sect practice internal strength, emphasizing internal strength as the king; the so-called Sword Sect believes that the routine is the king, the form is very important. In fact, these two factors are important.
It is not enough to just focus on the base big model, there must also be domain application, and domain knowledge is also needed to have application effect.
What is our current situation? We have followed the footsteps of the pioneers of the big model, so the homogenization of the big model is serious. In general, technology-based enterprises tend to "emphasize models, not applications", and application-based enterprises tend to "emphasize applications, not models", both of which are undesirable. Both of them can create business value only when they reach a certain level.
We actually have our own opportunities, although our base model is not as good as ChatGPT or GPT-4, and we have to catch up with this piece. Therefore, our development strategy of large model industry is very clear, and we can take a road to develop the whole industrial ecology of large model by driving the model base from the application and breaking the core from the periphery.
This is our very important idea, we can take a road of "encircling the city from the countryside", drive the base from the application, and break the kernel from the periphery. We will first do a good job in the applications of various industries, and through the applications of various industries, we will drive the development of data, computing power, models and processes, which will eventually lead to the improvement of the base model. As I mentioned earlier, it is not realistic for us to spend three or four months to reach the model level that OpenAI has spent four years developing, and we may have to endure a period of time in which the base model is not as good as others. I estimate that this state will last for 1 year, or 2-3 years or longer, and we may have to keep catching up. But we have good applications, and there will be many opportunities for us after the application.
Don't let the big model become a magnificent fireworks show
We have just proposed some improvement strategies from a macro-strategic perspective, and now we can consider at the level of specific responses, tactics:
First, we can promote data alliances, which is our advantage. Like the Shanghai Data Exchange, Guiyang and Northern Data Exchange are more in number, and our laws and regulations in this area of data trading are relatively sound and still very advanced, and we have a lot of regulations to protect. So we can completely for the development of the big model to promote the construction of the data trading system, we have the technology, have the advantage, we can rely on the data exchange to carry out this work.
Second, arithmetic synergy. We must accelerate the sound domestic arithmetic ecology, we are recently planning to call all arithmetic enterprises over to discuss whether we can promote the establishment of the alliance. The arithmetic power can only help you feedback after use, otherwise it is very troublesome, and now this problem is very serious, and we should pay attention to the arithmetic power is not only the GPU block, and the network card block are scattered, heterogeneous problems, which bring limitations to the development of large models.
Third, the model ecology. The large model technology itself should be established as soon as possible, a sound, open source ecology. In particular, open source ecology is very important, OpenAI is closed source, we can develop open source ecology. Open-source ecology can pool ideas and allow volunteers to bring improvement and optimization to the model itself.
Fourth, talent training. This is also a very critical point in the development of our big model industry. Here are some core data. Almost in January and February, many people in the industry thought that the number of people who could do big models in China was not more than 1,000, and a conservative estimate was that there were only two or three hundred people. My own experience is very deep, my own team, we are relatively lucky, two or three years ago to do a large model, this year there are doctoral students, master's students graduated, but all the students who do a large model, all the price was doubled by others to dig away. At this point in time, we may have no more than 20 students with large model refining experience at Fudan, while we have nearly several thousand students in the whole computer science school. This is because the requirements of large model refining are very high, first of all, like the A1 version of the server should be prepared several, the cost of one is now 1 million, ten is 10 million, not many schools have the ability to meet the equipment requirements. So now the shortage of talent is a very big problem. This is a problem that the government and schools have to think about.
In addition, after the big model comes out, the ability and quality of talents are different from the past, I found that we cooperate with many manufacturers, what we actually lack is the big model to do product design. People who know about big models often do not know about products, and people who know about products often know about big models are still in the early stage, so this kind of interdisciplinary and cross-disciplinary composite talents are particularly demanding, and the shortage of talents in this area is very powerful.
Fifth, the development of diagnosis and evaluation system. Now each family talks about itself and says it is good. In the future, the market needs an objective evaluation, which one is good and where it is good. In fact, the better state is that each family has its own specialties, this one is good at this, that one is good at that. The biggest fear is that so many people say they are good, which is certainly very serious homogeneity, so we need to establish a standard and system of evaluation in the future.
Sixth, to continue to optimize the landing cost of the large model. The cost of large model is very high, many parties are waiting and watching, the last three months the market is particularly interesting, we are waiting and watching, and even many parties are ready to pay but stopped, why? We are wondering whether the big model is the next generation of technology, now rashly put into a certain type of technical solutions, immediately replaced by the big model, this investment must have problems, so we are waiting and watching. There is a very important factor in the wait and see: the cost of the big model landing factor is too high, how we can reduce its cost, so that it brings us much more benefit than the input, very important.
Seventh, how to develop the big model industry to green and ecological. I joked two months ago that we don't need to think about it, and this year's summer is definitely hotter. It is said that the market value of nividia is over 1 trillion, and I wonder how many more graphics cards are going to enter the market. These graphics cards are going to consume electricity, consume electricity will be hot, so this summer will definitely be hotter, we should be prepared. I estimate that next year is the same, what is the key problem? In fact, it is all about energy consumption, we consume too much energy to do the calculation, the future of artificial intelligence industry green, ecological is a very important issue, soon, I believe it will not take long for everyone to realize the seriousness of this problem.
Eighth, continue to accelerate the research of large model technology. Big model technology is not as perfect as we imagine, there are still many problems, for example, a serious nonsense, big model of nonsense, big model of illusion, big model in the end reflects whose values, ideology, and big model of privacy leakage, big model of security, etc. There are too many problems waiting to be solved.
The change of general AI industry triggered by ChatGPT, I believe, has just begun. We need to think more deeply and practice more solidly to firmly grasp the new opportunities brought by big model and other general cognitive intelligence technologies to China's digital transformation and high-quality development ...... Big model is never a gimmick in the promotional literature, and must not become a gorgeous fireworks show, but a real one that can drive advanced productivity that can drive social development and progress.