GPT-4, steadily making additions forward

March 21st, 2023

On Tuesday, March 14, EST, OpenAI announced the launch of GPT-4, a large multimodal model that is "more creative and collaborative than ever before" by taking in image and text input and then outputting text, and "because it has broader knowledge and problem-solving capabilities, it can solve puzzles more accurately." OpenAI said that because GPT-4 has the ability to parse both text and images, it can interpret more complex input.

In addition OpenAI said it will open up the API as soon as possible this time, keeping working with several companies to allow them to integrate GPT-4 into their own products. Microsoft also said within the first hour of the announcement, "If you've used the new version of Bing at any time in the last six weeks, you've gotten an early look at the power of OpenAI's latest model." Even from across the screen and across the Pacific Ocean, I can feel Microsoft's face is full of "complacency".

The Past Life of GPT Series

GPT is called GenerativePre-trainedTransformers, which is a natural language generation model derived from the "converter architecture". The term "pre-training" in the GPT name refers to the initial training process on a large text corpus, where the model learns to predict the next word in a text. This provides a solid foundation for the model to perform well on a limited number of tasks with specific data downstream.

As mentioned earlier, the GPT family is derived from the "converter architecture", which is essentially a deep learning model that uses an "attention mechanism" to assign different weights according to the importance of the components of the incoming data, and is mainly used in natural language processing (NLP) and computer vision. processing (NLP) and computer vision (CV). The architecture was first proposed by Google in 2017 and, like the Recurrent Neural Network, which has been used for more than three decades, is designed to process "sequential input data" such as natural language (in this case, language specifically refers to written language). Unlike recurrent neural networks, which can only process one word at a time, the attention mechanism used in the "converter architecture" provides context for any position in the input sequence, allowing for more parallel computation and less training time, which is a significant This is a significant improvement.

In addition to the now-hot ChatGPT, there are also BioGPT and ProtGPT2, the former is Microsoft's self-developed GPT model dedicated to biomedical field, and the latter is a GPT model dedicated to protein research. The main reason for their obscurity is that they are not as widely used as ChatGPT, so naturally they do not attract many fans.

The GPT concept was first proposed on June 11, 2018, when OpenAI published a paper titled "Improving Language Comprehension through Generative Pre-training". At the time, the best performing natural language generation models relied mainly on supervised learning models, which had many large objective limitations and many difficulties in practical applications (e.g., translation and interpretation) due to the lack of textual material in which a corpus could be built up for many rare languages. In addition, the time and capital costs required to train very large models are also very high. In view of this, GPT proposed a "semi-supervised" (later commonly referred to as "self-supervised") approach: a pre-trained model is trained on unlabeled data, and then a discriminable fine-tuned version of the model is trained on a small amount of labeled data.

Before GPT-4, GPT-1, GPT-2, GPT-3, GPT-3.5, the most obvious changes are the number of internal parameters and the increasing volume. The first generation has about 120 million (4.5GB), the second generation about 1.5 billion (40GB), and the third generation about 175 billion (570GB). However, it has been dismissed by OpenAI CEO Altman as "complete nonsense", but he didn't disclose the exact number either.

The real face of GPT-4

From the information available so far, GPT-4 has been developed in the following aspects: powerful image recognition; text input limit raised to 25,000 words; significant improvement in response accuracy; ability to generate lyrics and creative text; and implementation of style changes. We hope to bring it to everyone soon." U.S. Rep. Don Beyer also confirmed to the New York Times that Alterman demonstrated GPT-4 during a visit to Congress in January, specifically showing the improved "security controls" compared to other AI models.

OpenAI has ambitions, and although the paper showing GPT-4 was written more like a technical report, that won't stop the progress. In addition, OpenAI has organized a number of experiments that show that GPT-4 performs at a similar level to humans on various professional tests and academic benchmarks. For example, it passed the mock bar exam and scored in the top 10% or so of test takers; in contrast, the previous GPT-3.5 score was not at all complimentary.

In addition to word processing, GPT-4 can also handle image input, and the new capability goes hand-in-hand with plain text, allowing the user to specify any visual or verbal task. Specifically, it generates the corresponding textual output (natural language or code, etc.) given a scattered graphical and textual mix of inputs by the user. GPT-4 demonstrates similar functionality to text-only input on a range of domain-specific domains, such as documents with text and photos, diagrams, or screenshots.

Unlike the previous GPT-3.5's fixed lengthy, calm tone style personality, GPT-4's ability to balance text and graphics really shines through, even for foreign stems and diagrams, I just don't know how it will be in the face of domestic ones. Now developers as well as ChatGPT users can specify the language style of their AI and the tasks it will handle by describing these directions in system messages. And this time OpenAI is the first to open up API access, so users can customize to achieve different user experiences within a certain range. Obviously, the official knows exactly what users want to do with ChatGPT.

Subscribe to voicedeep

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

Ndv_DXIWTNy6CmY…bNug-u6SXohDMQU

Author Address

0x8726d038c8BF67d…5ECBa481D649997

Content Digest

WelWrAgQuuSeNGZ…snRRxtMeBfAPw0c