OpenAI has released a powerful new image and text understanding AI model, GPT-4, which the company calls “the latest milestone in its effort to extend deep learning.”
GPT-4 is available today to paying users of OpenAI via ChatGPT Plus (with a usage limit), and developers can sign up for a waiting list to access the API.
The price is $0.03 per 1000 “request” tokens (about 750 words) and $0.06 per 1000 “completion” tokens (again, about 750 words). Tokens represent plain text; for example, the word “fantastic” would be split into the tokens “fan”, “tas”, and “tic”. The fast tokens are the word parts that are entered into GPT-4, while the completion tokens are the content. generated by GPT-4.
It turns out that GPT-4 has been hiding in plain sight. Microsoft confirmed today that Bing Chat, its chatbot technology co-developed with OpenAI, runs on GPT-4.
Other early adopters include Stripe, which uses GPT-4 to scan business websites and deliver a summary to customer support staff. Duolingo has added GPT-4 to a new subscription level for language learning. Morgan Stanley is creating a system powered by GPT-4 that will retrieve information from company documents and deliver it to financial analysts. And Khan Academy is leveraging GPT-4 to create a kind of automated tutor.
GPT-4 can generate text and accept image and text input, an improvement over GPT-3.5, its predecessor, which only accepted text, and works at a “human level” in various professional and academic benchmarks. For example, GPT-4 passes a mock bar exam with a score in the top 10% of test takers; in contrast, the GPT-3.5 score was in the bottom 10%.
OpenAI spent six months “iteratively aligning” GPT-4 using lessons from an internal cross-test program as well as ChatGPT, resulting in “best results” in factuality, steerability and refusal to go off-rails, according to the report. company. Like previous GPT models, GPT-4 was trained using publicly available data, including from public web pages, as well as data licensed from OpenAI.
OpenAI worked with Microsoft to develop a “supercomputer” from scratch in the Azure cloud, which was used to train GPT-4.
“In casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle,” OpenAI wrote in a blog post announcing GPT-4. “The difference arises when the complexity of the task reaches a sufficient threshold: GPT-4 is more reliable, creative, and capable of handling much more nuanced instructions than GPT-3.5.”
Undoubtedly, one of the most interesting aspects of GPT-4 is its ability to understand both images and text. GPT-4 can caption, and even interpret, relatively complex images, for example, identifying a Lightning cable adapter from an image of a plugged-in iPhone.
Image compression capability is not yet available to all OpenAI clients; For starters, OpenAI is testing it with just one partner, Be My Eyes. Powered by GPT-4 technology, Be My Eyes’ new Virtual Volunteer feature can answer questions about images sent to you. The company explains how it works in a blog post:
“For example, if a user sends a photo of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what it contains, but also extrapolate and analyze what can be prepared with those ingredients. The tool can also offer a series of recipes for those ingredients and send a step-by-step guide on how to prepare them.
A more significant improvement in GPT-4, potentially, is the aforementioned steering tool. With GPT-4, OpenAI is introducing a new API capability, “system” messages, which allow developers to prescribe styles and tasks by describing specific instructions. Coming to ChatGPT in the future as well, system messages are essentially instructions that set the tone and set limits for upcoming AI interactions.
For example, a system message might say: “You are a tutor who always responds in Socratic style. You never give the answer to the student, but always try to ask the right question to help him learn to think for himself. You should always tailor your question to the student’s interest and knowledge, breaking the problem down into simpler parts until it is at the right level for them.”
However, even with the system messages and the other updates, OpenAI recognizes that GPT-4 is far from perfect. He still “hallucinates” facts and makes reasoning errors, sometimes with great confidence. In an example cited by OpenAI, GPT-4 described Elvis Presley as the “son of an actor,” an obvious misstep.
“GPT-4 generally lacks knowledge of events that occurred after the vast majority of its data was cut off (September 2021) and does not learn from its experience,” OpenAI wrote. “Sometimes you can make simple reasoning errors that don’t seem to match the competition in so many domains, or be overly gullible in accepting obvious false statements from a user. And sometimes it can fail at difficult problems in the same way that humans do, such as introducing security vulnerabilities into the code it produces.”
However, OpenAI notes that it has made improvements in particular areas; GPT-4 is less likely to reject requests about how to synthesize dangerous chemicals, for example. The company says that GPT-4 is 82% less likely overall to respond to requests for “disallowed” content compared to GPT-3.5 and responds to sensitive requests, such as medical advice and anything related to self-harm, according to OpenAI policies 29% more often.
Image Credits: open AI
There is clearly a lot to unpack with GPT-4. But OpenAI, for its part, is going full steam ahead, evidently confident in the improvements it has made.
“We hope that GPT-4 will become a valuable tool to improve people’s lives by powering many applications,” OpenAI wrote. “There is still a lot of work to be done, and we hope to improve this model through the collective efforts of the community building, exploring, and contributing to the model.”