OpenAI says ChatGPT-4 is capable of passing exams better than previous version

OpenAI has released a new version of its popular chatbot, ChatGPT-4, with improved exam passing capabilities.

ChatGPT-4, according to the company, passed a simulated bar exam with a score in the top 10% of test takers, as opposed to GPT-3.5’s score, which was in the bottom 10%.

OpenAI described the AI tool as the latest milestone in its efforts to scale up deep learning. GPT-4 is a large multimodal model that accepts image and text inputs and emits text outputs. While AI is less capable than humans in many real-world scenarios, it outperforms humans on a variety of professional and academic benchmarks.

OpenAi stated that it spent 6 months iteratively aligning GPT-4 with lessons from its adversarial testing programme as well as ChatGPT, yielding “best-ever results” on factuality, steerability, and refusal to go outside of guardrails.

GPT-4 vs GPT-3.5, what changed

Like previous GPT models, GPT-4 was trained using publicly available data, including from public webpages, as well as data that OpenAI licensed. OpenAI worked with Microsoft to develop a “supercomputer” from the ground up in the Azure cloud, which was used to train GPT-4. Highlighting the difference between version 4 and version 3.5 of GPT:

OpenAI said:

In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold — GPT-4 is more reliable, creative and able to handle much more nuanced instructions than GPT-3.5.

The GPT-4 distinguishes itself by being able to understand both images and text. GPT-4 is capable of captioning and even interpreting relatively complex images. GPT-4 can generate text as well as accept image and text inputs, which is an improvement over GPT-3.5, which only accepted text, and performs at a “human level” on a variety of professional and academic benchmarks.

For example, if a user sends a picture of the inside of their refrigerator, the Virtual Volunteer will not only be able to correctly identify what’s in it but also extrapolate and analyze what can be prepared with those ingredients. The tool can also then offer a number of recipes for those ingredients and send a step-by-step guide on how to make them, OpenAI said in a blog post explaining the image understanding capability of the tool.

Despite its capabilities, reminded users that the OpenAI GPT-4, like other models, lacks knowledge of events that have occurred after the vast majority of its data cut off (September 2021), and does not learn from its experience.