OpenAI has announced the release of its latest model, GPT-4, which is a large multimodal model capable of accepting image and text inputs and producing text outputs. The company claims that while the model may not be as capable as humans in many real-world scenarios, it exhibits human-level performance on various professional and academic benchmarks.

For example, GPT-4 reportedly passes a simulated bar exam with a score around the top 10% of test takers. This is a significant improvement over GPT-3.5, which had a score around the bottom 10%.

To develop GPT-4, OpenAI spent six months iteratively aligning the model using lessons from its adversarial testing program as well as its own ChatGPT system, company claimed in a statement and added,“Over the past two years, we rebuilt our entire deep learning stack and, together with Azure, co-designed a supercomputer from the ground up for our workload. A year ago, we trained GPT-3.5 as a first “test run" of the system. We found and fixed some bugs and improved our theoretical foundations."

“As a result, our GPT-4 training run was (for us at least!) unprecedentedly stable, becoming our first large model whose training performance we were able to accurately predict ahead of time. As we continue to focus on reliable scaling, we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance—something we view as critical for safety."

OpenAI is releasing GPT-4's text input capability via its ChatGPT system and API, but the image input capability will be made available to a single partner first before being rolled out more widely. The company is also open-sourcing its OpenAI Evals framework, which automates the evaluation of AI model performance, to allow anyone to report shortcomings in OpenAI's models and help guide further improvements.

How GPT-4 is better than GPT-3.5?