GPT-4 passed a simulated bar exam with a score around the top 10% of test takers. GPT-3.5 (which was used to build ChatGPT) scored at the bottom 10%. It’s also more reliable, creative and able to handle much more nuanced instructions. It surpasses ChatGPT in advanced reasoning capabilities. In 24 out of 26 languages tested, GPT-4 did better in English than GPT-3.5 and other large language models (Chinchilla, PaLM), according to OpenAI. GPT-4 is also 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5, claims OpenAI.