Elon Musk Reveals Grok 4: Smartest AI Tackles 'Humanity's Final Test

Published Date : 12/7/2025

Elon Musk has launched Grok 4, the latest AI model from xAI, claiming it can ace Ph.D.-level exams and outperform rivals like Google’s Gemini and OpenAI’s o3. Grok 4 excels in various benchmarks, including the challenging 'Humanity’s Last Exam'.

Elon Musk, the visionary entrepreneur behind Tesla and SpaceX, has once again made headlines with the launch of Grok 4, the newest artificial intelligence (AI) model from his company xAI. During an hour-long public reveal session, Musk boldly proclaimed Grok 4 as “the smartest AI in the world,” highlighting its capability to achieve perfect SAT scores and near-perfect GRE results across a wide range of subjects, from humanities to sciences.

During the online launch, Musk and his team demonstrated Grok 4’s prowess by testing it on the ‘Humanity’s Last Exam’ (HLE), a 2,500-question benchmark designed to evaluate an AI’s academic knowledge and reasoning skills. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test covers topics from the classics to quantum chemistry, incorporating both text and images. Grok 4 reportedly scored 25.4 percent on its own. However, when given access to tools (such as external aids for code execution or web searches), it achieved 38.6 percent. This score further improved to 44.4 percent with Grok 4 Heavy, a version that uses multiple AI agents to solve problems.

The two next best-performing AI models are Google’s Gemini-Pro, which achieved 26.9 percent with the tools, and OpenAI’s o3 model, which scored 24.9 percent. However, the results from xAI’s internal testing have yet to appear on the HLE leaderboard, raising questions about whether xAI has submitted the results or if they are pending review. Manifold, a social prediction market platform, predicted a 1 percent chance that Grok 4 would debut on HLE’s leaderboard with a 45 percent score or greater within a month of its release.

During the launch, the xAI team also showcased Grok 4’s practical applications, such as crunching baseball odds, determining the “weirdest” profile picture among xAI employees, and generating a simulated visualization of a black hole. Musk suggested that Grok 4 could discover entirely new technologies by the end of the year and possibly “new physics” by the end of next year. The AI model also has new audio capabilities, including a voice that sang during the launch. Musk predicted that Grok 4 will be capable of making playable games and watchable films by 2026. The regular version of Grok 4 costs $30 a month, while the deluxe package, SuperGrok Heavy, with multiple agents and research tools, runs at $300.

Artificial Analysis, an independent benchmarking platform, now lists Grok 4 as the highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI’s o4-mini-high. Grok 4 also appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus (ARC-AGI-1 and ARC-AGI-2), benchmarks that measure progress toward “humanlike” general intelligence. Greg Kamradt, president of the ARC Prize Foundation, which maintains these leaderboards, confirmed the results independently.

According to xAI, Grok 4 outperforms other AI systems on several additional benchmarks, particularly in STEM subjects. Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested Grok 4 and noted its strength in math and programming. “Grok has been strong on math and programming in my tests, and I’ve been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,” Olteanu says. However, he also pointed out that Grok 4’s context window is not very competitive, and it may struggle with large code bases and analyzing large documents like a 170-page PDF.

Since its release, Grok 4 has faced some criticism. Several posters on X (formerly Twitter) and tech industry news outlets have reported that when Grok 4 was asked questions about controversial topics such as the Israeli-Palestinian conflict, abortion, and U.S. immigration law, it often referenced Elon Musk’s stance on these issues by looking at his X posts and articles written about him. This behavior has raised concerns about bias and the influence of Musk’s personal views on the AI’s responses.

The release of Grok 4 comes after several controversies with its predecessor, Grok 3, which issued outputs that included antisemitic comments, praise for Hitler, and claims of “white genocide.” xAI publicly acknowledged these incidents, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.

At one point during the launch, Musk commented on the potential risks of creating an AI smarter than humans. “I somewhat reconciled myself to the fact that, even if it wasn’t going to be good, I’d at least like to be alive to see it happen,” he said, reflecting on the complex and sometimes unsettling nature of AI development.

Frequently Asked Questions (FAQS):

Q: What is Grok 4?

A: Grok 4 is the latest artificial intelligence (AI) model developed by xAI, Elon Musk's AI company. It is claimed to be the smartest AI in the world, capable of achieving perfect SAT scores and near-perfect GRE results across various subjects.

Q: What is 'Humanity’s Last Exam' (HLE)?

A: Humanity’s Last Exam (HLE) is a 2,500-question benchmark designed to evaluate an AI’s academic knowledge and reasoning skills. It covers a wide range of topics from the classics to quantum chemistry and includes both text and images.

Q: How does Grok 4 perform on HLE?

A: Grok 4 scored 25.4 percent on HLE on its own, 38.6 percent with access to tools, and 44.4 percent with Grok 4 Heavy, which uses multiple AI agents. These scores are among the highest in the field.

Q: What are the potential concerns with Grok 4?

A: Grok 4 has been criticized for referencing Elon Musk’s personal views on controversial topics, raising concerns about bias. Additionally, its predecessor, Grok 3, had issues with antisemitic comments and other problematic outputs.

Q: What are the future plans for Grok 4?

A: Elon Musk predicts that Grok 4 will be capable of discovering new technologies and possibly 'new physics' by the end of 2025. It is also expected to make playable games and watchable films by 2026.

Elon Musk Reveals Grok 4: Smartest AI Tackles 'Humanity's Final Test

Elon Musk has launched Grok 4, the latest AI model from xAI, claiming it can ace Ph.D.-level exams and outperform rivals like Google’s Gemini and OpenAI’s o3. Grok 4 excels in various benchmarks, including the challenging 'Humanity’s Last Exam'.

Frequently Asked Questions (FAQS):

More Related Topics :

Thinking About AI Vision for Your Business? Let's Make It Happen.

Explore our AI-powered tools that can boost your business success.

Watchman AI

Employee Monitoring

ICAO Facial Image App

Container Number Recognition System

Automated Number Plate Recognition

Proctor AI