Musk's Grok 3 Impresses, But Doesn't Surpass OpenAI's o3

With much fanfare, xAI has finally launched Grok 3, the latest in AI technology. This new model was heavily promoted by none other than Elon Musk, xAI’s CEO. Musk boldly claims it is the “smartest AI on the planet,” surpassing other top-notch models from industry giants like OpenAI, Anthropic, DeepSeek, and Google in various benchmarks spanning mathematics, science, and coding.

The significant performance boost is likely due to Grok 3 being “complete with 10X more compute,” as pointed out by Musk. He made this statement during the launch event broadcast on X (formerly Twitter). Musk elaborated: “Grok 3 represents an order of magnitude increase in capability over Grok 2. It’s designed to be a maximally truth-seeking AI, unafraid to challenge politically correct norms when the truth demands it.”

Musk also noted that Grok 3 outshines OpenAI’s GPT-4o, particularly in tests like AIME, which measures mathematical prowess, and GPQA, focused on science capabilities. The improvements, supposedly ongoing, could show within just 24 hours.

OpenAI co-founder and ex-Tesla AI lead, Andrej Karpathy, provided his take on Grok 3’s performance: “After spending about two hours with Grok 3 + Thinking today, it feels it might indeed rival OpenAI’s strongest models in terms of what it can do, and it slightly outperforms models like DeepSeek-R1 and Gemini 2.0 Flash Thinking. It’s truly remarkable how the team achieved this from ground zero within just a year. The rapid progress is unprecedented. We should bear in mind that the AI responses can vary due to their stochastic nature, and it’s early days yet. Continuous evaluations over the next days and weeks will shed more light. For now, it’s promising! Kudos to the xAI crew for their speed and momentum. I’m eager to include Grok 3 in my ‘LLM council’ for future feedback.”

Karpathy went on Twitter, announcing with excitement that as one of the first to test out Grok 3, he observed its robust thinking capabilities, even performing impressively from the get-go in scenarios like Settler’s of Catan matches.

The new iteration of Grok made headlines in Chatbot Arena, a crowdsourced platform for testing AI models, outperforming its peers, all made possible by xAI’s Memphis data center packed with a whopping 200,000 GPUs.

Grok 3 offers two distinct modes: Think, for tackling general questions, and Big Brain, employed for scenarios demanding more computation for intensive reasoning. New features like Grok 3 Reasoning and a scaled-down Grok 3 mini Reasoning are present, drawing parallels with OpenAI’s o3-mini and DeepSeek’s R1 AI in terms of processing thought-provoking questions. The AI also boasts a state-of-the-art DeepSearch feature, enabling more nuanced research, brainstorming, and analysis when fielding queries, rivaling OpenAI’s Deep Search and Perplexity DeepResearch.

For those subscribed to X’s Premium+ tier, Grok 3 is readily available. xAI is rolling out a SuperGrok subscription plan soon, aiming to offer perks like advanced reasoning, exclusive DeepSearch access, and unlimited image generation capabilities.

Elon Musk intends to make Grok 2 open-source: “Once Grok 3 becomes stable and refined, which should be within a few months, we plan to open-source Grok 2.”

Despite Musk’s high praise, not all experts are convinced. Ethan Mollick from the University of Pennsylvania’s Wharton School commented that Grok 3, while catching up swiftly, doesn’t yet dominate the AI landscape: “They’ve made significant strides, but with the seasoned enterprise relations of rivals like OpenAI, it’s unclear if they can carve a niche with their current API offerings.”

Gary Marcus of Geometric Intelligence also weighed in, expressing skepticism (as reported by Business Insider): “Elon Musk assured us Grok 3 would be the pinnacle of AI evolution, but it has yet to shock the market. The launch felt repetitious, akin to prior displays. While Grok 3 shows vast potential, it doesn’t quite overtake the heights reached by OpenAI’s models.” Marcus jestingly concluded, “Sam Altman can rest easy—for now.”

Musk’s Grok 3 Impresses, But Doesn’t Surpass OpenAI’s o3

Fallout New Vegas is Being Remade in The Sims 2: Expect to See Mr. House’s Monitor Take a Dip in the Pool Soon

Diablo 4’s Decision to Delay Its Expansion Release Presents a Risk

Related Posts

Leave a Reply Cancel reply

Categories

Recent News

Welcome Back!

Retrieve your password