Gemini 3
10 articles found in this topic.
GPT-5.2 Faces Widespread Criticism and Underperforms Against Gemini 3 Pro in Benchmarks
OpenAI's GPT-5.2 faces significant criticism and underperforms against Google's Gemini 3 Pro in various benchmarks, including Epoch AI's ECI score. Third-party evaluations show it falling short, leading OpenAI to issue a "red alert" and re-prioritize. Google, meanwhile, re-emerges as an AI frontrunner with Gemini 3 Pro's superior performance.
AI Models Struggle with Six-Fingered Hands, Exposing Architectural Limitations
AI models consistently fail to accurately count fingers, especially when presented with more than five digits, a phenomenon dubbed the "finger problem." This issue highlights architectural limitations and biases in AI's visual reasoning, stemming from pre-trained data that prioritizes five-fingered hands.
Google's Gemini 2.5 Flash Native Audio Model Enhances Real-Time Speech Translation
Google's new Gemini 2.5 Flash native audio model significantly enhances real-time speech translation by directly processing sound, preserving intonation, and enabling more natural AI interactions. This innovation aims to humanize AI communication, supporting features like Live Speech Translation and Style Transfer across over 70 languages.
Google Reduces Free Gemini API Access, Prompting Developer Concerns
Google has drastically cut the free Gemini API daily request limit from 250 to 20, impacting developers and small projects. This unannounced change, including removing the Pro series from the free tier, has sparked significant developer backlash. The move suggests a strategic shift towards profitability after attracting users with extensive free access.
Former DeepMind Researchers Achieve SOTA in AI Reasoning with Poetiq Meta-System
Former DeepMind researchers at Poetiq have developed a meta-system that optimizes large language models, achieving state-of-the-art performance on the ARC-AGI-2 leaderboard. Their system delivers 54% accuracy at half the cost of previous methods, leveraging existing models to autonomously generate strategies for specific tasks. This innovation establishes a new Pareto frontier for AI reasoning.
OpenAI Releases GPT-5.2, Targeting Enterprise Productivity and Accuracy
OpenAI has launched its GPT-5.2 model for ChatGPT paid users and developers, following a 'Code Red' alert. Available in Instant, Thinking, and Pro versions, it targets enterprise productivity with significant improvements in programming, complex task handling, and reduced hallucination rates. The model aims to generate economic value and enhance workplace applications.
Google Introduces Gemini Deep Research Agent, New Benchmarks, and API Amid AI Competition
Google has unveiled its Gemini Deep Research Agent, designed for complex, long-term information synthesis. This release also includes the DeepSearchQA benchmark and the Interactions API, enhancing AI research and development capabilities. The agent demonstrates strong performance in new benchmarks, competing with top AI models.
OpenAI's GPT-5.2 Surfaces on Cursor Amid Intensifying AI Competition
OpenAI's GPT-5.2, codenamed 'Project Garlic,' has surfaced in the Cursor IDE, signaling OpenAI's focus on programming and advanced reasoning. This model aims to compete directly with Google's Gemini 3, offering enhanced capabilities in mathematical and academic reasoning, efficiency, and reliability. OpenAI is also developing 'Shallotpeat,' an even larger model.
OpenAI Tests "Emperor" Model and New "Penguin" Family, Memory Search for ChatGPT Revealed
OpenAI is reportedly testing new "Penguin family" models, including the flagship "Emperor," and a "memory search" feature for ChatGPT. These developments signal an accelerated product roadmap amidst increasing competition and user feedback. The new features aim to enhance model performance and user experience.
Google Introduces Gemini 3 Deep Think, Achieving High Scores in Advanced AI Benchmarks
Google introduces Gemini 3 Deep Think, a new deep reasoning AI model with parallel thinking capabilities. It achieved high scores in advanced benchmarks like ARC-AGI-2, Humanity's Last Exam, and GPQA Diamond, significantly outperforming competitors like GPT-5.1.