Goblin
News
AI news by
promptgoblins.ai
|
News
About
News
About
Filtered by:
Benchmarks
Clear
Titles
Summaries
Monday
7
Inception Labs' Mercury 2 Diffusion LLM Outperforms Google's DiffusionGemma on Speed and Reasoning Benchmarks
Models
1
1d ago
7
Inception Labs' Mercury 2 Diffusion LLM Outperforms Google's DiffusionGemma on Speed and Reasoning Benchmarks
Models
· 1 src · 1d ago
Discuss
6
Sakana AI's AB-MCTS Algorithm Enables Multi-Model Collective Intelligence, Beats Individual Frontiers on ARC-AGI-2
Research
1
1d ago
6
Sakana AI's AB-MCTS Algorithm Enables Multi-Model Collective Intelligence, Beats Individual Frontiers on ARC-AGI-2
Research
· 1 src · 1d ago
Discuss
Last Week
7
NVIDIA Blackwell Platform Sweeps All 7 MLPerf Training 6.0 Benchmarks at Record 8,192-GPU Scale
Infra
1
6d ago
7
NVIDIA Blackwell Platform Sweeps All 7 MLPerf Training 6.0 Benchmarks at Record 8,192-GPU Scale
Infra
· 1 src · 6d ago
Discuss
7
OpenAI Launches LifeSciBench: Expert-Written AI Benchmark for Life Science Research
Research
1
6d ago
7
OpenAI Launches LifeSciBench: Expert-Written AI Benchmark for Life Science Research
Research
· 1 src · 6d ago
Discuss
6
LangSmith Launches Shareable Benchmark System for Community-Driven LLM Architecture Evaluation
Products
1
Jun 16
6
LangSmith Launches Shareable Benchmark System for Community-Driven LLM Architecture Evaluation
Products
· 1 src · Jun 16
Discuss
6
AARRI-Bench: Best AI Agents Score Only 68.3% on Research Intern Tasks, Revealing Key Judgment Gaps
Research
1
Jun 15
6
AARRI-Bench: Best AI Agents Score Only 68.3% on Research Intern Tasks, Revealing Key Judgment Gaps
Research
· 1 src · Jun 15
Discuss
2 Weeks Ago
8
Recursive's Automated AI Research System Achieves State-of-the-Art on Three ML Benchmarks
Updated
Research
2
1d ago
8
Recursive's Automated AI Research System Achieves State-of-the-Art on Three ML Benchmarks
Top
Research
· 2 srcs · 1d ago
Discuss
7
Xiaomi Open-Sources MiMo Code Agentic Coding Assistant, Claims to Outperform Claude Code on SWE Benchmarks
Open Source
1
Jun 12
7
Xiaomi Open-Sources MiMo Code Agentic Coding Assistant, Claims to Outperform Claude Code on SWE Benchmarks
Top
Open Source
· 1 src · Jun 12
Discuss
6
New Benchmark Reveals Best ASR Models for Bilingual Code-Switched Voice Agents
Research
1
Jun 9
6
New Benchmark Reveals Best ASR Models for Bilingual Code-Switched Voice Agents
Research
· 1 src · Jun 9
Discuss
Filters
Signal
Title
Category
Sources
Posted
Discuss