Goblin
News
AI news by
promptgoblins.ai
|
News
About
News
About
Filtered by:
reinforcement-learning
Clear
Titles
Summaries
Yesterday
7
Wafer Startup Uses AI to Erode Nvidia's Software Moat on Rival Chips
Markets
1
3h ago
7
Wafer Startup Uses AI to Erode Nvidia's Software Moat on Rival Chips
Markets
· 1 src · 3h ago
Discuss
Last Week
6
Sol-RL Achieves 2.4x Faster Diffusion Model RL Training via FP4/BF16 Two-Stage Design
Research
1
5d ago
6
Sol-RL Achieves 2.4x Faster Diffusion Model RL Training via FP4/BF16 Two-Stage Design
Research
· 1 src · 5d ago
Discuss
7
Tesla FSD v14.3: Fleet Learning, MLIR Compiler Rewrite, 20% Faster Reactions
Products
1
Apr 8
7
Tesla FSD v14.3: Fleet Learning, MLIR Compiler Rewrite, 20% Faster Reactions
Products
· 1 src · Apr 8
Discuss
7
SandMLE Framework Makes On-Policy RL Training Tractable for ML Engineering Agents
Research
1
Apr 8
7
SandMLE Framework Makes On-Policy RL Training Tractable for ML Engineering Agents
Research
· 1 src · Apr 8
Discuss
7
Generalist's GEN-1 Robot Claims 99% Success Rate on Dexterous Tasks
Models
1
Apr 7
7
Generalist's GEN-1 Robot Claims 99% Success Rate on Dexterous Tasks
Models
· 1 src · Apr 7
Discuss
7
Arcee Releases Trinity Large Thinking: 400B Open-Weight Reasoning Model on $20M Budget
Models
2
Apr 7
7
Arcee Releases Trinity Large Thinking: 400B Open-Weight Reasoning Model on $20M Budget
Models
· 2 srcs · Apr 7
Discuss
6
Taxonomy of RL Environments for LLM Agents: A Framework for What Models Actually Practice On
Research
1
Apr 6
6
Taxonomy of RL Environments for LLM Agents: A Framework for What Models Actually Practice On
Research
· 1 src · Apr 6
Discuss
6
Three-Layer Framework for Continual Learning in AI Agents: Model, Harness, and Context
Research
1
Apr 6
6
Three-Layer Framework for Continual Learning in AI Agents: Model, Harness, and Context
Research
· 1 src · Apr 6
Discuss
2 Weeks Ago
6
The 'Straight Lines on Graphs' Thesis: AI Progress Is Regular and Predictable
Research
1
Apr 4
6
The 'Straight Lines on Graphs' Thesis: AI Progress Is Regular and Predictable
Research
· 1 src · Apr 4
Discuss
7
DeepMind Research: Predicting When RL Training Breaks CoT Monitorability
Research
1
Apr 3
7
DeepMind Research: Predicting When RL Training Breaks CoT Monitorability
Research
· 1 src · Apr 3
Discuss
7
Aurora: Open-Source RL Framework Makes LLM Speculative Decoding Self-Improving
Research
1
Apr 3
7
Aurora: Open-Source RL Framework Makes LLM Speculative Decoding Self-Improving
Research
· 1 src · Apr 3
Discuss
6
NomadicML Raises $8.4M to Structurize Autonomous Vehicle Fleet Data
Markets
1
Apr 3
6
NomadicML Raises $8.4M to Structurize Autonomous Vehicle Fleet Data
Markets
· 1 src · Apr 3
Discuss
6
DexDrummer: Robot System Achieves Real-World Drumming via Dexterous Bimanual Manipulation
Research
1
Apr 3
6
DexDrummer: Robot System Achieves Real-World Drumming via Dexterous Bimanual Manipulation
Research
· 1 src · Apr 3
Discuss
6
Yupp Shuts Down After Raising $33M, Citing AI Market Shift
Markets
1
Mar 31
6
Yupp Shuts Down After Raising $33M, Citing AI Market Shift
Markets
· 1 src · Mar 31
Discuss
6
Agent Labs: Vertical Model Training vs. Agent Engineering as Competing Strategies
Products
1
Mar 31
6
Agent Labs: Vertical Model Training vs. Agent Engineering as Competing Strategies
Products
· 1 src · Mar 31
Discuss
6
AlphaGo at 10: How a Go-Playing Algorithm Became the Blueprint for Modern AI Reasoning
Research
1
Mar 31
6
AlphaGo at 10: How a Go-Playing Algorithm Became the Blueprint for Modern AI Reasoning
Research
· 1 src · Mar 31
Discuss
8
DGM-Hyperagents: Self-Improving AI That Rewrites Its Own Improvement Process
Research
1
Mar 30
8
DGM-Hyperagents: Self-Improving AI That Rewrites Its Own Improvement Process
Research
· 1 src · Mar 30
Discuss
3 Weeks Ago
7
Cursor: Real-Time RL Trains Composer Every Five Hours
Products
1
Mar 27
7
Cursor: Real-Time RL Trains Composer Every Five Hours
Products
· 1 src · Mar 27
Discuss
6
AWS Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible API Support
Updated
Products
2
Apr 8
6
AWS Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible API Support
Products
· 2 srcs · Apr 8
Discuss
6.82
Semantic Calibration in LLMs: Why Base Models Know What They Know
Research
1
Mar 25
6.82
Semantic Calibration in LLMs: Why Base Models Know What They Know
Research
· 1 src · Mar 25
Discuss
8
NVIDIA Releases Nemotron-Cascade 2: Open 30B MoE Achieves Gold-Medal Reasoning with 20x Efficiency
Models
1
Mar 23
8
NVIDIA Releases Nemotron-Cascade 2: Open 30B MoE Achieves Gold-Medal Reasoning with 20x Efficiency
Models
· 1 src · Mar 23
Discuss
Last Month
8
Moonshot AI Releases Kimi K2 Open-Source Model and Kimi-Researcher Agent
Models
1
Mar 20
8
Moonshot AI Releases Kimi K2 Open-Source Model and Kimi-Researcher Agent
Top
Models
· 1 src · Mar 20
Discuss
7
Google DeepMind Algorithm Achieves 10x RLHF Data Efficiency
Research
1
Mar 20
7
Google DeepMind Algorithm Achieves 10x RLHF Data Efficiency
Research
· 1 src · Mar 20
Discuss
7
Cursor Launches Composer 2: Frontier-Level Coding Model with Sharp Benchmark Gains
Models
2
Mar 20
7
Cursor Launches Composer 2: Frontier-Level Coding Model with Sharp Benchmark Gains
Models
· 2 srcs · Mar 20
Discuss
7
MiniMax Launches M2.7 Model With Self-Improving Agent Architecture
Models
1
Mar 19
7
MiniMax Launches M2.7 Model With Self-Improving Agent Architecture
Models
· 1 src · Mar 19
Discuss
6
AWS Nova Forge SDK: LLM Customization via SFT+RFT Pipeline
Products
2
Mar 19
6
AWS Nova Forge SDK: LLM Customization via SFT+RFT Pipeline
Products
· 2 srcs · Mar 19
Discuss
6
Amazon Brings Alexa+ to the UK in First International Expansion
Products
1
Mar 19
6
Amazon Brings Alexa+ to the UK in First International Expansion
Products
· 1 src · Mar 19
Discuss
7
Cursor Trains Composer Model to Self-Summarize Long Agent Contexts
Products
1
Mar 18
7
Cursor Trains Composer Model to Self-Summarize Long Agent Contexts
Products
· 1 src · Mar 18
Discuss
6
Why AI Still Cannot Write Well, Despite Ingesting All Literature
Research
1
Mar 17
6
Why AI Still Cannot Write Well, Despite Ingesting All Literature
Research
· 1 src · Mar 17
Discuss
7
LATENT: Humanoid Robot Learns Competitive Tennis Skills from Imperfect Motion Data
Research
1
Mar 15
7
LATENT: Humanoid Robot Learns Competitive Tennis Skills from Imperfect Motion Data
Research
· 1 src · Mar 15
Discuss
7
OpenClaw-RL: Open-Source Async RL Framework Trains Agents from Live Conversations
Open Source
1
Mar 13
7
OpenClaw-RL: Open-Source Async RL Framework Trains Agents from Live Conversations
Open Source
· 1 src · Mar 13
Discuss
7
Reasoning Boosts Factual Recall in LLMs — Even for Simple Single-Hop Questions
Research
1
Mar 13
7
Reasoning Boosts Factual Recall in LLMs — Even for Simple Single-Hop Questions
Research
· 1 src · Mar 13
Discuss
Filters
Signal
Title
Category
Sources
Posted
Discuss