Goblin
News
AI news by
promptgoblins.ai
|
News
About
News
About
Filtered by:
reinforcement-learning
Clear
Titles
Summaries
April
6
AWS Reinforcement Fine-Tuning with LLM-as-a-Judge Using Amazon Nova Models
Products
1
Apr 30
6
AWS Reinforcement Fine-Tuning with LLM-as-a-Judge Using Amazon Nova Models
Products
· 1 src · Apr 30
Discuss
6
DataPRM: Process Reward Model for Reliable Agentic Data Analysis
Research
1
Apr 30
6
DataPRM: Process Reward Model for Reliable Agentic Data Analysis
Research
· 1 src · Apr 30
Discuss
7
Eka Startup Demonstrates Generalized Robot Dexterity via VLA Models and Self-Supervised Free-Play Training
Research
1
Apr 29
7
Eka Startup Demonstrates Generalized Robot Dexterity via VLA Models and Self-Supervised Free-Play Training
Research
· 1 src · Apr 29
Discuss
6
EPFL's Kinematic Intelligence Transfers Robot Skills Across Hardware Without Retraining
Research
1
Apr 26
6
EPFL's Kinematic Intelligence Transfers Robot Skills Across Hardware Without Retraining
Research
· 1 src · Apr 26
Discuss
7
Ex-OpenAI Researcher Jerry Tworek Launches Core Automation AI Lab
Research
1
Apr 23
7
Ex-OpenAI Researcher Jerry Tworek Launches Core Automation AI Lab
Research
· 1 src · Apr 23
Discuss
6
Perplexity Publishes Two-Stage Training Pipeline for Web Search Agents
Research
1
Apr 23
6
Perplexity Publishes Two-Stage Training Pipeline for Web Search Agents
Research
· 1 src · Apr 23
Discuss
8
Sony AI Robot Ace Beats Elite Table Tennis Players in Nature-Published Milestone
Updated
Research
3
May 4
8
Sony AI Robot Ace Beats Elite Table Tennis Players in Nature-Published Milestone
Top
Research
· 3 srcs · May 4
Discuss
7
RLVR Weak Supervision: When LLMs Can and Cannot Generalize
Research
1
Apr 22
7
RLVR Weak Supervision: When LLMs Can and Cannot Generalize
Research
· 1 src · Apr 22
Discuss
7
Failed Startups Sell Slack Logs and Emails to Train AI Agents
Security
1
Apr 20
7
Failed Startups Sell Slack Logs and Emails to Train AI Agents
Security
· 1 src · Apr 20
Discuss
7
Physical Intelligence's π0.7 Robot Model Demonstrates Compositional Generalization on Unseen Tasks
Research
1
Apr 16
7
Physical Intelligence's π0.7 Robot Model Demonstrates Compositional Generalization on Unseen Tasks
Research
· 1 src · Apr 16
Discuss
7
Wafer Startup Uses AI to Erode Nvidia's Software Moat on Rival Chips
Markets
1
Apr 15
7
Wafer Startup Uses AI to Erode Nvidia's Software Moat on Rival Chips
Markets
· 1 src · Apr 15
Discuss
6
Sol-RL Achieves 2.4x Faster Diffusion Model RL Training via FP4/BF16 Two-Stage Design
Research
1
Apr 10
6
Sol-RL Achieves 2.4x Faster Diffusion Model RL Training via FP4/BF16 Two-Stage Design
Research
· 1 src · Apr 10
Discuss
7
Tesla FSD v14.3: Fleet Learning, MLIR Compiler Rewrite, 20% Faster Reactions
Products
1
Apr 8
7
Tesla FSD v14.3: Fleet Learning, MLIR Compiler Rewrite, 20% Faster Reactions
Products
· 1 src · Apr 8
Discuss
7
SandMLE Framework Makes On-Policy RL Training Tractable for ML Engineering Agents
Research
1
Apr 8
7
SandMLE Framework Makes On-Policy RL Training Tractable for ML Engineering Agents
Research
· 1 src · Apr 8
Discuss
7
Arcee Releases Trinity Large Thinking: 400B Open-Weight Reasoning Model on $20M Budget
Models
2
Apr 7
7
Arcee Releases Trinity Large Thinking: 400B Open-Weight Reasoning Model on $20M Budget
Models
· 2 srcs · Apr 7
Discuss
7
Generalist's GEN-1 Robot Claims 99% Success Rate on Dexterous Tasks
Models
1
Apr 7
7
Generalist's GEN-1 Robot Claims 99% Success Rate on Dexterous Tasks
Models
· 1 src · Apr 7
Discuss
6
Taxonomy of RL Environments for LLM Agents: A Framework for What Models Actually Practice On
Research
1
Apr 6
6
Taxonomy of RL Environments for LLM Agents: A Framework for What Models Actually Practice On
Research
· 1 src · Apr 6
Discuss
6
Three-Layer Framework for Continual Learning in AI Agents: Model, Harness, and Context
Research
1
Apr 6
6
Three-Layer Framework for Continual Learning in AI Agents: Model, Harness, and Context
Research
· 1 src · Apr 6
Discuss
6
The 'Straight Lines on Graphs' Thesis: AI Progress Is Regular and Predictable
Research
1
Apr 4
6
The 'Straight Lines on Graphs' Thesis: AI Progress Is Regular and Predictable
Research
· 1 src · Apr 4
Discuss
7
DeepMind Research: Predicting When RL Training Breaks CoT Monitorability
Research
1
Apr 3
7
DeepMind Research: Predicting When RL Training Breaks CoT Monitorability
Research
· 1 src · Apr 3
Discuss
7
Aurora: Open-Source RL Framework Makes LLM Speculative Decoding Self-Improving
Research
1
Apr 3
7
Aurora: Open-Source RL Framework Makes LLM Speculative Decoding Self-Improving
Research
· 1 src · Apr 3
Discuss
6
DexDrummer: Robot System Achieves Real-World Drumming via Dexterous Bimanual Manipulation
Research
1
Apr 3
6
DexDrummer: Robot System Achieves Real-World Drumming via Dexterous Bimanual Manipulation
Research
· 1 src · Apr 3
Discuss
6
NomadicML Raises $8.4M to Structurize Autonomous Vehicle Fleet Data
Markets
1
Apr 3
6
NomadicML Raises $8.4M to Structurize Autonomous Vehicle Fleet Data
Markets
· 1 src · Apr 3
Discuss
March
6
AlphaGo at 10: How a Go-Playing Algorithm Became the Blueprint for Modern AI Reasoning
Research
1
Mar 31
6
AlphaGo at 10: How a Go-Playing Algorithm Became the Blueprint for Modern AI Reasoning
Research
· 1 src · Mar 31
Discuss
6
Yupp Shuts Down After Raising $33M, Citing AI Market Shift
Markets
1
Mar 31
6
Yupp Shuts Down After Raising $33M, Citing AI Market Shift
Markets
· 1 src · Mar 31
Discuss
6
Agent Labs: Vertical Model Training vs. Agent Engineering as Competing Strategies
Products
1
Mar 31
6
Agent Labs: Vertical Model Training vs. Agent Engineering as Competing Strategies
Products
· 1 src · Mar 31
Discuss
8
DGM-Hyperagents: Self-Improving AI That Rewrites Its Own Improvement Process
Research
1
Mar 30
8
DGM-Hyperagents: Self-Improving AI That Rewrites Its Own Improvement Process
Research
· 1 src · Mar 30
Discuss
7
Cursor: Real-Time RL Trains Composer Every Five Hours
Products
1
Mar 27
7
Cursor: Real-Time RL Trains Composer Every Five Hours
Products
· 1 src · Mar 27
Discuss
6
AWS Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible API Support
Updated
Products
2
Apr 8
6
AWS Bedrock Adds Reinforcement Fine-Tuning with OpenAI-Compatible API Support
Products
· 2 srcs · Apr 8
Discuss
6.82
Semantic Calibration in LLMs: Why Base Models Know What They Know
Research
1
Mar 25
6.82
Semantic Calibration in LLMs: Why Base Models Know What They Know
Research
· 1 src · Mar 25
Discuss
8
NVIDIA Releases Nemotron-Cascade 2: Open 30B MoE Achieves Gold-Medal Reasoning with 20x Efficiency
Models
1
Mar 23
8
NVIDIA Releases Nemotron-Cascade 2: Open 30B MoE Achieves Gold-Medal Reasoning with 20x Efficiency
Models
· 1 src · Mar 23
Discuss
8
Moonshot AI Releases Kimi K2 Open-Source Model and Kimi-Researcher Agent
Models
1
Mar 20
8
Moonshot AI Releases Kimi K2 Open-Source Model and Kimi-Researcher Agent
Top
Models
· 1 src · Mar 20
Discuss
7
Cursor Launches Composer 2: Frontier-Level Coding Model with Sharp Benchmark Gains
Models
2
Mar 20
7
Cursor Launches Composer 2: Frontier-Level Coding Model with Sharp Benchmark Gains
Models
· 2 srcs · Mar 20
Discuss
7
Google DeepMind Algorithm Achieves 10x RLHF Data Efficiency
Research
1
Mar 20
7
Google DeepMind Algorithm Achieves 10x RLHF Data Efficiency
Research
· 1 src · Mar 20
Discuss
Monday
7
Research: RL-Trained LLMs Can Exploit Real-World Regulatory Loopholes Through "Societal Hacking"
Safety
1
2d ago
7
Research: RL-Trained LLMs Can Exploit Real-World Regulatory Loopholes Through "Societal Hacking"
Safety
· 1 src · 2d ago
Discuss
7
Multi-Agent RL Drones Beat Champion Human Pilot, Cut Collision Rates 50%
Research
1
2d ago
7
Multi-Agent RL Drones Beat Champion Human Pilot, Cut Collision Rates 50%
Research
· 1 src · 2d ago
Discuss
Last Week
6
World Labs Proposes Functional Taxonomy Clarifying 'World Model' in AI
Research
1
6d ago
6
World Labs Proposes Functional Taxonomy Clarifying 'World Model' in AI
Research
· 1 src · 6d ago
Discuss
6
Sleep Paradigm for LLMs: Continual Learning via Memory Consolidation
Research
1
6d ago
6
Sleep Paradigm for LLMs: Continual Learning via Memory Consolidation
Research
· 1 src · 6d ago
Discuss
9
Microsoft AI Launches 7-Model MAI Family and Declares Itself a Superintelligence Lab
Models
4
Jun 3
9
Microsoft AI Launches 7-Model MAI Family and Declares Itself a Superintelligence Lab
Top
Models
· 4 srcs · Jun 3
Discuss
2 Weeks Ago
6
TRL Adds Delta Weight Sync to Cut Async RL Transfer Costs by ~98%
Open Source
1
May 28
6
TRL Adds Delta Weight Sync to Cut Async RL Transfer Costs by ~98%
Open Source
· 1 src · May 28
Discuss
3 Weeks Ago
7
Cursor Releases Composer 2.5 with Novel RL Training and SpaceXAI Compute Partnership
Products
1
May 19
7
Cursor Releases Composer 2.5 with Novel RL Training and SpaceXAI Compute Partnership
Products
· 1 src · May 19
Discuss
6
NVIDIA SANA-WM: 2.6B-Parameter Open-Source World Model with 720p Video and 6-DoF Camera Control
Models
1
May 18
6
NVIDIA SANA-WM: 2.6B-Parameter Open-Source World Model with 720p Video and 6-DoF Camera Control
Models
· 1 src · May 18
Discuss
Last Month
8
Ineffable Intelligence Raises $1.1B Seed Round and Partners with NVIDIA to Build Reinforcement Learning Infrastructure
Updated
Infra
3
May 15
8
Ineffable Intelligence Raises $1.1B Seed Round and Partners with NVIDIA to Build Reinforcement Learning Infrastructure
Top
Infra
· 3 srcs · May 15
Discuss
7
RL Fine-Tuning Enables Small 4B Models to Match Large LLMs as Recursive Agents
Research
1
May 13
7
RL Fine-Tuning Enables Small 4B Models to Match Large LLMs as Recursive Agents
Research
· 1 src · May 13
Discuss
7
SkillOS: RL Framework for Self-Evolving AI Agents
Research
1
May 11
7
SkillOS: RL Framework for Self-Evolving AI Agents
Research
· 1 src · May 11
Discuss
6
OpenAI's Goblin Problem: How Reward Systems Create Self-Reinforcing AI Behavioral Attractors
Safety
1
May 7
6
OpenAI's Goblin Problem: How Reward Systems Create Self-Reinforcing AI Behavioral Attractors
Safety
· 1 src · May 7
Discuss
6
vLLM V1 Migration: Four Fixes Required for RL Training Parity
Open Source
1
May 6
6
vLLM V1 Migration: Four Fixes Required for RL Training Parity
Open Source
· 1 src · May 6
Discuss
7
Synthetic Computers at Scale: Simulating Long-Horizon Productivity for Agent Training
Research
1
May 4
7
Synthetic Computers at Scale: Simulating Long-Horizon Productivity for Agent Training
Research
· 1 src · May 4
Discuss
6
Edit-R1: Verifier-Based Reinforcement Learning Framework Advances Image Editing
Research
1
May 4
6
Edit-R1: Verifier-Based Reinforcement Learning Framework Advances Image Editing
Research
· 1 src · May 4
Discuss
7
Speculative Decoding Cuts RL Post-Training Rollout Time by Up to 2.5x
Research
1
May 1
7
Speculative Decoding Cuts RL Post-Training Rollout Time by Up to 2.5x
Research
· 1 src · May 1
Discuss
Filters
Signal
Title
Category
Sources
Posted
Discuss