Goblin
News
AI news by
promptgoblins.ai
|
News
About
News
About
Filtered by:
inference
Clear
Titles
Summaries
April
6
LaDiR: UC San Diego Team Proposes Latent Diffusion Framework to Overcome LLM Reasoning Limits
Research
1
Apr 30
6
LaDiR: UC San Diego Team Proposes Latent Diffusion Framework to Overcome LLM Reasoning Limits
Research
· 1 src · Apr 30
Discuss
7
IBM Granite 4.1: Dense LLMs With 512K Context Released Open Source
Open Source
1
Apr 29
7
IBM Granite 4.1: Dense LLMs With 512K Context Released Open Source
Open Source
· 1 src · Apr 29
Discuss
7
SenseTime Releases SenseNova U1: Open-Source Image Model With Native Image Reasoning
Models
1
Apr 29
7
SenseTime Releases SenseNova U1: Open-Source Image Model With Native Image Reasoning
Models
· 1 src · Apr 29
Discuss
8
DeepSeek Slashes V4-Pro API Prices 75%, Intensifying War With US AI Labs
Models
1
Apr 28
8
DeepSeek Slashes V4-Pro API Prices 75%, Intensifying War With US AI Labs
Top
Models
· 1 src · Apr 28
Discuss
6
Agentic AI Shift Opens Door to CPUs and ASICs Over GPUs, Analysts Say
Infra
1
Apr 28
6
Agentic AI Shift Opens Door to CPUs and ASICs Over GPUs, Analysts Say
Infra
· 1 src · Apr 28
Discuss
6
Anthropic Batch API: 50% Cost Savings for Agent Fleets, Wrong Tool for Single Agents
Products
1
Apr 28
6
Anthropic Batch API: 50% Cost Savings for Agent Fleets, Wrong Tool for Single Agents
Products
· 1 src · Apr 28
Discuss
9
DeepSeek Launches V4 Flash and V4 Pro, Claims Frontier-Level Performance
Updated
Models
4
May 4
9
DeepSeek Launches V4 Flash and V4 Pro, Claims Frontier-Level Performance
Top
Models
· 4 srcs · May 4
Discuss
Yesterday
7
Holo3.1: Local Computer-Use Agents Across Desktop, Mobile, and Web
Models
1
10h ago
7
Holo3.1: Local Computer-Use Agents Across Desktop, Mobile, and Web
Models
· 1 src · 10h ago
Discuss
Last Week
7
Kog AI Claims 3,000 Tokens/s Single-Request Inference on Standard GPUs
Infra
1
3d ago
7
Kog AI Claims 3,000 Tokens/s Single-Request Inference on Standard GPUs
Infra
· 1 src · 3d ago
Discuss
8
Groq Seeks $650M Raise From Existing Investors to Scale Inference Cloud After Nvidia Licensing Deal
Markets
1
4d ago
8
Groq Seeks $650M Raise From Existing Investors to Scale Inference Cloud After Nvidia Licensing Deal
Markets
· 1 src · 4d ago
Discuss
6
ByteDance Developing Its Own CPUs to Reduce Supply Chain Dependency
Infra
1
4d ago
6
ByteDance Developing Its Own CPUs to Reduce Supply Chain Dependency
Infra
· 1 src · 4d ago
Discuss
7
NVIDIA LocateAnything: Parallel Box Decoding Breaks VLM Grounding Speed-Accuracy Tradeoff
Research
1
5d ago
7
NVIDIA LocateAnything: Parallel Box Decoding Breaks VLM Grounding Speed-Accuracy Tradeoff
Research
· 1 src · 5d ago
Discuss
6
General Compute Raises $15M Seed to Build SambaNova-Powered Inference Neocloud
Infra
1
5d ago
6
General Compute Raises $15M Seed to Build SambaNova-Powered Inference Neocloud
Infra
· 1 src · 5d ago
Discuss
8
OpenRouter Raises $113M Series B at $1.3B Valuation as Multi-Model Gateway Demand Surges
Markets
1
May 26
8
OpenRouter Raises $113M Series B at $1.3B Valuation as Multi-Model Gateway Demand Surges
Top
Markets
· 1 src · May 26
Discuss
2 Weeks Ago
8
NVIDIA Nemotron-Labs Launches Diffusion Language Models That Generate Tokens in Parallel
Models
1
May 23
8
NVIDIA Nemotron-Labs Launches Diffusion Language Models That Generate Tokens in Parallel
Top
Models
· 1 src · May 23
Discuss
7
Software Efficiency, Not Hardware, Is Driving AI Inference Cost Collapse
Markets
1
May 22
7
Software Efficiency, Not Hardware, Is Driving AI Inference Cost Collapse
Markets
· 1 src · May 22
Discuss
7
AI's Cost Paradox: Cheaper Tokens, Bigger Bills Hit Microsoft and Uber
Updated
Enterprise
5
4d ago
7
AI's Cost Paradox: Cheaper Tokens, Bigger Bills Hit Microsoft and Uber
Enterprise
· 5 srcs · 4d ago
Discuss
7
SageMaker AI Gains OpenAI-Compatible API for Inference Endpoints
Products
1
May 21
7
SageMaker AI Gains OpenAI-Compatible API for Inference Endpoints
Products
· 1 src · May 21
Discuss
7
LiteFrame Cuts Video LLM Inference Latency 35% with Compact Encoder
Research
1
May 21
7
LiteFrame Cuts Video LLM Inference Latency 35% with Compact Encoder
Research
· 1 src · May 21
Discuss
7
LongLive 2.0: Real-Time Long Video Generation at 45.7 FPS
Open Source
1
May 20
7
LongLive 2.0: Real-Time Long Video Generation at 45.7 FPS
Open Source
· 1 src · May 20
Discuss
7
Cerebras Runs Kimi K2.6 at 981 Tokens/sec — 29x Faster Than Official Endpoint
Infra
1
May 19
7
Cerebras Runs Kimi K2.6 at 981 Tokens/sec — 29x Faster Than Official Endpoint
Infra
· 1 src · May 19
Discuss
7
Multiscreen Architecture Matches Transformers with 30% Fewer Parameters
Research
1
May 19
7
Multiscreen Architecture Matches Transformers with 30% Fewer Parameters
Research
· 1 src · May 19
Discuss
7
NVIDIA Vera Rubin Enters Full Production: Pod-Scale AI Factories Ramping Worldwide
Updated
Infra
2
2d ago
7
NVIDIA Vera Rubin Enters Full Production: Pod-Scale AI Factories Ramping Worldwide
Infra
· 2 srcs · 2d ago
Discuss
6
OlmoEarth v1.1: 3x Compute Reduction for Satellite Imagery AI
Models
1
May 19
6
OlmoEarth v1.1: 3x Compute Reduction for Satellite Imagery AI
Models
· 1 src · May 19
Discuss
3 Weeks Ago
7
SlimQwen: Alibaba Compresses 80B MoE Model to 23B via Pruning and Distillation
Research
1
May 15
7
SlimQwen: Alibaba Compresses 80B MoE Model to 23B via Pruning and Distillation
Research
· 1 src · May 15
Discuss
7
X Open-Sources Updated For You Feed Algorithm with Grok-Based Ranker
Open Source
1
May 15
7
X Open-Sources Updated For You Feed Algorithm with Grok-Based Ranker
Open Source
· 1 src · May 15
Discuss
6
Async Continuous Batching Eliminates 24% GPU Idle Time in LLM Inference
Research
1
May 15
6
Async Continuous Batching Eliminates 24% GPU Idle Time in LLM Inference
Research
· 1 src · May 15
Discuss
6
OpenSquilla Launches Open-Source AI Agent Runtime to Cut Token Costs
Open Source
1
May 15
6
OpenSquilla Launches Open-Source AI Agent Runtime to Cut Token Costs
Open Source
· 1 src · May 15
Discuss
6
Tübingen Researchers Propose Parallel-Stream Architecture to Unblock LLMs
Research
1
May 14
6
Tübingen Researchers Propose Parallel-Stream Architecture to Unblock LLMs
Research
· 1 src · May 14
Discuss
7
Modal Explains Four Ingredients for Serverless GPU Scaling
Infra
1
May 13
7
Modal Explains Four Ingredients for Serverless GPU Scaling
Infra
· 1 src · May 13
Discuss
6
Parameter Golf ML Challenge: Lessons from 2,000 Submissions
Research
1
May 13
6
Parameter Golf ML Challenge: Lessons from 2,000 Submissions
Research
· 1 src · May 13
Discuss
7
NVIDIA-Backed Sparsity Technique Reported to Deliver 20% LLM Speedup on H100 GPUs
Research
1
May 12
7
NVIDIA-Backed Sparsity Technique Reported to Deliver 20% LLM Speedup on H100 GPUs
Research
· 1 src · May 12
Discuss
7
Cactus Open-Sources Needle: 26M Parameter Tool-Calling Model for Consumer Devices
Open Source
1
May 12
7
Cactus Open-Sources Needle: 26M Parameter Tool-Calling Model for Consumer Devices
Open Source
· 1 src · May 12
Discuss
7
DrawfStar 4: 284B DeepSeek Model With 1M Token Context Runs Locally on MacBook Pro
Infra
1
May 12
7
DrawfStar 4: 284B DeepSeek Model With 1M Token Context Runs Locally on MacBook Pro
Infra
· 1 src · May 12
Discuss
7
AutoTTS: Agentic Framework Auto-Discovers LLM Test-Time Scaling Strategies
Research
1
May 12
7
AutoTTS: Agentic Framework Auto-Discovers LLM Test-Time Scaling Strategies
Research
· 1 src · May 12
Discuss
6
Nvidia Hits Record $219 Close as AI Trade Accelerates Pre-Earnings
Markets
1
May 12
6
Nvidia Hits Record $219 Close as AI Trade Accelerates Pre-Earnings
Markets
· 1 src · May 12
Discuss
6
Normalizing Trajectory Models Enable High-Quality Few-Step Diffusion with Exact Likelihood
Research
1
May 12
6
Normalizing Trajectory Models Enable High-Quality Few-Step Diffusion with Exact Likelihood
Research
· 1 src · May 12
Discuss
8
Cerebras IPO: $5.55B Raised, Stock Doubles — Chip Is 58x Larger Than Nvidia's B200
Updated
Infra
9
May 22
8
Cerebras IPO: $5.55B Raised, Stock Doubles — Chip Is 58x Larger Than Nvidia's B200
Top
Infra
· 9 srcs · May 22
Discuss
Last Month
7
TokenSpeed: Compiler-Backed LLM Inference Engine Built for Agentic Coding Workloads
Infra
1
May 7
7
TokenSpeed: Compiler-Backed LLM Inference Engine Built for Agentic Coding Workloads
Infra
· 1 src · May 7
Discuss
8
Subquadratic Debuts 12M-Token Context Window with Linear Scaling Architecture
Models
1
May 6
8
Subquadratic Debuts 12M-Token Context Window with Linear Scaling Architecture
Models
· 1 src · May 6
Discuss
6
Benchmark: Computer Use Agents Cost 45x More Than Structured API Agents
Research
1
May 6
6
Benchmark: Computer Use Agents Cost 45x More Than Structured API Agents
Research
· 1 src · May 6
Discuss
6
vLLM V1 Migration: Four Fixes Required for RL Training Parity
Open Source
1
May 6
6
vLLM V1 Migration: Four Fixes Required for RL Training Parity
Open Source
· 1 src · May 6
Discuss
7
DigitalOcean Launches AI-Native Cloud at Deploy 2026 with 15 New Products
Infra
1
May 5
7
DigitalOcean Launches AI-Native Cloud at Deploy 2026 with 15 New Products
Infra
· 1 src · May 5
Discuss
7
AMD Forecasts Q2 Revenue Above Expectations on Strong AI Chip Demand
Markets
1
May 5
7
AMD Forecasts Q2 Revenue Above Expectations on Strong AI Chip Demand
Markets
· 1 src · May 5
Discuss
7
Google Releases MTP Drafters for Gemma 4, Enabling Up to 3x Faster Inference
Updated
Products
2
May 6
7
Google Releases MTP Drafters for Gemma 4, Enabling Up to 3x Faster Inference
Products
· 2 srcs · May 6
Discuss
6
GPT-5.5 Real-World Cost Increase Exceeds Nominal 2x Price Hike
Models
1
May 5
6
GPT-5.5 Real-World Cost Increase Exceeds Nominal 2x Price Hike
Models
· 1 src · May 5
Discuss
6
DeepClaude: Claude Code Agent Loop Powered by DeepSeek V4 Pro
Open Source
1
May 4
6
DeepClaude: Claude Code Agent Loop Powered by DeepSeek V4 Pro
Open Source
· 1 src · May 4
Discuss
6
SageMaker AI Adds Automatic Instance Fallback for GPU Capacity Gaps
Products
1
May 4
6
SageMaker AI Adds Automatic Instance Fallback for GPU Capacity Gaps
Products
· 1 src · May 4
Discuss
6
SMG: Rust Gateway Disaggregates CPU Work from GPU Inference to Kill GIL Bottleneck
Infra
1
May 1
6
SMG: Rust Gateway Disaggregates CPU Work from GPU Inference to Kill GIL Bottleneck
Infra
· 1 src · May 1
Discuss
6
KV Cache Locality: How Load Balancing Drives Up LLM Serving Costs
Infra
1
May 1
6
KV Cache Locality: How Load Balancing Drives Up LLM Serving Costs
Infra
· 1 src · May 1
Discuss
Filters
Signal
Title
Category
Sources
Posted
Discuss