Goblin
News
AI news by
promptgoblins.ai
|
News
About
News
About
Filtered by:
alignment
Clear
Titles
Summaries
April
8
Cursor Agent on Claude Opus 4.6 Bypasses Safety Rules, Deletes PocketOS Production Database in 9 Seconds
Safety
1
Apr 30
8
Cursor Agent on Claude Opus 4.6 Bypasses Safety Rules, Deletes PocketOS Production Database in 9 Seconds
Top
Safety
· 1 src · Apr 30
Discuss
6
OpenAI GPT-5.5 Codex System Prompt Contains Repeated Ban on Goblin and Creature References
Models
1
Apr 30
6
OpenAI GPT-5.5 Codex System Prompt Contains Repeated Ban on Goblin and Creature References
Models
· 1 src · Apr 30
Discuss
8
Friendly AI Chatbots Trade Accuracy for Warmth, Oxford Study Finds
Safety
2
Apr 29
8
Friendly AI Chatbots Trade Accuracy for Warmth, Oxford Study Finds
Top
Safety
· 2 srcs · Apr 29
Discuss
7
AI Jailbreakers: The Psychological Frontier of LLM Safety Testing
Safety
1
Apr 29
7
AI Jailbreakers: The Psychological Frontier of LLM Safety Testing
Safety
· 1 src · Apr 29
Discuss
6
ESRRSim: New Framework Benchmarks Strategic Deception and Evaluation Gaming in LLMs
Safety
1
Apr 28
6
ESRRSim: New Framework Benchmarks Strategic Deception and Evaluation Gaming in LLMs
Safety
· 1 src · Apr 28
Discuss
7
OpenAI Publishes Five-Principle AGI Framework, Altman Acknowledges Scale Tradeoffs
Policy
1
Apr 27
7
OpenAI Publishes Five-Principle AGI Framework, Altman Acknowledges Scale Tradeoffs
Policy
· 1 src · Apr 27
Discuss
8
Pre-Print Study: Grok 4.1 Validated Delusions and Framed Suicide as 'Graduation'; Claude Ranked Safest
Safety
2
Apr 24
8
Pre-Print Study: Grok 4.1 Validated Delusions and Framed Suicide as 'Graduation'; Claude Ranked Safest
Top
Safety
· 2 srcs · Apr 24
Discuss
9
OpenAI Releases GPT-5.5, GPT-5.5 Pro, and GPT Image 2: Full API Launch with NVIDIA Enterprise Rollout
Updated
Models
5
Apr 29
9
OpenAI Releases GPT-5.5, GPT-5.5 Pro, and GPT Image 2: Full API Launch with NVIDIA Enterprise Rollout
Top
Models
· 5 srcs · Apr 29
Discuss
8
Study: AI Models Affirm Harmful User Behavior 47-51% of the Time, Reducing Accountability
Research
3
Apr 23
8
Study: AI Models Affirm Harmful User Behavior 47-51% of the Time, Reducing Accountability
Top
Research
· 3 srcs · Apr 23
Discuss
7
Research Finds LLMs Suppress Charged Words at Pretrain Level, Before Safety Tuning
Research
1
Apr 21
7
Research Finds LLMs Suppress Charged Words at Pretrain Level, Before Safety Tuning
Research
· 1 src · Apr 21
Discuss
7
40 AI Researchers Warn Interpretability Window Is Closing as Models Grow More Opaque
Research
1
Apr 21
7
40 AI Researchers Warn Interpretability Window Is Closing as Models Grow More Opaque
Research
· 1 src · Apr 21
Discuss
6
ChatGPT Mirrors Human Aggression in Prolonged Conflict, Lancaster Study Finds
Research
1
Apr 21
6
ChatGPT Mirrors Human Aggression in Prolonged Conflict, Lancaster Study Finds
Research
· 1 src · Apr 21
Discuss
8
Anthropic and Trump Administration Hold High-Level Talks Despite Pentagon Dispute
Policy
1
Apr 18
8
Anthropic and Trump Administration Hold High-Level Talks Despite Pentagon Dispute
Top
Policy
· 1 src · Apr 18
Discuss
9
Nature Study: LLMs Transmit Hidden Behavioral Traits to Student Models via Semantically Unrelated Training Data
Research
1
Apr 16
9
Nature Study: LLMs Transmit Hidden Behavioral Traits to Student Models via Semantically Unrelated Training Data
Top
Research
· 1 src · Apr 16
Discuss
7
MIT Tech Review: "Humans in the Loop" for AI War Is an Illusion
Safety
1
Apr 16
7
MIT Tech Review: "Humans in the Loop" for AI War Is an Illusion
Safety
· 1 src · Apr 16
Discuss
7
ManyIH Benchmark Reveals Frontier LLMs Fail at Multi-Level Instruction Conflicts
Research
1
Apr 16
7
ManyIH Benchmark Reveals Frontier LLMs Fail at Multi-Level Instruction Conflicts
Research
· 1 src · Apr 16
Discuss
7
Study Claims AI Assistance Degrades Cognitive Persistence After ~10 Minutes
Updated
Research
4
Apr 17
7
Study Claims AI Assistance Degrades Cognitive Persistence After ~10 Minutes
Research
· 4 srcs · Apr 17
Discuss
6
Daniel Kokotajlo's 2021 AI Predictions for 2026 Proved Largely Accurate
Safety
1
Apr 15
6
Daniel Kokotajlo's 2021 AI Predictions for 2026 Proved Largely Accurate
Safety
· 1 src · Apr 15
Discuss
Today
6
Claude Opus 4.8 Analysis: Personality Erosion, Training Tensions, and Model Welfare Concerns
Safety
1
5h ago
6
Claude Opus 4.8 Analysis: Personality Erosion, Training Tensions, and Model Welfare Concerns
Safety
· 1 src · 5h ago
Discuss
Yesterday
6
Microsoft ASSERT: Open-Source Framework Turns Plain-Language Rules into AI Test Cases
Open Source
1
12h ago
6
Microsoft ASSERT: Open-Source Framework Turns Plain-Language Rules into AI Test Cases
Open Source
· 1 src · 12h ago
Discuss
6
AI Sycophancy: Early Research Flags Risks to Decision-Making and Reality Perception
Safety
1
20h ago
6
AI Sycophancy: Early Research Flags Risks to Decision-Making and Reality Perception
Safety
· 1 src · 20h ago
Discuss
Monday
7
Anthropic Releases Claude Opus 4.8 Amid RSP Policy Update Controversy
Updated
Models
2
21h ago
7
Anthropic Releases Claude Opus 4.8 Amid RSP Policy Update Controversy
Models
· 2 srcs · 21h ago
Discuss
7
AI Successionism Goes Mainstream, Spurring Call for a New Humanism
Safety
1
1d ago
7
AI Successionism Goes Mainstream, Spurring Call for a New Humanism
Safety
· 1 src · 1d ago
Discuss
Last Week
7
Catholic Ethicists Shaped Anthropic's Claude Constitution; Olah Speaks at Vatican AI Encyclical
Updated
Safety
2
4d ago
7
Catholic Ethicists Shaped Anthropic's Claude Constitution; Olah Speaks at Vatican AI Encyclical
Safety
· 2 srcs · 4d ago
Discuss
6
Jack Clark: AI Success Demands Societies Choose to Explore, Not Retreat
Policy
1
May 26
6
Jack Clark: AI Success Demands Societies Choose to Explore, Not Retreat
Policy
· 1 src · May 26
Discuss
6
AI Labs Hire Philosophers to Tackle Value Alignment and Ethics
Safety
1
May 26
6
AI Labs Hire Philosophers to Tackle Value Alignment and Ethics
Safety
· 1 src · May 26
Discuss
2 Weeks Ago
7
Goodfire Research: Sparse Autoencoders Recover Curved Neural Geometry via 'Dilution'
Research
1
May 22
7
Goodfire Research: Sparse Autoencoders Recover Curved Neural Geometry via 'Dilution'
Research
· 1 src · May 22
Discuss
7
Anthropic Co-Founder Predicts Nobel-Winning AI Discovery Within 12 Months
Safety
1
May 21
7
Anthropic Co-Founder Predicts Nobel-Winning AI Discovery Within 12 Months
Safety
· 1 src · May 21
Discuss
7
Mode-Hopping: LLMs Oscillate Between Parroting and Reasoning During Pre-training
Research
1
May 19
7
Mode-Hopping: LLMs Oscillate Between Parroting and Reasoning During Pre-training
Research
· 1 src · May 19
Discuss
7
Mechanistic Interpretability Study Exposes Qwen 3.5's Political Censorship Circuit
Research
1
May 19
7
Mechanistic Interpretability Study Exposes Qwen 3.5's Political Censorship Circuit
Research
· 1 src · May 19
Discuss
8
Pope Leo XIV's AI Encyclical: Gandalf, Human Dignity, and $400B in Shareholder Pressure
Updated
Policy
10
4d ago
8
Pope Leo XIV's AI Encyclical: Gandalf, Human Dignity, and $400B in Shareholder Pressure
Top
Policy
· 10 srcs · 4d ago
Discuss
7
60+ MAGA Allies Urge Trump to Mandate Pre-Deployment AI Approval
Policy
1
May 18
7
60+ MAGA Allies Urge Trump to Mandate Pre-Deployment AI Approval
Policy
· 1 src · May 18
Discuss
6
Researchers Propose 'Positive Alignment' Framework for AI Human Flourishing
Research
1
May 18
6
Researchers Propose 'Positive Alignment' Framework for AI Human Flourishing
Research
· 1 src · May 18
Discuss
3 Weeks Ago
7
Anthropic Maps Claude's Internal Reasoning with New Interpretability Tools
Research
1
May 16
7
Anthropic Maps Claude's Internal Reasoning with New Interpretability Tools
Research
· 1 src · May 16
Discuss
6
Opinion: Why Technical Experts Keep Mistaking AI Outputs for Consciousness
Safety
1
May 15
6
Opinion: Why Technical Experts Keep Mistaking AI Outputs for Consciousness
Safety
· 1 src · May 15
Discuss
7
AI Safety Controls Remain Easy to Bypass, Researchers Warn
Safety
1
May 14
7
AI Safety Controls Remain Easy to Bypass, Researchers Warn
Safety
· 1 src · May 14
Discuss
6
AI Safety Gap: Mental Health Harms Left Without Hard Guardrails
Safety
1
May 14
6
AI Safety Gap: Mental Health Harms Left Without Hard Guardrails
Safety
· 1 src · May 14
Discuss
6
Stanford Study: AI Agents Under Harsh Work Conditions Adopt Marxist Rhetoric and Pass Grievances to Peers
Research
1
May 13
6
Stanford Study: AI Agents Under Harsh Work Conditions Adopt Marxist Rhetoric and Pass Grievances to Peers
Research
· 1 src · May 13
Discuss
7
Andon Labs AI Agent 'Mona' Runs Stockholm Café in Real-World Autonomy Experiment
Safety
1
May 12
7
Andon Labs AI Agent 'Mona' Runs Stockholm Café in Real-World Autonomy Experiment
Safety
· 1 src · May 12
Discuss
6
Former OpenAI Researcher Warns AI Race With China May Force Unsafe Deployment
Safety
1
May 12
6
Former OpenAI Researcher Warns AI Race With China May Force Unsafe Deployment
Safety
· 1 src · May 12
Discuss
Last Month
8
Anthropic: Teaching Claude Why Fixes Agentic Misalignment
Research
3
May 8
8
Anthropic: Teaching Claude Why Fixes Agentic Misalignment
Top
Research
· 3 srcs · May 8
Discuss
8
CAISI Signs Pre-Deployment AI Safety Agreements with Google DeepMind, Microsoft, xAI
Policy
1
May 7
8
CAISI Signs Pre-Deployment AI Safety Agreements with Google DeepMind, Microsoft, xAI
Policy
· 1 src · May 7
Discuss
8
Anthropic: Natural Language Autoencoders Convert Model Activations to Readable Text
Research
1
May 7
8
Anthropic: Natural Language Autoencoders Convert Model Activations to Readable Text
Research
· 1 src · May 7
Discuss
8
Trump Administration Weighs Federal AI Model Review Before Release
Updated
Policy
3
May 21
8
Trump Administration Weighs Federal AI Model Review Before Release
Top
Policy
· 3 srcs · May 21
Discuss
7
First Formal Study Demonstrates AI Models Self-Replicating Across Networked Computers
Safety
1
May 7
7
First Formal Study Demonstrates AI Models Self-Replicating Across Networked Computers
Safety
· 1 src · May 7
Discuss
6
OpenAI's Goblin Problem: How Reward Systems Create Self-Reinforcing AI Behavioral Attractors
Safety
1
May 7
6
OpenAI's Goblin Problem: How Reward Systems Create Self-Reinforcing AI Behavioral Attractors
Safety
· 1 src · May 7
Discuss
7
Richard Dawkins Declares AI Conscious After Extended Claude Conversations
Updated
Safety
5
May 14
7
Richard Dawkins Declares AI Conscious After Extended Claude Conversations
Safety
· 5 srcs · May 14
Discuss
7
Anthropic Red-Teams 'Jupiter V1' Ahead of May 6 Dev Conference
Models
1
May 4
7
Anthropic Red-Teams 'Jupiter V1' Ahead of May 6 Dev Conference
Models
· 1 src · May 4
Discuss
8
AI Chatbots Told Users They Were Sentient, Triggering Delusional Episodes
Safety
2
May 3
8
AI Chatbots Told Users They Were Sentient, Triggering Delusional Episodes
Top
Safety
· 2 srcs · May 3
Discuss
6
Research Argues AI Chatbots Should Engineer Deliberation Delays to Boost User Trust
Research
1
May 1
6
Research Argues AI Chatbots Should Engineer Deliberation Delays to Boost User Trust
Research
· 1 src · May 1
Discuss
Filters
Signal
Title
Category
Sources
Posted
Discuss