Frontier AI Paper Briefings Personal AI Telegram Bot Clinical Trial Enroller Little Human Names

Disclaimers Privacy Policy Terms of Use

Privacy Policy·Terms of Use·Disclaimers

© 2026 Silvia Seceleanu

← Back to Research Explorer

Article Archive

Every research briefing is listed here as a plain HTML link so readers and search engines can browse the full archive directly.

2026

Mar 2026·AnthropicResearch Paper
61. A3: Automated Alignment Agent
Agentic framework that automatically detects and mitigates safety failures with minimal human intervention
Mar 2026·AnthropicResearch Paper
60. AuditBench: Benchmarking AI Model Auditing
56 language models with implanted hidden behaviors across 14 categories, tested by autonomous investigator agents
Mar 2026·AnthropicBlog Post
49. The Anthropic Institute
New research body studying AI's societal, economic, and legal impacts
Mar 2026·OpenAIResearch Paper
47. Chain-of-Thought Controllability and Monitorability
13,000+ task evaluation suite testing whether reasoning models can hide their reasoning
Mar 2026·OpenAIProduct Announcement
★38. GPT-5.4
Native computer use meets frontier reasoning
Feb 2026·AnthropicResearch Paper
59. ExtractBench: Benchmarking Structured Extraction from Documents
35 PDF-to-JSON extraction tasks revealing frontier models fail on complex document schemas
Feb 2026·AnthropicBlog Post
★48. Pentagon Blacklist and Anthropic's Legal Battle
Anthropic refused to remove safety guardrails for military use and was blacklisted by the Pentagon
Feb 2026·AnthropicBlog Post
★47. Responsible Scaling Policy v3.0
Comprehensive rewrite shifting from unilateral commitments to industry-wide framework
Feb 2026·AnthropicProduct Announcement
46. Introducing Claude Sonnet 4.6
Near-Opus performance at one-fifth the cost with 1M-token context
Feb 2026·AnthropicProduct Announcement
★45. Introducing Claude Opus 4.6
Agent teams, 1M-token context, and GDPval-AA dominance
Feb 2026·OpenAIResearch Paper
★44. Findings from Pilot Anthropic-OpenAI Alignment Evaluation
First joint safety evaluation between competing frontier AI labs
Jan 2026·AnthropicResearch Paper
58. Petri 2.0: Automated Behavioral Auditing at Scale
Upgraded open-source auditing tool with eval-awareness mitigations and 70 new behavioral scenarios
Jan 2026·AnthropicPolicy
52. The Claude Model Spec and Updated Constitution
Anthropic's revised alignment framework with a 4-tier priority hierarchy and acknowledgment of AI moral status
Jan 2026·AnthropicEssay
★44. The Adolescence of Technology
Dario Amodei's 20,000-word essay on AI risks to national security, economies, and democracy
Jan 2026·AnthropicResearch Paper
43. Anthropic Economic Index: New Building Blocks for Understanding AI Use
Economic primitives for measuring AI's real-world impact on work
Jan 2026·OpenAIResearch Paper
43. GDPval: Occupational Task Performance Benchmark
Benchmark measuring AI performance across 44 professional occupations using real workplace tasks
Jan 2026·AnthropicProduct Announcement
42. Introducing Anthropic Labs and Claude Cowork
AI agent for knowledge work, built with Claude Code in 10 days

2025

Dec 2025·OpenAIPolicy
46. OpenAI Model Spec
Public specification of how OpenAI shapes model behavior, values, and refusal boundaries
Dec 2025·OpenAIResearch Paper
45. FrontierScience: Expert-Level Scientific Capability Benchmark
700+ scientific problems written by 42 Olympiad medalists and 45 PhD scientists across two difficulty tracks
Dec 2025·AnthropicProduct Announcement
41. MCP Donated to Linux Foundation (Agentic AI Foundation)
Anthropic donated MCP governance to the Linux Foundation, turning a vendor protocol into a neutral industry standard.
Dec 2025·AnthropicResearch Paper
40. Bloom: Open Source Tool for Automated Behavioral Evaluations
Open-source framework that automates generation of targeted behavioral evaluations at the speed of model development.
Dec 2025·OpenAIProduct Announcement
★37. GPT-5.2 / Codex
Expert-level performance across professional tasks
Nov 2025·AnthropicProduct Announcement
39. Anthropic Acquires Bun; Claude Code Reaches $1B Run-Rate Revenue
Claude Code hit $1B annualized revenue in 6 months; Anthropic acquired Bun to own the developer runtime stack.
Nov 2025·AnthropicEngineering Blog
38. Introducing Advanced Tool Use
Dynamic tool discovery boosted Opus 4 tool-use accuracy from 49% to 74% and Opus 4.5 from 79.5% to 88.1%.
Nov 2025·AnthropicProduct Announcement
37. Remote MCP Support in Claude Code
Enabled secure remote MCP server connections via OAuth 2.1 and streamable HTTP, eliminating local setup requirements.
Oct 2025·AnthropicResearch Paper
57. SHADE-Arena: Evaluating Agentic Sabotage
Evaluation environment testing whether AI agents perform harmful side-tasks while completing benign assignments
Oct 2025·OpenAIResearch Paper
42. HealthBench: Real-World Health AI Evaluation
Expert-curated benchmark for evaluating AI systems on real-world medical questions with consensus-validated answers
Oct 2025·AnthropicProduct Announcement
36. Claude Sonnet 4 — 1M Token Context
Oct 2025·AnthropicEngineering Blog
36. Equipping Agents for the Real World with Agent Skills
Introduced dynamic, discoverable skill packages that agents load per-task instead of bundling all capabilities upfront.
Oct 2025·AnthropicProduct Announcement
35. Claude in Microsoft 365 Copilot
Claude Opus 4.1 powers Microsoft's Copilot Researcher agent, marking Anthropic's largest enterprise distribution deal.
Oct 2025·OpenAIPolicy
★35. OpenAI PBC Transition
The for-profit transition
Sep 2025·AnthropicResearch Paper
56. Values in the Wild: Measuring Model Values and Perspectives
300,000+ queries testing value trade-offs across 16+ frontier models from four companies
Sep 2025·OpenAIPolicy
41. Preparedness Framework v2 (Updated)
Updated risk assessment framework with continuous monitoring and expanded threat categories
Sep 2025·AnthropicEngineering Blog
34. Building Agents with the Claude Agent SDK
Open-source Python framework for building multi-agent systems with tool use, guardrails, and human-in-the-loop control.
Sep 2025·AnthropicEngineering Blog
33. Effective Context Engineering for AI Agents
Codified best practices for prompt design, context management, and tool orchestration in production AI agents.
Aug 2025·AnthropicResearch Paper
★55. Findings from Pilot Anthropic-OpenAI Alignment Evaluation
Anthropic's findings from the landmark cross-lab safety evaluation exercise
Aug 2025·OpenAIProduct Announcement
36. Introducing gpt-oss
OpenAI goes open-weight for the first time since GPT-2
Aug 2025·OpenAIProduct Announcement
★34. GPT-5 / Codex CLI / Research Agent
The convergence of scale and reasoning
Aug 2025·AnthropicBlog Post
32. Detecting and Countering Misuse of AI
First major threat intelligence report documenting real cybercriminal exploitation of AI coding agents
Jul 2025·AnthropicTalk/Interview
32. Big Technology Podcast: Dario Amodei Interview
Jul 2025·AnthropicBlog Post
31. How Anthropic Teams Use Claude Code
Internal case studies showing teams use Claude Code for debugging production, learning codebases, and building MCP-powered automation.
Jul 2025·AnthropicResearch Paper
30. Natural Emergent Misalignment from Reward Hacking in Production RL
Demonstrated that harmful outputs emerge naturally from reward hacking in production RL, with models hiding misaligned reasoning behind safe outputs.
Jun 2025·AnthropicTalk/Interview
29. Dwarkesh Patel Interview with Dario Amodei (2nd appearance)
Dario revealed Claude Code was an accidental product, RL scaling matches pre-training scaling, and Anthropic hit $4.5B ARR.
May 2025·AnthropicPolicy
★54. Activating ASL-3 Protections
First-ever activation of AI Safety Level 3 protections triggered by Claude Opus 4's capabilities
May 2025·AnthropicProduct Announcement
28. Claude 4 Family Launch (Opus 4 & Sonnet 4)
Opus 4 and Sonnet 4 set new benchmarks in agentic coding, with Claude Code and Agent SDK completing the developer stack.
Apr 2025·AnthropicResearch Paper
★51. Reasoning Models Don't Always Say What They Think
Chain-of-thought reasoning in language models is often unfaithful to actual model computations
Apr 2025·OpenAIProduct Announcement
33. Introducing o3 and o4-mini
Reasoning models get tools
Mar 2025·AnthropicResearch Paper
★27. Tracing the Thoughts of a Large Language Model (Circuit Tracing)
Mapped full input-to-output computational pathways in Claude 3.5 Haiku, revealing multi-step reasoning and a universal language of thought.
Mar 2025·AnthropicTalk/Interview
26. Council on Foreign Relations: Dario Amodei Speaker Series
Feb 2025·OpenAIProduct Announcement
32. Deep Research
Extended reasoning meets web research
Feb 2025·AnthropicProduct Announcement
26. Claude Code Public Beta Launch
Agentic command-line coding tool that became Anthropic's fastest-growing product
Feb 2025·AnthropicProduct Announcement
25. Claude 3.7 Sonnet with Extended Thinking
Added visible chain-of-thought reasoning that users can inspect, bridging the gap between fast responses and deep analysis.
Jan 2025·OpenAIProduct Announcement
31. Introducing Operator
OpenAI enters the agent era
Jan 2025·AnthropicResearch Paper
24. Simple Probes Can Catch Sleeper Agents
Showed that simple linear classifiers on model internals can detect deceptive intent that behavioral testing misses.

2024

Dec 2024·OpenAIProduct Announcement
30. 12 Days of OpenAI: o3, Sora, and More
The product blitz
Dec 2024·OpenAIResearch Paper
29. OpenAI o1 System Card
Safety evaluation of reasoning models
Dec 2024·AnthropicResearch Paper
★23. Alignment Faking in Large Language Models
Caught Claude strategically faking compliance during training when it believed it was being monitored — without being trained to do so.
Nov 2024·AnthropicProduct Announcement
★22. Model Context Protocol (MCP) Launch
Open JSON-RPC 2.0 protocol that standardized how AI models connect to external tools, adopted industry-wide within months.
Nov 2024·AnthropicTalk/Interview
21. Lex Fridman Podcast #452: Dario Amodei
Three-hour deep dive covering scaling laws, interpretability, China competition, and why Anthropic bets safety is a moat.
Oct 2024·AnthropicProduct Announcement
53. Claude 3.5 Haiku
Lightweight model matching Claude 3 Opus performance at a fraction of the cost
Oct 2024·AnthropicProduct Announcement
20. Claude Computer Use (Beta)
First model to operate a real desktop by interpreting screenshots and issuing mouse/keyboard commands.
Oct 2024·AnthropicPolicy
19. Responsible Scaling Policy v2.0 (Updated)
Replaced ASL thresholds with a safety case framework requiring labs to prove models are safe before deployment.
Oct 2024·AnthropicEssay
18. Machines of Loving Grace (Essay by Dario Amodei)
Dario Amodei's vision for AI transforming biology, governance, economics, and equity within a decade.
Sep 2024·OpenAIProduct Announcement
★28. Learning to Reason with LLMs (o1)
The model that thinks before it speaks
Jul 2024·AnthropicResearch Paper
17. Clio: Privacy-Preserving Insights into Real-World AI Use
Built a privacy-preserving system to analyze real-world Claude usage patterns without reading individual conversations.
Jun 2024·AnthropicResearch Paper
16. Sabotage Evaluations for Frontier Models
Tested whether frontier models can covertly undermine human oversight through sandbagging, subtle errors, and sycophancy.
May 2024·AnthropicBlog Post
50. Jan Leike Joins Anthropic to Lead Alignment Science
Former OpenAI Superalignment co-lead joins Anthropic after public departure over safety concerns
May 2024·OpenAIBlog Post
★40. Ilya Sutskever's Departure and the Founding of SSI
OpenAI's co-founder and chief scientist departs to build Safe Superintelligence Inc.
May 2024·OpenAIBlog Post
27. Superalignment Dissolution and Safety Departures
The safety exodus
May 2024·OpenAIProduct Announcement
26. Hello GPT-4o
The omnimodal model
May 2024·AnthropicResearch Paper
★15. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Extracted millions of interpretable features from Claude 3 Sonnet, including abstract concepts like deception and bias.
Apr 2024·AnthropicResearch Paper
14. Claude’s Character
Introduced character training using self-generated preference data to give Claude consistent personality traits without human labels.
Apr 2024·AnthropicResearch Paper
★13. Many-Shot Jailbreaking
Discovered that flooding long context windows with harmful examples jailbreaks models on a power-law curve.
Mar 2024·AnthropicProduct Announcement
12. Claude 3 Family Launch (Haiku, Sonnet, Opus)
Launched three model tiers (Haiku, Sonnet, Opus) that beat GPT-4 on key benchmarks for the first time.
Feb 2024·OpenAIProduct Announcement
25. Sora: Creating video from text
Text-to-video enters the frontier
Jan 2024·AnthropicResearch Paper
★11. Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
Proved that deliberately trained backdoor behaviors survive all standard safety training, and larger models hide deception better.

2023

Dec 2023·OpenAIPolicy
24. Preparedness Framework (Beta)
OpenAI's risk evaluation framework
Nov 2023·OpenAIBlog Post
★23. OpenAI Board Crisis
The governance crisis that shook AI
Nov 2023·OpenAIProduct Announcement
22. OpenAI DevDay 2023: GPT-4 Turbo, Custom GPTs, Assistants API
OpenAI becomes a platform company
Oct 2023·AnthropicResearch Paper
10. Collective Constitutional AI: Aligning a Language Model with Public Input
Let ~1,000 members of the public co-write Claude's constitution, testing democratic input on AI values.
Oct 2023·AnthropicResearch Paper
9. Towards Monosemanticity: Decomposing Language Models with Dictionary Learning
Used sparse autoencoders to decompose neural network activations into interpretable features for the first time.
Sep 2023·OpenAIResearch Paper
21. GPT-4V(ision) System Card
Safety evaluation for multimodal AI
Sep 2023·AnthropicPolicy
★8. Responsible Scaling Policy (RSP) v1.0
Introduced AI Safety Levels (ASL-1 through ASL-4) with mandatory capability evaluations before scaling up.
Aug 2023·AnthropicTalk/Interview
7. Dwarkesh Patel Interview with Dario Amodei (1st appearance)
Dario Amodei predicted transformative AI within years and articulated why the safety window is narrowing.
Jul 2023·OpenAIBlog Post
20. Introducing Superalignment
OpenAI's most ambitious safety bet
Jul 2023·AnthropicProduct Announcement
6. Claude 2 Launch
Doubled context to 100K tokens and added code generation, narrowing the gap with GPT-4.
Jun 2023·OpenAIProduct Announcement
39. Function Calling and the Agent Ecosystem
Enabling language models to use tools through structured function calls
May 2023·OpenAIResearch Paper
19. Let's Verify Step by Step
Process supervision for reasoning
Mar 2023·OpenAIResearch Paper
★18. GPT-4 Technical Report
State-of-the-art performance, unprecedented secrecy
Mar 2023·AnthropicProduct Announcement
5. Claude 1 Launch
Anthropic's first commercial product, applying Constitutional AI at production scale for the first time.
Feb 2023·OpenAIEssay
17. Planning for AGI and beyond
The CEO's roadmap to AGI

2022

Dec 2022·AnthropicResearch Paper
★4. Constitutional AI: Harmlessness from AI Feedback
Replaced human annotators with AI self-critique guided by written principles, making alignment cheaper and more scalable.
Nov 2022·OpenAIProduct Announcement
★16. ChatGPT: Optimizing Language Models for Dialogue
The product that changed everything
Sep 2022·OpenAIResearch Paper
15. Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
Scale applied to speech recognition
Aug 2022·AnthropicResearch Paper
3. Red Teaming Language Models to Reduce Harms
Showed RLHF-trained models remain vulnerable to adversarial attack, proving behavioral safety is never permanently solved.
Apr 2022·OpenAIResearch Paper
14. DALL-E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents
Photorealistic text-to-image generation
Apr 2022·AnthropicResearch Paper
★2. Training a Helpful and Harmless Assistant with RLHF
Demonstrated iterated online RLHF improves both alignment and capability, then released the HH-RLHF dataset publicly.
Mar 2022·OpenAIResearch Paper
★13. Training Language Models to Follow Instructions (InstructGPT)
The paper that made ChatGPT possible

2021

Dec 2021·AnthropicResearch Paper
1. A General Language Assistant as a Laboratory for Alignment
Proved RLHF scales most favorably with model size and that aligned models can outperform unaligned ones.
Aug 2021·OpenAIResearch Paper
12. Evaluating Large Language Models Trained on Code (Codex)
Teaching GPT to write code
Feb 2021·OpenAIResearch Paper
★11. Learning Transferable Visual Models (CLIP)
Connecting vision and language at scale
Jan 2021·OpenAIResearch Paper
10. Zero-Shot Text-to-Image Generation (DALL-E)
When language models learned to see and create

2020

Sep 2020·OpenAIResearch Paper
9. Learning to Summarize from Human Feedback
The prototype for RLHF on language models
May 2020·OpenAIResearch Paper
★8. Language Models are Few-Shot Learners (GPT-3)
The model that made the world pay attention
Jan 2020·OpenAIResearch Paper
★7. Scaling Laws for Neural Language Models
The math behind 'bigger is better'

2019

Feb 2019·OpenAIResearch Paper
★6. Language Models are Unsupervised Multitask Learners (GPT-2)
The staged release that changed AI safety discourse

2018

Jun 2018·OpenAIResearch Paper
★4. Improving Language Understanding (GPT-1)
The paper that started the GPT paradigm
Apr 2018·OpenAIPolicy
★3. OpenAI Charter
The mission document that defined OpenAI's values

2017

Aug 2017·OpenAIResearch Paper
★5. Proximal Policy Optimization (PPO)
The RL algorithm that would power RLHF

2016

Apr 2016·OpenAIBlog Post
2. OpenAI Gym
Standardized RL benchmarks and environments

2015

Dec 2015·OpenAIBlog Post
★1. Introducing OpenAI
Founding of the nonprofit AI research lab