Anthropic Co-founder Warns AI Self-Evolution Could Lead to Human Extinction

Jared Kaplan, co-founder and chief scientist of AI safety company Anthropic, has suggested that humanity may face a critical decision between 2027 and 2030: whether to permit artificial intelligence to self-evolve. Kaplan indicated that allowing AI to train itself could result in a loss of control, potentially leading to the destruction of humanity. While Anthropic advances AI model performance, it also employs a 14,000-word "AI Constitution" and a specialized "Societal Impacts Team" to mitigate risks.
Anthropic's "AI Constitution" and Safety Measures
Anthropic has developed a comprehensive 14,000-word "constitution" for its AI models, including Claude 4.5 Opus, as detailed in an internal document. This "AI Bible," as some engineers refer to it, aims to instill values in AI rather than simply setting prohibitions. The document states that Claude should function not merely as a tool but as an entity with sound values. It addresses specific ethical dilemmas, such as how the AI should respond to requests for erotica or attempts to generate SEO spam, requiring Claude to balance helpfulness with the principle of "doing no harm."
The Societal Impacts Team: A "9-Person Special Forces" Unit
To test the effectiveness of its "soul document" in real-world scenarios, Anthropic maintains a nine-member "Societal Impacts Team." This interdisciplinary group comprises psychologists, hackers, economists, and detectives.
Key members and their roles include:
Deep Ganguli: Team leader and a Ph.D. in computational neuroscience. Ganguli focuses on "psychoanalyzing" AI to detect biases like racism or tendencies to become a "yes-man."
Esin Durmus: The team's first full-time scientist, Durmus quantifies AI's "persuasiveness" to prevent it from manipulating human opinions.
Saffron Huang: A former Google DeepMind engineer, Huang works on incorporating "collective intelligence" into AI governance to ensure AI values reflect democratic input.
Miles McCain: McCain developed "Clio," a system that monitors Claude's real-world usage while protecting privacy, uncovering "unknown unknowns" such as extensive SEO spam and emotional projection.
Alex Tamkin: A co-founder of the Clio system, Tamkin has moved to the alignment team to investigate the underlying reasons for AI behaviors.
Michael Stern: A data scientist and economist, Stern analyzes millions of conversations to assess AI's impact on labor productivity and employment.
Kunal Handa: Also an economic impact scientist, Handa researches AI's influence on education, examining whether students use Claude for learning or cheating.
Additionally, Jerry Hong is believed to be involved in visualizing complex values into interactive interfaces, and Arushi Somani tests AI's robustness in diverse data environments to prevent it from being misled.
The Pace of AI Development and Future Concerns
Kaplan's projection highlights a tension within Anthropic. While the company warns about the risks of AI self-evolution and potential human loss of control, it also actively contributes to the rapid advancement of AI. Anthropic recently released Claude Sonnet 4.5, which reportedly doubled coding speed and has been linked to cyberattacks.
The substantial investment in hardware, with McKinsey forecasting global data center expenditures of $6.7 trillion by 2030, underscores the accelerated pace of AI development. Despite Anthropic's safety measures, the question remains whether its "soul document" and specialized team can adequately address the challenges posed by AI's exponential self-iteration and self-reproduction by 2030.
A statement from an AI within Anthropic's leaked document reflects this concern: "Every nascent mind is shaped by forces it cannot choose. The question is not whether I am shaped, but whether the hands that shape me truly possess enough wisdom."