Anthropic engineers have developed containment methods for their AI model Claude, used across multiple products, to limit its potential impact.
Ai Safety
Your weight: normal
- 0.The ways we contain Claude across products (anthropic.com)
- 0.Announcing the ARC White-Box Estimation Challenge (alignmentforum.org)
The Alignment Forum has announced the ARC White-Box Estimation Challenge, a competition to improve the estimation of AI models' capabilities. The challenge aims to advance the field of AI alignment and safety.
- 0.
OpenAI calls for global action on youth AI safety through a dedicated AI Safety Institute, emphasizing the need for safe and age-appropriate AI access to unlock new learning opportunities.
- 0.Testing Gemini models for scheming tendencies (alignmentforum.org)
Researchers at the Alignment Forum are testing Gemini models for potential scheming tendencies, a key concern in AI safety.
- 0.Looking for backdoors in Jane Street LLMs (alignmentforum.org)
Researchers are searching for potential backdoors in Jane Street's large language models (LLMs), citing concerns about model safety and reliability. The investigation is ongoing, with no concrete findings reported yet.