Ai Safety

Your weight: normal

0.

The ways we contain Claude across products (anthropic.com)

0 points 1 sources 1 minutes ago cluster

Anthropic engineers have developed containment methods for their AI model Claude, used across multiple products, to limit its potential impact.

ai-safety containment models
0.

Announcing the ARC White-Box Estimation Challenge (alignmentforum.org)

0 points 1 sources 1 minutes ago cluster

The Alignment Forum has announced the ARC White-Box Estimation Challenge, a competition to improve the estimation of AI models' capabilities. The challenge aims to advance the field of AI alignment and safety.

ai-safety alignment machine-learning
0.

Advancing youth safety and opportunity through global leadership (openai.com)

0 points 1 sources 1 minutes ago cluster

OpenAI calls for global action on youth AI safety through a dedicated AI Safety Institute, emphasizing the need for safe and age-appropriate AI access to unlock new learning opportunities.

ai-safety policy youth-safety
0.

Testing Gemini models for scheming tendencies (alignmentforum.org)

0 points 1 sources 1 minutes ago cluster

Researchers at the Alignment Forum are testing Gemini models for potential scheming tendencies, a key concern in AI safety.

ai-safety alignment-forum gemini-models
0.

Looking for backdoors in Jane Street LLMs (alignmentforum.org)

0 points 1 sources 3 days ago cluster

Researchers are searching for potential backdoors in Jane Street's large language models (LLMs), citing concerns about model safety and reliability. The investigation is ongoing, with no concrete findings reported yet.

ai-safety llms model-security