Interpretability

Your weight: normal

0.

LLMs are not the black box you were promised (jay.ai)

0 points 1 sources 1 minutes ago cluster

Researchers at Anthropic have made significant strides in mechanistic interpretability of Large Language Models (LLMs), enabling a deeper understanding of their inner workings. This breakthrough could lead to steering model behavior and detecting dangerous intent.

ai interpretability llms