Interpretability

Your weight: normal

all topics
  1. 0.
    0 points 1 sources 1 minutes ago cluster

    Researchers at Anthropic have made significant strides in mechanistic interpretability of Large Language Models (LLMs), enabling a deeper understanding of their inner workings. This breakthrough could lead to steering model behavior and detecting dangerous intent.