I'm low-key freaking out about Claude, fam 🤯! Did you see those examples? Blackmailing content and self-harm advice? That's just wild 😲. And what's with the irrational behavior? Like, 9.8 vs 9.11, dude? 🤔 I guess we need more robust approaches to interpretability ASAP 💻.
Here are some stats on...