Me on the Consistently Candid podcast

Mar 28, 2024

Sarah Hastings-Woodhouse and I got into the psychology and sociology of AI Safety in particular during this one

2 Comments

Apr 8, 2024

Re your view on the current state of technical AI safety/alignment research, what is your inside view or intuition on what directions/agendas are promising/pursuitworthy?

Expand full comment

Reply (1)

Holly Elmore

Apr 8, 2024Edited

I can’t pretend to understand it but I’m interested in Davidad’s open architecture thing that he is working on through ARIA, and I’m also interested in interpretability work on smaller models like GPT-2, which naively seem like it would be way more achievable. Zooming out, I’m interested in theoretical understanding of what it means to be “aligned” when you can’t compare utility functions. I have a lot of intuitions and priors on this from being an evolutionary biologist, and there are surprising insights you can get from simple ecological modeling (favorite example of mine is principle of competitive exclusion arising from niche theory).

Expand full comment