2 Comments

Re your view on the current state of technical AI safety/alignment research, what is your inside view or intuition on what directions/agendas are promising/pursuitworthy?

Expand full comment

I can’t pretend to understand it but I’m interested in Davidad’s open architecture thing that he is working on through ARIA, and I’m also interested in interpretability work on smaller models like GPT-2, which naively seem like it would be way more achievable. Zooming out, I’m interested in theoretical understanding of what it means to be “aligned” when you can’t compare utility functions. I have a lot of intuitions and priors on this from being an evolutionary biologist, and there are surprising insights you can get from simple ecological modeling (favorite example of mine is principle of competitive exclusion arising from niche theory).

Expand full comment