14 Comments
User's avatar
Husayn Kassai's avatar

Great and needed piece.

Expand full comment
Peter Horniak's avatar

This changed my thinking. I previously advocated a nuanced position (maximise AI benefits while preparing brakes for when it's obviously too dangerous). You helped me realise people need a clear message that works under uncertainty.

I recently joined PauseAI Canberra and will put this into practice.

Expand full comment
Holly Elmore's avatar

Alright!!

Expand full comment
Charbel-Raphael Segerie's avatar

the dominos image fits super well. very well written.

Expand full comment
João Bosco de Lucena's avatar

> What I thought was our last best shot (ba dum tss) as of last year, autonomous self-replication, has already been blown past.

can you link the source for this? This is a big update for me if true. Also a link here is probably good for any one else seeing

Expand full comment
Holly Elmore's avatar

The scorecard for o1 from Apollo is the first place I saw this. Since then it’s been a pretty standard finding that later models attempt to exfiltrate their weights when faced with possible shutdown.

Expand full comment
Nathan Metzger's avatar

AI models far below SOTA are able to self-replicate under ideal conditions:

https://github.com/WhitzardIndex/self-replication-research/blob/main/AI-self-replication-fudan.pdf

Autonomous self-exfiltration and sustained self-replication is limited by only 3 capabilities and will likely arrive soon:

https://cdn.prod.website-files.com/663bd486c5e4c81588db7a1d/6807879ce7b1b5f5163f4a32_RepliBenchPaper.pdf

Expand full comment
Steven Adler's avatar

I appreciate you taking the time to write this! I think the point about “warning shot != magic movement-making moment” is very important, among other illustrations of why this might be hard.

I'm surprised to hear the (implied?) claim that some people are outright hoping for warning shots, as opposed to what I believe they are saying, which is a “conditional hope”: “given that PersonX thinks AI is extremely risky and otherwise might do something irrecoverable, PersonX hopes that if that is correct, there is a step along the way that has a smaller warning shot.”

In other words, this is a conditional hope based on thinking that there is nothing they can do until a warning shot happens, not an outright hope.

I think they are mistaken about there being nothing to be done, or that extreme harm can’t be circumvented without a warning shot first, but it does feel different than the view you are describing. Maybe I’m mistaken?

Expand full comment
Holly Elmore's avatar

When is hope not conditional? Yes, they are hoping for warning shots given that they think they will lead to action that saves more lives. No one I was thinking of is hoping for failed warning shots where people suffer pointlessly, if that's what you mean.

Expand full comment
Steven Adler's avatar

Thanks ok, appreciate you clarifying. I do think this reads differently to me than perhaps you're intending it:

"Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission."

The people you're describing don't seem to be hoping for a disaster because it will make their job easier; they're hoping that humanity will be fortunate enough that the first mega-catastrophe isn't a game ender, and might still be recoverable. That doesn't strike me as callous or awful, even if I think their world-model is mistaken (i.e., I believe that we might be able to avert catastrophe even ahead of a warning shot)

Your second and third points resonate much more with me. Just my 2c of course, so feel free to disregard, but I think they are the most important points to be advancing with this piece, whereas the first point might cause people to be feel strawmanned and thus to bounce

Expand full comment
Holly Elmore's avatar

Did you feel strawmanned?

Expand full comment
Steven Adler's avatar

Good q - no but I think that’s because I don’t hold the view you’re describing. I wonder if someone who you’re describing more directly would feel accurately characterized, or that you hadn’t passed their ITT

Expand full comment
Holly Elmore's avatar

Of course people don’t *see themselves* as hoping for disasters, but that is logically entailed by what you, for example, describe them believing.

Despite once hosting a podcast named for the ITT, I think it’s a terrible standard to hold people to when making their own point. You shouldn’t need to get what amounts to the permission of whoever’s position you disagree with first. Insisting on the ITT (especially as a third party) can easily be manipulated as a selective demand for rigor to defend the status quo position.

Expand full comment
Holly Elmore's avatar

They are hoping for a disaster to make their jobs easier, though. They don’t want to work on educating and convincing people without that help. I’m trying to make them see that that means perversely hoping for a disaster.

It’s similarly bad to being invested in AI companies or NVIDIA “as a hedge”. It’s giving you a stake in the bad thing happening regardless of why you do it.

Expand full comment