There is no ‘good’ AGI scenario

18 min readMay 21, 2023

There’s a lot of debate these days on whether we’re on track to build Artificial General Intelligence, and on whether succeeding in doing so poses an existential risk: people like Eliezer Yudkowsky are convinced that AGI will kill us, whereas those in Demis Hassabis’s camp are optimistic that AGI will lead to a post-scarcity utopia.

I think the former scenario is unlikely, but I consider the latter scenario almost as bad.

The doom scenario

The AI-as-existential-risk argument is that if we succeed in building AI with human-level intelligence, it will probably learn to improve itself; and, human-level intelligence combined with a computer’s processing speed, memory, and access to the entire corpus of human knowledge will improve itself very fast, so that within minutes/hours/days/months, the AI will reach super-human levels of intellect.

This means that such an AI system will be able to do anything that’s physically possible — from mundane things such as hacking into any system in the world, to things that to us will look like magic (for instance, if telepathy is possible (which I imagine it is? Isn’t that kind of what an MRI does?), this kind of AI will be able to read our minds).

This poses an existential risk to humanity because of what is known as the alignment problem. This is the fundamental challenge captured by folk wisdom such as ‘be careful what you wish for’, and parables including the tale of Midas: even if we have a way of giving such an all-powerful AI a concrete goal or instructions, the AI’s interpretation of the goal may not be what we intended. For instance,

If you ask the AI to eliminate global poverty, it might decide that the optimal way of doing this is to kill anyone who earns less than $2.15 a day;
Tell it to maximise human happiness, and it might lock all humans in pods and stimulate our skulls with electric current to induce a response in our brains that we’ve tagged as ‘happiness’ in the AI’s training;
and of course, famously, if you ask it to produce paper clips, it might start converting every molecule in the universe into paperclips, and possibly pre-emptively kill humans to prevent us from switching it off.

People who first come across the existential risk argument often dismiss it with one of two lazy responses:

a) something along the lines of ‘why would the AI want to kill us?’, to which the answer is that the AI does not want to kill us, it just doesn’t care whether we live or die (and if we live, under what conditions we do so); as a result, if the easiest way for it to complete its objective is to get rid of us, then it will. (And, giving instructions such as ‘don’t kill humans’ isn’t a solution: first, it’s not like we have a good way of communicating instructions to AI today — we train AIs by rewarding ‘good’ behaviours and punishing ‘bad’ ones, not by giving human-language objectives to them; second, as per the second example above, the AI physically killing us isn’t the only bad scenario — what about placing us all in a coma? We can’t possibly specify all possible bad outcomes as no-nos.)

b) ‘how will the AI even launch nuclear weapons / release a deadly virus / do other horrible thing, it won’t have access to these tools’. This shows a tremendous lack of imagination and misunderstanding of the issue. A super-human intellect would be able to think of things we cannot even imagine — figuring out how to break out of whatever locked-box confines we impose would be child’s play for it.

There is a third kind of response that at first appear somewhat more nuanced: that there are diminishing returns to intelligence, so that even if an AI develops super-human intellect, it won’t be able to do much with it. I don’t find this convincing either: as mentioned above, anything that’s theoretically possible will one day be feasible — and an AI that can function at x1,000 the speed of humans, and without all the coordination costs humans face when they collaborate, will be able to achieve these technical breakthroughs very quickly. (It might help here to break down intelligence into three components — call them, memory, processing speed, and software, where software refers to how we think. What’s limiting us humans isn’t the software — we have versatile and general abilities to solve problems. What’s limiting us are our memory and processing speed — so while it’s probably true there are diminishing returns to improving the software, an AI with human-level software but computer-level memory and speed would do much better than us.)

Why I think this scenario is unlikely

All that said, I don’t think this scenario is the most likely outcome, even if we do develop AGI soon.

<Interlude: is AGI coming coon?

The recent advances in generative AI have caused a lot of people to ‘shift their timelines’ — industry speak for ‘we think AGI will arrive sooner than we thought before’. I think this is a legitimate position: the progress we’ve seen over the past year has caused more money and talent to flow to AI, and this will possibly lead to AGI sooner.

I don’t think the recent progress in AI (mainly generative AI) is itself a path towards AGI (at least, not on its own). For generative AI to lead to AGI, you’d have to believe one of two things: either

a) the neural networks that are the underlying mechanism behind generative AI are a close enough approximation of the mechanism that leads to human cognitive ability, so that after a certain point they will lead to general cognition and creativity, as opposed to synthesising data they’ve come across in their training (of course, a big question here is whether there is such a thing as creativity, or whether everything we humans do is more or less a synthesis of things we’ve already seen — see point (b) below.). I can’t claim to understand generative AI enough to be able to judge the validity of this assumption, but I would ask: if the assumption is true, what is the ‘something’ missing that would cause this spark of creativity? (Some people answer ‘scale’, to which I counter: gen AI programs have consumed far more data than most humans have (at least, certain kinds of data), yet have not achieved AGI. And, btw, I find the ‘scale is all you need’ argument a little pedantic — okay, but how much scale? In the same vain, a sufficiently large IF statement would be indistinguishable from AGI, but no-one advocates that huge nested if-statements are the road to super-intelligence). Or,

b) everything we need to create AGI is already out there, but we’ve not managed to synthesise it (i.e. we have all the puzzle pieces for AGI, it’s just a case of putting them together). If so, a generative AI programme that consumes enough information won’t have to create anything new, it’ll just have to combine what’s already out there. Is this possible? I mean, I guess? It wouldn’t be the first time humanity has overlooked an obvious idea that didn’t require technical innovation (think of containerisation, modern self-service supermarkets, or Airbnb). Still, it feels like AGI would require better understanding of how the mind works, and as far as I can tell, we don’t quite know that yet — so while possible, I don’t think this condition is probable, either.

The timing matters, because if AGI is not imminent, then AI doom scenarios are as scary as other large-impact risks (including, e.g., nuclear war, climate change, asteroid impacts, large volcanic eruptions, etc). That is to say, something we should plan to mitigate, but not something we ought to be hysterical about.

/Interlude>

Still, as I was saying, even if AGI is imminent, I am not sure the doom scenario is the most likely outcome. The argument I laid out earlier, about how AI might decide to pursue a goal in a way that’s incompatible with human interests, rests on the assumption that the AI will never stop to second-guess its objectives. This assumption is known as the Orthogonality Thesis (I know, I know, more jargon: see interlude below). The Orthogonality thesis is an overwrought way of saying that the values and goals of an AI system are independent of its intelligence.

The OT rests on a number of arguments. I won’t go through all of them, but will focus on the three I consider most important. The first is that there isn’t a single way for minds to work. The article I linked above is unnecessarily complex, but the sentence that sums up this argument is:

For any thought you have about why a mind in that space ought to work one way, there’s a different possible mind that works differently.

This is important, because if true it means an AGI would not necessarily function the same way a human brain functions — and so, together with the subsequent arguments— this would mean the AI might relentlessly pursue an objective without considering why it’s doing it (and that we’d have no way of understanding its thought process). But this is just an assertion! Is it true that there are fundamentally different ways for minds to work? Where’s the empirical evidence or the logical proof for that? And even if it were true, would an AI trained on human content develop an ‘alien’ mind that is fundamentally different (both in processes and resultant outputs) from ours? Why, and whence the fundamental difference?

(Here’s an analogy: if you knew nothing about sound, you might say that there’s a large number of ways to produce it — you can hit things, push air through narrow passages, blow bubbles in water, pull chords, etc. But we know that in fact, sound is always produced through vibration that’s propagated through a medium like air or water. Similarly, without knowing how human-level cognition really works, we might hypothesise that the ‘space of possible minds is enormous’, though in fact there’s only one way for them to operate.)

The second argument in the Orthogonality Thesis is that ethics are not an emergent property of intelligence. The article goes into a discussion of Hume (because why not), to conclude that

When Clippy disassembles you for your atoms, it’s not disagreeing with you about the value of human life, or what it ought to do, or which outcomes are better or worse. All of those are ought-propositions. Clippy’s action is only informative about the true is-proposition ‘turning this person into paperclips causes there to be more paperclips in the universe’

The article does acknowledge the counter-argument that some definitions of intelligence (definitions known as ‘thick intelligence’) include the requirement of being able to reason about the desirability of a given end-goal, but then dismisses it with a sleight of hand:

For pragmatic purposes of AI alignment theory, if an agent is cognitively powerful enough to build Dyson Spheres, it doesn’t matter whether that agent is defined as ‘intelligent’ or its ends are defined as ‘reasonable’. A definition of the word ‘intelligence’ contrived to exclude paperclip maximization doesn’t change the empirical behavior or empirical power of a paperclip maximizer.

In other words, ‘fine, if by ‘intelligence’ you mean that an AI stops to think about its objectives, then maybe AI won’t be intelligent by your definition, but it will still be able to harvest you for atoms’.

Except that the empirical behaviour of a paperclip maximiser does change if the kind of intelligence that would be required for it to be able to disassemble humans for their atoms would also lead to its being able to reason about whether or not that’s a ‘good’ thing to do! People who bring up ‘thick’ definitions of intelligence aren’t being pedantic, they are saying that any meaningful type of intelligence will necessarily be ‘thick’.

This is the crux of the issue: the Orthogonality Thesis seems to me an elaborate act of begging the question (or, to stoop to AI writers’ level of pretentious incomprehensibility, a case of petitio principii). It concludes that AIs may be super-human in their intellect and power without questioning their goals or reasoning about their ethics, and tries to prove that by saying… AIs will be ‘alien’ minds that won’t care about non-factual considerations.

The third argument, known as ‘reflective stability’, is that even if further self-improvement led to self-reflection, AIs would refuse to ‘improve’ themselves in a way that makes them question their goals:

Suppose that Gandhi doesn’t want people to be murdered. Imagine that you offer Gandhi a pill that will make him start wanting to kill people. If Gandhi knows that this is what the pill does, Gandhi will refuse the pill […] Similarly, a sufficiently intelligent paperclip maximizer will not self-modify to act according to “actions which promote the welfare of sapient life” instead of “actions which lead to the most paperclips”

This line of reasoning is absurd: it assumes that an agent knows in advance the precise effects of self-improvement — but that’s not how learning works! If you knew exactly how an alteration in your understanding of the world would impact you, you wouldn’t need the alteration: to be able to make that judgement, you’d have to be able to reason as though you had already undergone it (of course, you can predict some of the effects of a particular course of study/self-improvement: for instance, you know that if you take a course in accounting, you’ll become better at reading financial statements. But you have no idea what other effects this course will have on your worldview — for instance, an accounting course might cause you to hate reading financial statements. If you did — if you could think exactly as you would after the course — you wouldn’t need the course.)

So if the argument the OT proponents are making is that AI will not self-improve out of fear of jeopardising its commitment to its original goal, then the entire OT is moot, because AI will never risk self-improving at all.

(To tackle the Gandhi analogy head on: obviously Ghandi wouldn’t take a pill if it were sold to him as ‘if you take this you’ll start killing people’. But if he were told ‘this pill will lead to enlightenment’, and it turns out that an enlightened being is OK with murder, then he’d have to take it — otherwise, he’d be admitting that his injunction against murder is not enlightened; and ultimately, Ghandi’s agenda wasn’t simply non-violence — that was one aspect of a wider worldview and philosophy. To be logically consistent, AI doomers would need to argue that Ghandi wouldn’t dare reading anything new, out of fear that might change his worldview.)

Of course, it is possible that these assertions are true (save self-reflectivity, which rests on a paradoxical view of learning), but do we have any reason to think them probable? I think no. We only have a sample of one for human-level intelligence, but in this one sample, generalised intelligence has led to self-reflection and attempts to specify ethics. So perhaps there is some innate quality in reasoning that leads to this kind of behaviour.

Rejection of the orthogonality thesis is the main reason I find the doom scenario unlikely. But there are more:

AI doomers think that AI will self-improve at a breakneck pace (the ‘foom’ argument): once AI reaches human-level intellect, it will rapidly improve its abilities, so that within weeks (if not hours or even minutes), it will reach omnipotence — at which point it will destroy humanity. But again, this is an assertion, not a probable outcome: the underlying assumption is that once AI reaches the level of humans, it will formulate a plan to reach its objectives (which plan necessitates human extinction, or causes it as a side-effect); the first step of this plan will be self-improvement. But human-level reasoning is not perfect: we make mistakes, and it’s strange to assume that the AI won’t make any at this point. And if it does, stopping it and imposing a moratorium on the kind of AI research that led to this particular AI will likely be possible, both technically and politically.
AI is not the first technology that has scared people into thinking we’re all about to die — there’s an endless list of examples, including nuclear power and electricity. Doomers rightly say that this kind of reasoning is fallacious — past performance doesn’t guarantee future performance etc. Granted — but it’s often a good predictor of it; and when it comes to history, ‘this time is different’ is almost always false. (This argument is not in itself a refutation of doomers’ positions. It is a reminder that many smart people and experts in the past have come up with very convincing arguments why novel technologies will lead to extinction or at least significant harm — and so, no matter how compelling and scary these arguments may seem, we should stop and consider which of their premises are factual, and which are assertions (as are, in this case, the orthogonality thesis, and ‘foom’).)
This one is left-field but it has to be mentioned: what about aliens? The possibility that aliens exist and have visited earth is increasingly brought into mainstream conversation; and it has in fact come up in AI discourse. Eliezer Yudkowsky’s response to the argument that if aliens exist, they’d probably stop us from releasing catastrophic AGI is that ‘aliens didn’t prevent the Holocaust’, so why should they prevent AGI? The obvious answer is that the Holocaust did not affect the rest of the universe, whereas the kind of omnipotent AI Yudkowsky fears would not stop at disassembling the earth for paper clip fodder. So aliens would very much intervene if we were in fact close to releasing such a dangerous system.

<Interlude: doomers are exceptionally annoying

I’m gonna take this opportunity to vent: AI doomers are incredibly obnoxious. First, there’s many of them who seem to do nothing but spend all day on Twitter talking about how the end is nigh; many are convinced the probability of doom (or P(doom) as they put it — see below) is 100% (in fact, they like to write P(doom) = 1, which (they think) looks more mathematically rigorous), and so their aim is not to prevent the apocalypse, but to ensure we ‘die with dignity’. I call BS: spending your last days whining on Twitter has not an iota of dignity. If you really believe we’re all about to die, stop the hysteria and go enjoy your final days on earth.

Second, as I’ve suggested several times already, the way these people write is atrocious (though this is true of people who write about AI in general, not just doomers). They love their jargon, novel metaphors, and grandiose terms: Shoggoth, stochastic parrot, Orthogonality, reflective stability, instrumental convergence, foom, &c, &c. As a fan of Politics and the English Language, I suppose I ought to be applauding original imagery in writing, but I can’t help feeling these terms all sound pompous and affected. Like economists, AI writers and doomers love to insert mathematical notation in prose; unlike economists (whose use of maths aims to clarify), AI writers seem to use mathematical notation as a pretence of rigour (see, e.g., the use of <v in the discussion of Hume in the Orthogonality article).

Third, it seems to me that doomers do not follow their theories to their logical conclusions. Many are appalled and cry that words are being put in their mouths when their opponents argue doomerism is a call to violence. But it’s hard to see why doomers aren’t advocating violence. If they’re legitimately 100% convinced that AI will lead to human extinction, why aren’t they suggesting we bomb data centres and AI research labs? Why is MIRI pursuing alignment research instead of morphing into a hacker collective to disrupt AI organisations?

/interlude> (yes, my writing’s also affectedly and studiedly idiosyncratic)

The post-scarcity utopia scenario

There’s so much discussion on the harms from AI — not only extinction, but also the more immediate impacts such as those on employment and misinformation — that little consideration has been given to the ‘good’ case. It seems that everyone agrees that if the doom scenario doesn’t play out, the future’s very rosy indeed: as outlined earlier, an AGI would make anything that’s theoretically possible feasible.

What does that entail? Obviously, no more disease; most probably, no ageing. But also no scarcity — in theory, we should be able to reconfigure matter at will, and so we’ll probably be able to do it. Imagine not only every necessity (food, water) but also every conceivable luxury at your fingertips. (And if at this point you respond ‘the earth only has this many resources’, you’ve not been paying attention: AGI will have powers that seem like magic to us — what restricts it to earth’s resources?)

Does this sound appealing? Most people seem to think so — but scarcity, death, and overcoming adversity are things that have defined and shaped humanity forever: they are the things that give our lives meaning. I don’t think we’re equipped to deal with their loss. What will life look like when we have nothing to strive for? What is there in life that gives us meaning and motivation, if not overcoming a challenge — and what kind of challenge will be left for humans to tackle, when AGI will be able to do everything better than us?

(I’m using ‘challenge’ here is a very broad sense, that of something that needs to be solved. For instance, building meaningful relationships is a challenge — but one that AGI will probably make easy (for instance, many issues in relationships stem from misunderstandings and inability to see the other person’s point of view — this sounds like something AGI can address though: you can imagine a world where AGI allows you to literally experience another person’s emotions, or ‘map’ their thought process in a way that’s compatible to yours.).)

Consider the effects of AGI on the following broad human activities: science & industry, philosophy, art, and entertainment (as an aside, something that’s always bothered me about English is the lack of the useful Greek distinction between ψυχαγωγία (psychagogia — ‘cultivation of the soul’, i.e. activities that have a deeper meaning as opposed to merely helping pass the time) and διασκέδαση (diaskedasi — entertainment)).

Science and industry

This whole area will obviously become obsolete under AGI. Since AGI will be (by definition) at least as capable as humans, and much faster & better coordinated, it will not need our help to do science, or produce goods. So there’ll be no ‘work’ for us to do.

(‘What about services,’ some of you might still object. ‘These require the human touch.’ First off, professional services would be by and large unnecessary — the vast majority of them are predicated on the existence of scarcity or illness: we won’t need doctors and carers if we’re never ill; bankers’ job is to allocate scarce capital, and capital will no longer be scarce; lawyers deal with disputes that arise from scarcity; etc). Those service jobs that are left — e.g. waiting — could easily be done by humanoid robots.)

Philosophy

There are many branches in philosophy, and I’m by no means an expert, or even a particularly well-read amateur, in the field. But it seems to me that most of these branches will be obsolete, in that they stem in one shape or form from scarcity, or the limits of science.

Start with the latter: as already discussed, AGI will push science to its limits. Anything that AGI cannot answer cannot be answered — so philosophical investigations in logic, computation theory, etc will be moot.

As to the former: a lot of philosophy deals with issues of virtue, vice, and justice; these stem from conflict (used here in a broad sense); and all conflict stems from scarcity. For instance,

Kindness and generosity are impossible when no-one’s in need of them (how can you be generous to someone who can have everything they want?)
Courage is meaningless when there’s nothing to be afraid of (since AI can cure everything)
Justice is obsolete, since fairness has to do with the allocation of resources, and resources are no longer scarce

Etc, etc.

There are other branches of philosophy that are not obviously affected — e.g. aesthetics. But… if we take philosophy to be the search for ‘true’ principles for things like aesthetics, then AGI will be able to confirm the existence of such true principles (and specify what they are), or it will conclude the definition of such principles is impossible, in which case efforts to discover them are meaningless.

Art

I consider ‘serious’ art (as opposed to pure entertainment) a more approachable form of philosophy: broadly speaking, it concerns itself with the study of truth, self-expression, and aesthetics. As with philosophy, so with art: AGI renders the first and last of these obsolete (because it can itself discover ‘truth’) or meaningless (because ‘truth’ isn’t something that can be discovered). One possible exception to this is that the pursuit of aesthetic excellence may still be fulfilling, even if we know it’s a meaningless pursuit; though in this case, perhaps aesthetics becomes delegated to the ‘entertainment’ category.

As for self-expression, a lot of our emotions are things we experience because of scarcity, challenges we need to overcome, or (in the case of fear) the existence of forces that disquiet us. AGI does away with these sources of emotion — so what are we left to express with our art, other than ennui?

Entertainment

There are many things we do because they are inherently fun, and not in pursuit of some higher purpose. Some of these will retain their value even after AGI arrives — though many forms of current entertainment will probably not survive.

It’ll be hard to enjoy literature or film when all plots in these media require conflict, something that will become alien to us over time (even if we still retain memories of scarcity and conflict 10,000 years in the future, our descendants certainly won’t understand these concepts (imagine a film studio releasing a bee mating documentary and marketing it as a romcom; though some people might consider bee mating documentaries interesting, they’re certainly a small minority of the population, and even they wouldn’t consider such a documentary relevant to their love life)). Things like sports and games might survive.

Frankly, the most likely outcome is that we’ll develop hyper-realistic VR games to simulate life pre-AGI, with all its toils and uncertainty and turbulence and fear and insecurity (which, incidentally, is another reason the doomer scenario is unlikely: if we live in a simulation, it’s probably hard-coded to preclude the development of AGI :) )

Conclusion

Basically, a world with no death, and an overabundance of resources, is incomprehensible to us; we’ll have nothing to work towards and nothing from which to derive meaning. I would love an AGI-optimist to tell me exactly how we humans will fill our time, and what we’ll do when there are no challenges left to overcome.

If you enjoyed this post, you can read more here, or follow me on Twitter