In a recent post on LessWrong, Andrew Critch imagines scenarios in which networks of seemingly well-behaved AI systems nonetheless results in humanity’s demise. In it, he mentions:
… [both stories] follow a progression from less automation to more, and correspondingly from more human control to less …
The comment is only made in passing, but it seems (a) interesting and (b) important: is this really the case? Does an increase in automation always result in a decrease in human control?
If our definition of “control” is that humans should complete every task, then yes: increasing automation decreases control.
However, I’d argue that it’s possible for us humans to act upon complex systems at a higher level of abstraction, and that although we have less hands-on control at the object level, this meta-level work confers more control to us as agents.
There are many examples of emergent behaviour in science, where we find it useful to think in terms of abstractions rather than the object level:
We also see the same pattern in daily life:
In both of these examples, abstraction and automation have removed tasks from a human’s to-do list, but as a result they gain meaningful control over their work and their lives.
In Critch’s stories, two key things happen as automation is increased:
Critch’s stories are (knowingly) optimistic here, in that he basically assumes that we have solved the single/single alignment problem. Specifically, each AI system is assumed to have been created by careful, well-meaning humans to serve sensible, limited goals, and that they’re pretty well-aligned to their creators’ goals.
CAIS is an example of an attempt to better-define such a outcome—where AIs exist as a suite of tools for us to choose from.
Unfortunately, as AIs become more capable, even anodyne goals can result in extremely undesirable behaviour. For example, Bostrom’s instrumental convergence thesis, or Gwern Branwen’s arguments that even limited tool-style AIs are incentivised to start acting like agents.
Even after assuming that the agents are well-aligned with their creators’ preferences, the increasingly inscrutable interaction between automated agents is disempowering for humanity.
Chattering AI systems can negotiate complex multi-party decisions faster than we can comprehend, due to increased communication bandwidth and efficient encoding of information.
Additionally, although we might have some model about what our agent’s goals are, the same won’t be the case for the other AI agents in the environment. Being able to predict—and therefore control—what the overall complex system does would be extremely challenging with this constraint.
It seems like somewhere between household gadgets and an AI-driven global economy, things went awry.
There isn’t a bright line between a world in which more automation would emancipate humanity, and one in which more automation would enslave humanity (or worse).
This unprecedented phase transition could be what gives rise to flippant dismissals of AI safety risk. “I’m not worried that my robot vacuum is going to take over the world”, someone might say.
And why wouldn’t they! The last 150 years of technological progress and increased automation have led to radical improvements in life expectancy, moral enlightment, equality of opportunity, wealth creation, and medical treatments.
It is exactly this recent history which lulls us into an implicit expectation that more automation will necessarily lead to futher gains, at our peril.