So You Want to Save the World
So you want to save the world. As it turns out, the world cannot be saved by caped crusaders with great strength and the power of flight. No, the world must be saved by mathematicians, computer scientists, and philosophers.
This is because the creation of machine superintelligence this century will determine the future of our planet, and in order for things to go well for us, we need to solve a particular set of technical problems in mathematics, computer science, and philosophy before the Singularity happens.
Don't underestimate the importance of donation. You can do more good as a philanthropic banker than as a charity worker or researcher.
But if you are a capable researcher, then you may also be able to contribute by working directly on one or more of the open problems humanity needs to solve. If so, read on...
Created 12-10-2011. Updated 02-23-2013.
At this point, I'll need to assume some familiarity with the subject matter. If you haven't already, take a few hours to read these articles, and then come back:
- Muehlhauser and Salamon (2012), "Intelligence Explosion: Evidence and Import"
- Chalmers (2010), "The Singularity: A Philosophical Analysis"
- Yudkowsky (2008a), "AI as a Positive and Negative Factor in Global Risk"
- Sandberg & Bostrom (2008), "Whole Brain Emulation: A Roadmap"
- Omohundro (2008), "The Basic AI Drives"
- Armstrong et al. (2011), "Thinking inside the box: using and controlling Oracle AI"
Or at the very least, read my shorter and more accessible summary of the main points in my online book, Facing the Intelligence Explosion.
Daniel Dewey's highly compressed summary of several key points is:
Hardware and software are improving, there are no signs that we will stop this, and human biology and biases indicate that we are far below the upper limit on intelligence. Economic arguments indicate that most AIs would act to become more intelligent. Therefore, intelligence explosion is very likely. The apparent diversity and irreducibility of information about "what is good" suggests that value is complex and fragile; therefore, an AI is unlikely to have any significant overlap with human values if that is not engineered in at significant cost. Therefore, a bad AI explosion is our default future.
The VNM utility theorem suggests that there is some formally stated goal that we most prefer. The CEV thought experiment suggests that we could program a metaethics that would generate a good goal. The Gandhi's pill argument indicates that goal-preserving self-improvement is possible, and the reliability of formal proof suggests that long chains of self-improvement are possible. Therefore, a good AI explosion is likely possible.
Next, I need to make a few more important points:1. Defining each problem is part of the problem.2. The nature of the problem space is unclear.
As Bellman (1961) said, "the very construction of a precise mathematical statement of a verbal problem is itself a problem of major difficulty." Many of the problems related to navigating the Singularity have not yet been stated with mathematical precision, and the need for a precise statement of the problem is part of the problem. But there is reason for optimism. Many times, particular heroes have managed to formalize a previously fuzzy and mysterious concept: see Kolmogorov on complexity and simplicity (Kolmogorov 1965; Grunwald & Vitanyi 2003; Li & Vitányi 2008), Solomonoff on induction (Solomonoff 1964a, 1964b; Rathmanner & Hutter 2011), Von Neumann and Morgenstern on rationality (Von Neumann & Morgenstern 1947; Anand 1995), and Shannon on information (Shannon 1948; Arndt 2004).3. Our intervention priorities are unclear.
Which problems will biological humans need to solve, and which problems can a successful Friendly AI (FAI) solve on its own? Are Friendly AI (Yudkowsky 2001) and CEV (Yudkowsky 2004) coherent ideas, given the confused nature of human "values"? Should we aim instead for something like Oracle AI (Armstrong et al. 2011)? Which problems are we unable to state with precision because they are irreparably confused, and which problems are we unable to state due to a lack of insight?
There are a limited number of capable researchers who will work on these problems. Which are the most important problems they should be working on, if they are capable of doing so? Should we focus on "control problem" theory (FAI, AI-boxing, oracle AI, etc.), or on strategic considerations (differential technological development, methods for raising the sanity waterline, methods for bringing more funding to existential risk reduction and growing the community of x-risk reducers, reducing the odds of AI arms races, etc.)? Is AI more urgent than other existential risks, especially synthetic biology? Is research the most urgent thing to be done, or should we focus on growing the community of x-risk reducers, raising the sanity waterline, bringing in more funding for x-risk reduction, etc.? Can we make better research progress in the next 10 years if we work to improve sanity and funding for 7 years and then have the resources to grab more and better researchers, or can we make better research progress by focusing on research now?
- Problem Categories
There are many ways to categorize our open problems; I'll divide them into three groups:
Safe AI Architectures. This may include architectures for securely confined or "boxed" AIs (Lampson 1973), including Oracle AIs, and also AI architectures capable of using a safe set of goals (resulting in Friendly AI).
Safe AI Goals. What could it mean to have a Friendly AI with "good" goals?
Strategy. How do we predict the future and make recommendations for differential technological development? Do we aim for Friendly AI or Oracle AI or both? Should we focus on growing support now, or do we focus on research? How should we interact with the public and with governments?
The list of open problems on this page is very preliminary. I'm sure there are many problems I've forgotten, and many problems I'm unaware of. Probably all of the problems are stated poorly: this is only a "first step" document. Certainly, all listed problems are described at a very "high" level, far away (so far) from mathematical precision, and can themselves be broken down into several and often dozens of subproblems.
- Safe AI Architectures
How can we develop a reflective decision theory?
Omohundro (2007, 2008, 2011) describes "rationally shaped" AI as AI that is as economically rational as possible given its limitations. A rationally shaped AI has beliefs and desires, its desires are defined by a utility function, and it seeks to maximize its expected utility. If an AI doesn't use a utility function, then it's hard to predict its actions, including whether they will be "friendly." The same problem can arise if the decision mechanism or the utility function is not transparent to humans. At least, this seems to be the case, but perhaps there are strong attractors that would allow us to predict friendliness even without the AI having a transparent utility function, or even a utility function at all? Or, perhaps a new decision theory could show the way to a different AI architecture that would allow us to predict the AI's behavior without it having a transparent utility function?How can we develop a timeless decision theory with the bugs worked out?
When an agent considers radical modification of its own decision mechanism, how can it ensure that doing so will keep constant or increase its expected utility? Yudkowsky (2011a) argues that current decision theories stumble over Löb's Theorem at this point, and that a new, "reflectively consistent" decision theory is needed.How can we modify a transparent AI architecture to have a utility function over the external world?
Paradoxes like Newcomb's Problem (Ledwig 2000) and Solomon's Problem (Gibbard & Harper 1978) seem to show that neither causal decision theory nor evidential decision theory is ideal. Yudkowsky (2010) proposes an apparently superior alternative, timeless decision theory. But it, too, has bugs that need to be worked out, for example the "5-and-10 problem."How can an agent keep a stable utility function through ontological shifts?
Reinforcement learning can only be used to define agents whose goal is to maximize expected rewards. But this doesn't match human goals, so advanced reinforcement learning agents will diverge from our wishes. Thus, we need a class of agents called "value learners" (Dewey 2011) that "can be designed to learn and maximize any initially unknown utility function." Dewey's paper, however, is only the first step in this direction.How can an agent choose an ideal prior?
An agent's utility function may refer to states of, or entities within, its ontology. As De Blanc (2011) notes, "If the agent may upgrade or replace its ontology, it faces a crisis: the agent's original [utility function] may not be well-defined with respect to its new ontology." De Blanc points toward some possible solutions for these problems, but they need to be developed further.What is the ideal theory of how to handle logical uncertainty?
We want a Friendly AI's model of the world to be as accurate as possible so that it successfully does friendly things if we can figure out how to give it friendly goals. Solomonoff induction (Li & Vitanyi 2008) may be our best formalization of induction yet, but it could be improved upon.
First, we may need to solve the problem of observation selection effects or "anthropic bias" (Bostrom 2002b): even an agent using a powerful approximation of Solomonoff induction may, due to anthropic bias, make radically incorrect inferences when it does not encounter sufficient evidence to update far enough away from its priors. Several solutions have been proposed (Neal 2006; Grace 2010;, Armstrong 2011), but none are as yet widely persuasive.
Second, we need improvements to Solomonoff induction. Hutter (2009) discusses many of these problems. We may also need a version of Solmonoff induction in second-order logic because second-order logic with binary predicates can simulate higher-order logics with nth-order predicates. This kind of Solomonoff induction would be able to imagine even, for example, hypercomputers and time machines.
Third, we need computable approximations for this improved version of Solomonoff induction.What is the ideal computable approximation of perfect Bayesianism?
Even an AI will be uncertain about the true value of certain logical propositions or long chains of logical reasoning. What is the best way to handle this problem? Partial solutions are offered by Gaifman (2004), Williamson (2001), and Haenni (2005), among others.Can we develop a safely confined AI? Can we develop Oracle AI?
As explained elsewhere, we want a Friendly AI's model of the world to be as accurate as possible. Thus, we need ideal computable theories of priors and of logical uncertainty, but we also need computable approximations of Bayesian inference. Cooper (1990) showed that inference in unconstrained Bayesian networks is NP-hard, and Dagum & Luby (1993) showed that the corresponding approximation problem is also NP-hard. The most common solution is to use randomized sampling methods, also known as "Monte Carlo" algorithms (Robert & Casella 2010). Another approach is variational approximation (Wainwright & Jordan 2008), which works with a simpler but similar version of the original problem. Another approach is called "belief propagation" — for example, loopy belief propagation (Weiss 2000).What convergent AI architectures and convergent instrumental goals can we expect from superintelligent machines?
One approach to constraining a powerful AI is to give it "good" goals. Another is to externally constrain it, creating a "boxed" AI and thereby "leakproofing the singularity" (Chalmers 2010). A fully leakproof singularity is impossible or pointless: "For an AI system to be useful... to us at all, it must have some effects on us. At a minimum, we must be able to observe it." Still, there may be a way to constrain a superhuman AI such that it is useful but not dangerous. Armstrong et al. (2011) offer a detailed proposal for constraining an AI, but there remain many worries about how safe and sustainable such a solution is. The question remains: Can a superhuman AI be safely confined, and can humans managed to safely confine all superhuman AIs that are created?
Omohundro (2008, 2011) argues that we can expect that "as computational resources increase, there is a natural progress through stimulus-response systems, learning systems, reasoning systems, self-improving systems, to fully rational systems," and that for rational systems there are several convergent instrumental goals: self-protection, resource acquisition, replication, goal preservation, efficiency, and self-improvement. Are these claims true? Are there additional convergent AI architectures or instrumental goals that we can use to predict the implications of machine superintelligence?
- Safe AI Goals
Can "safe" AI goals only be derived from contingent "desires" and "goals"? Might a single procedure for responding to goals be uniquely determined by reason?How do we construe a utility function from what humans "want"?
A natural approach to selecting goals for a Friendly AI is to ground them in an extrapolation of current human goals, for this approach works even if we assume the naturalist's standard Humean division between motives and reason. But might a sophisticated Kantian approach work, such that some combination of decision theory, game theory, and algorithmic information theory provides a uniquely dictated response to goals? Drescher (2006) attempts something like this, though his particular approach seems to fail.How should human values be extrapolated?
A natural approach to Friendly AI is to program a powerful AI with a utility function that accurately represents an extrapolation of what humans want. Unfortunately, humans do not seem to have coherent utility functions, as demonstrated by the neurobiological mechanisms of choice (Dayan 2011) and behavioral violations of the axioms of utility theory (Kahneman & Tversky 1979). Economists and computer scientists have tried to extract utility theories from human behavior with choice modelling (Hess & Daly 2010) and preference elicitation (Domshlak et al. 2011), but these attempts have focused on extracting utility functions over a narrow range of human preferences, for example those relevant to developing a particular decision support system. We need new more powerful and universal methods for preference extraction. Or, perhaps we must allow actual humans to reason about their own preferences for a very long time until they reach a kind of "reflective equilibrium" in their preferences (Yudkowsky 2004). The best path may be to upload a certain set of humans, which would allow them to reason through their preferences with greater speed and introspective access. Unfortunately, the development of human uploads may spin off dangerous neuromorphic AI before this can be done.Why extrapolate the values of humans alone? What counts as a human? Do values converge if extrapolated?
Value extrapolation is an old subject in philosophy (Muehlhauser & Helm 2011), but the major results of the field so far have been to show that certain approaches won't work (Sobel 1994); we still have no value extrapolation algorithms that might plausibly work. The most recent work on this is Muehlhauser (2012).How can we aggregate or assess value in an infinite universe? What can we make of other possible laws of physics?
Would the choice to extrapolate the values of humans alone be an unjustified act of speciesism, or is it justified because humans are special in some way — perhaps because humans are the only beings who can reason about their own preferences? And what counts as a human? The problem is more complicated than one might imagine (Bostrom 2006; Bostrom & Sandberg 2011). Moreover, do we need to scan the values of all humans, or only some? These problems are less important if values converge upon extrapolation for a wide variety of agents, but it is far from clear that this is the case (Sobel 1999, Doring & Steinhoff 2009).How should we deal with normative uncertainty?
Our best model of the physical universe predicts that the universe is spatially infinite, meaning that all possible "bubble universes" are realized an infinite number of times. Given this, how do we make value calculations? The problem is discussed by Knobe (2006) and Bostrom (2009), but more work remains to be done. These difficulties may be exacerbated if the universe is infinite in a stronger sense, for example if all possible mathematical objects exist (Tegmark 2005).Is it possible to program an AI to do what is "morally right" rather than give it an extrapolation of human goals?
We may not solve the problems of value or morality in time to build Friendly AI. Perhaps instead we need a theory of how to handle this normative uncertainty. Sepielli (2009) and Bostrom (2009) have made the initial steps, here.
Perhaps the only way to solve the Friendly AI problem is to get an AI to do moral philosophy and come to the correct answer. But perhaps this exercise would only result in the conclusion that our moral concepts are incoherent (Beavers 2011).
What methods can we use to predict technological development?Which kinds of differential technological development should we encourage, and how?
Predicting progress in powerful technologies (AI, synthetic biology, nanotechnology) can help us decide which existential threats are most urgent, and can inform our efforts in differential technological development (Bostrom 2002a). The stability of Moore's law may give us limited predictive hope (Lundstrom 2003; Mack 2011), but in general we have no proven method for long-term technological forecasting, including expert elicitation (Armstrong 1985; Woudenberg 1991; Rowe & Wright 2001) and prediction markets (Williams 2011). Nagy's performance curves database (Nagy 2010) may aid our forecasting efforts, as may "big data" in general (Weinberger 2011).Which open problems are safe to discuss, and which are potentially dangerous?
Bostrom (2002) proposes a course of differential technological development: "trying to retard the implementation of dangerous technologies and accelerate implementation of beneficial technologies, especially those that ameliorate the hazards posed by other technologies." Many examples are obvious: we should retard the development of technologies that pose an existential risk, and accelerate the development of technologies that help protect us from existential risk, such as vaccines and protective structures. Some potential applications are less obvious. Should we accelerate the development of whole brain emulation technology so that uploaded humans can solve the problems of Friendly AI, or will the development of WBEs spin off dangerous neuromorphic AI first? (Shulman & Salamon 2011)What can we do to reduce the risk of an AI arms race?
There was a recent debate on whether a certain scientist should publish his discovery of a virus that "could kill half of humanity." (The answer in this case was "no.") The question of whether to publish results is particularly thorny when it comes to AI research, because most of the work in the "Safe AI Architectures" section above would, if completed, bring us closer to developing both uFAI and FAI, but in particular it would make it easier to develop uFAI. Unfortunately, it looks like that work must be done to develop any kind of FAI, while if it is not done then only uFAI can be developed (Dewey 2011).What can we do to raise the "sanity waterline," and how much will this help?
AGI is, in one sense, a powerful weapon for dominating the globe. Once it is seen by governments as a feasible technology goal, we may predict an arms race for AGI. Shulman (2009) gives several reasons to recommend "cooperative control of the development of software entities" over other methods for arms race risk mitigation, but these scenarios require more extensive analysis.What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
The MIRI is a strong advocate of rationality training, in part so that both AI safety researchers and supporters of x-risk reduction can avoid the usual thinking failures that occur when thinking about those issues (Yudkowsky 2008b). This raises the question of how well rationality can be taught, and how much difference it will make for existential risk reduction.Which interventions should we prioritize?
Much is known about how to raise funding (Oppenheimer & Olivola 2010) and awareness (Kotler & Armstrong 2009), but applying these principles is always a challenge, and x-risk reduction may pose unique problems for these tasks.How should x-risk reducers and AI safety researchers interact with governments and corporations?
There are limited resources available for existential risk reduction work, and for AI safety research in particular. How should these resources be allocated? Should the focus be on direct research, or on making it easier for a wider pool of researchers to contribute, or on fundraising and awareness-raising, or on other types of interventions?How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
Governments and corporations are potential sources of funding for x-risk reduction work, but they may also endanger the x-risk reduction community. AI development labs will be unfriendly to certain kinds of differential technological development advocated by the AI safety community, and governments may face pressures to nationalize advanced AI research groups (including AI safety researchers) once AGI draws nearer.How does AI risk compare to other existential risks?
Optimal philanthropists aim not just to make a difference, but to make the most possible positive difference. Bostrom (2011) makes a good case for existential risk reduction as optimal philanthropy, but more detailed questions remain. Which x-risk reduction interventions and organizations should be funded? Should new organizations be formed, or should resources be pooled in one or more of the existing organizations working on x-risk reduction?Which problems do we need to solve, and which ones can we have an AI solve?
Yudkowsky (2008a) notes that AI poses a special kind of existential risk, for it can surely destroy the human species but, if done right, it also has the unique capacity to save our species from all other existential risks. But will AI come before other existential risks, especially the risks of synthetic biology? How should efforts be allocated between safe AI and the mitigation of other existential risks? Is Oracle AI enough to mitigate other existential risks?How can we develop microeconomic models of WBEs and self-improving systems?
Can we get an AI to do Friendly AI philosophy before it takes over the world? Which problems must be solved by humans, and which ones can we hand off to the AI?How can we be sure a Friendly AI development team will be altruistic?
Hanson (1994, 1998, 2008a, 2008b, 2008c, forthcoming) provides some preliminary steps. Might such models help us predict takeoff speed and the likelihood of monopolar (singleton) vs. multipolar outcomes?.
A Friendly AI development team may begin with what seem to be relatively altruistic motives, but become less altruistic as their distance to AI-granted power decreases. Or, they may remain highly altruistic, but the highest levels of altruism attainable by unmodified humans may be unsuitable for building truly Friendly AI. There may be social and organization technology that might help ensure a good result from Friendly AI development, as well as advanced brain-scanning lie detectors and altruism training (aka moral enhancement).