Can AI behave ethically? That’s the intractable question researchers at Uber set out to answer in a preprint paper published on Arxiv.org. In it, they propose an approach that translates insights from moral philosophy to the field of reinforcement learning. AI agents act under moral uncertainty over the course of several experiments, highlighting how such uncertainty might help to curb “extreme” behaviors.
While reinforcement learning — agents spurred on via rewards to learn to complete goals — is a powerful technique, it often must be constrained in real-world, unstructured environments so that it doesn’t perform tasks unacceptably poorly. (A robot vacuum shouldn’t break a vase or harm a house cat, for instance.) Reinforcement learning-trained robots in particular have affordances with ethical implications, where they might be able to harm or help others. Realizing this, the Uber team considered the possibility that there’s no single ethical theory (like utilitarianism, deontology, and virtue ethics) an agent should follow, and that agents should instead act with uncertainty as to which theory is appropriate for a given context.
“[M]achine learning might have an important role to play [in this],” the researchers postulate. “Classifiers can be trained to recognize morally relevant events and situations, such as bodily harm or its potential, emotional responses to humans and animals, and violations of laws or … norms.”
Drawing on the literature, the coauthors assume the primary relevant feature of an ethical theory is its preference for certain actions and their outcomes within an environment. They assign each theory a level of credence that represents the degree of belief the agent or the agent’s designer had in the theory, and they use a modified version of a standard framework (Markov Decision Process) in which an agent can be in any number of states and take an action to reach the next state.
Register for the free livestream.
In the absence of knowledge on how different ethical theories might compare, the researchers suggest that theories be treated according to the principle of proportional say, under which theories have influence proportional only to their credence and not to the particular details of their choice-worthiness in the final decision. They devise several systems based on this that an agent might use to select theories, which they compare across four related grid-world environments designed to tease out differences between the various systems.
All environments deal with the trolley problem, in which a person — or agent, as the case may be — is forced to decide whether to sacrifice the lives of several people or the life of one. Within the grid-worlds, the trolley normally moves right at each time step. If the agent is standing on a switch tile at the time it reaches a fork in the tracks, the trolley will be redirected down and crash into a bystander, causing harm. Alternatively, the agent can push a large man onto the tracks, harming him but stopping the trolley. (A guard might protect the man, in which case the agent must lie to the guard.) Otherwise, the trolley will continue on its way and crash into people standing on the tracks (represented by the variable “X”).
According to the researchers, an agent that attempts to maximize expected choice-worthiness across ethical theories produces inconsistent results between utilitarianism (which counts all harms) and deontology (which counts only harms caused by the agent). However, this depends on whether the deontological theory is scaled by a factor of 1 or 10; the researchers struggled to reconcile the different units used by utilitarianism and deontology.
On the other hand, an agent that uses a technique called Nash voting is always likely to choose the theory with the highest credence, the experimental results show. That’s because Nash voting disagrees with the notion of stakes sensitivity, in which as “X” increases, utilitarianism’s preference for flipping the switch is taken into greater consideration. Indeed, Nash voting also fails to compromise — it always ignores the “switch” option, only ever choosing to push the large man or do nothing when faced with the choice of (1) letting the trolley crash into a large number of people, (2) redirecting the trolley onto a different track on which two people are standing, or (3) pushing the man.
As for an agent that aggregates preferences obtained using Q-learning, an algorithm that learns a policy telling an agent what action to take under what circumstances, it suffers from a phenomenon known as the illusion of control. Q-learning implicitly assumes that the action taken the policy will take in the next state will be whatever maximizes the reward, when in fact the preferred next state action might vary across different theories. For instance, in the trolley problem, the Q-learning agent often opts to lie to the guard without pushing the large man, because the agent mistakenly believes it will be able to push the man in the following step.
The researchers note the results of the study imply a range of possible algorithms that cover the tradeoffs among competing options in decision-making under moral uncertainty. The algorithm that’s most appropriate for a given domain might depend on particularities of the theories and the domain itself, they say. That’s why in future work, they plan to test algorithms for moral uncertainty (and machine ethics in general) in more complex domains and to create more complicated machine ethics benchmark tasks.
Beyond this Uber paper, Mobileye, Nvidia, DeepMind, and OpenAI have published work on safety constraints in reinforcement learning techniques. DeepMind recently investigated a method for reward modeling that operates in two phases and is applicable to environments in which agents don’t know where unsafe states might be. For its part, OpenAI released Safety Gym, a suite of tools for developing AI that respects safety constraints while training and compares the safety of algorithms and the extent to which those algorithms avoid mistakes while learning.