Abandoning Objectives: Evolution through the Search for Novelty Alone

Page created by Jill Carter
 
CONTINUE READING
Abandoning Objectives: Evolution through the
         Search for Novelty Alone

                Joel Lehman and Kenneth O. Stanley
             Evolutionary Complexity Research Group
       School of Electrical Engineering and Computer Science
                     University of Central Florida
                       Orlando, FL 32816 USA
                  {jlehman,kstanley}@eecs.ucf.edu

                                            In:
Evolutionary Computation journal, (19):2, pages 189-223, Cambridge, MA: MIT Press, 2011

                                        Abstract
       In evolutionary computation, the fitness function normally measures
   progress towards an objective in the search space, effectively acting as an objective
   function. Through deception, such objective functions may actually prevent
   the objective from being reached. While methods exist to mitigate deception,
   they leave the underlying pathology untreated: Objective functions themselves
   may actively misdirect search towards dead ends. This paper proposes an
   approach to circumventing deception that also yields a new perspective on
   open-ended evolution: Instead of either explicitly seeking an objective or
   modeling natural evolution to capture open-endedness, the idea is to simply
   search for behavioral novelty. Even in an objective-based problem, such novelty
   search ignores the objective. Because many points in the search space collapse
   to a single behavior, the search for novelty is often feasible. Furthermore,
   because there are only so many simple behaviors, the search for novelty leads
   to increasing complexity. By decoupling open-ended search from artificial life
   worlds, the search for novelty is applicable to real world problems. Coun-
   terintuitively, in the maze navigation and biped walking tasks in this paper,
   novelty search significantly outperforms objective-based search, suggesting the
   strange conclusion that some problems are best solved by methods that ignore
   the objective. The main lesson is the inherent limitation of the objective-based
   paradigm and the unexploited opportunity to guide search through other means.

   Keywords: Evolutionary algorithms, deception, novelty search, open-ended evo-
   lution, neuroevolution.

                                            1
1    Introduction
It is tempting to believe that measuring progress with respect to an objective,
which is the customary role of the fitness function in evolutionary computation
(EC; De Jong 2006; Fogel et al. 1966; Holland 1975; Mitchell 1997), lights a path
to the objective through the search space. Yet as results in this paper will remind
us, increasing fitness does not always reveal the best path through the search
space. In this sense, a sobering message of this paper is the inherent limitation of
objective-based search. Yet at the same time, this message is tempered by the caveat
that search need not always be guided by explicit objectives. In fact, a key insight of
this paper is that sometimes, opening up the search, more in the spirit of artificial life
than traditional optimization, can yield the surprising and paradoxical outcome that
the more open-ended approach more effectively solves the problem than explicitly
trying to solve it. There will be no silver bullet to be found in these results; rather,
the benefit is to widen our perspective on search, both to be more sober and at the
same time more open-minded about the potential breadth of our toolbox.
    The concept of the objective function, which rewards moving closer to the goal,
is ubiquitous in machine learning (Mitchell 1997). The overriding intuition behind
the idea of the objective function, which is widely accepted, is that the best way
to improve performance is to reward improving performance with respect to the
objective. In EC, this objective measure of performance is called the fitness function
(De Jong 2006; Holland 1975; Mitchell 1997), which is a metaphor for the pressure to
adapt in nature.
    Yet although they are pervasive, objective functions often suffer from the pathol-
ogy of local optima, i.e. dead ends in the search space with respect to increasing the
value of the objective function. In this way, landscapes induced by objective (e.g.
fitness) functions are often deceptive (Goldberg 1987; Liepins and Vose 1990; Pe-
likan et al. 2001). Although researchers are still working to characterize the various
reasons that search methods may fail to reach the objective (Davidor 1990; Gold-
berg 1987; Grefenstette 1992; Hordijk 1995; Jones and Forrest 1995; Mahfoud 1995;
Mitchell et al. 1992; Weinberger 1990), as a rule of thumb, the more ambitious the
goal, the more difficult it may be to articulate an appropriate objective function and
the more likely it is that search can be deceived by local optima (Ficici and Pollack
1998; Zaera et al. 1996). The problem is that the objective function does not necessar-
ily reward the stepping stones in the search space that ultimately lead to the objective.
For example, consider fingers stuck within a Chinese finger trap. While the goal is to
free one’s fingers, performing the most direct action of pulling them apart yields no
progress. Rather, the necessary precursor to solving the trap is to push one’s fingers
together, which seems to entrap them more severely. In this way, the trap is deceptive
because one must seemingly move farther from the goal to ever have the hope of
reaching it.
    Because of deception, ambitious objectives in EC are often carefully sculpted
through a curriculum of graded tasks, each chosen delicately to build upon the prior
(Elman 1991; Gomez and Miikkulainen 1997; Van de Panne and Lamouret 1995). Yet
such incremental evolution is difficult and ad hoc, requiring intimate domain knowl-
edge and careful oversight. Incremental evolution, along with other methods em-

                                            2
ployed in EC to deal with deception (Ficici and Pollack 1998; Knowles et al. 2001;
Mahfoud 1995; Pelikan et al. 2001; Stewart 2001), do not fix the underlying pathol-
ogy of local optima; if local optima are pervasive, search will still likely be deceived.
Thus, deception may be an unavoidable consequence of certain objective functions
irrespective of the underlying search algorithm. Paradoxically, in these cases pursu-
ing the objective may prevent the objective from being reached.
    In contrast to the focus on objective optimization in machine learning and EC,
researchers in artificial life often study systems without explicit objectives, such as
in open-ended evolution (Channon 2001; Maley 1999; Standish 2003). An ambitious
goal of this research is to reproduce the unbounded innovation of natural evolu-
tion. A typical approach is to create a complex artificial world in which there is
no final objective other than survival and replication (Adami et al. 2000; Channon
2001; Yaeger 1994). Such models follow the assumption that biologically-inspired
evolution can support an open-ended dynamic that leads to unbounded increasing
complexity (Bedau 1998; Channon 2001; Maley 1999).
    However, a growing yet controversial view in biology is that the drive towards
complexity in natural evolution is a passive force, i.e. not driven primarily by se-
lection (Gould 1996; Lynch 2007b; McShea 1991; Miconi 2007). In fact, in this view,
the path towards complexity in natural evolution can sometimes be inhibited by se-
lection pressure. If selection pressure is too high, then any deviation from a locally
optimal behavior will be filtered out by selection. Thus, in this view, instead of being
a byproduct of selection, perhaps the accumulation of complexity is better explained
by a different characteristic of natural evolution and open-ended evolutionary sys-
tems in general: They continually produce novel forms (Standish 2003).
    This perspective leads to a key idea in this paper that approaches the problems
in both EC and artificial life in a new way: Instead of modeling natural evolution
with the hope that novel individuals will be continually discovered, it is possible to
create an open-ended dynamic by simply searching directly for novelty. Thus this
paper offers a comprehensive introduction to the novelty search algorithm, which
was first described in Lehman and Stanley (2008). The main idea is to search with
no objective other than continually finding novel behaviors in the search space. By
defining novelty in this domain-independent way, novelty search can be applied to
real world problems, bridging the gap between open-ended evolution and the prac-
tical application of EC. Interestingly, because there are only so many ways to behave,
some of which must be more complex than others (Gould 1996), the passive force in
nature that leads to increasing complexity is potentially accelerated by searching for
behavioral novelty.
    Paradoxically, the search for novelty often evolves objectively superior behavior
to evolution that is actually driven by the objective, as demonstrated by experiments
in this paper in both a deceptive two-dimensional robot maze navigation task and a
challenging three-dimensional biped locomotion domain, as well as in experiments
in several other independent works (Lehman and Stanley 2010a; Mouret 2009; Risi
et al. 2009). Counterintuitively, in this paper, novelty search, which ignores the ob-
jective, evolves successful maze navigators that reach the objective in significantly
fewer evaluations than the objective-based method. This result is further investi-
gated under several different conditions, including expanding the dimensionality of

                                           3
the behavior space, and found to be robust. In the biped domain, novelty search
evolves controllers that walk significantly further than those evolved by the the
objective-based method. These results challenge the premise that the objective is
always the proper impetus for search.
    The conclusion is that by abstracting the process through which natural evolu-
tion discovers novelty, it is possible to derive an open-ended search algorithm that
operates without pressure towards the ultimate objective. Novelty search is immune
to the problems of deception and local optima inherent in objective optimization be-
cause it entirely ignores the objective, suggesting the counter-intuitive conclusion
that ignoring the objective in this way may often benefit the search for the objective.
While novelty search is not a panacea, the more salient point is that objective-based
search, which is ubiquitous in EC, clearly does not always work well. The implica-
tion is that while it seems natural to blame the search algorithm when search fails
to reach the objective, the problem may ultimately lie in the pursuit of the objective
itself.

2       Background
This section reviews deception in EC, complexity in natural evolution, open-
endedness in EC, and the neuroevolution method used in the experiments.

2.1     Deception in Evolutionary Computation
The study of deception is part of a larger study by EC researchers into what may
cause an evolutionary algorithm (EA) to fail and how to remedy such failures. For
the purpose of this paper, it is instructive to study the role of the objective (fitness)
function in such failures and remedies.

2.1.1    Deception and Problem Difficulty
The original definition of deception by Goldberg (1987) is based on the building
blocks hypothesis, in which small genetic building blocks may be integrated by
crossover to form larger blocks (Holland 1975). In the original conception, a prob-
lem is deceptive if lower-order building blocks, when combined, do not lead to a
global optimum. A variety of work has further refined this definition and investi-
gated performance on deceptive problems (Goldberg 1987; Liepins and Vose 1990;
Pelikan et al. 2001). Some researchers have argued that the importance of decep-
tion may be over-emphasized (Grefenstette 1992; Mitchell et al. 1992), while Whitley
(1991) concluded that the only challenging problems are those with some degree of
deception. Such disagreements are natural because no measure of problem difficulty
can be perfect; in general it is impossible to know the outcome of an algorithm on a
particular set of data without actually running it (Rice 1953). Hence, many different
metrics of EA problem hardness have been explored (Davidor 1990; Hordijk 1995;
Jones and Forrest 1995; Kauffman 1993; Manderick et al. 1991; Weinberger 1990).

                                           4
Some alternative measures of problem difficulty attempt to model or quantify
the ruggedness of the fitness landscape, motivated by the intuition that optimizing
more rugged landscapes is more difficult (Davidor 1990; Hordijk 1995; Kauffman
1993; Manderick et al. 1991; Weinberger 1990). Such approaches are often based on
the concepts of correlation, i.e. the degree to which the fitness of individuals are well
correlated to their neighbors in the search space, or epistasis, i.e. the degree of interac-
tion among genes’ effects. A low degree of correlation or a high degrees of epistasis
may indicate a rough landscape that may be difficult to optimize (Davidor 1990;
Hordijk 1995; Kauffman 1993; Manderick et al. 1991; Weinberger 1990). Importantly,
because the fitness landscape is induced by the objective function, the problem of
ruggedness, presupposing reasonable settings for the EA, can be attributed to the
objective function itself.
    Other researchers suggest that ruggedness is overemphasized and that neutral
fitness plateaus (neutral networks) are key influences on evolutionary dynamics
(Barnett 2001; Harvey and Thompson 1996; Stewart 2001), a view which has some
support in biology (Kimura 1983). However, neutral networks actually suggest a
deficiency in the objective function: In a neutral network the map defined by the ob-
jective function is ambiguous with respect to which way search should proceed.
    Another approach, by Jones and Forrest (1995), suggests that measuring the de-
gree to which the heuristic of fitness relates to the real distance to the goal is a good
measure of difficulty. This perspective perhaps most clearly demonstrates that prob-
lem difficultly may often result from an uninformative objective function: When the
heuristic has weak basis in reality there is little reason to expect search to perform
well. However, as demonstrated by the experiments in this paper, sometimes pur-
suing what appears to be a reasonable objective produces an unreasonable objective
function.
    While in general the exact properties of a problem that make it difficult for EAs
are still a subject of research, in this paper the term deception will refer to an intu-
itive definition of problem hardness: A deceptive problem is one in which a reason-
able EA will not reach the desired objective in a reasonable amount of time. That
is, by exploiting objective fitness in a deceptive problem, a population’s trajectory
is unlikely to uncover a path through the search space that ultimately leads to the
objective. It is important to note that this definition of deception is different from
the traditional definition (Goldberg 1987) and is not meant to trivialize the impact or
difficulty of choosing the correct representation (Rothlauf and Goldberg 2002), pa-
rameters (De Jong 2006), or search operators (Yao 1993), all of which affect the per-
formance of the EA and the structure of the fitness landscape. Rather, the intuitive
approach helps to isolate the general problem with particular objective functions be-
cause the word “deception” itself reflects a fault in the objective function: A deceptive
objective function will deceive search by actively pointing the wrong way.

2.1.2   Mitigating Deception
A common approach to preventing premature convergence to local optima in EC is
by employing a diversity maintenance technique (Goldberg and Richardson 1987;
Hornby 2006; Hu et al. 2005; Hutter and Legg 2006; Mahfoud 1995; Stanley and

                                             5
Miikkulainen 2002). Many of these methods are inspired by speciation or niching
in natural evolution, wherein competition may be mostly restricted to occur within
the same species or niche, instead of encompassing the entire population. For exam-
ple, fitness sharing (Goldberg and Richardson 1987) enforces competition between
similar solutions so that there is pressure to find solutions in distant parts of the
search space. Similarly, hierarchical fair competition (Hu et al. 2005) enforces com-
petition among individuals with similar fitness scores, and the age-layered popula-
tion structure (ALPS; Hornby 2006) approach enforces competition among genomes
of different genetic ages. The Fitness Uniform Selection Scheme (FUSS; Hutter and
Legg 2006) removes the direct pressure to increase fitness entirely: An individual is
not rewarded for higher fitness, but for a unique fitness value. This approach can be
viewed as a search for novel fitness scores, which is related to the approach in this pa-
per. Although these methods encourage exploration, if local optima are pervasive or
genotypic difference is not well-correlated with phenotypic/behavioral difference,
these methods may still be deceived.
    Other methods for avoiding deception tackle the problem of ruggedness by rear-
ranging the genome so that crossover respects genes whose effects are linked (Gold-
berg et al. 1989), or by building models of interactions among genes (Pelikan et al.
2001). When successful, the effect is to smooth a rugged fitness landscape by de-
riving additional information from an imperfect objective function. However, given
a sufficiently uninformative objective function, the advantage of such modeling is
impaired.
    Still other methods seek to accelerate search through neutral networks (Barnett
2001; Stewart 2001). While these methods may decrease the amount of meandering
that occurs in neutral networks, if an unlikely series of specific mutations is needed
to exit the neutral network then search may still be stalled for a long time.
    Some researchers incrementally evolve solutions by sequentially applying care-
fully crafted objective functions to avoid local optima (e.g. Elman 1991; Gomez and
Miikkulainen 1997; Van de Panne and Lamouret 1995). These efforts demonstrate
that to avoid deception it may be necessary to identify and analyze the stepping stones
that ultimately lead to the objective so a training program of multiple objective func-
tions and switching criteria can be engineered. However, for ambitious objectives
these stepping stones may be difficult or impossible to determine a priori. Addition-
ally, the requirement of such intimate domain knowledge conflicts with the aspira-
tion of machine learning.
    In addition to single-objective optimization, there also exist evolutionary meth-
ods that aim to optimize several objectives at once: Multi-Objective Evolutionary Al-
gorithms (MOEAs; Veldhuizen and Lamont 2000). These MOEAs are not immune to
the problem of deception (Deb 1999), and adding objectives does not always make
a problem easier (Brockhoff et al. 2007), but the idea is that perhaps deception is
less likely when optimizing multiple objectives because if a local optimum has been
reached in one objective then sometimes progress can be made with respect to an
alternate objective (Knowles et al. 2001). In this spirit, some researchers have experi-
mented with multi-objectivization, i.e. extending single-objective problems into multi-
objective problems to avoid deception (Knowles et al. 2001). Decomposing a single-
objective problem into a multi-objective problem can either make it easier or harder

                                           6
(Handl et al. 2008b), and it is necessary to verify that the single-objective optima are
multi-objective optima in the transformed multi-objective problem (Knowles et al.
2001). There have been several successful applications of multi-objectivization (De
Jong and Pollack 2003; Greiner et al. 2007; Handl et al. 2008a; Knowles et al. 2001),
but as in other reviewed methods, the fundamental pathology of deception remains
a concern.
    Yet another avenue of research in EC related to deception is coevolution. Coevo-
lutionary methods in EC attempt to overcome the limitations of a static fitness func-
tion by making interactions among individuals contribute towards fitness. The hope
is that such interaction will spark an evolutionary arms race that will continually cre-
ate a gradient for better performance (Cliff and Miller 1995). There have been several
impressive successes (Chellapilla and Fogel 1999; Hillis 1991; Sims 1994), but a com-
mon problem is that in practice such arm races may converge to mediocre stable-
states (Ficici and Pollack 1998; Pollack et al. 1996; Watson and Pollack 2001), cycle
among various behaviors without further progress (Cartlidge and Bullock 2004b;
Cliff and Miller 1995; Watson and Pollack 2001), or one species may so far out-adapt
another that they are evolutionarily disengaged (Cartlidge and Bullock 2004a,b; Wat-
son and Pollack 2001). The difficulty for practitioners in coevolution, much like the
difficulty of crafting an effective fitness function facing researchers in standard EC,
is to construct an environment that provides sustained learnability in which the gra-
dient of improvement is always present (Ficici and Pollack 1998).
    Finally, outside of EC there is also general interest in machine learning in avoid-
ing local optima. For example, simulated annealing probabilistically accepts delete-
rious changes with respect to the objective function (Kirkpatrick et al. 1983), and tabu
search avoids re-searching areas of the search space (Glover 1989). However, if local
optima are pervasive, then these methods too can fail to reach the global optimum.
    While methods that mitigate deception may work for some problems, ultimately
such methods do not address the underlying pathology: The gradient of the objec-
tive function may be misleading or uninformative. Instead, current methods that
deal with deception attempt to glean as much information as possible from an im-
perfect objective function or encourage exploration in the search space. Given a
sufficiently uninformative objective function, it is an open question whether any
method relying solely on the objective function will be effective. Thus an interest-
ing yet sobering conclusion is that some objectives may be unreachable in a reason-
able amount of time by direct objective-based search alone. Furthermore, as task
complexity increases it is more difficult to successfully craft an appropriate objec-
tive function (Ficici and Pollack 1998; Zaera et al. 1996). Thus, as experiments in EC
become more ambitious, deception may be a limiting factor for success.
    The next section discusses complexity in natural evolution, which, in contrast to
traditional EAs, is a search process without any final objective.

2.2   Complexity in Natural Evolution
Natural evolution fascinates practitioners of search because of its profuse creativity,
lack of volitional guidance, and perhaps above all its apparent drive towards com-
plexity.

                                           7
A subject of longstanding debate is the arrow of complexity (Bedau 1998; Miconi
2007), i.e. the idea that evolutionary lineages sometimes tend towards increasing
complexity. What about evolutionary search in nature causes complexity to increase?
This question is important because the most difficult problems in search, e.g. an
intelligent autonomous robot, may require discovering a prohibitive level of solution
complexity.
    The topic of complexity in natural evolution is much in contention across biol-
ogy, artificial life, and evolutionary computation (Arthur 1994; Gould 1996; Lynch
2007b; McShea 1991; Miconi 2007; Nehaniv and Rhodes 1999; Stanley and Miikku-
lainen 2003). One important question is whether there is a selective pressure towards
complexity in evolution. A potentially heretical view that is gaining attention is that
progress towards higher forms is not mainly a direct consequence of selection pres-
sure, but rather an inevitable passive byproduct of random perturbations (Gould
1996; Lynch 2007b; Miconi 2007). Researchers like Miconi (2007) in artificial life,
Sigmund (1993) in evolutionary game theory, and Gould (1996), McShea (1991), and
Lynch (2007a,b) in biology are arguing that natural selection does not always explain
increases in evolutionary complexity. In fact, some argue that to the extent that fit-
ness (i.e. in nature, the ability to survive and reproduce) determines the direction of
evolution, it can be deleterious to increasing complexity (Lynch 2007b; Miconi 2007;
Sigmund 1993). In other words, rather than laying a path towards the next major
innovation, fitness (like the objective function in machine learning) in effect prunes
that very path away.
    In particular, Miconi (2007) illustrates this point of view, noting that selection
pressure is in fact a restriction on both the scope and direction of search, allowing
exploration only in the neighborhood of individuals with high fitness. Similarly,
Sigmund (1993, p. 85) describes how high levels of selection pressure in nature can
oppose innovation because sometimes ultimately improving requires a series of dele-
terious intermediate steps. Gould (1996), a biologist, goes even further, and argues
that an increasing upper bound in complexity is not a product of selection, but sim-
ply the result of a drift in complexity space limited by a hard lower bound (the min-
imal complexity needed for a single cell to reproduce). Lynch (2007b), another bi-
ologist, argues that selection pressure in general does not explain innovation, and
that nonadaptive processes (i.e. processes not driven by selection) are often unde-
servedly ignored. The role of adaptation is also relegated in Kimura’s influential
neutral theory of molecular evolution (Kimura 1983). Finally, Huyen (1995) suggests
that neutral mutations in nature allow a nearly limitless indirect exploration of phe-
notype space, a process that this paper seeks to directly accelerate.
    These arguments lead to the main idea in this paper that abandoning the search
for the objective and instead simply searching explicitly for novel behaviors may
itself exemplify a powerful search algorithm.

2.3   Open-Ended Evolutionary Computation
The open-ended evolution community in artificial life aims to produce simulated
worlds that allow a similar degree of unconstrained exploration as Earth. While a
precise definition of open-ended evolution is not uniformly accepted (Maley 1999;

                                          8
Standish 2003), the general intuition is that it should continually create individuals
“of a greater complexity and diversity than the initial individuals of the system”
(Maley 1999). Tierra (Ray 1992), PolyWorld (Yaeger 1994) and Geb (Channon 2001)
are typical attempts to attain such a dynamic. In such systems, there is no objective
beyond that of survival and reproduction. The motivation behind this approach
is that as evolution explores an unbounded range of life forms, complexity will
inevitably increase (Channon 2001; Miconi 2007).
    Tierra (Ray 1992) is an open-ended evolutionary model in which self-replicating
digital programs compete for resources on a virtual computer. In several runs of the
system, cycles of parasitism and corresponding immunity arose, demonstrating a
basic coevolutionary arms race. However, Tierra and other similar systems (Adami
et al. 2000; Taylor and Hallam 1997) all inevitably struggle to continually produce
novelty (Channon and Damper 2000; Kampis and Gulyás 2008; Standish 2003).
    Polyword (Yaeger 1994) is also a simulation of an ecological system of compet-
ing agents, but gives these basic agents embodiment in a two-dimensional world.
The only goal for the agents is survival. Interestingly, emergent behaviors such as
flocking and foraging, although not directly specified by the system, result from the
interactions and evolution of agents. However, as in the Tierra-like systems, eventu-
ally innovation appears to slow (Channon and Damper 2000). Geb (Channon 2001),
which addresses criticisms of Polyworld, follows a similar philosophy.
    Bedau and Packard (1991) and Bedau et al. (1998) have contributed to formaliz-
ing the notion of unbounded open-ended dynamics by deriving a test that classifies
evolutionary systems into categories of open-endedness. Geb and others are distin-
guished by passing this test (Channon 2001; Maley 1999), but the results neverthe-
less do not appear to achieve the levels of diversity or complexity seen in natural
evolution. This apparent deficiency raises the question of what element is missing
from current models (Maley 1999; Standish 2003), with the common conclusion that
more detailed, lifelike domains must be constructed (Kampis and Gulyás 2008; Ma-
ley 1999; Yaeger 1994).
    However, this paper presents a more general approach to open-ended evolution
that is motivated well by the following insight from Standish (2003): “The issue of
open-ended evolution can be summed up by asking under what conditions will an evo-
lutionary system continue to produce novel forms.” Thus, instead of modeling nat-
ural selection, the idea in this paper is that it may be more efficient to search directly
for novel behaviors. It is important to acknowledge that this view of open-endedness
contrasts with the more commonly accepted notion of prolonged production of adap-
tive traits (Bedau and Packard 1991; Bedau et al. 1998). Nevertheless, the simpler
view of open-endedness merits consideration on the chance that a dynamic that ap-
pears adaptive might be possible to capture in spirit with a simpler process. Another
benefit of this approach is that it decouples the concept of open-ended search from
artificial life worlds, and can thus be applied to any domain, including real-world
tasks.
    The experiments in this paper combine this approach to open-ended evolution
with the NEAT method, which is explained next.

                                            9
2.4   NeuroEvolution of Augmenting Topologies (NEAT)
Because in this paper behaviors are evolved that are controlled by artificial neural
networks (ANNs), a neuroevolution (NE) method is required. The NEAT method
serves this purpose well because it is both widely applied (Aaltonen et al. 2009; Allen
and Faloutsos 2009; Stanley et al. 2005; Stanley and Miikkulainen 2002, 2004; White-
son and Stone 2006) and well understood. However, the aim is not to validate the
capabilities of NEAT yet again. In fact, in some of the experiments NEAT performs
poorly. Rather, the interesting insight is that the same algorithm can appear ineffec-
tive in an objective-based context yet excel when the search is open-ended. Thus, as
a common approach to NE, NEAT is a natural conduit to reaching this conclusion.
    The NEAT method was originally developed to evolve ANNs to solve difficult
control and sequential decision tasks (Stanley et al. 2005; Stanley and Miikkulainen
2002, 2004). Evolved ANNs control agents that select actions based on their sensory
inputs. Like the SAGA method (Harvey 1993) introduced before it, NEAT begins
evolution with a population of small, simple networks and complexifies the network
topology into diverse species over generations, leading to the potential for increas-
ingly sophisticated behavior; while complexifying the structure of an ANN does not
always increase the Kolmogorov complexity of the behavior of the ANN, it does in-
crease the upper-bound of possible behavioral complexity by adding free parame-
ters. Thus in NEAT simpler behaviors must be encountered before more complex
behaviors. A similar process of gradually adding new genes has been confirmed in
natural evolution (Martin 1999; Watson et al. 1987), and fits well with the idea of
open-ended evolution.
    However, a key feature distinguishing NEAT from prior work in complexifi-
cation is its unique approach to maintaining a healthy diversity of complexifying
structures simultaneously. Complete descriptions of the NEAT method, including
experiments confirming the contributions of its components, are available in Stanley
et al. (2005), Stanley and Miikkulainen (2002), and Stanley and Miikkulainen (2004).
Let us review the key ideas on which the basic NEAT method is based.
    To keep track of which gene is which while new genes are added, a historical mark-
ing is uniquely assigned to each new structural component. During crossover, genes
with the same historical markings are aligned, producing meaningful offspring ef-
ficiently. Speciation in NEAT protects new structural innovations by reducing com-
petition among differing structures, thereby giving newer, more complex structures
room to adjust. Networks are assigned to species based on the extent to which they
share historical markings. Complexification, which resembles how genes are added
over the course of natural evolution (Martin 1999), is thus supported by both his-
torical marking and speciation, allowing NEAT to establish high-level features early
in evolution and then later elaborate on them. In effect, then, NEAT searches for
a compact, appropriate network topology by incrementally complexifying existing
structure.
    It is important to note that a complexifying neuroevolutionary algorithm like
NEAT induces an approximate order over the complexity of behaviors discovered
during search from simple to complex. An important difference between an ANN
with five connections and one with five million connections is that the larger net-

                                          10
work, by virtue of having more free parameters, can exhibit more complex behav-
iors. Thus, as new nodes and connections are added over the course of evolution,
the potential complexity of the behaviors representable by the network increases.
Encountering simple behaviors first is significant because the most complex behav-
iors are often associated with irregularity and chaos. Thus a search heuristic that
encounters them last makes sense.
     In the experiments in this paper, NEAT is combined with novelty search, which
is is explained next.

3    The Search for Novelty
Recall that the problem identified with the objective fitness function in EC is that
it does not necessarily reward the intermediate stepping stones that lead to the ob-
jective. The more ambitious the objective, the harder it is to identify a priori these
stepping stones.
    The suggested approach is to identify novelty as a proxy for stepping stones; in-
stead of searching for a final objective, the learning method is rewarded for finding
any instance whose functionality is significantly different from what has been dis-
covered before. This idea is related to the concept of curiosity and seeking novelty
in reinforcement learning research and developmental robotics (Kaplan and Hafner
2006; Oudeyer et al. 2007, 2005; Schmidhuber 2003, 2006), though novelty search op-
erates on an evolutionary timescale and is motivated by open-ended evolution. In
novelty search, instead of measuring overall progress with a traditional objective
function, evolution employs a measure of behavioral novelty called a novelty metric.
In effect, a search guided by such a metric performs explicitly what natural evolution
does passively, i.e. gradually accumulating novel forms that ascend the complexity
ladder.
    For example, in a biped locomotion domain, initial attempts might simply fall
down. The novelty metric would reward simply falling down in a different way,
regardless of whether it is closer to the objective behavior or not. In contrast, an ob-
jective function may explicitly reward falling the farthest, which likely does not lead
to the ultimate objective of walking and thus exemplifies a deceptive local optimum.
In contrast, in the search for novelty, a set of instances are maintained that represent
the most novel discoveries. Further search then jumps off from these representative
behaviors. After a few ways to fall are discovered, the only way to be rewarded is
to find a behavior that does not fall right away. In this way, behavioral complexity
rises from the bottom up. Eventually, to do something new, the biped would have to
successfully walk for some distance even though it is not an objective.
    At first glance, this approach may seem naive. What confidence can we have that
a search process can solve a problem when the objective is not provided whatsoever?
Where is the pressure to adapt? Yet its appeal is that it rejects the misleading intuition
that objectives are an essential means to discovery. The idea that the objective may be
the enemy of progress is a bitter pill to swallow, yet if the proper stepping stones do
not lie conveniently along its gradient then it provides little more than false security.
    Still, what hope is there that novelty is any better when it contains no information

                                           11
about the direction of the solution? Is not the space of novel behaviors unboundedly
vast, creating the potential for endless meandering? One might compare novelty
search to exhaustive search: Of course a search that enumerates all possible solutions
will eventually find the solution, but at enormous computational cost.
    Yet there are good reasons to believe that novelty search is not like exhaustive
search, and that in fact the number of novel behaviors is reasonable and limited in
many practical domains. The main reason for optimism is that task domains on
their own provide sufficient constraints on the kinds of behaviors that can exist or
are meaningful, without the need for further constraint from an objective function.
    For example, a biped robot can only enact so many behaviors related to loco-
motion; the robot is limited in its motions by physics and by its own morphology.
Although the search space is effectively infinite if the evolutionary algorithm can
add new genes (like NEAT), the behavior space into which points in the search space
collapse is limited. For example, after an evaluation, a biped robot finishes at a spe-
cific location. If the robot’s behavior is characterized only by this ending location, all
of the many ways to encode a policy that arrives at a particular point will collapse
to the same behavior. In fact, the behavior space may often collapse into a manage-
able number of points, significantly differentiating novelty search from exhaustive
enumeration.
    Furthermore, novelty search can succeed where objective-based search fails by
rewarding the stepping stones. That is, anything that is genuinely different is re-
warded and promoted as a jumping-off point for further evolution. While we cannot
know which stepping stones are the right ones, if the primary pathology in objective-
based search is that it cannot detect the stepping stones at all, then that pathology is
remedied.
    A natural question about novelty search is whether it follows any principle be-
yond naively enumerating all possible behaviors. The answer is that while it does
attempt to find all possible behaviors over time, when combined with a complexify-
ing algorithm like NEAT, the order in which they are discovered is principled and not
random. To understand why, recall that NEAT evolves increasingly complex ANNs.
That way, the amount of nodes and connections and thus the maximal possible com-
plexity of behaviors discovered by novelty search increases over time, ensuring that
simple behaviors must be discovered before more complex behaviors. Furthermore,
this ordering from simple to complex is generally beneficial because of Occam’s ra-
zor. Thus there is an order in a complexifying search for novelty; it is just a different
one than in fitness-based search. While fitness-based search generally orders the
search from low to high fitness, a structurally-complexifying search for novelty gen-
erally orders it from low to high complexity, which is principled in a different way.
    The next section introduces the novelty search algorithm by replacing the objec-
tive function with the novelty metric and formalizing the concept of novelty itself.

4    The Novelty Search Algorithm
Evolutionary algorithms like NEAT are well-suited to novelty search because the
population of genomes that is central to such algorithms naturally covers a wide

                                           12
range of expanding behaviors. In fact, tracking novelty requires little change to any
evolutionary algorithm aside from replacing the fitness function with a novelty met-
ric.
     The novelty metric measures how unique an individual’s behavior is, creating a
constant pressure to do something new. The key idea is that instead of rewarding
performance based on an objective, novelty search rewards diverging from prior be-
haviors. Therefore, novelty needs to be measured. There are many potential ways to
measure novelty by analyzing and quantifying behaviors to characterize their dif-
ferences. Importantly, like the fitness function, this measure must be fitted to the
domain.
     The novelty of a newly generated individual is computed with respect to the be-
haviors of an archive of past individuals whose behaviors were highly novel when
they originated; because novelty is measured relative to other individuals in evo-
lution, it is driven by a coevolutionary dynamic. In addition, if the evolutionary
algorithm is steady state (i.e. one individual is replaced at a time) then the current
population can also supplement the archive by representing the most recently visited
points.
     The aim is to characterize how far away the new individual is from the rest of the
population and its predecessors in behavior space, i.e. the space of unique behaviors.
A good metric should thus compute the sparseness at any point in the behavior space.
Areas with denser clusters of visited points are less novel and therefore rewarded
less.
     A simple measure of sparseness at a point is the average distance to the k-nearest
neighbors of that point, where k is a fixed parameter that is determined experimen-
tally. If the average distance to a given point’s nearest neighbors is large then it is in
a sparse area; it is in a dense region if the average distance is small. The sparseness
ρ at point x is given by
                                         k
                                      1X
                             ρ(x) =         dist(x, µi ),                              (1)
                                      k i=0

where µi is the ith-nearest neighbor of x with respect to the distance metric dist,
which is a domain-dependent measure of behavioral difference between two indi-
viduals in the search space. The nearest neighbors calculation must take into con-
sideration individuals from the current population and from the permanent archive
of novel individuals. Candidates from more sparse regions of this behavioral search
space then receive higher novelty scores. It is important to note that this behavior
space cannot be explored purposefully; it is not known a priori how to enter areas of
low density just as it is not known a priori how to construct a solution close to the
objective. Therefore, moving through the space of novel behaviors requires explo-
ration.
    If novelty is sufficiently high at the location of a new individual, i.e. above some
minimal threshold ρmin , then the individual is entered into the permanent archive
that characterizes the distribution of prior solutions in behavior space, similarly to
archive-based approaches in coevolution (De Jong 2004). The current generation
plus the archive give a comprehensive sample of where the search has been and

                                            13
where it currently is; that way, by attempting to maximize the novelty metric, the
gradient of search is simply towards what is new, with no explicit objective. Al-
though novelty search does not directly seek to reach the objective, to know when
to the stop search it is necessary to check whether each individual meets the goal
criteria.
    It is important to note that novelty search resembles prior diversity maintenance
techniques (i.e. speciation) popular in EC (Goldberg and Richardson 1987; Hornby
2006; Hu et al. 2005; Mahfoud 1995). These also in effect open up the search by re-
ducing selection pressure, but the search is still ultimately guided by the fitness func-
tion. In contrast, novelty search takes the radical step of only rewarding behavioral
diversity with no concept of fitness or a final objective, inoculating it to traditional
deception.
    It is also important to note that novelty search is not a random walk; rather, it
explicitly maximizes novelty. Because novelty search includes an archive that ac-
cumulates a record of where search has been, backtracking, which can happen in a
random walk, is mitigated in behavioral spaces of any dimensionality.
    The novelty search approach in general allows any behavior characterization and
any novelty metric. Although generally applicable, novelty search is best suited to
domains with deceptive fitness landscapes, intuitive behavioral characterizations,
and domain constraints on possible expressible behaviors.
    Once objective-based fitness is replaced with novelty, the NEAT algorithm oper-
ates as normal, selecting the highest-scoring individuals to reproduce. Over genera-
tions, the population spreads out across the space of possible behaviors, continually
ascending to new levels of complexity (i.e. by expanding the neural networks in
NEAT) to create novel behaviors as the simpler variants are exhausted. The power
of this process is demonstrated in this paper through two experiments with results
that conflict with common intuitions, which are described next.

5    Maze Experiment
An interesting domain for testing novelty search would have a deceptive fitness
landscape. In such a domain, the search algorithm following the fitness gradient may
perform worse than an algorithm following novelty gradients because novelty can-
not be deceived with respect to the objective; it ignores objective fitness entirely. A
compelling, easily-visualized domain with this property is a two-dimensional maze
navigation task. A reasonable fitness function for such a domain is how close the
maze navigator is to the goal at the end of the evaluation. Thus, dead ends that
lead close to the goal are local optima to which an objective-based algorithm may
converge, which makes a good model for deceptive problems in general.
   The maze domain works as follows. A robot controlled by an ANN must navi-
gate from a starting point to an end point in a fixed time. The task is complicated
by cul-de-sacs that prevent a direct route and that create local optima in the fitness
landscape. The robot (figure 1) has six rangefinders that indicate the distance to the
nearest obstacle and four pie-slice radar sensors that fire when the goal is within the
pie-slice. The robot’s two effectors result in forces that respectively turn and propel

                                           14
Left/Right   Forward/Back

                  Evolved Topology

               Rangefinder         Radar      Bias
                 Sensors          Sensors

                  (a) Neural Network                           (b) Sensors

Figure 1: Maze Navigating Robot. The artificial neural network that controls the maze
navigating robot is shown in (a). The layout of the sensors is shown in (b). Each arrow outside
of the robot’s body in (b) is a rangefinder sensor that indicates the distance to the closest
obstacle in that direction. The robot has four pie-slice sensors that act as a compass towards
the goal, activating when a line from the goal to the center of the robot falls within the pie-slice.
The solid arrow indicates the robot’s heading.

                        (a) Medium Map                      (b) Hard Map

Figure 2: Maze Navigation Maps.         In both maps, the large circle represents the starting
position of the robot and the small circle represents the goal. Cul-de-sacs in both maps that
lead toward the goal create the potential for deception.

the robot. This setup is similar to the successful maze navigating robots in NERO
(Stanley et al. 2005).
    Two maps are designed to compare the performance of NEAT with fitness-based
search and NEAT with novelty search. The first (figure 2a) has deceptive dead ends
that lead the robot close to the goal. To achieve a higher fitness than the local op-
timum provided by a dead end, the robot must travel part of the way through a
more difficult path that requires a weaving motion. The second maze (figure 2b)
provides a more deceptive fitness landscape that requires the search algorithm to ex-
plore areas of significantly lower fitness before finding the global optimum (which
is a network that reaches the goal). Note that this task is not pathfinding. Rather,
NEAT is searching for an ANN that itself can navigate the maze.
    Fitness-based NEAT, which will be compared to novelty search, requires a fitness
function to reward maze-navigating robots. Because the objective is to reach the
goal, the fitness f is defined as the distance from the robot to the goal at the end
of an evaluation: f = bf − dg , where bf is a constant bias and dg is the distance
from the robot to the goal. Given a maze with no deceptive obstacles, this fitness

                                                     15
function defines a monotonic gradient for search to follow. The constant bf ensures
all individuals will have positive fitness.
     NEAT with novelty search, on the other hand, requires a novelty metric to distin-
guish between maze-navigating robots. Defining the novelty metric requires careful
consideration because it biases the search in a fundamentally different way than the
fitness function. The novelty metric determines the behavior-space through which
search will proceed. It is important that the types of behaviors that one hopes to
distinguish are recognized by the metric.
     Thus, for the maze domain, the behavior of a navigator is defined as its ending
position. The novelty metric is then the squared Euclidean distance between the
ending positions of two individuals. For example, two robots stuck in the same
corner appear similar, while a robot that simply sits at the start position looks very
different from one that reaches the goal, though they are both equally viable to the
novelty metric.
     This novelty metric rewards the robot for ending in a place where none have
ended before; the method of traversal is ignored. This measure reflects that what
is important is reaching a certain location (i.e. the goal) rather than the method of
locomotion. Thus, although the novelty metric has no knowledge of the final goal, a
solution that reaches the goal can appear novel. In addition, the comparison between
fitness-based and novelty-based search is fair because both scores are computed only
based on the distance of the final position of the robot from other points. Further-
more, NEAT is given exactly the same settings in both (Appendix A), so the only
difference is the reward scheme.
     Finally, to confirm that novelty search is indeed not anything like random search,
NEAT is also tested with a random fitness assigned to every individual regardless
of performance, which means that selection is random. If the maze is solved, the
number of evaluations is recorded.

6    Maze Results
On both maps, a robot that finishes within five units of the goal counts as a solu-
tion. On the medium map, both fitness-based NEAT and NEAT with novelty search
were able to evolve solutions in every run (figure 3a). Novelty search took on aver-
age 18, 274 evaluations (sd = 20, 447) to reach a solution, while fitness-based NEAT
was three times slower, taking 56, 334 evaluations (sd = 48, 705), averaged over 40
runs. This difference is significant (p < 0.001; Student’s t-test). NEAT with random
selection performed much worse than the other two methods, finding successful
navigators in only 21 out of 40 runs, which confirms the difference between novelty
search and random search.
    Interestingly, the average genomic complexity of solutions evolved by fitness-
based NEAT for the medium map (66.74 connections, sd = 56.7) was almost three
times greater (p < 0.05; Student’s t-test) than those evolved by NEAT with novelty
search (24.6 connections, sd = 4.59), even though both share the same parameters.
    On the hard map, fitness-based NEAT was only successful in three out of 40 runs,
while NEAT with random selection fared marginally better, succeeding in four out

                                          16
(a) Medium Map                                  (b) Hard Map

Figure 3: Comparing Novelty Search to Fitness-based Search. The change in fitness over
time (i.e. number of evaluations) is shown for NEAT with novelty search, fitness-based NEAT,
and NEAT with random selection on the medium (a) and hard (b) maps, both averaged over
40 runs of each approach. The horizontal line indicates at what fitness the maze is solved.
The main result is that novelty search is significantly more effective. Only the first 75,000
evaluations (out of 250,000) are shown because the dynamics remain stable after that point.

of 40 runs, showing that deception in this map renders the gradient of fitness no
more helpful than random search. However, novelty search was able to solve the
same map in 39 out of 40 runs, in 35, 109 evaluations (sd = 30, 236) on average when
successful, using 33.46 connections on average (sd = 9.26). Figure 3b shows this
more dramatic divergence. Remarkably, because the second maze is so deceptive,
the same NEAT algorithm can almost never solve it when solving the maze is made
the explicit objective, yet solves it almost every time when finding novel behavior is
the objective.

6.1   Typical Behavior
Figure 4 depicts behaviors (represented as the final point visited by an individ-
ual) discovered during typical runs of NEAT with novelty search and fitness-based
NEAT on each map. Novelty search exhibits a more even distribution of points
throughout both mazes. Fitness-based NEAT shows areas of density around local
optima in the maze. The typical behavior of a successful robot on either maze was
to directly traverse the maze for both methods.
    The results so far raise a number of important questions and concerns, which the
remainder of this section addresses.

6.2   Bounding the Size of the Archive in the Maze Domain
A possible concern about the computational effort required to search for novelty is
that the archive of past behaviors may grow without limit as the search progresses.
As the size of the archive grows, the nearest-neighbor calculations that determine
the novelty scores for individuals become more computationally demanding. Al-
though in most complex domains the evaluation of individuals will likely be the

                                             17
(a) Medium Map Novelty              (b) Hard Map Novelty

                (c) Medium Map Fitness             (d) Hard Map Fitness

Figure 4: Final Points Visited Over Typical Runs. Each maze depicts a typical run, stopping
at either 250,000 evaluations or when a solution is found. Each point represents the end loca-
tion of a robot evaluated during the run. Novelty search is more evenly distributed because it
is not deceived.

computational bottleneck, it is true that the nearest-neighbor calculation increases
the amount of computation beyond that required for a simple objective-based search
method. Thus, it is interesting to consider ways in which the archive’s size may be
limited.
    A simple approach is to change the structure of the archive from an ever-
expanding list to a queue of limited size. When a new individual is added to the
archive once the archive has reached its full size, the earliest entry is overwritten
by the new individual instead of the new individual being appended to the end
of the list. While this approach may lead to some backtracking through behavior
space (which the archive is designed to avoid), the amount of backtracking may be
limited because the dropped behaviors may no longer be reachable from the current
population.
    To explore the effects of limiting the archive size, 40 additional runs of novelty
search were conducted in the hard maze with the size of the archive limited to the
same size as the population (250). The hard maze was solved in all 40 runs, in 38, 324
evaluations (sd = 42, 229) on average, which is not significantly different from the
original results of novelty search on the hard maze without a bounded archive. This
result demonstrates that in some domains it is possible to limit the archive size, and
thus the additional computational effort, without significantly decreasing the per-
formance of novelty search.

6.3   Removing Walls in the Maze Domain
An important question for novelty search is whether it will suffer in relatively uncon-
strained domains. Domain constraints may provide pressure for individuals within a

                                             18
Figure 5: Unenclosed Map. This unenclosed version of the hard map enlarges the behavior
space because the robot can travel outside the maze without restriction.

search for novelty to encode useful information about the domain because exploiting
such information may be the easiest way for evolution to maximize novelty.
    Both maps in figure 2 are enclosed; that is, the space of possible behaviors is
constrained by the outer walls of the map. It is interesting to consider an unenclosed
map, in which the behavior space has no such limits. Figure 5 shows a variant of the
hard map that has two walls removed; a navigator in such a map could travel into
the vast empty area outside the map.
    In such a map, the search for novelty may not efficiently explore the space of
interesting behaviors that in some way relate to maze navigation. That is, it may be
trivial to create variations of policies that appear novel by ending in significantly
different areas outside the maze, yet encode no useful information about how to
navigate within the maze. While in the constrained maze learning about how to
avoid walls may allow individuals to reach previously unexplored areas of the maze,
in the unbounded maze such learning may appear no more novel than a behavior
that wonders off in a slightly different direction than previous robots.
    To explore this issue, 100 runs of fitness-based NEAT and NEAT with novelty
search were attempted in the unenclosed map (figure 5). Interestingly, in this map,
novelty search solved the maze only five times out of 100, which is not significantly
better than fitness-based NEAT, which solved the maze two times out of 100. This
result confirms the hypothesis that constraining the space of possible behaviors is
important in some domains for novelty search to be efficient. However, fitness fares
no better, highlighting that fitness-based search is not necessarily a viable alternative
even when novelty search is not not effective.

6.4   Conflation in the Maze Domain
Because the behavioral characterization in the maze experiment earlier consisted
only of the location of the robot at the end of an evaluation, all navigators that end at
the same point are conflated as equivalent even if they take entirely different routes.
Thus they appear identical to the novelty search algorithm. Intuitively, a natural as-
sumption is that in the maze domain, this conflation makes the search for novelty
more efficient: By conflating individuals that end at the same location, the behavior
space is greatly reduced and many possible uninteresting meandering behaviors that
end at the same location are not rewarded for being different.
   An important implication of this assumption is that lengthening the behavioral
characterization might render the search for novelty intractable because the number

                                             19
You can also read