How Learning Can Guide Evolution
Geoffrey E.
Hinton & Steven J. Nowlan
Originally published in 1987 in Complex Systems, 1, 495-502.
Reprinted by permission.
Abstract. The assumption that acquired characteristics are not inherited is often
taken to imply that the adaptations that an organism learns during its lifetime
cannot guide the course of evolution. This inference is incorrect (
1. INTRODUCTION
Many organisms learn useful adaptations
during their lifetime. These adaptations are often the result of an
exploratory search which tries out many possibilities in order to discover good
solutions. It seems very wasteful not to make use of the exploration performed
by the phenotype to facilitate the evolutionary search for good genotypes. The
obvious way to achieve this is to transfer information about the acquired
characteristics back to the genotype. Most biologists now accept that the Lamarckian
hypothesis is not substantiated; some then infer that learning cannot guide the
evolutionary search. We use a simple combinatorial argument to show that this
inference is incorrect and that learning can be very effective in guiding the search,
even when the specific adaptations that are learned are not communicated to the
genotype. In difficult evolutionary searches which require many possibilities to
be tested in order to discover a complex co-adaptation, we demonstrate that
each learning trial can be almost as helpful to the evolutionary search as the
production and evaluation of a whole new organism. This greatly increases the
efficiency of evolution because a learning trial is much faster and requires
much less expenditure of energy than the production of a whole organism.
Learning can provide an easy evolutionary
path towards co-adapted alleles in environments that have no good evolutionary
path for non-learning organisms. This type of interaction between learning and
evolution was first proposed by Baldwin (1896) and Lloyd Morgan (1896) and is
sometimes called the
2. AN EXTREME
AND SIMPLE EXAMPLE
Baldwinism is best understood by considering an extreme (and unrealistic) case
in which the combinatorics are very clear. Imagine an
organism that contains a neural net in which there are many potential
connections. Suppose that the net only confers added reproductive fitness on
the organism if it is connected in exactly the right way. In this worst case,
there is no reasonable evolutionary path toward the good net and a pure
evolutionary search can only discover which of the potential connections should
be present by trying possibilities
at random. The good net is like a needle in a haystack.
The evolutionary search space becomes
much better if the genotype specifies some of the decisions about where to put
connections, but leaves other decisions to learning. This has the effect of
constructing a large zone of increased fitness around the good net. Whenever
the genetically specified decisions are correct, the genotype falls within this
zone and will have increased fitness because learning will stand a chance of
discovering how to make the remaining decisions so as to produce the good net.
This makes the evolutionary search much easier. It is like searching for a
needle in a haystack when someone tells you when you are getting close. The central
point of the argument is that the person who tells you that you are getting close
does not need to tell you anything more.
3. A
SIMULATION
We have simulated a simple example of
this kind of interaction between learning and evolution. The neural net has 20
potential connections, and the genotype has 20 genes[1], each of which has three alternative forms (alleles) called
1, 0, and ?. The 1 allele specifies that a connection should be present, 0
specifies that it should be absent, and ? specifies a connection containing a switch which can be open
or closed. It is left to learning to decide how the switches should be set. We
assume, for simplicity, a learning mechanism that simply tries a random
combination of switch settings on every trial. If the combination of the switch
settings and the genetically specified decisions ever produce the one good net
we assume that the switch settings are frozen. Otherwise they keep changing.[2]
The evolutionary search is modeled with a
version of the genetic algorithm proposed by
The same problem was never solved by an evolutionary search without learning. This was not a surprising result; the problem was selected to be extremely difficult for an evolutionary search, which relies on the exploitation of small co-adapted sets of alleles to provide a better than random search of the space. The spike of fitness in our example (Figure 1) means that the only co-adaptation that confers improved fitness requires simultaneous co-adaptation of all 20 genes. Even if this co-adaptation is discovered, it is not easily passed to descendants. If an adapted individual mates with any individual other than one nearly identical to itself, the co-adaptation will probably be destroyed. The crux of the problem is that only the one good genotype is distinguished, and fitness is the only criterion for mate selection. To preserve the co-adaptation from generation to generation it is necessary for each good genotype, on average, to give rise to at least one good descendant in the next generation. If the dispersal of complex co-adaptations due to mating causes each good genotype to have less than one expected good descendant in the next generation, the co-adaptation will not spread, even if it is discovered many times. In our example, the expected number of good immediate descendants of a good genotype is below 1 without learning and above 1 with learning.
4. DISCUSSION
The most common argument in favor of
learning is that some aspects of the environment are unpredictable, so it is
positively advantageous to leave some decisions to learning rather than
specifying them genetically (e.g. Harley, 1981). This argument is clearly correct and is one good
reason for having a learning mechanism, but it is different from the
To keep the argument simple, we started
by assuming that learning was simply a random search through possible switch
settings. When there is a single good combination and all other combinations
are equally bad a random search is a reasonable strategy, but for most learning
tasks there is more structure than this and the learning process should make
use of the structure to home in on good switch configurations. More
sophisticated learning procedures could be used in these cases (e.g. Rumelhart, Hinton, and Williams, 1986). Indeed, using a hillclimbing procedure as an inner loop to guide a genetic
search can be very effective (Brady, 1985). As
For simplicity, we assumed that the
learning operates on exactly the same variables as the genetic search. This is
not necessary for the argument. Each gene could influence the probabilities of
large numbers of potential connections and the learning would still improve the
evolutionary path for the
genetic search. In this more general case, any Lamarckian attempt to inherit
acquired characteristics would run into a severe computational difficulty: To know how to change the genotype in order
to generate the acquired characteristics of the phenotype it is necessary to invert
the forward function that maps from genotypes, via the processes of development
and learning, to adapted phenotypes. This is generally a very complicated, non-linear,
stochastic function and so it is very hard to compute how to change the genes
to achieve desired changes in the phenotypes even when these desired changes are
known.
We have focused on the interaction between
evolution and learning, but the same combinatorial argument can be applied to
the interaction between evolution and development. Instead of directly
specifying the phenotype, the genes could specify the ingredients of an
adaptive process and leave it to this process to achieve the required end
result. An interesting model of this kind of adaptive process is described by
Von der Malsburg and Willshaw (1977). Waddington (1942) suggested this type of
mechanism to account for the inheritance of acquired characteristics within a
Darwinian framework. There is selective pressure for genes which facilitate the
development of certain useful characteristics in response to the environment. In
the limit, the developmental process becomes canalized: The same
characteristic will tend to develop regardless of the environmental factors
that originally controlled it. Environmental control of the process is
supplanted by internal genetic control. Thus, we have a mechanism which as
evolution progresses allows some aspects of the phenotype that were initially
specified indirectly via an adaptive process to become more directly specified.
Our simulation supports the arguments of
Baldwin and Waddington, and demonstrates that adaptive processes within the
organism can be very effective in guiding evolution. The main limitation of the
ACKNOWLEDGMENTS
This research was supported by grant
IST-8520359 from the National Science Foundation and by contract
N00014-86-K-00167 from the Office of Naval Research. We thank David Ackley,
Francis Crick, Graeme Mitchison, John Maynard-Smith,
David Willshaw, and Rosalind Zalin
for helpful discussions.
REFERENCES
Ackley, D. H. "Stochastic Iterated
Genetic Hillclimbing." Doctoral dissertation.
Baldwin, J. M. "A New Factor in Evolution." American Naturalist 30 (1896): 441-451.
Brady, R. M. "Optimization Strategies
Gleaned From Biological Evolution. Nature, 317 (1985): 804-806.
Harley, C. B. "Learning the
Evolutionary Stable Strategy." J Theoretical Biology 8 (1981): 611-633.
Lloyd Morgan, C. "On Modification and Variation." Science
4 (1896): 733-740.
Rumelhart, D. E., G. E. Hinton, and R. J. Williams. "Learning
Representations by Back-Propagating Errors" Nature 323 (1986): 533-536.
Von der
Malsburg, C., and D. J. Willshaw. "How to Label Nerve Cells so that They Can
Interconnect in an Ordered Fashion. Proc. Natl. Acad. Sci. U.S.A. 74
(1977): 5176-5178.
Waddington, C. H. "Canalization of
Development and the Inheritance of Acquired Characters. Nature 150 (1942): 563-565.
Footnotes
[1] We assume, for
simplicity, that each potential connection is controlled by its own gene.
Naturally, we do not believe that the relationship between genes and
connections is so direct.
[2] This implicitly
assumes that the organism can "recognize" when it has achieved the
good net. This recognition ability (or an ability to tell when the switch
settings have been improved) is required to make learning effective and so it
must precede the
[3] One good reason for
believing the search space must be nicely structured is that evolution works. But
this does not show that the search space would be nicely structured in the
absence of adaptive processes.