Read an Excerpt
3: Genetic Algorithms
Natural evolution implies that organisms adapt to their environment. Evolution works over many generations, covering much longer periods than those of lifetime learning. How could an individual learn to use its eyes if it hadn't been equipped with eyes through evolution? An organism without any organ of sight might not be able to react to visual stimuli, but it could be the ancestor of a species with eyelike organs. Therefore, evolution can be considered as a process of metalearning on a generational level. Only evolutionary adaptations and innovations enable organisms to "optimally" react to environmental conditions. This involves an impressive potential for creativity and innovation. How else could such complex and useful organs as eyes be developed? John H. Holland, the "inventor" of genetic algorithms, talks about a perpetual novelty driven by evolution, which we want to use for evolution-based development of computer programs.
The first models of adaptation processes on the basis of genetic algorithms were introduced in the seminal book Adaptation in Natural and Artificial Systems, by John H. Holland (Holland 1975; Holland 1992a). In this book the term adaptation, the ability to adapt, plays a central role and is defined as a process of progressive modification of (general) structures, which leads to improved performance of the system interacting with its environment (see Section 2.2).
The term genetic algorithm (GA) describes the basic idea: algorithmic techniques, inspired by genetic principles, are used to simulate evolutionary processes. The objective is to design and implement robust, adaptive systems, following nature's paradigm for the evolution of genetic structures. In particular, genetic algorithms glean ideas from natural mechanisms of reproduction, especially among cells of sexually reproducing individuals. Four essential principles manifested in nature form the core ingredients of genetic algorithms: (1) the dualism principle of separating genetic information of the genotype from the expressed phenotype, (2) a discrete encoding of genotypical structures, and (3) recombination effects resulting from sexual reproduction. Finally, (4) elementary building blocks, which are combined according to specific templates, help in the composition of complex interacting systems, representing a core precondition of modular design and construction.
Dualism: In biological systems, the genetic information encoded in DNA is used in two ways-as genetic information, which is replicated, and as instructions, which have to be executed. This dualistic principle of information processing has already been pointed out by John von Neumann, a computer pioneer and visionary researcher, particularly in the area of selfreproducing cellular automata (Neumann and Burks 1966; see also Levy 1993a, p. 44, and Section 9.1.2). In genetic algorithms, the genotypical structures, modified by an evolutionary algorithm through recombinations and mutations, are clearly separated from the phenotypical structures, the computer programs or parameter vectors (Figure 3.1). The evolution-based variations are not performed in the space of the parameters to be optimized (as is the case, for example, with evolution strategies; see Chapter 4) but in an encoding, genotypical structure space. Hence, the principal separation of phenotype and genotype in nature is explicitly implemented by genetic algorithms.
Discrete encoding: In DNA strands, genetic information is encoded by a four-letter alphabet. For GA structures an analogous representation is used. Usually this is a (binary) bit string representation, but any other encoding scheme over a discrete, finite alphabet may be chosen as well (Section 3.1).
Sexual reproduction: The diploid cells of sexually reproducing individuals consist of a double set of homologous chromosomes, half each from the offspring's mother and father. During the cell division phase meiosis, parts of the single, paired chromosomes are mutually exchanged by crossover.
Genetic algorithms algorithmically replicate these recombination effects and integrate them into evolutionary simulation models and optimization algorithms. Successful application of genetic algorithms in the sense of a progressive creativity in essence depends on the potential of recombination, the mixing and recomposition of elementary structural building blocks. According to Holland, by reorganizing available information one of the great advantages of evolution is utilized, namely, "its perpetual novelty in its approaches to maintaining fitness " (Levy 1993a, p. 169).
Unlike recombinations, point mutations do not play a prominent role in genetic algorithms, in contrast to the evolution strategic view (Chapter 4).
Elementary building blocks: Complex adaptive systems, such as the interactions on genome structures, are hierarchically composed of simple elementary units. Only through the interplay of genes, the building blocks of DNA, are chains of amino acids and proteins composed during the transcription and translation phases. To translate the triplets into the corresponding amino acids, building blocks of RNA, nucleotide bases are needed. The proteins produced, the proteome, constitute the "atomic units of a cell." This modular organization principle plays a decisive role in the encoding structures of genetic algorithms as well. The genotypical GA structures-an analogy to computer programs used in Turing machine tables, in assembler code, and in C, LISP, or Mathematica-are also built from elementary building blocks: short binary strings, command sequences, variables, constants, or symbolic expressions.
Genetic algorithms have a surprisingly broad range of application domains, from solving technical optimization problems (Goldberg 1989), mathematical function optimization (De Jong 1975), simulated evolution of social behavior of ants (Jefferson, Collins, et al. 1992), or artificial, interacting organisms (Ray 1991) to the evolutionary "breeding" of computer programs (Koza 1992; Koza 1994; Koza, Andre, et al. 1999; Banzhaf, Nordin, et al. 1997). The latter paradigm is known as genetic programming, which we discuss in detail in Part II of this book (Chapter 7). An extensive list of further GA application areas can be found in, among others, Goldberg 1989, p. 126 ff. or Mitchell 1996.
At the end of the 1980s, genetic algorithms received worldwide attention through promotion by David E. Goldberg (Goldberg 1989) and the development of a number of practical GA applications. Since then, the representation of adaptive GA structures seems to be mainly restricted to binary strings. Today binary encoding is still considered one of the key features of genetic algorithms. Binary alphabets are preferred because a two-value representation maximizes the number of "schemata" with respect to a given encoding (Goldberg 1986). We will examine this aspect in the context of the controversial schema theorem in Section 3.8.
In the following sections we show that it is not necessary to restrict ourselves to binary GA chromosomes. Instead we introduce a number of representation schemes for GA chromosomes over arbitrary, finite, and discrete alphabets (Section 3.1). The set of variants covers haploid single-stranded chromosomes over binary strings, triplet chromosomes for simulating RNA sequences over a four-letter alphabet, and GA chromosomes with real-value alleles (Section 3.1.1). We also look at diploid chromosome strands and how to deal with dominant and recessive alleles (Section 3.1.2).
Various effects of point mutations on different variants of GA chromosomes (haploid, diploid, varying discrete allele alphabets) are described in Section 3.2.
In Section 3.3, we discuss recombination operators, developed from simple combination operators on lists into meiotic recombinations on diploid chromosomes.
Mutations on groups of chromosome subunits in the sense of biological inversions, translocations, and crossovers among nonhomologous chromosomes are introduced in Section 3.4...