Experimenting with Genetic Algorithms

Genetic algorithm trying to find a series of four mathematical operations (e.g. -3*4/7+9) that would result in the number 42.
Genetic algorithm trying to find a series of four mathematical operations (e.g. -3*4/7+9) that would result in the number 42.

I’m teaching a numerical methods class that’s partly an introduction to programming, and partly a survey of numerical solutions to different types of problems students might encounter in the wild. I thought I’d look into doing a session on genetic algorithms, which are an important precursor to things like networks that have been found to be useful in a wide variety of fields including image and character recognition, stock market prediction and medical diagnostics.

The ai-junkie, bare-essentials page on genetic algorithms seemed a reasonable place to start. The site is definitely readable and I was able to put together a code to try to solve its example problem: to figure out what series of four mathematical operations using only single digits (e.g. +5*3/2-7) would give target number (42 in this example).

The procedure is as follows:

  • Initialize: Generate several random sets of four operations,
  • Test for fitness: Check which ones come closest to the target number,
  • Select: Select the two best options (which is not quite what the ai-junkie says to do, but it worked better for me),
  • Mate: Combine the two best options semi-randomly (i.e. exchange some percentage of the operations) to produce a new set of operations
  • Mutate: swap out some small percentage of the operations randomly,
  • Repeat: Go back to the second step (and repeat until you hit the target).

And this is the code I came up with:

genetic_algorithm2.py

''' Write a program to combine the sequence of numbers 0123456789 and
    the operators */+- to get the target value (42 (as an integer))
'''

'''
Procedure:
    1. Randomly generate a few sequences (ns=10) where each sequence is 8
       charaters long (ng=8).
    2. Check how close the sequence's value is to the target value.
        The closer the sequence the higher the weight it will get so use:
            w = 1/(value - target)
    3. Chose two of the sequences in a way that gives preference to higher
       weights.
    4. Randomly combine the successful sequences to create new sequences (ns=10)
    5. Repeat until target is achieved.

'''
from visual import *
from visual.graph import *
from random import *
import operator

# MODEL PARAMETERS
ns = 100
target_val = 42 #the value the program is trying to achieve
sequence_length = 4  # the number of operators in the sequence
crossover_rate = 0.3
mutation_rate = 0.1
max_itterations = 400


class operation:
    def __init__(self, operator = None, number = None, nmin = 0, nmax = 9, type="int"):
        if operator == None:
            n = randrange(1,5)
            if n == 1:
                self.operator = "+"
            elif n == 2:
                self.operator = "-"
            elif n == 3:
                self.operator = "/"
            else:
                self.operator = "*"
        else:
            self.operator = operator
            
        if number == None:
            #generate random number from 0-9
            self.number = 0
            if self.operator == "/":
                while self.number == 0:
                    self.number = randrange(nmin, nmax)
            else:
                self.number = randrange(nmin, nmax)
        else:
            self.number = number
        self.number = float(self.number)

    def calc(self, val=0):
        # perform operation given the input value
        if self.operator == "+":
            val += self.number
        elif self.operator == "-":
            val -= self.number
        elif self.operator == "*":
            val *= self.number
        elif self.operator == "/":
            val /= self.number
        return val


class gene:

    def __init__(self, n_operations = 5, seq = None):
        #seq is a sequence of operations (see class above)
        #initalize
        self.n_operations = n_operations
        
        #generate sequence
        if seq == None:
            #print "Generating sequence"
            self.seq = []
            self.seq.append(operation(operator="+"))  # the default operation is + some number
            for i in range(n_operations-1):
                #generate random number
                self.seq.append(operation())

        else:
            self.seq = seq

        self.calc_seq()

        #print "Sequence: ", self.seq
    def stringify(self):
        seq = ""
        for i in self.seq:
            seq = seq + i.operator + str(i.number)
        return seq

    def calc_seq(self):
        self.val = 0
        for i in self.seq:
            #print i.calc(self.val)
            self.val = i.calc(self.val)
        return self.val

    def crossover(self, ingene, rate):
        # combine this gene with the ingene at the given rate (between 0 and 1)
        #  of mixing to create two new genes

        #print "In 1: ", self.stringify()
        #print "In 2: ", ingene.stringify()
        new_seq_a = []
        new_seq_b = []
        for i in range(len(self.seq)):
            if (random() < rate): # swap
                new_seq_a.append(ingene.seq[i])
                new_seq_b.append(self.seq[i])
            else:
                new_seq_b.append(ingene.seq[i])
                new_seq_a.append(self.seq[i])

        new_gene_a = gene(seq = new_seq_a)
        new_gene_b = gene(seq = new_seq_b)
                       
        #print "Out 1:", new_gene_a.stringify()
        #print "Out 2:", new_gene_b.stringify()

        return (new_gene_a, new_gene_b)

    def mutate(self, mutation_rate):
        for i in range(1, len(self.seq)):
            if random() < mutation_rate:
                self.seq[i] = operation()
                
            
            

def weight(target, val):
    if val <> None:
        #print abs(target - val)
        if abs(target - val) <> 0:
            w = (1. / abs(target - val))
        else:
            w = "Bingo"
            print "Bingo: target, val = ", target, val
    else:
        w = 0.
    return w

def pick_value(weights):
    #given a series of weights randomly pick one of the sequence accounting for
    # the values of the weights

    # sum all the weights (for normalization)
    total = 0
    for i in weights:
        total += i

    # make an array of the normalized cumulative totals of the weights.
    cum_wts = []
    ctot = 0.0
    cum_wts.append(ctot)
    for i in range(len(weights)):
        ctot += weights[i]/total
        cum_wts.append(ctot)
    #print cum_wts

    # get random number and find where it occurs in array
    n = random()
    index = randrange(0, len(weights)-1)
    for i in range(len(cum_wts)-1):
        #print i, cum_wts[i], n, cum_wts[i+1]
        if n >= cum_wts[i] and n < cum_wts[i+1]:
            
            index = i
            #print "Picked", i
            break
    return index

def pick_best(weights):
    # pick the top two values from the sequences
    i1 = -1
    i2 = -1
    max1 = 0.
    max2 = 0.
    for i in range(len(weights)):
        if weights[i] > max1:
            max2 = max1
            max1 = weights[i]
            i2 = i1
            i1 = i
        elif weights[i] > max2:
            max2 = weights[i]
            i2 = i

    return (i1, i2)
        
    
    
            


# Main loop
l_loop = True
loop_num = 0
best_gene = None

##test = gene()
##test.print_seq()
##print test.calc_seq()

# initialize
genes = []
for i in range(ns):
    genes.append(gene(n_operations=sequence_length))
    #print genes[-1].stringify(), genes[-1].val
    

f1 = gcurve(color=color.cyan)

while (l_loop and loop_num < max_itterations):
    loop_num += 1
    if (loop_num%10 == 0):
        print "Loop: ", loop_num

    # Calculate weights
    weights = []
    for i in range(ns):
        weights.append(weight(target_val, genes[i].val))
        # check for hit on target
        if weights[-1] == "Bingo":
            print "Bingo", genes[i].stringify(), genes[i].val
            l_loop = False
            best_gene = genes[i]
            break
    #print weights

    if l_loop:

        # indicate which was the best fit option (highest weight)
        max_w = 0.0
        max_i = -1
        for i in range(len(weights)):
            #print max_w, weights[i]
            if weights[i] > max_w:
                max_w = weights[i]
                max_i = i
        best_gene = genes[max_i]
##        print "Best operation:", max_i, genes[max_i].stringify(), \
##              genes[max_i].val, max_w
        f1.plot(pos=(loop_num, best_gene.val))


        # Pick parent gene sequences for next generation
        # pick first of the genes using weigths for preference    
##        index = pick_value(weights)
##        print "Picked operation:  ", index, genes[index].stringify(), \
##              genes[index].val, weights[index]
##
##        # pick second gene
##        index2 = index
##        while index2 == index:
##            index2 = pick_value(weights)
##        print "Picked operation 2:", index2, genes[index2].stringify(), \
##              genes[index2].val, weights[index2]
##

        (index, index2) = pick_best(weights)
        
        # Crossover: combine genes to get the new population
        new_genes = []
        for i in range(ns/2):
            (a,b) = genes[index].crossover(genes[index2], crossover_rate)
            new_genes.append(a)
            new_genes.append(b)

        # Mutate
        for i in new_genes:
            i.mutate(mutation_rate)
                

        # update genes array
        genes = []
        for i in new_genes:
            genes.append(i)


print
print "Best Gene:", best_gene.stringify(), best_gene.val
print "Number of iterations:", loop_num
##

When run, the code usually gets a valid answer, but does not always converge: The figure at the top of this post shows it finding a solution after 142 iterations (the solution it found was: +8.0 +8.0 *3.0 -6.0). The code is rough, but is all I have time for at the moment. However, it should be a reasonable starting point if I should decide to discuss these in class.

Gene Transfer from Algea to Sea Slug

The sea slug. "Elysia pusilla" by Katharina Händeler, Yvonne P. Grzymbowski, Patrick J. Krug & Heike Wägele - Händeler K., Grzymbowski Y. P., Krug P. J. & Wägele H. (2009) "Functional chloroplasts in metazoan cells - a unique evolutionary strategy in animal life". Frontiers in Zoology 6: 28. doi:10.1186/1742-9994-6-28 Figure 1F.. Licensed under CC BY 2.0 via Wikimedia Commons.
The sea slug. “Elysia pusilla” by Katharina Händeler, Yvonne P. Grzymbowski, Patrick J. Krug & Heike Wägele – Händeler K., Grzymbowski Y. P., Krug P. J. & Wägele H. (2009) “Functional chloroplasts in metazoan cells – a unique evolutionary strategy in animal life”. Frontiers in Zoology 6: 28. doi:10.1186/1742-9994-6-28 Figure 1F.. Licensed under CC BY 2.0 via Wikimedia Commons.

One of my students wanted to figure out how to make animals photosynthesize. Well, this article indicates that sea slugs have figured out how eat and digest the algae but keep the algal chloroplasts alive in their guts so the sea slug can use the fats and carbohydrates the chloroplasts produce (the stealing of the algal organelles is called kleptoplasty). To maintain the chloroplasts, the slugs have actually had to incorporate some of the algae DNA into their own chromosomes–this is called horizontal gene transfer and it’s what scientists try to do with gene therapies.

More details here.

Magnification ?.
Filamentous algae like the ones eaten by the sea slugs.

Mars Colonization Project

My high-school biology class is taking their exam on genetics and evolution. To make the test a little more interesting, and to point out that there may be some relevance for this knowledge in the future, I made the test a questionnaire for the new head of the Mars Colonization Project. It begins like this:

Friday, January 30th, 2054.

Dear Dr. ________________ (insert your name here):

We are excited that you have accepted our offer to head the Biomedical Division of the MCP. As we are engaged in the first ever effort to colonize another planet, we know that we will face many unique challenges. Your expertise in pluripotent stem cell research and oncology will be extremely valuable to us — even though some of us administrators still don’t know what pluripotent stem cells are.

Please fill out the questions in this document to help us with our planning for the colony and to help our Human Resources department assemble your research and medical team.

Because of the sensitivity of some of the personal information included in this document, please write out, and sign, the Honor Code below before turning the page.

Yours truly,

Board of Administrators,
Martian Colonization Project

Front page of the High Schooler's Biology exam.
Front page of the High Schooler’s Biology exam.

Then I pose all of the questions in this context. For example, to get their knowledge of vocabulary I ask them to define the scientific words and phrases (which they’ve used in their scientific publications many, many times), in terms that laymen — like the people on the board of administrators — could understand.

To get at more complex concepts, like the molecular process of gene expression and regulation, I phrased the question like this:

Medical Issues Related to Ongoing Colonization Planning

The trip to Mars will take five years, so we will be placing most of the colonists into cryogenic sleep for most of that time. We are still working out some of the bugs in the cryogenic technology, and we need your help.

To put people into cryogenic sleep, we need to stop their digestion of carbohydrates. Your predecessor, Dr. Malign, told us that we could do this using RNA interference, by injecting them with engineered microRNA that would block the production of the enzyme amalyse.

Could you draw a diagram of a cell showing how proteins are expressed from DNA, and how microRNA would interfere with protein production. Are there other methods for preventing protein expression?

We’ll see how the students do on the test, however at least one student glanced at the front page and said, “This is kinda cool,” (actually, she first asked if I’d stolen the idea from the internet somewhere), which is significant praise coming from a teenager.

Drawing Faces: An Exercise in Heredity

A.C.'s demonstration of how to draw a face.
A.C.’s demonstration of how to draw a face.

My biology students are doing an exercise in genetics and heredity that requires them to combine the genes of two parents to see what their offspring might look like. They do the procedure twice — to create two kids — so they can see how the same parents can produce children who look similar but have distinct differences. To actually see what the kids look like, the students have to draw the faces of their “children”.

“I’m not going to claim that child as my own!”

I was walking through the class when I heard that. Apparently one student, who’d had a bit of art training, was paired with another student who had not.

Fortunately, I was able to convince the more practiced artist to give the rest of the class a lesson on how to draw faces. She did an awesome job; first drawing a female face and then adapting it a bit to make it look more male.

If nothing else, I tried to make sure that the other students registered the idea that proportion is important in drawing biological specimens — like faces — from real life. Just getting the proportions right made a huge difference in the quality of their drawings. The forehead region should be the largest (from the top of the head to the eyebrows), then the area between the eyebrows and the bottom of the nose, then the nose to lips, and then, finally, the region from lips to chin should be shortest. You can see the proportion lines in the picture above.

The adaptation stage, where she made the facial features more masculine, was also quite useful. The students had to think about what were typical male features and if there were a genetic basis to things like square chins.

Although all of the other students’ drawings improved markedly, including her simulated spouse’s, I don’t think my art-teaching student was absolutely happy with the end results after the one lesson. She ended up handing in two drawings of her own even though everyone else (including her partner) did one each.

However, having all the students on the same page, working with the same basic drawing methods, helped improve the heredity exercise because it reduced a lot of the variability in the pictures that resulted from different drawing styles and skill levels.

I also think that taking these interludes for art lessons are quite useful in a science class, since it emphasizes the importance of accurate observation, shapes student’s abilities to represent what they see in diagrams, and demonstrates that they can — and should — be applying the skills they learn in other classes to their sciences.

Instruction on how to draw a face.
Instruction on how to draw a face.

Return to 3rd Degree

Glass tile using the DNA Writer codon translation table.

Last weekend, I took the Glass Art Sampler class at the Third Degree Glass Factory, and got to try my hand at making a paperweight, a glass tile, and a few beads. It was awesome.

I’d had the chance to make a paperweight when my Lamplighter class had visited St. Louis a couple years ago, so I had a general idea of some of the possibilities. This time, however, I had DNA sequences on the brain, and went in with a bit of a theme in mind.

The tiles were the easiest because all you need to learn how to do was cut glass — by scoring it and using a little pliers like device to break it along the score — and then arrange the tiles of colored and clear glass on a tile. The arrangement was placed in a flat kiln, and then a day or so later, you tile would be all melted together. Pretty simple for a beginner.

My glass tile arrangement sitting in the kiln.

There is, of course, a bit more to it than that. The way the glass is stacked can be used to create floating effects; some colors will react when melted in the kiln to give different colors; care needs to be taken to manage where bubbles show up in the cooled glass; among other things.

Since it’s easiest to make straight edged cuts in glass, I made four sets of square colored tiles — yellow, red, blue, and green — to make a nucleotide sequence based off the DNA Writer translation table (with the start and stop codons added in).


Paperweight

I tried something similar when making the paperweight.

A blob of molten glass.

Usually, you start with a blob of molten, clear glass on the end of a metal rod, and dip it into trays of colored glass shards that stick to the molten glass. You can then pull and twist the viscous glass with a large pair of tweezers to blend the colors and make pretty patterns. The twisted glass is then pushed into a blob at the end of the rod, and the whole thing encased in more clear glass.

Twisting the glass.

Instead, I wanted to create a discernible pattern of colors to create a multi-colored helix of molten phenocryst-like blobs in the clear glass. I really wasn’t sure how to make it work. I though perhaps I could dip the initial glass blob into a pattern of shards and then pull it out once while twisting to get the spiral pattern. Our instructor was patient as I tried to explain my ultimate goal, and he came up with a more subtle method for making the spiral.

A pattern of colored glass chips.

I laid out the short pattern of colored glass shards and carefully dipped the initial blob of clear glass into it. All the shards stuck, which was good. Then instead of pulling with tweezers, the instructor helped my gently roll the blob of glass along a metal surface at a slight downward angle. Contact with the metal cooled the tip of the glass faster than rest of the blob causing the whole thing to twist just nicely. After smoothing things with a block we covered it with more clear glass (and smoothed again), and were done.

One week later:

Half a double helix encased in glass.

Working with big blobs of extremely hot glass is quite challenging, so I couldn’t replicate this on my own at the moment. I may have to take another class.

Glass Beads

The instructor melts a yellow glass rod in the flame and drops the molten glass onto a thin metal rod to create a bead.

I would feel comfortable making glass beads after the one class, but mastering the art is going to take a lot of practice. The flame — created from a mix of fuel gas (propane I think) and oxygen — is quite hot, and it takes some expertise to be able to melt the glass and twirl it onto the rods to make a nice round bead. The trickiest part, however, is making little colored dots to decorate the bead. You need to melt small bits of glass for the dots, then move the bead through the flame to warm it up enough so the dot will stick to the bead while not melting the bead too much. Then you pass the bead through the flame again to set the dot. If the bead or the dot is too cool when they’re put together the dots will pop off. I had a lot of popping dots.

I was not able to get my nucleotide sequence onto a bead in the time I had, but I did at least get to make a couple beads.

The Genetics of Blondes

Photo by Graham Crumb (via Wikimedia Commons).

Hair color tends to come up pretty organically when talking about heredity and genetic traits. Blonde hair in people of European descent is a result of a the interactions of a combination of genes, but the blonde afros of Melanesians appears to be the result of a single mutation of a single gene.

Switching one “letter” of genetic code-replacing a “C” with a “T”-meant the difference between dark hair and blond hair.

— Loury (2012): The Origin of Blond Afros in Melanesia in Science

Meiosis: Passing on Half of Your Genes

Now that we have an idea of what a strand of DNA looks like we’re going to start talking about how our genes are passed on to our kids.

During normal time (interphase) our DNA is stored in the nucleus of our cells. Humans have 23 pairs of chromosomes. Of each pair, one comes from your mom and one from your dad.

The 23 pairs of human chromosomes. One chromosome in each pair comes from each parent. Image from the NIH.

When a cell is not reproducing (which is most of the time) the chromosomes are unspooled threads in the cell’s nucleus.

An unspooled strand of DNA in the cell nucleus.

When the cell is preparing to reproduce, each DNA strand duplicates.

Each chromosome duplicates in preparation for cell reproduction.

Then they fold up into the chromosomes and line up in the center of the cell.

DNA folded into chromosomes in the cell nucleus. The centrioles (plastic cups) move to opposite sides of the cell nucleus.

Now this is where interesting things start to happen. In mitosis, each chromosome pairs up with its duplicate, so when these are pulled apart you get two new cells with exactly the same DNA.

Mitosis produces two identical cells. Image from the NIH.

In meiosis however, where the cell breaks apart into reproductive cells called gametes, the two parent chromosomes pair up and exchange some DNA before being pulled apart (the DNA exchange is called crossing over). Since the DNA has duplicated before this happens, when the cell splits, you end up with two new daughters with mixed up DNA. Each daughter nucleus has two chromosomes, like all your other cells, but unlike every other (non-reproductive) cell in your body those chromosomes are different because of the DNA mixing. In addition, in the last step of meiosis (called Meiosis II) each daughter cell splits apart into two more daughter cells (granddaughter cells?) each with only one chromosome.

Meiosis produces four cells (gametes), each of which has only half as many chromosomes as one of your normal cells. Image from the NIH.

Again, it’s important to note that because of the crossing over and the second splitting, when everything is done, you end up with four cells — called gametes –, each of which has its own unique DNA. And unlike the other cells in your body, which have 23 pairs of chromosomes, each gamete only has 23 chromosomes.

Because a normal cell has 23 pairs of chromosomes is called a diploid cell. The gametes with only 23 single chromosomes is called haploid. These haploid gametes are the reproductive cells — eggs and sperm.

Thus, the DNA you contribute to your kids is not the same strands that you have in your cells, but a halved mixture of the two sets of genes you got from your parents.

References

The NIH has an excellent primer called “What is a Cell” on the history of cells, their parts, and how they split.