DNA Writer: Storing Information in DNA Exercise

DNA Writer: Translate text into a DNA code (and back again) using a simple lookup table.

I created this little DNA Writer webpage after seeing the article on scientists recording one of Shakespeare’s sonnets on DNA, I was inspired to put together something similar as an assignment for my middle-school science class to demonstrate how DNA records information. With the website to do quick translations for me, I’ll give each student the translation table and a simple message in DNA code and have them figure out the message.

Update: I’ve adapted the code to add a two to five letter sequence of non-coding DNA to the beginning and end of the message code. There’s also start and stop code as well.

The DNA sequence (or RNA in this figure) can be broken down into groups of three nucleotides called codons. Each codon codes for a specific amino acid, so the order of codons gives the sequence of amino acids in the proteins created by the DNA strand. Image by TransControl via Wikipedia.

The DNA Writer code uses a simple look-up table where each letter in the English alphabet is assigned a unique three letter nucleotide code. The three letters are chosen from the letters of the DNA bases – AGCT – similar to the way codons are organized in mRNA. Any unknown characters or punctuation are ignored.

Also, with a little tweaking, I think I can adapt this assignment to show how random mutation can be introduced into DNA sequences during transcription. Maybe break the class into groups of 4, give the first student a message as a nucleotide sequence have them copy and pass it on to the next student and so on. If I structure this as a race between the groups, then someone’s bound to introduce some errors, so when they translate the final code back into English they should see how the random mutation affected their code.

UPDATE: Non-Coding (junk) DNA: I’ve updated the code so that you have the option of adding a short (2-5 character) string of non-coding DNA to the beginning and end of each sequence.

A more personalized and printer friendly format for output.

UPDATE 2: Personalized and Printable output: Since I’m using the DNA writer to give each student a personalized message, I’ve created a button that gives “Printer Friendly Output” which will produce an individualized page with the code, the translation table, and some information on how it works, so I can print off individualized assignments more easily.

UPDATE 3: You can now get a color coded version of the sequence.

Ravenclaw’s DNA sequence color coded, and translated back to English.

Update 4: Now you can embed the nucleobase color patterns into other websites. Like so:

Update 5: Closer to the standard lettering

DNA Writer A: https://earthsciweb.org/js/bio/dna-writerA/

In constructing the codon-to-english conversion table I had to decide if I wanted to go with the standard coding (e.g. letting GTC which codes for alanine represent A) or make up a random encoding.

I opted for the random approach for a number of reasons, but the primary one was that multiple codons can code for the same amino acid. GCT, GCC, GCA, and GCG all code for alanine. This would not necessarily be a problem, except that if we respect all of the multiple encodings, we run out of codons to represent things like numbers and punctuation. A secondary reason is that U is used to represent the 21st amino acid, selenocysteine, but its codon is the same as the stop codon (Croat, 2012) and its addition to the protein chain depends on not just a single codon in the sequence.

I’ve created a hybrid option: dnaWriterA which respects the standard lettering as much as possible (based off of the inverse DNA codon table on Wikipedia). In the table below, the bolded sequences are the ones that have been reassigned.

Letter/codeAmino acidCodon
startATG
stopTAA
space (” “)GCA
.GGA
AAlaGCTGCCGCAGCG
BAsn or AspAAC
CCysTGTTGC
DAspGATGAC
EGluGAAGAG
FPheTTTTTC
GGlyGGTGGCGGAGGG
HHisCATCAC
IIleATTATCATA
JTTG
KLysAAA
LLeuCTTCTCCTACTGTTATTG
MMetATG
NAsnAATAAC
OAGG
PProCCTCCCCCACCG
QGlnCAACAG
RArgCGTCGCCGACGGAGAAGG
SSerTCTTCCTCATCGAGTAGC
TThrACTACCACAACG
UAGA
VValGTTGTCGTAGTG
WTrpTGG
XAGC
YTyrTATTAC
ZGln or GluCAACAGGAAGAG
0AGT
1GCG
2GGG
3CTG
4CCG
5CGG
6TCG
7ACG
8GTG
9GAG
Codons mapping to letters/codes used in the dnaWriterA version. The bolded sequences are the ones that have been reassigned.

I’ve also posted the code to GitHub: https://github.com/lurbano/dnaWriterA with instructions on how to adapt the sequence.