Maximize
Bookmark

VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Infection of biological DNA with digital Computer Code

SPTH
Valhalla #4
November 2013

[Back to index] [Comments (0)]

Biological life spreads in the biological-chemical world, computer codes can spread in the digital computer world. That is a rule - no self- replicator has ever overcome the digital-biological barrier. Until today. Here I show a method how a digital computer code can infect biological DNA, thus spread in the biological-chemical world. The method is mainly based on the fantastic research by the J. Craig Venter Institute on synthetic life, and might ask new questions about the definition of life itself.

1) Introduction

In 2010, the J. Craig Venter Institute (JCVI) reported the creation of a bacterial cell with a chemically synthesized genome [1]. They sequenced the DNA of a bacteria (M.mycoides), modified several parts of its DNA in the computer, synthetized the novel genome and transplanted it to a different bacteria's cell (M.capricolum). They observed the control of the cell only by the new DNA. For verification, they introduced artificial "watermarks" sequences (non-coded part of the DNA) to the genome, which contained among other things the names of the involved scientists (written in a specially designed DNA encoding alphabet). The artificially created genome was capable of continuous self-replication. They call their new artificial bacterial Mycoplasma mycoides JCVI-syn1.0.

This is in my opinion one of the greatest scientific achievement in recent years.

In this text I explain the implementation of a computer code that makes the step from the digital to the biological world. The computer code, written in C++, hosts the DNA sequence of M.mycoides JCVI-syn1.0. At runtime it acts as follows:

The code has a classical self-replication mechanism as well, to eventually end up on a computer in a microbiology-laboratory with the ability of creating DNA out of digital genomes (such as laboratories by the JCVI).

If the scientists are incautious, the computer code's genome (instead of the intented original DNA) might be written to the biological cell. The new cell will start replicating in the biological world, and with it the representation of the digital computer code.

2) Craig Venter's synthetical cell

2.1) General concept

The team of Craig Venter has demonstrated how to create bacteria controlled by artificially designed and synthesized DNA. For that, they used the sequenced DNA of a ~1 mega-base pair bacteria M.mycoides. They modified the genome on the computer - deactivated several genes, and introduced watermarks (artificial non-coding parts of the DNA). A company called Blue Heron sequenced 1000 bp fragments of the full DNA. With a three-step procedere, they assembled the full DNA. This was transplanted into an empty receiver cell of the bacteria M.capricolum.

Amazingly, the cell with the new genom booted up, and was able to self-replicate. To verify that the expected genome was replicating, they introduced special functionality to the watermarks which are visible with chemical methods.

In their article [1] they write: "This work provides a proof of principle for producing cells based on computer-designed genome sequences. DNA sequencing of a cellular genome allows storage of the genetic instructions for life as a digital file."

The project describe here uses the method of their proof-of-principle.

2.2) Watermarks and DNA encoding language

The watermarks are parts of the genome that are not translated into functional proteins. That means: They are part of the DNA, but have no functional effect on the behaviour of the cell.

The watermarks are represented by nucleotides A,C,G,T. JCVI developed an encoding technique from DNA to human letters. Three nucleotides (one codon) represent one letter or ascii symbol. With that encoding methode, they encode readable information into the cell: It contains the name of the involved scientists, philosophical quotes and one html-code with an e-mail adresse.

The encoding from codons to letters has never been documented explicitly, but can be deduced mainly from the implicit information given in the article. The known alphabet looks like this:

    TAG = a        GCA = k        TCC = u        AGA = 4        CAC = /
    AGT = b        AAC = l        TTG = v        GCG = 5        CCA = =
    TTT = c        CAA = m        GTC = w        GCC = 6        CGA = .
    ATT = d        TGC = n        GGT = x        TAT = 7        GAG = !
    TAA = e        CGT = o        CAT = y        CGC = 8        CAG = :
    GGC = f        ACA = p        TGG = z        GTA = 9        GGA = "
    TAC = g        TTA = q        TCT = 0        ATA = space    GTG = ,
    TCA = h        CTA = r        CTT = 1        GGG = chr(10)  TCG = @
    CTG = i        GCT = s        ACT = 2        AGC = >        CCC = -
    GTT = j        TGA = t        AAT = 3        CGG = <

Four watermarks have been introduced to the modified bacterial DNA in the computer. As an example, a part of the DNA sequence of one watermark is:

    GCTTAATAAATATGATCACTGTGCTACGCTATATGCCGTTGAATATAGGCTATATGATC
    ATAACATATATAGCTATAAGTGATAAGTTCCTGAATATAGGCTATATGATCATAACATA
    TACAACTGTACTCATGAATAAGTTAACGA 

The sequence is divided into three-nucleotide parts (codons):

    GCT TAA TAA ATA TGA TCA CTG TGC TAC GCT ATA TGC CGT TGA ATA 
    TAG GCT ATA TGA TCA TAA CAT ATA TAG CTA TAA GTG ATA AGT TCC
    TGA ATA TAG GCT ATA TGA TCA TAA CAT ATA CAA CTG TAC TCA TGA
    ATA AGT TAA CGA

We can see in the above list that GCT stands for "s", TAA stands for "e", ATA is a space, TGA stands for "t" ... and so on. In the end we can extract the sentence: "see things not as they are, but as they might be."

Obviously we can also write in this encoding technique:

"hello vxers!" -> TCA TAA AAC AAC CGT ATA TTG GGT TAA CTA GCT GAG  

The full structure of the alphabet is not known ,therefor only 49 out of 64 codon's representation are presented here. However all of them are used in the watermark (i.e. there is no biological reason for not using specific codons).

3) FASTA file format

Fasta files are textbased representations of nucleotide sequences, commonly used in micro-biologic libraries. There are two fasta-file types that I will describe here. The first one is plain fasta-format (which usually have the file-extention .fasta or .fas. Both are available from the genome-database http://www.ncbi.nlm.nih.gov/.

For example, if you want to see the DNA of Mycoplasma mycoides JCVI-syn1.0 or something more common: E.coli.

3.1) plain fasta-files

The plain fasta-files have a small header, followed by a plain representation of the DNA in the nucleotide basis (A, T, G, C). Two examples:

a) Mycoplasma mycoides JCVI-syn1.0

This is about 1MB of data

     
>gi|296455217|gb|CP002027.1| Synthetic Mycoplasma mycoides JCVI-syn1.0 clone sMmYCp235-1, complete sequence
ATGAACGTAAACGATATTTTAAAAGAACTTAAACTAAGTTTAATGGCTAATAAAAATATTGATGAATCCG
TGTATAACGACTATATAAAGACAATAAATATTCATAAAAAGGGGTTTTCTGATTATATTGTTGTTGTTAA
ATCACAATTTGGTTTGTTAGCTATAAAACAGTTTCGTCAAACTATTGAAAATGAGATAAAAAATATTTTA
AAAGAACCTGTAAATATTAGTTTTACATACGAACAAGAATATAAAAAACAACTAGAAAAAGATGAATTAA
TTAATAAAGATCATTCTGATATCATTACTAAAAAAGTTAAAAAAACTAATGAAAACACTTTTGAAAATTT
...
Mycoplasma mycoides JCVI-syn1.0.fasta

b) Escherichia coli

This is about 5.5MB of data

       
>gi|47118301|dbj|BA000007.2| Escherichia coli O157:H7 str. Sakai DNA, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCACCACCATCACCATTACCATTACCACAGGTAACGGTGCGGGCTGACGCGTAC
AGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTCGACCAAAGGTAACGAGGTAACA    
...
E.coli.fasta

3.2) xml fasta-files

The second form is pure DNA aswell, however in a small xml-file. Two examples again:

<?xml version="1.0"?>
 <!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
 <TSeqSet>
<TSeq>
  <TSeq_seqtype value="nucleotide"/>
  <TSeq_gi>296455217</TSeq_gi>
  <TSeq_accver>CP002027.1</TSeq_accver>
  <TSeq_taxid>766747</TSeq_taxid>
  <TSeq_orgname>synthetic Mycoplasma mycoides JCVI-syn1.0</TSeq_orgname>
  <TSeq_defline>Synthetic Mycoplasma mycoides JCVI-syn1.0 clone sMmYCp235-1, complete sequence</TSeq_defline>
  <TSeq_length>1078809</TSeq_length>
  <TSeq_sequence>ATGAACGTAAACGATATTTTAAAAGAACTTAAACTAAGTTTAATGGCTAATAAAAATATTGATGAATCCGTGTATAACGACTATATAAAGACAATAAATATTCATAAAAAGGGGTTTTCTGATTATATTGTTGTTGTTAAATCA...</TSeq_sequence>
</TSeq>

</TSeqSet>
 
Mycoplasma mycoides JCVI-syn1.0.fasta.xml

or E.coli again:

<?xml version="1.0"?>
 <!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
 <TSeqSet>
<TSeq>
  <TSeq_seqtype value="nucleotide"/>
  <TSeq_gi>47118301</TSeq_gi>
  <TSeq_accver>BA000007.2</TSeq_accver>
  <TSeq_taxid>386585</TSeq_taxid>
  <TSeq_orgname>Escherichia coli O157:H7 str. Sakai</TSeq_orgname>
  <TSeq_defline>Escherichia coli O157:H7 str. Sakai DNA, complete genome</TSeq_defline>
  <TSeq_length>5498450</TSeq_length>
  <TSeq_sequence>AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATA...</TSeq_sequence>
</TSeq>

</TSeqSet>
 

4) Infection scenario

The strategy of this digitally and biologically self-replicating code is the following:

It starts as a digital computer file, and replicates itself via local networks, USB sticks and other removeable devices. There are two potential scenarios to step from the digital to the biological world:

There is a different interesting scenario: First, Mycoplasma mycoides bacteria are usually infecting cattles and goats. Imagine an unknown outbreak of the here presented bacteria. Goats or cattles would get sick, and microbiologists want to know the exact reason. They take samples of the infectious cells and sequence them in their laboratories. Now they see the DNA, and find out that the bacteria contains a rather big non-coded sequence - the watermark. They find this very unnatural and analyse the watermark, also by applying Craig Venter's DNA encoding alphabet (because it is very famous due to their first fascinating results). After decoding, they see that the code only contains a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,2,3,4,5,6,7

This is a curious structure, they research a bit and see that its base32-encoding. They decode it, and see 'M','Z',0x90,0x0,... They immediatly see that its a windows executeable, and I guess would be surprised :)

4.1) Stealth technique in the DNA

In their modified genome of M.mycoides JCVI-syn1.0, the JCVI-team introduces four watermarks. Every watermark contains a special sequence which is useed to test whether a cell has the intended genome or is a contamination (for example, from the receiving cell).

In the supplementary material of their article [1], they describe the exact representation of these sequences (primer). Each of the four watermarks contain one primer. When they perform a multiplex PCR, each watermark creats a specific characteristic.

In my code, I removed the total original content of all watermarks, except for the identified primer-sequence. As a result, when a team tests the bacteria cell with the representation of the digital code, it will have the same characteristic as their original designed DNA. Thus the computer code's DNA will pass this test.

5) Conclusion

I've shown the implementation of a technique that allows a digital computer code to make the step to the biological world. This is done by infecting a DNA-file with the genome of a self-replicating biological bacteria. The bacteria's genome contains the digital code of the self-replicator in form of a base32-representation encoded via Craig Venter's DNA encoding alphabet. The biological bacteria will self-replicator in the biological world, and so will the representation of the digital computer code.

The outbreak-probability of such cross-domain infectors is very low. The researchers in [1] have made ethical studies, and I'm convinced that they came up with perfect protections against potential attacks as this.

Finally, digital self-replicators are usually not considered as a form of life, even they fulfill the most important characteristic of life: capability of self-replication and subject to evolution [2]. I wonder whether this computer code can count as a form of life - if so, I would call it Mycoplasma mycoides SPTH-syn1.0 :)

Second Part To Hell
October 2013
http://spth.virii.lu/
[email protected]
twitter: @SPTHvx

References

  1. Daniel G. Gibson et al., "Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome", Science 329, 52 (2010).
  2. SPTH, "Taking the redpill: Artificial Evolution in native x86 systems", (2010); "Imitation of Life: Advanced system for native Artificial Evolution", in valhalla#1 (2011).

PS: Thanks to hh86 for motivation. Thanks to the JCVI-team for their awesome research, looking forward reading more discoveries on the boarder between dead and living material!

PPS: I'm not a microbiologist (or biologist at all). Even if I tried as hard as possible, I can not rule out that some assumptions might be wrong, some things I might have misunderstand. In any case, the main idea should be valid.

[Back to index] [Comments (0)]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! vxheaven.org aka vx.netlux.org
deenesitfrplruua