VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Implementing genetic algorithms in virusses

Ready Rangers Liberation Front [5]
November 2004

[Back to index] [Comments]


I have personally always been fascinated by evolution and am a strong believer that it is the truth about how organisms evolve. I have also spend many thoughts and codes and readings trying to figure out if it could also be implemented in computer virusses, of course something completely different.

Normally computer virusses either don't change them selves (static), they encrypt themselves with one or multiple decryptors or they completely change their body (hard to do). With the second and the third way virusses try to completely change their layout / code use / register use etc. This theory will try to prove that another way could be better.

Note! In this theory I mainly describe how a fileinfecting virus could use genetics. This can however be applied to any spreading program including worms, as long as you can be creative.

Darwinian evolution - theory

Ok, so in short. What is Darwinian evolution? Let assume we have one species, let us say dogs. And lets assume that they are put in the jungle without humans around. You probably know there are lots of different dogs. Small, thin, big with long/small tales large/small beaks etc. etc. So all these different ones are dumped into the jungle.

Some would die because they could not find something to eat. Some because they were caught and eaten by lions. Some would fall and drown in rivers. Etc. But as you could imagine: From all those tens of thousands of dogs a few would still probably survive, because they were best adapted to the jungle. These remaining dogs would then mate and produce offspring. Some of these new dogs would be better adjusted to the environment, some worse. So some of this new generation would die but a lot of them would not, because they resembled their dog parents which could survive.

With Darwinian evolution it is all about this. The strongest survive, and the offspring of these strongest will survive event better or worse. Ultimately creating the "perfect" species.

So in short

Okay, so reading from the text above there are x core factors:

Binary organisms

This theory said, lets talk about modern computer virusses and evolution theory. What's the use of genetics in computer virusses?

In the virus world, of course, there are no "natural" enemies. No lions, rivers, etc. etc. There is only one enemy: antivirusses. When an antivirus finds a virus, it will be removed. Killed. To detect virusses antivirusses have 2 ways: virus specific and virus unspecific (heuristic).

The first method means a virus on the loose is 'caught'. Someone who is infected by it and sends it over to antivirus offices for analysing. It will get fully or partially analysed by antivirus personnel and a detection routine for it is generated. This is sent to all people owning the antivirus and thus they become 'immune' to the virus.

The second way for detecting virusses in general is for the antivirus to check for virus-like signs and suspicious things. What kind of things are suspicious and what things are not have been decided by the software authors. Some virusses will not get detected though, because they don't match the 'description'. The programmer cannot add an unlimited number of 'recognisers' because the more he adds the higher the chance will be a non-virus will fit the description.

Binary and biologic: the connection

First let us talk about unspecific detection. With this detection virusses are detected on what they do. The antivirus can detect them on the changes they make to files they infect. But as I said before: Some virusses do not get detected. This is of course because each virus can have its own way of doing it.

If we would compare this with a dog with long or short legs, we get a remarkably good comparison. If a dog has short legs he dies, and with long legs he survives. If a virus has one way of infecting it gets detected, if it uses another it does not.

So what way is best for the virus to use? That's unknown. Just like a dog with short or long legs does not know if it will survive, a virus won't either. For the dog species in total it is not a problem. Some dogs will die but new dogs will take their places. For a virus species it *is* a problem. Simply because there are no different kinds of one virus. Even if the virus is polymorphic (self changing), it will still be a problem because a polymorphic virus survivor has no way of 'knowing' what combination worked. So it would be like a long-legged parent getting both long-legged and short-legged offspring.

Here comes the idea of genetics and DNA for virusses to mind. What if the virus could 'remember' what kind of infection it did and pass it on with some slight modifications to its children. If it did that it could truly evolute like the dogs would.

The same goes for specific detection. Let us say the antivirus researcher analyses the virus and finds a kind of infection. It is added to the virus database and all virusses using that kind will be detected and removed. Virusses using another one because of other DNA won't be removed and will take the place of the previous virus sample. That way virus evolution is complete: as long as not all virus samples have been added to the detection database, some will escape detection, live on and breed.

Genetic algorithms

Genetic algorithms can be considered reasonably simple with a few things maybe harder to understand (and code). The structure of the, in my opinion, best genetic engine would be (explained later):

For each file to infect:

  1. Save virus DNA
  2. Mutate virus DNA
  3. Infect file
  4. Restore virus DNA

Firstly let us talk about the infection of the file. In this routine every possible genetic step (*) is made in the way of:

> if (stepxgene==0){ do first way } else { do second way }

To make the process more efficient it is the easiest to give all steps a fixed number of possibilities, Fe. all four possibilities. This will help in determining the way you are going to store the DNA. If you are thinking about multiple DNA steps per byte I would encourage you to use a number in the range of 2^... so you can use the maximum number of bits available.

The save/mutate/restore may be a little bit harder to understand why it is done. The ultimate reason is the fact that this way the file will be infect according to DNA of the virus's OFFSPRING. That way the offspring 'knows' exactly what it consists of. If the virus would only give its copy a mutated DNA under infection evolution would stay behind a little. Fe. A virus has DNA A and infects a file in way A but gives its copy DNA B. Also a problem is the fact that all files infected by one virus will be infected in exactly the same way. And you don't see all the puppies of a dog look the same.

Selecting appropriate genes

What to make genetic and what not is another topic to think about and it is debatable. The real question is whether or not something is useful to be genetic. Fe. human hair colour is genetic, but fingerprints are even with one-egg twins different. Finger prints were in through the evolution of man clearly of no importance. You can add anything which you like to make genetic, but a problem comes up when you have something like:

if(gene==0){ ...
	if (anothergene==0){ ... } else { ... }
	} else {
		if (yetanothergene==0){ ... } else { ... }

Of course it is possible, but when you look at it more closely you will notice that the importance of a gene declines, f.e.:

if (...){ if (...){ if (...){ ... } ..}..}

If the first "if" gene will mutate all the inner genes won't get executed anymore, and a much bigger adaptation is made than when an inner gene is changed. So to be 'fair' you would have to make the chance of mutation smaller on more important genes. It is a problem which you can live with but I would recommend trying to avoid it and keeping it in this format:

	if (...) { ... }
	if (...) { ... }
	if (...) { ... }

Furthermore, do not get carried away when creating genes. I mean do not create a self destruct gene. ;) And make sure all options are compatible! The best way to test for this is to put all genes first to zero, than to one etc. to the end number of different options. You can't test all combination individually anyway.


Mutations are the essence of evolution. Without it, of course, everything would stay the same. So the engine should make a change sometimes. However this is another thing to think about: How often should something be changed? If too much is changed virusses could be completely different to soon and the effect of evolution lost. But if it changed too slow it could be not variable enough to escape detection and come up with new ways. Anyway, it is debatable.

Furthermore, the number of possible mutations shouldn't also be static. So to stick to nature, every gene should independently have a certain chance of changing. My algorithm is (ASM):

        call    rand            ; eax = random
        xchg    eax, edx        ; (each bit has a chance of 1 in 2 to be 1)
        call    rand            ; eax = random
        and     edx, eax        ; chance 1 in 4
        call    rand
        and     edx, eax        ; chance 1 in 8
        call    rand
        and     edx, eax        ; chance 1 in 16
        call    rand
        and     edx, eax        ; chance 1 in 32

So in this example, at the end, every bit in edx has a chance of one in thirty-two to be 1; while all others are zero. Thus because a dword consists out of 32 bits, the average should be one bit 1 each time at the end of this proc. However 0, 2, 3, 4 ... 32 bits 1 are also possibilities, but the chance 32 bits are 1 is a number of one in 48 digits. So we are save to assume too big mutations will almost never happen.

So after getting this value we simply xor it with the dna (given the fact it only consists out of one dword). Giving a small number or no mutations.

        xor     [dna], edx

Code example

To give and example of how this could be implement I have coded this simple polymorphic engine. Actually it is not that simple and may be hard to understand, since I optimized most things about it. It has 2 parts which contain genetics. A DNA variable (called DNA), and a register container. These are firstly saved at the start of the engine and then the originals are altered. Then it generates a decryptor according to these.

The decryptor is then generated in the following way: All parts of the decryptor have four possible options (see the part under <call load_table>). So from the DNA each time 2 bits (can be 0,1,2,3) are read and used to pick the correct genetically specified option. If you check this out maybe you will learn some new techniques, but don't try to understand it too much :) (blame my coding style ;)).

Final Word

First of all, I hope you have enjoyed reading my article. As i am very interested in the subject myself, i liked to write something about this. I have already read other people thinking and writing about this subject, and i like it. I also hope this will enspire you to do new things with your codings, even completely different. This is not necessarily the best idea.

BGPE - BlueOwls Genetic Poly Engine (Simple version)
BlueOwl november 2004
[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka