The Art of Computer Virus Research and Defense

Péter Ször
ISBN 0-321-30454-3
February 2005

17

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

Symantec Press Publisher: Linda McCarthy

Editor in Chief: Karen Gettman

Acquisitions Editor: Jessica Goldstein

Cover Designer: Alan Clements

Managing Editor: Gina Kanouse

Senior Project Editor: Kristy Hart

Copy Editor: Christal Andry

Indexers: Cheryl Lenser and Larry Sweazy

Compositor: Stickman Studio

The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact:

U. S. Corporate and Government Sales
(800) 382-3419
[email protected]

International Sales
[email protected]

Visit us on the Web: www.awprofessional.com

Library of Congress Number: 2004114972

All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to:

Pearson Education, Inc.
Rights and Contracts Department
One Lake Street

Text printed in the United States on recycled paper at Phoenix BookTech in Hagerstown, Maryland.

First printing, February, 2005

Dedication

to Natalia

Peter Szor is a world renowned computer virus and security researcher. He has been actively conducting research on computer viruses for more than 15 years, and he focused on the subject of computer viruses and virus protection in his diploma work in 1991. Over the years, Peter has been fortunate to work with the best-known antivirus products, such as AVP, F-PROT, and Symantec Norton AntiVirus. Originally, he built his own antivirus program, Pasteur, from 1990 to 1995, in Hungary. Parallel to his interest in computer antivirus development, Peter also has years of experience in fault-tolerant and secured financial transaction systems development.

He was invited to join the Computer Antivirus Researchers Organization (CARO) in 1997. Peter is on the advisory board of Virus Bulletin Magazine and a founding member of the AntiVirus Emergency Discussion (AVED) network. He has been with Symantec for over five years as a chief researcher in Santa Monica, California.

Peter has authored over 70 articles and papers on the subject of computer viruses and security for magazines such as Virus Bulletin, Chip, Source, Windows NT Magazine, and Information Security Bulletin, among others. He is a frequent speaker at conferences, including Virus Bulletin, EICAR, ICSA, and RSA and has given invited talks at such security conferences as the USENIX Security Symposium. Peter is passionate about sharing his research results and educating others about computer viruses and security issues.

Preface

Over the last two decades, several publications appeared on the subject of computer viruses, but only a few have been written by professionals ("insiders") of computer virus research. Although many books exist that discuss the computer virus problem, they usually target a novice audience and are simply not too interesting for the technical professionals. There are only a few works that have no worries going into the technical details, necessary to understand, to effectively defend against computer viruses.

Part of the problem is that existing books have little if any information about the current complexity of computer viruses. For example, they lack serious technical information on fast-spreading computer worms that exploit vulnerabilities to invade target systems, or they do not discuss recent code evolution techniques such as code metamorphism. If you wanted to get all the information I have in this book, you would need to spend a lot of time reading articles and papers that are often hidden somewhere deep inside computer virus and security conference proceedings, and perhaps you would need to dig into malicious code for years to extract the relevant details.

I believe that this book is most useful for IT and security professionals who fight against computer viruses on a daily basis. Nowadays, system administrators as well as individual home users often need to deal with computer worms and other malicious programs on their networks. Unfortunately, security courses have very little training on computer virus protection, and the general public knows very little about how to analyze and defend their network from such attacks. To make things more difficult, computer virus analysis techniques have not been discussed in any existing works in sufficient length before.

I also think that, for anybody interested in information security, being aware of what the computer virus writers have "achieved" so far is an important thing to know.

For years, computer virus researchers used to be "file" or "infected object" oriented. To the contrary, security professionals were excited about suspicious events only on the network level. In addition, threats such as CodeRed worm appeared to inject their code into the memory of vulnerable processes over the network, but did not "infect" objects on the disk. Today, it is important to understand all of these major perspectivesthe file (storage), in-memory, and network viewsand correlate the events using malicious code analysis techniques.

During the years, I have trained many computer virus and security analysts to effectively analyze and respond to malicious code threats. In this book, I have included information about anything that I ever had to deal with. For example, I have relevant examples of ancient threats, such as 8-bit viruses on the Commodore 64. You will see that techniques such as stealth technology appeared in the earliest computer viruses, and on a variety of platforms. Thus, you will be able to realize that current rootkits do not represent anything new! You will find sufficient coverage on 32-bit Windows worm threats with in-depth exploit discussions, as well as 64-bit viruses and "pocket monsters" on mobile devices. All along the way, my goal is to illustrate how old techniques "reincarnate" in new threats and demonstrate up-to-date attacks with just enough technical details.

I am sure that many of you are interested in joining the fight against malicious code, and perhaps, just like me, some of you will become inventors of defense techniques. All of you should, however, be aware of the pitfalls and the challenges of this field!

That is what this book is all about.

What I Cover

The purpose of this book is to demonstrate the current state of the art of computer virus and antivirus developments and to teach you the methodology of computer virus analysis and protection. I discuss infection techniques of computer viruses from all possible perspectives: file (on storage), in-memory, and network. I classify and tell you all about the dirty little tricks of computer viruses that bad guys developed over the last two decades and tell you what has been done to deal with complexities such as code polymorphism and exploits.

The easiest way to read this book is, well, to read it from chapter to chapter. However, some of the attack chapters have content that can be more relevant after understanding techniques presented in the defense chapters. If you feel that any of the chapters are not your taste, or are too difficult or lengthy, you can always jump to the next chapter. I am sure that everybody will find some parts of this book very difficult and other parts very simple, depending on individual experience.

I expect my readers to be familiar with technology and some level of programming. There are so many things discussed in this book that it is simply impossible to cover everything in sufficient length. However, you will know exactly what you might need to learn from elsewhere to be absolutely successful against malicious threats. To help you, I have created an extensive reference list for each chapter that leads you to the necessary background information.

Indeed, this book could easily have been over 1,000 pages. However, as you can tell, I am not Shakespeare. My knowledge of computer viruses is great, not my English. Most likely, you would have no benefit of my work if this were the other way around.

What I Do Not Cover

I do not cover Trojan horse programs or backdoors in great length. This book is primarily about self-replicating malicious code. There are plenty of great books available on regular malicious programs, but not on computer viruses.

I do not present any virus code in the book that you could directly use to build another virus. This book is not a "virus writing" class. My understanding, however, is that the bad guys already know about most of the techniques that I discuss in this book. So, the good guys need to learn more and start to think (but not act) like a real attacker to develop their defense!

Interestingly, many universities attempt to teach computer virus research courses by offering classes on writing viruses. Would it really help if a student could write a virus to infect millions of systems around the world? Will such students know more about how to develop defense better? Simply, the answer is no...

Instead, classes should focus on the analysis of existing malicious threats. There are so many threats out there waiting for somebody to understand themand do something against them.

Of course, the knowledge of computer viruses is like the "Force" in Star Wars. Depending on the user of the "Force," the knowledge can turn to good or evil. I cannot force you to stay away from the "Dark Side," but I urge you to do so.

Acknowledgments

First, I would like to thank my wife Natalia for encouraging my work for over 15 years! I also thank her for accepting the lost time on all the weekends that we could have spent together while I was working on this book.

I would like to thank everybody who made this book possible. This book grew out of a series of articles and papers on computer viruses, several of which I have co-authored with other researchers over the years. Therefore, I could never adequately thank Eric Chien, Peter Ferrie, Bruce McCorkendale, and Frederic Perriot for their excellent contributions to Chapter 7 and Chapter 10.

This book could not be written without the help of many friends, great antivirus researchers, and colleagues. First and foremost, I would like to thank Dr. Vesselin Bontchev for educating me in the terminology of malicious programs for many years while we worked together. Vesselin is famous ("infamous?") for his religious accuracy in the subject matter, and he greatly influenced and supported my research.

A big thank you needs to go to the following people who encouraged me to write this book, educated me in the subject, and influenced my research over the years: Oliver Beke, Zoltan Hornak, Frans Veldman, Eugene Kaspersky, Istvan Farmosi, Jim Bates, Dr. Frederick Cohen, Fridrik Skulason, David Ferbrache, Dr. Klaus Brunnstein, Mikko Hypponen, Dr. Steve White, and Dr. Alan Solomon.

I owe a huge thanks to my technical reviewers: Dr. Vesselin Bontchev, Peter Ferrie, Nick FitzGerald, Halvar Flake, Mikko Hypponen, Dr. Jose Nazario, and Jason V. Miller. Your encouragements, criticisms, insights, and reviews of early handbook manuscripts were simply invaluable.

I need to thank Janos Kis and Zsolt Szoboszlay for providing me access to in-the-wild virus code for analysis, in the days when the BBS was the center of the computing universe. I also need to thank Gunter May for the greatest present that an east European kid could geta C64.

A big thanks to everybody at Symantec, especially to Linda A. McCarthy and Vincent Weafer, who greatly encouraged me to write this book. I would also like to thank Nancy Conner and Chris Andry for their outstanding editorial work. Without their help, this project simply would never have finished. I also owe a huge thanks to Jessica Goldstein, Kristy Hart, and Christy Hackerd for helping me with the publishing process all the way.

A big thanks to all past and present members of the Computer Antivirus Researchers Organization (CARO), VFORUM, and the AntiVirus Emergency Discussion (AVED) List for all the exciting discussions on computer viruses and other malicious programs and defense systems.

I would like to thank everybody at Virus Bulletin for publishing my articles and papers internationally for almost a decade and for letting me use that material in this book.

Last but not least, I thank my teacher parents and grandparents for the extra "home education" in math, physics, music, and history

Contact Information

If you find errors or have suggestions for clarification or material you would like to see in a future edition, I would love to hear from you. I am planning to introduce clarifications, possible corrections, and new information relevant to the content of this work on my Web site. While I think we have found most of the problems (especially in those paragraphs that were written late at night or between virus and security emergencies), I believe that no such work of this complexity and size can exist without some minor nits. Nonetheless, I made all the efforts to provide you with "trustworthy" information according to the best of my research knowledge.

Peter Szor,
Santa Monica, CA
[email protected]
http://www.peterszor.com

Chapter 1. Introduction to the Games of Nature

"To me art is a desire to communicate."

Endre Szasz

Computer virus research is a fascinating subject to many who are interested in nature, biology, or mathematics. Everyone who uses a computer will likely encounter some form of the increasingly common problem of computer viruses. In fact, some well-known computer virus researchers became interested in the field when, decades ago, their own systems were infected.

The title of Donald Knuth's book series1, The Art of Computer Programming, suggests that anything we can explain to a computer is science, but that which we cannot currently explain to a computer is an art. Computer virus research is a rich, complex, multifaceted subject. It is about reverse engineering, developing detection, disinfection, and defense systems with optimized algorithms, so it naturally has scientific aspects; however, many of the analytical methods are an art of their own. This is why outsiders often find this relatively young field so hard to understand. Even after years of research and publications, many new analytical techniques are in the category of art and can only be learned at antivirus and security vendor companies or through the personal associations one must forge to succeed in this field.

This book attempts to provide an insider's view of this fascinating research. In the process, I hope to teach many facts that should interest both students of the art and information technology professionals. My goal is to provide an extended understanding of both the attackers and the systems built to defend against virulent, malicious programs.

Although there are many books about computer viruses, only a few have been written by people experienced enough in computer virus research to discuss the subject for a technically oriented audience.

The following sections discuss historical points in computation that are relevant to computer viruses and arrive at a practical definition of the term computer virus.

1.1. Early Models of Self-Replicating Structures

Humans create new models to represent our world from different perspectives. The idea of self-replicating systems that model self-replicating structures has been around since the Hungarian-American, Neumann JE1nos (John von Neumann), suggested it in 1948 2,3,4.

Von Neumann was a mathematician, an amazing thinker, and one of the greatest computer architects of all time. Today's computers are designed according to his original vision. Neumann's machines introduced memory for storing information and binary (versus analog) operations. According to von Neumann's brother Nicholas, "Johnny" was very impressed with Bach's "Art of the Fugue" because it was written for several voices, with the instrumentation unspecified. Nicholas von Neumann credits the Bach piece as a source for the idea of the stored-program computer5.

In the traditional von Neumann machine, there was no basic difference between code and data. Code was differentiated from data only when the operating system transferred control and executed the information stored there.

To create a more secure computing system, we will find that system operations that better control the differentiation of data from code are essential. However, we also will see the weaknesses of such approaches.

Modern computers can simulate nature using a variety of modeling techniques. Many computer simulations of nature manifest themselves as games. Modern computer viruses are somewhat different from these traditional nature-simulation game systems, but students of computer virus research can appreciate the utility of such games for gaining an understanding of self-replicating structures.

1.1.1. John von Neumann: Theory of Self-Reproducing Automata

Replication is an essential part of life. John von Neumann was the first to provide a model to describe nature's self-reproduction with the idea of self-building automata.

In von Neumann's vision, there were three main components in a system:

1. A Universal Machine
2. A Universal Constructor
3. Information on a Tape

A universal machine (Turing Machine) would read the memory tape and, using the information on the tape, it would be able to rebuild itself piece by piece using a universal constructor. The machine would not understand the processit would simply follow the information (blueprint instructions) on the memory tape. The machine would only be able to select the next proper piece from the set of all the pieces by picking them one by one until the proper piece was found. When it was found, two proper pieces would be put together according to the instructions until the machine reproduced itself completely.

If the information that was necessary to rebuild another system could be found on the tape, then the automata was able to reproduce itself. The original automata would be rebuilt (Figure 1.1), and then the newly built automata was booted, which would start the same process.

Figure 1.1. The model of a self-building machine.

A few years later, Stanislaw Ulam suggested to von Neumann to use the processes of cellular automation to describe this model. Instead of using "machine parts," states of cells were introduced. Because cells are operated in a robotic fashion according to rules ("code"), the cell is known as an automaton. The array of cells comprises the cellular automata (CA) computer architecture.

Von Neumann changed the original model using cells that had 29 different states in a two-dimensional, 5-cell environment. To create a self-reproducing structure, he used 200,000 cells. Neumann's model mathematically proved the possibility of self-reproducing structures: Regular non-living parts (molecules) could be combined to create self-reproducing structures (potentially living organisms).

In September 1948, von Neumann presented his vision of self-replicating automata systems. Only five years later, in 1953, Watson and Crick recognized that living organisms use the DNA molecule as a "tape" that provides the information for the reproduction system of living organisms.

Unfortunately, von Neumann could not see a proof of his work in his life, but his work was completed by Arthur Burks. Further work was accomplished by E.F. Codd in 1968. Codd simplified Neumann's model using cells that had eight states, 5-cell environments. Such simplification is the base for "self-replicating loops"6 developed by artificial life researchers, such as Christopher G. Langton, in 1979. Such replication loops eliminate the complexity of universal machine from the system and focus on the needs of replication.

In 1980 at NASA/ASEE, Robert A. Freitas, Jr. and William B. Zachary7 conducted research on a self-replicating, growing lunar factory. A lunar manufacturing facility (LMF) was researched, which used the theory of self-reproducing automata and existing automation technology to make a self-replicating, self-growing factory on the moon. Robert A. Freitas, Jr. and Ralph C. Merkle recently authored a book titled Kinematic Self-Replicating Machines. This book indicates a renewed scientific interest in the subject. A few years ago, Freitas introduced the term ecophagy, the theoretical consumption of the entire ecosystem by out of control, self-replicating nano-robots, and he proposed mitigation recommendations8.

It is also interesting to note that the theme of self-replicating machines occurs repeatedly in works of science fiction, from movies such as Terminator to novels written by such authors as Neal Stephenson and William Gibson. And of course, there are many more examples from beyond the world of science fiction, as nanotech and microelectrical mechanical systems (MEMS) engineering have become real sciences.

1.1.2. Fredkin: Reproducing Structures

Several people attempted to simplify von Neumann's model. For instance, in 1961 Edward Fredkin used a specialized cellular automaton in which all the structures could reproduce themselves and replicate using simple patterns on a grid (see Figure 1.2 for a possible illustration). Fredkin's automata had the following rules9:

• On the table, we use the same kind of tokens.
• We either have a token or no token in each possible position.
• Token generations will follow each other in a finite time frame.
• The environment of each token will determine whether we will have a new token in the next generation.
• The environment is represented by the squares above, below, to the left, and to the right of the token (using the 5-cell-based von Neumann environment).
• The state of a square in the next generation will be empty when the token has an even number of tokens in its environment.
• The state of a square in the next generation will be filled with a token if it has an odd number of tokens in its environment.
• It is possible to change the number of states.

Figure 1.2. Generation 1, Generation 2, and... Generation 4.

Using the rules described previously with this initial layout allows all structures to replicate. Although there are far more interesting layouts to explore, this example is the simplest possible model of self-reproducing cellular automata.

1.1.3. Conway: Game of Life

In 1970, John Horton Conway10 created one of the most interesting cellular automata systems. Just as the pioneer von Neumann did, Conway researched the interaction of simple elements under a common rule and found that this could lead to surprisingly interesting structures. Conway named his game Life. Life is based on the following rules:

• There should be no initial pattern for which there is a single proof that the population can grow without limit.
• There should be an initial pattern that apparently does grow without limit.
• There should be simple initial patterns that work according to simple genetic law: birth, survival, and death.

Figure 1.3 demonstrates a modern representation of the original Conway table game written by Edwin Martin11.

Figure 1.3. Edwin Martin's Game of Life implementation on the Mac using "Shooter" starting structure.

It is especially interesting to see the computer animation as the game develops with the so-called "Shooter" starting structure. In a few generations, two shooter positions that appear to shoot to each other will develop on the sides of the table, as shown in Figure 1.4, and in doing so they appear to produce so-called gliders that "fly" away (see Figure 1.5) toward the lower-right corner of the table. This sequence continues endlessly, and new gliders are produced.

Figure 1.4. "Shooter" in Generation 355.

Figure 1.5. The glider moves around without changing shape.

On a two-dimensional table, each cell has two potential states: S=1 if there is one token in the cell, or S=0 if there is no token. Each cell will live according to the rules governed by the cell's environment (see Figure 1.6).

Figure 1.6. The 9-cell-based Moore environment.

The following characteristics/rules define Conway's game, Life:

• Birth: If an empty cell has three (K=3) other filled cells in its environment, that particular cell will be filled in a new generation.
• Survival: If a filled cell has two or three (K=2 or K=3) other filled cells in its environment, that particular cell will survive in the new generation.
• Death: If a filled cell has only one or no other filled cells (K=1 or K=0) in its environment, that particular cell will die because of isolation. Further, if a cell has too many filled cells in its environmentfour, five, six, seven, or eight (K=4, 5, 6, 7, or 8), that particular cell will also die in the next generation due to overpopulation.

Conway originally believed that there were no self-replicating structures in Life. He even offered $50 to anyone who could create a starting structure that would lead to self-replication. One such structure was quickly found using computers at the artificial intelligence group of the Massachusetts Institute of Technology (MIT). MIT students found a structure that was later nicknamed a glider. When 13 gliders meet, they create a pulsing structure. Later, in the 100th generation, the pulsing structure suddenly "gives birth" to new gliders, which quickly "fly" away. After this point, in each 30th subsequent generation, there will be a new glider on the table that flies away. This sequence continues endlessly. This setup is very similar to the "Shooter" structure shown in Figures 1.3 and 1.4. Games with Computers, written by Antal Csakany and Ferenc Vajda in 1980, contains examples of competitive games. The authors described a table game with rules similar to those of Life. The table game uses cabbage, rabbits, and foxes to demonstrate struggles in nature. An initial cell is filled with cabbage as food for the rabbits, which becomes food for the foxes according to predefined rules. Then the rules control and balance the population of rabbits and foxes. It is interesting to think about computers, computer viruses, and antiviral programs in terms of this model. Without computers (in particular, an operating system or BIOS of some sort), computer viruses are unable to replicate. Computer viruses infect new computer systems, and as they replicate, the viruses can be thought of as prey for antivirus programs. In some situations, computer viruses fight back. These are called retro viruses. In such a situation, the antiviral application can be thought to "die." When an antiviral program stops an instance of a virus, the virus can be thought to "die." In some cases, the PC will "die" immediately as the virus infects it. For example, if the virus indiscriminately deletes key operating system files, the system will crash, and the virus can be said to have "killed" its host. If this process happens too quickly, the virus might kill the host before having the opportunity to replicate to other systems. When we imagine millions of computers as a table game of this form, it is fascinating to see how computer virus and antiviral population models parallel those of the cabbage, rabbits, and foxes simulation game. Rules, side effects, mutations, replication techniques, and degrees of virulence dictate the balance of such programs in a never-ending fight. At the same time, a "co-evolution"12 exists between computer viruses and antivirus programs. As antivirus systems have become more sophisticated, so have computer viruses. This tendency has continued over the more than 30-year history of computer viruses. Using models along these lines, we can see how the virus population varies according to the number of computers compatible with them. When it comes to computer viruses and antiviral programs, multiple parallel games occur side by side. Viruses within an environment that consists of a large number of compatible computers will be more virulent; that is, they will spread more rapidly to many more computers. A large number of similar PCs with compatible operating systems create a homogeneous environmentfertile ground for virulence (sound familiar?). With smaller game boards representing a smaller number of compatible computers, we will obviously see smaller outbreaks, along with relatively small virus populations. This sort of modeling clearly explains why we find major computer virus infections on operating systems such as Windows, which represents about 95% of the current PC population around us on a huge "grid." Of course this is not to say that 5% of computer systems are not enough to cause a global epidemic of some sort. Note If you are fascinated by self-replicating, self-repairing, and evolving structures, visit the BioWall project, http://lslwww.epfl.ch/biowall/index.html . 1.1.4. Core War: The Fighting Programs Around 1966, Robert Morris, Sr., the future National Security Agency (NSA) chief scientist, decided to create a new game environment with two of his friends, Victor Vyssotsky and Dennis Ritchie, who coded the game and called it Darwin. (Morris, Jr. was the first infamous worm writer in the history of computer viruses. His mark on computer virus history will be discussed later in the book.) The original version of Darwin was created for the PDP-1 (programmed data processing) at Bell Labs. Later, Darwin became Core War, a computer game that many programmers and mathematicians (as well as hackers) play to this day. Note I use the term hacker in its original, positive sense. I also believe that all good virus researchers are hackers in the traditional sense. I consider myself a hacker, too, but fundamentally different from malicious hackers who break into other people's computers. The game is called Core War because the objective of the game is to kill your opponent's programs by overwriting them. The original game is played between two assembly programs written in the Redcode language. The Redcode programs run in the core of a simulated (for example, "virtual") machine named Memory Array Redcode Simulator (MARS). The actual fight between the warrior programs was referred to as Core Wars. The original instruction set of Redcode consists of 10 simple instructions that allow movement of information from one memory location to another, which provides great flexibility in creating tricky warrior programs. Dewdney wrote several "Computer Recreations" articles in Scientific American13,14 that discussed Core War, beginning with the May 1984 article. Figure 1.7 is a screen shot of a Core War implementation called PMARSV, written by Albert Ma, Na'ndor Sieben, Stefan Strack, and Mintardjo Wangsaw. It is interesting to watch as the little warriors fight each other within the MARS environment. Figure 1.7. Core Wars warrior programs (Dwarf and MICE) in battle. As programs fight in the annual tournaments, certain warriors might become the King of the Hill (KotH). These are the Redcode programs that outperform their competitors. The warrior program named MICE won the first tournament. Its author, Chip Wendell, received a trophy that incorporated a core-memory board from an early CDC 6600 computer14. The simplest Redcode program consists of only one MOV instruction: MOV 0,1 (in the traditional syntax). This program is named IMP, which causes the contents at relative address 0 (namely the MOV, or move, instruction itself), to be transferred to relative address 1, just one address ahead of itself. After the instruction is copied to the new location, control is given to that address, executing the instruction, which, in turn, makes a new copy of itself at a higher address, and so on. This happens naturally, as instructions are executed following a higher address. The instruction counter will be incremented after each executed instruction. The basic core consisted of two warrior programs and 8,000 cells for instructions. Newer revisions of the game can run multiple warriors at the same time. Warrior programs are limited to a specific starting size, normally 100 instructions. Each program has a finite number of iterations; by default, this number is 80,000. The original version of Redcode supported 10 instructions. Later revisions contain more. For example, the following 14 instructions are used in the 1994 revision, shown in Listing 1.1. Listing 1.1. Core War Instructions in the 1994 Revision  DATA data MOVA move ADDA add SUBA subtract MULA multiply DIVA divide MODA modula JMPA jump JMZA jump if zero JMNA jump if not zero DJNA decrement, jump if not zero CMPA compare SLTA skip if less than SPLA split execution Let's take a look at Dewdney's Dwarf tutorial (see Listing 1.2). Listing 1.2. Dwarf Bombing Warrior Program ;name Dwarf ;author A. K. Dewdney ;version 94.1 ;date April 29, 1993 ;strategy Bombs every fourth instruction. ORG 1 ; Indicates execution begins with the second ; instruction (ORG is not actually loaded, and is ; therefore not counted as an instruction). DAT.F #0, #0 ; Pointer to target instruction. ADD.AB #4,$-1         ; Increments pointer by 4.
MOV.AB  #0, @-2         ; Bombs target instruction.
JMP.A   $-2, #0 ; Loops back two instructions. Dwarf follows a so-called bombing strategy. The first few lines are comments indicating the name of the warrior program and its Redcode 1994 standard. Dwarf attempts to destroy its opponents by "dropping" DAT bombs into their operation paths. Because any warrior process that attempts to execute a DAT statement dies in the MARS, Dwarf will be a likely winner when it hits its opponents. The MOV instruction is used to move information into MARS cells. (The IMP warrior explains this very clearly.) The general format of a Redcode command is of the Opcode A, B form. Thus, the command MOV.AB #0, @-2 will point to the DAT statement in Dwarf's code as a source. The A field points to the DAT statement, as each instruction has an equivalent size of 1, and at 0, we find DAT #0, #0. Thus, MOV will copy the DAT instruction to where B points. So where does B point to now? The B field points to DAT.F #0, #0 statement in it. Ordinarily, this would mean that the bomb would be put on top of this statement, but the @ symbol makes this an indirect pointer. In effect, the @ symbol says to use the contents of the location to where the B field points as a new pointer (destination). In this case, the B field appears to point to a value of 0 (location 0, where the DAT.F instruction is placed). The first instruction to execute before the MOV, however, is an ADD instruction. When this ADD #4,$-1 is executed, the DAT's offset field will be incremented by four each time it is executedthe first time, it will be changed from 0 to 4, the next time from 4 to 8, and so on.

This is why, when the MOV command copies a DAT bomb, it will land four lines (locations) above the DAT statement (see Listing 1.3).

Listing 1.3. Dwarf's Code When the First Bomb Is Dropped

0       DAT.F #0, #8
1 -> ADD.AB 4, $-1 2 MOV.AB #0, @-2 ; launcher 3 JMP.A$-2, #0
<strong>4       DAT ; Bomb 1</strong>
5       .
6       .
7       .
<strong>8       DAT ; Bomb 2</strong>
9       .

The JMP.A $-2 instruction transfers control back relative to the current offset, that is, back to the ADD instruction to run the Dwarf program "endlessly." Dwarf will continue to bomb into the core at every four locations until the pointers wrap around the core and return. (After the highest number possible for the DAT location has been reached, it will "wrap" back around past 0. For example, if the highest possible value were 10, 10+1 would be 0, and 10+4 would be 3.) At that point, Dwarf begins to bomb over its own bombs, until the end of 80,000 cycles/iterations or until another warrior acts upon it. At any time, another warrior program might easily kill Dwarf because Dwarf stays at a constant locationso that it can avoid hitting itself with friendly fire. But in doing so, it exposes itself to attackers. There are several common strategies in Core War, including scanning, replicating, bombing, IMP-spiral (those using the SPL instruction), and the interesting bomber variation named the vampire. Dewdney also pointed out that programs can even steal their enemy warrior's very soul by hijacking a warrior execution flow. These are the so-called vampire warriors, which bomb JMP (JUMP) instructions into the core. By bombing with jumps, the enemy program's control can be hijacked to point to a new, predefined location where the hijacked warrior will typically execute useless code. Useless code will "burn" the cycles of the enemy warrior's execution threads, thus giving the vampire warrior an advantage. Instead of writing computer viruses, I strongly recommend playing this harmless and interesting game. In fact, if worms fascinate you, a new version of Core War can be created to link battles in different networks and allow warrior programs to jump from one battle to another to fight new enemies on those machines. Evolving the game to be more networked allows for simulating worm-like warrior programs. 1.2. Genesis of Computer Viruses Virus-like programs appeared on microcomputers in the 1980s. However, two fairly recounted precursors deserve mention here: Creeper from 1971-72 and John Walker's "infective" version of the popular ANIMAL game for UNIVAC15 in 1975. Creeper and its nemesis, Reaper, the first "antivirus" for networked TENEX running on PDP-10s at BBN, was born while they were doing the early development of what became "the Internet." Even more interestingly, ANIMAL was created on a UNIVAC 1100/42 mainframe computer running under the Univac 1100 series operating system, Exec-8. In January of 1975, John Walker (later founder of Autodesk, Inc. and co-author of AutoCAD) created a general subroutine called PERVADE16, which could be called by any program. When PERVADE was called by ANIMAL, it looked around for all accessible directories and made a copy of its caller program, ANIMAL in this case, to each directory to which the user had access. Programs used to be exchanged relatively slowly, on tapes at the time, but still, within a month, ANIMAL appeared at a number of places. The first viruses on microcomputers were written on the Apple-II, circa 1982. Rich Skrenta17, who was a ninth-grade student at the time in Pittsburgh, Pennsylvania, wrote "Elk Cloner." He did not think the program would work well, but he coded it nonetheless. His friends found the program quite entertainingunlike his math teacher, whose computer became infected with it. Elk Cloner had a payload that displayed Skrenta's poem after every 50th use of the infected disk when reset was pressed (see Figure 1.8). On every 50th boot, Elk Cloner hooked the reset handler; thus, only pressing reset triggered the payload of the virus. Figure 1.8. Elk Cloner activates. Not surprisingly, the friendship of the two ended shortly after the incident. Skrenta also wrote computer games and many useful programs at the time, and he still finds it amazing that he is best known for the "stupidest hack" he ever coded. In 1982, two researchers at Xerox PARC18 performed other early studies with computer worms. At that time, the term computer virus was not used to describe these programs. In 1984, mathematician Dr. Frederick Cohen19 introduced this term, thereby becoming the "father" of computer viruses with his early studies of them. Cohen introduced computer virus based on the recommendation of his advisor, Professor Leonard Adleman20, who picked the name from science fiction novels. 1.3. Automated Replicating Code: The Theory and Definition of Computer Viruses Cohen provided a formal mathematical model for computer viruses in 1984. This model used a Turing machine. In fact, Cohen's formal mathematical model for a computer virus is similar to Neumann's self-replicating cellular automata model. We could say, that in the Neumann sense, a computer virus is a self-reproducing cellular automata. The mathematical model does not have much practical use for today's researcher. It is a rather general description of what a computer virus is. However, the mathematical model provides significant theoretical foundation to the computer virus problem. Here is Cohen's informal definition of a computer virus: "A virus is a program that is able to infect other programs by modifying them to include a possibly evolved copy of itself." This definition provides the important properties of a computer virus, such as the possibility of evolution (the capability to make a modified copy of the same code with mutations). However, it might also be a bit misleading if applied in its strictest sense. This is, by no means, to criticize Cohen's groundbreaking model. It is difficult to provide a precise definition because there are so many different kinds of computer viruses nowadays. For instance, some forms of computer viruses, called companion viruses, do not necessarily modify the code of other programs. They do not strictly follow Cohen's definition because they do not need to include a copy of themselves within other programs. Instead, they make devious use of the program's environmentproperties of the operating systemby placing themselves with the same name ahead of their victim programs on the execution path. This can create a problem for behavior-blocking programs that attempt to block malicious actions of other programsif the authors of such blockers strictly apply Cohen's informal definition. In other words, if such blocking programs are looking only for viruses that make unwanted changes to the code of another program, they will miss companion viruses. Note Cohen's mathematical formulation properly encompasses companion viruses; it is only the literal interpretation of the single-sentence human language definition that is problematic. A single-sentence linguistic definition of viruses is difficult to come up with. Integrity checker programs also rely on the fact that one program's code remains unchanged over time. Such programs rely on a database (created at some initial point in time) assumed to represent a "clean" state of the programs on a machine. Integrity checker programs were Cohen's favorite defense method and my own in the early '90s. However, it is easy to see that the integrity checker would be challenged by companion viruses unless the integrity checker also alerted the user about any new application on the system. Cohen's own system properly performed this. Unfortunately, the general public does not like to be bothered each time a new program is introduced on their systems, but Cohen's approach is definitely the safest technique to use. Dr. Cohen's definition does not differentiate between programs explicitly designed to copy themselves (the "real viruses" as we call them) from the programs that can copy themselves as a side effect of the fact that they are general-purpose copying programs (compilers and so on). Indeed, in the real world, behavior-blocking defense systems often alarm in such a situation. For instance, Norton Commander, the popular command shell, might be used to copy the commander's own code to another hard drive or network resource. This action might be confused with self-replicating code, especially if the folder in which the copy is made has a previous version of the program that we overwrite to upgrade it. Though such "false alarms" are easily dealt with, they will undoubtedly annoy end users. Taking these points into consideration, a more accurate definition of a computer virus would be the following: "A computer virus is a program that recursively and explicitly copies a possibly evolved version of itself." There is no need to specify how the copy is made, and there is no strict need to "infect" or otherwise modify another application or host program. However, most computer viruses do indeed modify another program's code to take control. Blocking such an action, then, considerably reduces the possibility for viruses to spread on the system. As a result, there is always a host, an operating system, or another kind of execution environment, such as an interpreter, in which a particular sequence of symbols behaves as a computer virus and replicates itself recursively. Computer viruses are self-automated programs that, against the user's wishes, make copies of themselves to spread themselves to new targets. Although particular computer viruses ask the user with prompts before they infect a machine, such as, "Do you want to infect another program? (Y/N?)," this does not make them non-viruses. Often, novice researchers in computer virus labs believe otherwise, and they actually argue that such programs are not viruses. Obviously, they are wrong! When attempting to classify a particular program as a virus, we need to ask the important question of whether a program is able to replicate itself recursively and explicitly. A program cannot be considered a computer virus if it needs any help to make a copy of itself. This help might include modifying the environment of such a program (for example, manually changing bytes in memory or on a disk) orheaven forbidapplying a hot fix to the intended virus code itself using a debugger! Instead, nonworking viruses should be classified as intended viruses. The copy in question does not have to be an exact clone of the initial instance. Modern computer viruses, especially so-called metamorphic viruses (further discussed in Chapter 7, "Advanced Code Evolution Techniques and Computer Virus Generator Kits"), can rewrite their own code in such a way that the starting sequence of bytes responsible for the copy of such code will look completely different in subsequent generations but will perform the equivalent or similar functionality. References 1. Donald E. Knuth , The Art of Computer Programming, 2nd Edition, Addison-Wesley, Reading, MA, 1973, 1968, ISBN: 0-201-03809-9 (Hardcover). 2. John von Neumann , "The General and Logical Theory of Automata," Hixon Symposium, 1948. 3. John von Neumann , "Theory and Organization of Complicated Automata," Lectures at the University of Illinois, 1949. 4. John von Neumann , "The Theory of Automata: Contruction, Reproduction, Homogenity," Unfinished manuscript, 1953. 5. William Poundstone , Prisoner's Dilemma, Doubleday, New York, ISBN: 0-385-41580-X (Paperback), 1992. 6. Eli Bachmutsky , "Self-Replication Loops in Cellular Space," http://necsi.org:16080/postdocs/sayama/sdrs/java . 7. Robert A. Freitas, Jr. and William B. Zachary , "A Self-Replicating, Growing Lunar Factory," Fifth Princeton/AIAA Conference, May 1981. 8. Robert A. Freitas, Jr., "Some Limits to Global Ecophagy by Biovorous Nanoreplicators, with Public Policy Recommendations," http://www.foresight.org/nanorev/ecophagy.html . 9. György Marx , A Természet Játékai, Ifúsági Lap és Könyvterjesztô Vállalat, Hungary, 1982, ISBN: 963-422-674-4 (Hardcover). 10. Martin Gardner , "Mathematical Games: The Fantastic Combinations of John Conway's New Solitaire Game 'Life,'" Scientific American, October 1970, pp. 120-123. 11. Edwin Martin , "John Conway's Game of Life," http://www.bitstorm.org/gameoflife (Java version is available). 12. Carey Nachenberg , "Computer Virus-Antivirus Coevolution," Communications of the ACM, January 1997, Vol. 40, No. 1., pp. 46-51. 13. Dewdney, A. K. , The Armchair Universe: An Exploration of Computer Worlds, New York: W. H. Freeman (c) , 1988, ISBN: 0-7167-1939-8 (Paperback). 14. Dewdney, A. K. , The Magic Machine: A Handbook of Computer Sorcery, New York: W. H. Freeman (c) , 1990, ISBN: 0-7167-2125-2 (Hardcover), 0-7167-2144-9 (Paperback). 15. John Walker , "ANIMAL," http://fourmilab.ch/documents/univac/animal.html . 16. John Walker , "PERVADE," http://fourmillab.ch/documents/univac/pervade.html . 17. Rich Skrenta , http://www.skrenta.com . 18. John Shock and Jon Hepps , "The Worm Programs, Early Experience with a Distributed Computation," ACM, Volume 25, 1982, pp. 172-180. 19. Dr. Frederick B. Cohen , A Short Course on Computer Viruses, Wiley Professonal Computing, New York, 2nd edition, 1994, ISBN: 0471007684 (Paperback). 20. Vesselin Vladimirov Bontchev , "Methodology of Computer Anti-Virus Research," University of Hamburg Dissertation, 1998. Chapter 2. The Fascination of Malicious Code Analysis "The Lion looked at Alice wearily. 'Are you animalor vegetableor mineral?' he said, yawning at every other word." Lewis Carroll (18321898), Through the Looking-Glass and What Alice Found There (1871). For people who are interested in nature, it is difficult to find a subject more fascinating than computer viruses. Computer virus analysis can be extremely difficult for most people at first glance. However, the difficulty depends on the actual virus code in question. Binary forms of viruses, those compiled to object code, must be reverse-engineered to understand them in detail. This process can be challenging for an individual, but it provides a great deal of knowledge about computer systems. My own interest in computer viruses began in September of 1990, when my new PC clone displayed a bizarre message, followed by two beeps. The message read "Your PC is now Stoned!" I had heard about computer viruses before, but this was my first experience with one of these incredible nuisances. Considering that my PC was two weeks old at the time, I was fascinated by how quickly I encountered a virus on it. I had introduced the Stoned boot virus with an infected diskette, which contained a copy of a popular game named Jbird. A friend had given me the game. Obviously he did not know about the hidden "extras" stored on the diskette. I did not have antivirus software at the time, of course, and because this incident happened on a Saturday, help was not readily available. The PC clone had cost me five months' worth of my summer salary, so you can imagine my disappointment! I was worried that I was going to lose all the data on my system. I remembered an incident that had happened to a friend in 1988: His PC was infected with a virus, causing characters to fall randomly down his computer screen; after a while, he could not do anything with the machine. He had told me that he needed to format the drive and reinstall all the programs. Later, we learned that a strain of the Cascade virus had infected his computer. Cascade could have been removed from his system without formatting the hard drive, but he did not know that at the time. Unfortunately, as a result, he lost all his data. Of course I wanted to do the exact opposite on my machineremove the virus without losing my data. To find the Stoned virus, I first searched the files on the infected diskette for the text that was displayed on the screen. I was not lucky enough to find any files that contained it. If I had had more experience in hunting viruses at the time, I might have considered the possibility that the virus was encrypted in a file. But this virus was not encrypted, and my instinct about a nonfile system hiding place was heading in the right direction. This gave me the idea that the virus was not stored in the files but instead was located somewhere else on the diskette. I had Peter Norton's book, Programmer's Guide to the IBM PC, on-hand. Up to this point, I had only read a few pages of it, but luckily the book described how the boot sector of diskettes could be accessed using a standard DOS tool called DEBUG. After some hesitation, I finally executed the DEBUG command for the first time to try to look into the boot sector of the diskette, which was inserted in drive A. The command was the following: DEBUG -L 100 0 0 1 This command instructs DEBUG to load the first sector (the boot sector) from drive A: to memory at offset 100 hexadecimal. When I used the dump (D) command of DEBUG to display the loaded sector's content, I saw the virus's message, as well as some other text. -d280 1437:0280 03 33 DB FE C1 CD 13 EB-C5 07 59 6F 75 72 20 50 .3........Your P 1437:0290 43 20 69 73 20 6E 6F 77-20 53 74 6F 6E 65 64 21 C is now Stoned! 1437:02A0 07 0D 0A 0A 00 4C 45 47-41 4C 49 53 45 20 4D 41 .....LEGALISE MA 1437:02B0 52 49 4A 55 41 4E 41 21-00 00 00 00 00 00 00 00 RIJUANA!........ You can imagine how excited I was to find the virus. Finally, it was right there in front of me! I spent the weekend reading more of the Norton book because I did not understand the virus's code at all. I simply did not know IBM PC Assembly language at the time, which was required to understand the code. There were so many things to learn! The Norton book introduced me to a substantial amount of the information I needed to begin. For example, it provided detailed and superb descriptions of the boot process, disk structures, and various interrupts of the DOS and basic input-output system (BIOS) routines. I spent a few days analyzing Stoned on paper and commenting every single Assembly instruction until I understood everything. It took me almost a full week to absorb all the information, but, sadly, my computer was still infected with the virus. After a few more days of work, I created a detection program, then a disinfection program for the virus, which I wrote in Turbo Pascal. The disinfection program was able to remove the virus from all over: from the system memory as well as from the boot and Master boot sectors in which the virus was stored. A couple of days later, I visited the university with my virus detector and found that the virus had infected more than half of the PC labs' machines. I was amazed at how successfully this simple virus code could invade machines around the world. I could not fathom how the virus had traveled all the way from New Zealand where, I learned later, it had been released in early 1988, to Hungary to infect my system. The Stoned virus was in the wild. (IBM researcher, Dave Chess, coined the term in the wild to describe computer viruses that were encountered on production systems. Not all viruses are in the wild. The viruses that only collectors or researchers have seen are named zoo viruses.) People welcomed the help, and I was happy because I wanted to assist them and learn more about virus hunting. I started to collect viruses from friends and wrote disinfection programs for them. Viruses such as Cascade, Vacsina, Yankee_Doodle, Vienna, Invader, Tequila, and Dark_Avenger were among the first set that I analyzed in detail, and I wrote detection and disinfection code for them one by one. Eventually, my work culminated in a diploma, and my antivirus program became a popular shareware in Hungary. I named my program Pasteur after the French microbiologist Louis Pasteur. All my efforts and experiences opened up a career for me in antivirus research and development. This book is designed to share my knowledge of computer virus research. 2.1. Common Patterns of Virus Research Computer virus analysis has some common patterns that can be learned easily, lending efficiency to the analysis process. There are several techniques that computer virus researchers use to reach their ultimate goal, which is to acquire a precise understanding of viral programs in a timely manner to provide appropriate prevention and to respond so that computer virus outbreaks can be controlled. Virus researchers also need to identify and understand particular vulnerabilities and malicious code that exploits them. Vulnerability and exploit research has its own common patterns and techniques. Some of these are similar to the methods of computer virus research, but many key differences exist. This book will introduce these useful techniques to teach you how to deal with viral programs more efficiently. Along the way, you will learn how to analyze a computer virus more effectively and safely by using disassemblers, debuggers, emulators, virtual machines, file dumpers, goat files, dedicated virus replication machines and systems, virus test networks, decryption tools, unpackers, and many other useful tools. You can use this information to deal with computer virus problems more effectively on a daily basis. You also will learn how computer viruses are classified and named, as well as a great deal about state-of-the-art computer virus tricks. Computer virus source code is not discussed in this book. Discussions on this topic are unethical and in some countries, illegal1. More importantly, writing even a dozen viruses would not make you an expert on this subject. Some virus writers2 believe that they are experts because they created a single piece of code that replicates itself. This assumption could not be further from the truth. Although some virus writers might be very knowledgeable individuals, most of them are not experts on the subject of computer viruses. The masterminds who arguably at various times represented the state of the art in computer virus writing go (or went) by aliases such as Dark Avenger3, Vecna, Jacky Qwerty, Murkry, Sandman, Quantum, Spanska, GriYo, Zombie, roy g biv, and Mental Driller. 2.2. Antivirus Defense Development Initially, developing antivirus software programs was not difficult. In the late '80s and early '90s, many individuals were able to create some sort of antivirus program against a particular form of a computer virus. Frederick Cohen proved that antivirus programs cannot solve the computer virus problem because there is no way to create a single program that can detect all future computer viruses in finite time. Regardless of this proven fact, antivirus programs have been quite successful in dealing with the problem for a while. At the same time, other solutions have been researched and developed, but computer antivirus programs are still the most widely used defenses against computer viruses at present, regardless of their many drawbacks, including the inability to contend with and solve the aforementioned problem. Perhaps under the delusion that they are experts on computer viruses, some security analysts state that any sort of antivirus program is useless if it cannot find all the new viruses. However, the reality is that without antivirus programs, the Internet would be brought to a standstill because of the traffic undetected computer viruses would generate. Often we do not completely understand how to protect ourselves against viruses, but neither do we know how to reduce the risk of becoming infected by them by adopting proper hygiene habits. Unfortunately, negligence is one of the biggest contributors to the spread of computer viruses. The sociological aspects of computer security appear to be more relevant than technology. Carelessly neglecting the most minimal level of computer maintenance, network security configuration, and failing to clean an infected computer opens up a Pandora's box that allows more problems to spread to other computers. In the early phases of virus detection and removal, computer viruses were easily managed because very few viruses existed (there were fewer than 100 known strains in 1990). Computer virus researchers could spend weeks analyzing a single virus alone. To make life even easier, computer viruses spread slowly, compared to the rapid proliferation of today's viruses. For example, many successful boot viruses were 512 bytes long (the size of the boot sector on the IBM PC), and they often took a year or longer to travel from one country to another. Consider this: The spread time at which a computer virus traveled in the past compared to today's virus spread time is analogous to comparing the speed of message transfer in ancient times, when messengers walked or ran from city to city to deliver parcels, with today's instant message transfer, via e-mail, with or without attachments. Finding a virus in the boot sector was easy for those who knew what a boot sector was; writing a program to recognize the infection was tricky. Manually disinfecting an infected system was a true challenge in and of itself, so creating a program that automatically removed viruses from computers was considered a tremendous achievement. Currently, the development of antivirus and security defense systems is deemed an art form, which lends itself to cultivating and developing a plethora of useful skills. However, natural curiosity, dedication, hard work, and the continuous desire to learn often supersede mere hobbyist curiosity and are thus essential to becoming a master of this artistic and creative vocation. 2.3. Terminology of Malicious Programs The need to define a unified nomenclature for malicious programs is almost as old as computer viruses themselves4. Obviously, each classification has a common pitfall because classes will always appear to overlap, and classes often represent closely related subclasses of each other. 2.3.1. Viruses As defined in Chapter 1, "Introduction to the Games of Nature," a computer virus is code5 that recursively replicates a possibly evolved copy of itself. Viruses infect a host file or system area, or they simply modify a reference to such objects to take control and then multiply again to form new generations. 2.3.2. Worms Worms are network viruses, primarily replicating on networks. Usually a worm will execute itself automatically on a remote machine without any extra help from a user. However, there are worms, such as mailer or mass-mailer worms, that will not always automatically execute themselves without the help of a user. Worms are typically standalone applications without a host program. However, some worms, like [email protected], also spread as a file-infector virus and infect host programs, which is precisely why the easiest way to approach and contain worms is to consider them a special subclass of virus. If the primary vector of the virus is the network, it should be classified as a worm. 2.3.2.1 Mailers and Mass-Mailer Worms Mailers and mass-mailer worms comprise a special class of computer worms, which send themselves in an e-mail. Mass-mailers, often referred to as "@mm" worms such as [email protected], send multiple e-mails including a copy of themselves once the virus is invoked. Mailers will send themselves less frequently. For instance, a mailer such as [email protected] (also known as the Happy99 worm) sends a copy of itself every time the user sends a new message. 2.3.2.2 Octopus An octopus is a sophisticated kind of computer worm that exists as a set of programs on more than one computer on a network. For example, head and tail copies are installed on individual computers that communicate with each other to perform a function. An octopus is not currently a common type of computer worm but will likely become more prevalent in the future. (Interestingly, the idea of the octopus comes from the science fiction novel Shockwave Rider by John Brunner. In the story, the main character, Nickie, is on the run and uses various identities. Nickie is a phone phreak, and he uses a "tapeworm," similar to an octopus, to erase his previous identities.) 2.3.2.3 Rabbits A rabbit is a special computer worm that exists as a single copy of itself at any point in time as it "jumps around" on networked hosts. Other researchers use the term rabbit to describe crafty, malicious applications that usually run themselves recursively to fill memory with their own copies and to slow down processing time by consuming CPU time. Such malicious code uses too much memory and thus can cause serious side effects on a machine within other applications that are not prepared to work under low-memory conditions and that unexpectedly cease functioning. 2.3.3. Logic Bombs A logic bomb is a programmed malfunction of a legitimate application. An application, for example, might delete itself from the disk after a couple of runs as a copy protection scheme; a programmer might want to include some extra code to perform a malicious action on certain systems when the application is used. These scenarios are realistic when dealing with large projects driven by limited code-reviews. An example of a logic bomb can be found in the original version of the popular Mosquitos game on Nokia Series 60 phones. This game has a built-in function to send a message using the Short Message Service (SMS) to premium rate lines. The functionality was built into the first version of the game as a software distribution and piracy protection scheme, but it backfired6. When legitimate users complained to the software vendor, the routine was eliminated from the code of the game. The premium lines have been "disconnected" as well. However, the pirated versions of the game are still in circulation, which have the logic bomb inside and send regular SMS messages. The game used four premium SMS phone numbers such as 4636, 9222, 33333, and 87140, which corresponded to four countries. For example, the number 87140 corresponded to the UK. When the game used this number, it sent the text "king.001151183" as short message. In turn, the user of the game was charged a hefty A31.5 per message. Often extra functionality is hidden as resources in the applicationand remains hidden. In fact, the way in which these functions are built into an application is similar to the way so-called Easter eggs are making headway into large projects. Programmers create Easter eggs to hide some extra credit pages for team members who have worked on a project. Applications such as those in the Microsoft Office suite have many Easter eggs hidden within them, and other major software vendors have had similar credit pages embedded within their programs as well. Although Easter eggs are not malicious and do not threaten end users (even though they might consume extra space on the hard drive), logic bombs are always malicious. 2.3.4. Trojan Horses Perhaps the simplest kind of malicious program is a Trojan horse. Trojan horses try to appeal to and interest the user with some useful functionality to entice the user to run the program. In other cases, malicious hackers leave behind Trojanized versions of real tools to camouflage their activities on a computer, so they can retrace their steps to the compromised system and perform malicious activities later. For example, on UNIX-based systems, hackers often leave a modified version of "ps" (a tool to display a process list) to hide a particular process ID (PID), which can relate to another backdoor Trojan's process. Later on, it might be difficult to find such changes on a compromised system. These kinds of Trojans are often called user mode rootkits. The attacker can easily manipulate the tool by modifying the source code of the original tool at a certain location. At first glance, this minor modification is extremely difficult to locate. Probably the most famous Trojan horse is the AIDS TROJAN DISK7 that was sent to about 7,000 research organizations on a diskette. When the Trojan was introduced on the system, it scrambled the name of all files (except a few) and filled the empty areas of the disk completely. The program offered a recovery solution in exchange of a bounty. Thus, malicious cryptography was born. The author of the Trojan horse was captured shortly after the incident. Dr. Joseph Popp, 39 at the time, a zoologist from Cleveland, Ohio was prosecuted in the UK8. The filename scrambling function of AIDS TROJAN DISK was based on two substitution tables9. One was used to encrypt the filenames and another to encrypt the file extensions. At some point in the history of cryptography10, such an algorithm was considered unbreakable11. However, it is easy to see that substitution ciphers can be easily attacked based on the use of statistical methods (the distribution of common words). In addition, if given enough time, the defender can disassemble the Trojan's code and pick the tables from its code. There are two kinds of Trojans: • One hundred percent Trojan code, which is easy to analyze. • A careful modification of an original application with some extra functionality, some of which belong to backdoor or rootkit subclasses. This kind of Trojan is more common on open source systems because the attacker can easily insert backdoor functionality to existing code. Note The source code of Windows NT and Windows 2000 got into circulation in early 2004. It is expected that backdoor and rootkit programs will be created using these sources. 2.3.4.1 Backdoors (Trapdoors) A backdoor is the malicious hacker's tool of choice that allows remote connections to systems. A typical backdoor opens a network port (UDP/TCP) on the host when it is executed. Then, the listening backdoor waits for a remote connection from the attacker and allows the attacker to connect to the system. This is the most common type of backdoor functionality, which is often mixed with other Trojan-like features. Another kind of backdoor relates to a program design flaw. Some applications, such as the early implementation of SMTP (simple mail transfer protocol) allowed features to run a command (for example, for debugging purposes). The Morris Internet worm uses such a command to execute itself remotely, with the command placed as the recipient of the message on such vulnerable installations. Fortunately, this command was quickly removed once the Morris worm exploited it. However, there can be many applications, especially newer ones, that allow for similar insecure features. 2.3.4.2 Password-Stealing Trojans Password-stealing Trojans are a special subclass of Trojans. This class of malicious program is used to capture and send a password to an attacker. As a result, an attacker can return to the vulnerable system and take whatever he or she wants. Password stealers are often combined with keyloggers to capture keystrokes when the password is typed at logon. 2.3.5. Germs Germs are first-generation viruses in a form that the virus cannot generate to its usual infection processes. Usually, when the virus is compiled for the first time, it exists in a special form and normally does not have a host program attached to it. Germs will not have the usual marks that most viruses use in second-generation form to flag infected files to avoid reinfecting an already infected object. A germ of an encrypted or polymorphic virus is usually not encrypted but is plain, readable code. Detecting germs might need to be done differently from detecting second, and later, -generation infections. 2.3.6. Exploits Exploit code is specific to a single vulnerability or set of vulnerabilities. Its goal is to run a program on a (possibly remote, networked) system automatically or provide some other form of more highly privileged access to the target system. Often, a single attacker builds exploit code and shares it with others. "White hat" hackers create a form of exploit code for penetration (or "pen") testing. Therefore, depending on the actual use of the exploit, the exploitation might be malicious in some cases but harmless in othersthe severity of the threat depends on the intention of the attacker. 2.3.7. Downloaders A downloader is yet another malicious program that installs a set of other items on a machine that is under attack. Usually, a downloader is sent in e-mail, and when it is executed (sometimes aided with the help of an exploit), it downloads malicious content from a Web site or other location and then extracts and runs its content. 2.3.8. Dialers Dialers got their relatively early start during the heyday of dial-up connections to bulletin board systems (BBSs) in homes. The concept driving a dialer is to make money for the people behind the dialer by having its users (often unwitting victims) call via premium-rate phone numbers. Thus, the person who runs the dialer might know the intent of the application, but the user is not aware of the charges. A common form of dialer is the so-called porn dialer. Similar approaches exist on the World Wide Web using links to Web pages that connect to paid services. 2.3.9. Droppers The original term refers to an "installer" for first-generation virus code. For example, boot viruses that first exist as compiled files in binary form are often installed in the boot sector of a floppy using a dropper. The dropper writes the germ code to the boot sector of the diskette. Then the virus can replicate on its own without ever generating the dropper form again. When the virus regenerates the dropper form, the intermediate form is part of an infection cycle, which is not to be confused with a dedicated (or pure) dropper. 2.3.10. Injectors Injectors are special kinds of droppers that usually install virus code in memory. An injector can be used to inject virus code in an active form on a disk interrupt handler. Then, the first time a user accesses a diskette, the virus begins to replicate itself normally. A special kind of injector is the network injector. Attackers also can use legitimate utilities, such as NetCat (NC), to inject code into the network. Usually, a remote target is specified, and the datagram is sent to the machine that will be attacked using the injector. An attacker initially introduced the CodeRed worm using an injector; subsequently, the worm replicated as data on the network without ever hitting the disk again as a file. Injectors are often used in a process called seeding. Seeding is a process that is used to inject virus code to several remote systems to cause an initial outbreak that is large enough to cause a quick epidemic. For example, there is supporting digital evidence that W32/Witty worm12 was seeded to several systems by its author. 2.3.11. Auto-Rooters Auto-rooters are usually malicious hacker tools used to break into new machines remotely. Auto-rooters typically use a collection of exploits that they execute against a specified target to "gain root" on the machine. As a result, a malicious hacker (typically a so-called script-kiddie) gains administrative privileges to the remote machine. 2.3.12. Kits (Virus Generators) Virus writers developed kits, such as the Virus Creation Laboratory (VCL) or PSMPC generators, to generate new computer viruses automatically, using a menu-based application. With such tools, even novice users were able to develop harmful computer viruses without too much background knowledge. Some virus generators exist to create DOS, macro, script, or even Win32 viruses and mass-mailing worms. As discussed in Chapter 7 "Advanced Code Evolution Techniques and Computer Virus Generator Kits," the so-called "Anna Kournikova" virus (technically VBS/VBSWG.J) was created by a Dutch teenager, Jan de Wit, from the VBSWG kitsadly, de Wit got lucky and the kit, infamous for churning out mainly broken, intended code produced a working virus. De Wit was subsequently arrested, convicted, and sentenced for his role in this. 2.3.13. Spammer Programs Vikings: Spam spam spam spam Waitress: ...spam spam spam egg and spam; spam spam spam spam spam baked beans spam spam spam... Vikings: Spam! Lovely spam! Lovely spam! Monty Python Spam Song Spammer programs are used to send unsolicited messages to Instant Messaging groups, newsgroups, or any other kind of mobile device in forms of e-mail or cell phone SMS messages. Two lawyers helped to make spam an international, albeit notorious, superstar of the worldwide Internet virus scene. Their main objective was to send advertisements to Internet newsgroups. Spam mail has become the number one Internet nuisance for the global community. Many e-mail users complain that their inbox is littered with more than 70% spam each day. This ratio has been on the rise for the last couple of years. The primary motivation of spammers is to make money by generating traffic to Web sites. In addition, spam messages are often used to implement phishing attacks. For example, you might receive an e-mail message asking you to visit your bank's Web site and telling you that if you don't, they will disable your account. There is a link in the e-mail, however, that forwards you to the fraudster. If you fall victim to the attack, you might disclose personal information to the attacker on a silver plate. The fraudster wants to get your credit card number, account number, password, PIN (personal identification number), and other personal information to make money. In addition, you might become the prime subject of an identity theft as well. 2.3.14. Flooders Malicious hackers use flooders to attack networked computer systems with an extra load of network traffic to carry out a denial of service (DoS) attack. When the DoS attack is performed simultaneously from many compromised systems (so-called zombie machines), the attack is called a distributed denial of service (DDoS) attack. Of course, there are much more sophisticated DoS attacks including SYN floods, packet fragmentation attacks, and other (mis-)sequencing attacks, traffic amplification, or traffic deflection, just to name the most common types. 2.3.15. Keyloggers A keylogger captures keystrokes on a compromised system, collecting sensitive information for the attacker. Such sensitive information might include names, passwords, PINs, birthdays, Social Security numbers, or credit card numbers. The keylogger is installed on the system. Unbeknownst to the user, a computer could be compromised for weeks before the attack is ever noticed. Attackers often use keyloggers to commit identity theft. 2.3.16. Rootkits Rootkits are a special set of hacker tools that are used after the attacker has broken into a computer system and gained root-level access. Usually, hackers break into a system with exploits and install modified versions of common tools. Such rootkits are called user-mode rootkits because the Trojanized application runs in user mode. Some more sophisticated rootkits, such as Adore13, have kernel-mode module components. These rootkits are more dangerous because they change the behavior of the kernel. Thus, they can hide objects from even kernel-level defense software. For example, they can hide processes, files in the file system, registry keys, and values under Windows, and implement stealth capabilities for other malicious components. In contrast, user-mode rootkits cannot typically hide themselves effectively from kernel-level defense software. User-mode rootkits only manipulate with user-mode objects; therefore, defense systems relying on kernel objects have chance to reveal the truth. 2.4. Other Categories Some other categories of commonly encountered Internet pests are not necessarily malicious in their primary intent. However, they can be a nuisance to end users; therefore, antivirus and antispam products have been created to detect and remove such annoying burdens from computers. 2.4.1. Joke Programs Joke programs are not malicious; however, as Alan Solomon (author of one of the most widely used scanning engines today) once mentioned, "Whether a program should be classified as a joke program or as a Trojan largely depends on the sense of humor of the victim." Joke programs change or interrupt the normal behavior of your computer, creating a general distraction or nuisance. Colleagues often make fun of each other by installing a joke program or by tricking others to run one on their systems. A typical example of a joke program is a screen saver that randomly locks the system. However, such programs can be considered harmful in some cases. Consider, for example, a joke program that locks the system but never unlocks it. Thus, computers cannot be stopped safely. As a result, important data could be lost because it was never saved to the disk. Or worse, the file allocation table could get corrupted, and the machine would become unbootable. 2.4.2. Hoaxes: Chain Letters On computers, hoaxes typically spread information about computer virus infections and ask the recipient of the message to forward it to others. One of the most infamous hoaxes was the Good Times hoax. Good Times appeared in 1994 and warned users about a potential new kind of virus that would arrive in e-mail. The hoax claimed that reading a message with "Good Times" in the subject line would erase data from the hard disk. Although many believed at the time that such an e-mail based virus was a hoax, the reality is that such a payload might be possible. Hoaxes typically mix some reality with lies. Good Times claimed that a particular virus existed, which was simply not true. End users then spread the e-mail hoax to new people, "replicating" the message on the Internet by themselves and overloading e-mail systems with the hoax. At larger corporations, policies must be implemented to avoid the spread of hoaxes on local systems. In the past, a typical hoax circulating at large corporations tried to deceive people into believing an untrue story about a very sick child, attempting to collect money for the child's medical procedure. Most people were sympathetic and did not recognize the danger of forwarding the e-mail message in this case; they trusted the source and believed the fabricated story. With company policies intact, the problems that such hoaxes create can be effectively eliminated. However, hoaxes are considered one of the most successful Internet threats every year; take for example, the new chain letters that surface and rapidly spread around the world. 2.4.3. Other Pests: Adware and Spyware A new type of application has appeared recently as a direct result of increased residential Internet access. Many companies are interested in what people look for or research on the Web, especially what kinds of products consumers might buy. Therefore, some consumer retail businesses install little applications to collect information and display customized advertisements in pop-up messages. The most obvious problem with this type of application is that such applications were not written with malicious intent. In fact, many programmers make a living out of writing such tools. However, many of these Internet pests get installed on a system without the user's permission or knowledge, raising questions about privacy. Not surprisingly, corporations as well as home users dislike this type of program, referred to as spyware, which collects various information of user activity and then sends these data to a company via the Internet. Home users are undoubtedly disturbed by this invasive activity, not to mention the frustration that users feel in response to pop-ups. In addition, these programs are often very poorly written and are resource hogs, particularly when two or more become installed on the same machine. Many also have the highly undesirable habit of lowering Internet Explorer's already deplorable security settings to unconscionable levels, opening the (usually unwitting) "victim" up to even worse exploits and infections14. Because these applications are often a major source of business for organizations driven by consumer revenue, such businesses prefer that antivirus products not detect such programs at all, or at least not by default. Often such companies bring lawsuits against vendors who produce software to detect and remove their "applications." Such litigation makes the fight against this kind of pest much more difficult. It is expected, however, that such programs will be illegal to create in several countries in the future. To make things even more interesting, some corporations prefer to remove "unwanted" spyware but want to keep the few "tools" that they use to monitor their employees on a regular basis. 2.5. Computer Malware Naming Scheme Back in 1991, founding members of CARO (Computer Antivirus Researchers Organization) designed a computer virus naming scheme15 for use in antivirus (AV) products. Today, the CARO naming scheme is slightly outdated compared to daily practice, but it remains the only standard that most antivirus companies ever attempted to adopt. An up-to-date version of the document is in the works and is expected to be published by CARO soon at www.caro.org. In this short section, I can only show you a 10,000-foot view of malware naming. I strongly recommend Nick FitzGerald's AVAR 2002 conference paper16, which greatly expands on further naming considerations. Furthermore, credit must be given to all the respected antivirus researchers of CARO. Note The original naming scheme was designed by Dr. Alan Solomon, Fridrik Skulason, and Dr. Vesselin Bontchev. Virus naming is a challenging task. Unfortunately, there has been a major increase in widespread, fast-running computer virus outbreaks. Nowadays, antivirus researchers must add detections of 500, 1000, 1500, or even more threats to their products each month. Thus, the problem of naming computer viruses, even by the same common name, is getting to be a hard, if not impossible, task to manage. Nonetheless, representatives of antivirus companies still try to reduce the confusion by using a common name for at least the in-the-wild computer malware. However, computer virus outbreaks are on the rise, and researchers do not have the time to agree on a common name for each in-the-wild virus in advance of deploying response definitions. Even more commonly, it is very difficult to predict which viruses will be seen in the wild and which will remain zoo viruses. Most people remember textual family names better than the naked IDs that many other naming schemes have adopted in the security space. Let's take a look at malware naming in its most complex form: <malware_type>://<platform>/<family_name>.<group_name>.<infective_length>.<variant><devolution><modifiers> In practice, very little, if any, malware requires all name components. Practically anything other than the family name is an optional field: [<malware_type>://][<platform>/]<family_name>[.<group_name>][.<infective_length>][.<variant>[<devolution>]][<modifiers>] The following sections give a short description of each naming component. 2.5.1. <family_name> This is the key component of any malware name. The basic rule set for the family name follows: • Do not use company names, brand names, or the names of living people. • Do not use an existing family name unless the virus belongs to the same family. • Do not use obscene or offensive names. • Do not use another name if a name already exists for the family. Use a tool, such as VGrep, to check name cross-references for older malware. • Do not use numeric family names. • Avoid the malware writer's suggested or intended name. • Avoid naming malware after a file that traditionally or conventionally contains the malware. • Avoid family names such as Friday_13th, particularly if the dates represent payload triggers. • Avoid geographic names that are based on the discovery site. • If multiple acceptable names exist, select the original one, the one used by the majority of existing antivirus programs, or the most descriptive one. 2.5.2. <malware_type>:// This part of the name indicates whether a malware type is a virus, Trojan, dropper, intended, kit, or garbage type (Virus://, Trojan://, .., Garbage://). Several products have extended this set slightly, and these are expected to become part of the standard malware naming in the future. 2.5.3. <platform>/ The platform prefix indicates the minimum native environment for the malware type that is required for it to function correctly. An annotated list of officially recognized platform names is listed in the next section. Note Multiple platform names can be defined for the same threat, for example, virus://{W32,W97M}/Beast.41472.A17. This name indicates a file-infecting virus called Beast that can infect on Win32 platforms and also is able to infect Word 97 documents. 2.5.4. .<group_name> The group name represents a major family of computer viruses that are similar to each other. The group name is rarely used nowadays. It was mostly used to group DOS viruses. 2.5.5. <infective_length> The infective length is used to distinguish parasitic viruses within a family or group based on their typical infective length in bytes. 2.5.6. <variant> The subvariant represents minor variants of the same virus family with the same infective length. 2.5.7. [<devolution>] The devolution identifier is used most commonly with the subvariant name in the case of macro viruses. Some macro viruses have a common ability (mostly related to programming mistakes) to create a subset of their original macro set during their natural replication cycle. Thus, the subset of macros cannot regenerate the original, complete macro set but is still able to recursively replicate from the partial set. 2.5.8. <modifiers> The original intent of the modifier was to identify the polymorphic engine of a computer virus. However, most antivirus developers never used this modifier in practice. Nowadays, modifiers include the following optional components: [[:<locale_specifier>][#<packer>][@'m'|'mm'][!<vendor-specific_comment>]] 2.5.9. :<locale_specifier> This specifier is used mostly for macro viruses that depend on a particular language version of their environment, such as Word. For example, virus://WM/Concept.B:Fr is a virus that affects only the French version of Microsoft Word. 2.5.10. #<packer> The packer modifier is rarely used in practice. It can indicate that a computer malware was packed with a particular "on-the-fly" extractor unpacker, such as UPX. 2.5.11. @m or @mm These symbols indicate self-mailer or mass-mailer computer viruses. Suggested by Bontchev, this is probably the most widely recognized modifier. This modifier highlights computer viruses that are more likely to be encountered by the general public because of the way the viruses use e-mail to propagate themselves. 2.5.12. !<vendor-specific_comment> The vendor-specific modifier is a recent addition to the set of modifiers. Vendors are allowed to postfix any malware name with such a modifier. For example, a vendor might want to indicate that a virus is multipartite by using !mp in the name. 2.6. Annotated List of Officially Recognized Platform Names The platform names shown in Table 2.1 are the only officially recognized identifiers following the proposed naming standard. A platform name that does not appear on this list cannot be used as a platform identifier in a malware name following this standard. The Comments column helps to explain some of the finer points of platform name selection. This is intended to be an authoritative list at this book's publication date. The platform list will need to be extended in the future. Table 2.1. Officially Recognized Platform Names Short FormLong FormComments ABAPABAPMalware for the SAP /R3 Advanced Business Application Programming environment. ALSACADLispScriptMalware that requires AutoCAD Lisp Interpreter. BATBATMalware that requires a DOS or Windows command shell interpreter or close clone. BeOSBeOSRequires BeOS. BootBootRequires MBR and/or system boot sector of IBM PCcompatible hard drive and/or floppy. (Rarely used in practice.) DOSDOSInfects DOS COM and/or EXE (MZ) and/or SYS format files and requires some version of MS-DOS or a closely compatible OS. (Rarely used in practice.) EPOCEPOCRequires the EPOC OS up to version 5. SymbOSSymbianOSRequires Symbian OS (EPOC version 6 and later). JavaJavaRequires a Java run-time environment (standalone or browser-embedded). MacOSMacOSRequires a Macintosh OS prior to OS X. MeOSMenuetOSRequires MenuetOS. MSILMSILRequires the Microsoft Intermediate Language runtime. MulMultiThis is a pseudo-platform, and its use is reserved for a few very special cases. PalmOSPalmOSRequires a version of PalmOS. OS2OS2Requires OS/2. OSXOSXRequires Macintosh OS X or a subsequent, essentially similar version. W16Win16Requires one of the 16-bit Windows x86 OSes. (Note: Several products use the Win prefix.) W95Win95Requires Windows 9x VxD services. W32Win32Requires a 32-bit Windows (Windows 9x, Me, NT, 2000, XP on x86). W64Win64Requires Windows 64. WinCEWinCERequires WinCE. WMWordMacroMacro malware for WordBasic as included in WinWord 6.0, Word 95, and Word for Mac 5.x. W2MWord2MacroMacro malware for WordBasic as included in WinWord 2.0. W97MWord97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Word (that shipped in Word 97) or later. Changes in VBA between Word 97 and 2003 versions (inclusive) are sufficiently slight that we do not distinguish platforms even if the malware makes a version check or uses one of the few VBA features added in versions subsequent to VBA v5.0. AMAccessMacroMacro malware for AccessBasic. A97MAccess97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Access that shipped in Access 97 and later. As for W97M, changes in VBA versions between Access 97 and 2003 (inclusive) are insufficient to justify distinguishing the platforms. P98MProject98MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Project that shipped in Project 98 and later. As for W97M, changes in VBA versions between Project 98 and 2003 (inclusive) are insufficient to justify distinguishing the platforms. PP97MPowerPoint97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Project, which shipped in Project 97 and later. As for W97M, changes in VBA between Project 97 and 2002 inclusive are insufficient to justify distinguishing the platforms. V5MVisio5MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Visio that shipped in Visio 5.0 and later. As for W97M, changes in VBA versions between Visio 5.0 and 2002 inclusive are insufficient to justify distinguishing the platforms. XFExcelFormulaMalware based on Excel Formula language that has shipped in Excel since the very early days. XMExcelMacroMacro malware for Visual Basic for Applications (VBA) v3.0 that shipped in Excel for Windows 5.0 and Excel for Mac 5.x. X97MExcel97MacroMacro malware for Visual Basic for Applications (VBA) v5.0 for Excel that shipped in Excel 97 and later. As for W97M, changes in VBA versions between Excel 97 and 2002 (inclusive) are insufficient to justify distinguishing the platforms. O97MOffice97MacroThis is a pseudo-platform name reserved for macro malware that infects across at least two applications within the Office 97 and later suites. Cross-infectors between Office applications and related products, such as Project or Visio, can also be labeled thus. AC14MAutoCAD14MacroVBA v5.0 macro viruses for AutoCAD r14 and later. As with W97M malware, minor differences in later versions of VBA are insufficient to justify new plat form names. ActnSActionScriptRequires the Macromedia ActionScript interpreter found in some ShockWave Flash (and possibly other) animation players. AplSAppleScriptRequires AppleScript interpreter. APMAmiProMacroMacro malware for AmiPro. CSCCorelScriptMalware that requires the CorelScript interpreter shipped in many Corel products. HLPWinHelpScriptRequires the script interpreter of the WinHelp display engine. INFINFScriptRequires one of the Windows INF (installer) script interpreters. JSJScript, JavaScriptRequires a JScript and/or JavaScript interpreter. Hosting does not affect the platform designatorstandalone JS malware that requires MS JS under WSH, HTML-embedded JS malware, and JS malware embedded in Windows-compiled HTML help files (.CHM) all fall under this platform type. MIRCmIRCScriptRequires the mIRC script interpreter. MPBMapBasicRequires MapBasic of MapInfo product. PerlPerlRequires a Perl interpreter. Hosting does not affect the platform designatorstandalone Perl infectors under UNIX(-like) shells, ones that require Perl under WSH and HTML-embedded Perl malware all fall under this platform type. PHPPHPScriptRequires a PHP script interpreter. PirchPirchScriptRequires the Pirch script interpreter. PSPostScriptRequires a PostScript interpreter. REGRegistryRequires a Windows Registry file (.REG) interpreter. (We do not distinguish .REG versions or ASCII versus Unicode.) SHShellScriptRequires a UNIX(-like) shell interpreter. Hosting does not affect the platform nameshell malware specific to Linux, Solaris, HP-UX, or other systems, or specific to csh, ksh, bash, or other interpreters currently all fall under this platform type. VBSVBScript, VisualBasicScriptRequires a VBS interpreter. Hosting does not affect the platform designatorstandalone VBS infectors that require VBS under WSH, HTML-embedded VBS malware, and malware embedded in Windows-compiled HTML help files (.CHM) all fall under this platform type. UNIXUNIXThis is a common name for binary viruses on UNIX platforms. (More specific platform names are available.) BSDBSDUsed for malware specific to BSD (-derived) platforms. LinuxLinuxUsed for malware specific to Linux platforms and others closely based on it. SolarisSolarisUsed for Solaris-specific malware. References 1. Joe Hirst, "Virus Research and Social Responsibility," Virus Bulletin, October 1989, page 3. 2. Sarah Gordon, "The Generic Virus Writer," Virus Bulletin Conference, 1994. 3. Vesselin Bontchev, "The Bulgarian and Soviet Virus Writing Factories," Virus Bulletin Conference, 1991, pp. 11-25. 4. Dr. Keith Jackson, "Nomenclature for Malicious Programs," Virus Bulletin, March, 1990, page 13. 5. Vesselin Bontchev, "Are 'Good' Computer Viruses Still a Bad Idea?," EICAR, 1994, pp. 25-47. 6. Jamo Niemela, "Mquito," http://www.f-secure.com/v-descs/mquito.shtml . 7. Jim Bates, "Trojan Horse: AIDS Information Introductory Diskette Version 2.0," Virus Bulletin, January 1990, page 3. 8. Mark Hamilton, "U.S. Judge Rules In Favour Of Extradition," Virus Bulletin, January, 1991. 9. Istvan Farmosi, Janos Kis, Imre Szegedi , "Viruslelektan," Alaplap Konyvek, Budapest, 1990, ISBN: 963-02-8675-0 (Paperback). 10. David Kahn, "The CODE-Breakers," Scribner, New York, 1967, 1996, ISBN: 0-684-83130-9. 11. Tibor Nemetz, Istvan Vajda , "Algorithmic Cryptography," Academic Press, Budapest, 1991, ISBN: 963-05-6093-2. 12. Peter Ferrie, Frederic Perriot and Peter Szor , "Chiba Witty Blues," Virus Bulletin, May 2004, pp. 9-10. 13. Sami Rautiainen, "Hidden Under the Hood: Linux Backdoors," Virus Bulletin Conference 2002, pp. 217-234. 14. Nick FitzGerald, Private Communication, 2004. 15. Vesselin Bontchev, Fridrik Skulason and Alan Solomon , "A Virus Naming Convention," available at the FTP site of University of Hamburg, ftp://ftp.informatik.uni-hamburg.de/pub/virus/texts/tests/naming.zip. 16. Nick FitzGerald, "A Virus by Any Other Name: The Revised CARO Naming Convention," AVAR Conference, 2002. 17. Peter Szor, "Beast Regards," Virus Bulletin, June 1999, pp. 6-7. Chapter 3. Malicious Code Environments "In all things of nature there is something of the marvelous." Aristotle One of the most important steps toward understanding computer viruses is learning about the particular execution environments in which they operate. In theory, for any given sequence of symbols we could define an environment in which that sequence could replicate itself. In practice, we need to be able to find the environment in which the sequence of symbols operates and prove that it uses code explicitly to make copies of itself and does so recursively1. A successful penetration of the system by viral code occurs only if the various dependencies of malicious code match a potential environment. Figure 3.1 is an imperfect illustration of common environments for malicious code. A perfect diagram like this is difficult to draw in 2D form. Figure 3.1. Common environments of malicious code. The figure shows that Microsoft Office itself creates a homogeneous environment for malicious code across Mac and the PC. However, not all macro viruses2 that can multiply on the PC will be able to multiply on the Mac because of further dependencies. Each layer might create new dependencies (such as vulnerabilities) for malicious code. It is also interesting to see how possible developments of .NET on further operating systems, such as Linux, might change these dependency points and allow computer viruses to jump across operating systems easily. Imagine that each ring in Figure 3.1 has tiny penetration holes in it. When the holes on all the rings match the viral code and all the dependencies are resolved, the viral code successfully infects the system. Figure 3.1 suggests how difficult virus research has become over the years. With many platforms already invaded by viruses, the fight against malicious code gets more and more difficult. Please note that I am not suggesting that viruses would need to exploit systems. An exploitable vulnerability is just one possible dependency out of many examples. Automation of malicious code analysis has also become increasingly more difficult because of diverse environment dependency issues. It is not uncommon to spend many hours with a virus in a lab environment, attempting natural replication, but without success, while the virus is being reported from hundreds or perhaps even thousands of systems around the world. Another set of viruses could be so unsuccessful that a researcher could never manage to replicate them. Steve White of IBM Research once said that he could give a copy of the Whale virus ("the mother of all viruses") to everybody in the audience, and it would still not replicate3. However, it turns out that Whale has an interesting dependency on early 8088 architectures4 on which it works perfectly. Even more interestingly, this dependency disappears on Pentium and above processors5. Thus Whale, "the dinosaur heading for extinction,"6 is able to return, theoretically, in a Jurassic Park like fashion. One of the greatest challenges facing virus researchers is the need to be able to recognize the types, formats, and sequences of code and to find its environment. A researcher can only analyze the code according to the rules of its en-vironment and prove that the sequence of code is malicious in that environment. Over the years, viruses have appeared on many platforms, including Apple II, C64, Atari ST, Amiga, PC, and Macintosh, as well as mainframe systems and handheld systems such as the PalmPilot7, Symbian phones, and the Pocket PC. However, the largest set of computer viruses exists on the IBM PC and its clones. In this chapter, I will discuss the most important dependency factors that computer viruses rely on to replicate. I will also demonstrate how computer viruses unexpectedly evolve, devolve, and mutate, caused by the interaction of virus code with its environment. 3.1. Computer Architecture Dependency Most computer viruses do spread in executable, binary form (also called compiled form). For instance, a boot virus will replicate itself as a single or couple of sectors of code and takes advantage of the computer's boot sequence. Among the very first documented virus incidents was Elk Cloner on the Apple II, which is also a boot virus. Elk Cloner modified the loaded operating system with a hook to itself so that it could intercept disk access and infect newly inserted disks by overwriting their system boot sectors with a copy of its own code and so on. Brain, the oldest known PC computer virus, was a boot sector virus as well, written in 1986. Although the boot sequences of the two systems as well as the structures of these viruses show similarities, viruses are highly dependent on the particularities of the architecture itself (such as the CPU dependency described later on in this chapter) and on the exact load procedure and memory layout. Thus, binary viruses typically depend on the computer architecture. This explains why one computer virus for an Apple II is generally unable to infect an IBM PC and vice versa. In theory, it would be feasible to create a multi-architecture binary virus, but this is no simple task. It is especially hard to find ways to execute the code made for one architecture to run on another. However, it is relatively easy to code to two independent architectures, inserting the code for both in the same virus. Then the virus must make sure that the proper code gets control on the proper architecture. In March of 2001, the PeElf virus proved that it was possible to create a cross-platform binary virus. Virus writers found another way to solve the multi-architecture and operating system issue by translating the virus code to a pseudoformat and then translating it to a new architecture. The Simile.D virus (also known as Etap.D) of Mental Driller uses this strategy to spread itself on Windows and Linux systems on 32-bit Intel (and compatible) architectures. It is interesting to note that some viruses refrain from replication in particular environments. Such an attempt was first seen in the Cascade virus, written by a German programmer in 1987. Cascade was supposed to look at the BIOS of the system, and if it found an IBM copyright, it would refrain from infecting. This part of the virus had a minor bug, so the virus infected all kinds of systems. Its author repeatedly released new versions of the virus to fix this bug, but the newer variants also had bugs in this part of the code8. Another kind of computer virus is dependent on the nature of BIOS updating. On so-called flashable or upgradeable BIOS systems, BIOS infection is feasible. There have been published attempts to do this by the infamous Australian virus-writer group called VLAD. 3.2. CPU Dependency CPU dependency affects binary computer viruses. The source code of programs is compiled to object code, which is linked in a binary format such as an EXE (executable) file format. The actual executable contains the "genome" of a program as a sequence of instructions. The instructions consist of opcodes. For instance, the instruction NOP (no operation) has a different opcode on an Intel x86 than on a VAX or a Macintosh. On Intel CPUs, the opcode is defined as 0x90. On the VAX, this opcode would be 0x01. Thus the sequences of bytes most likely translate to garbage code from one CPU to another because of the differences between the opcode table and the operation of the actual CPU. However, there are some opcodes that might be used as meaningful code on both systems, and some viruses might take advantage of this. Most computer viruses that are compiled to binary format will be CPU-dependent and unable to replicate on a different CPU architecture. There is yet another form of CPU dependency that occurs when a particular processor is not 100% backward compatible with a previous generation and does not support the features of another perfectly or at all. For example, the Finnpoly virus fails to work on 386 processors because the processor incorrectly executes the instruction CALL SP (make a call according to the Stack Pointer). Because the virus transfers control to its decrypted code on the stack using this instruction, it hangs the machine when an infected file is executed on a 386 processor. In addition, a similar error appeared in Pentium processors as well9. Another example is the Cyrix 486 clones, which have a bug in their single-stepping code10. Single-stepping is used by tunneling viruses (see Chapter 6, "Basic Self-Protection Strategies") such as Yankee_Doodle, thus they fail to work correctly on the bogus processors. Note It is not an everyday discovery to find a computer virus that fails because of a bug in the processor. Some viruses use instructions that are simply no longer supported on a newer CPU. For instance, the 8086 Intel CPU supported a POP CS instruction, although Intel did not document it. Later, the instruction opcode (0x0f) was used to trap into multibyte opcode tables. A similar example of this kind of dependency is the MOV CS, AX instruction used by some early computer viruses, such as the Italian boot virus, Ping Pong: OpcodeAssembly Instruction 8EC8MOV CS,AX 0EPUSH CS 1FPOP DS Other computer viruses might use the coprocessor or MMX (Multimedia Extensions) or some other extension, which causes them to fail when they execute on a machine that does not support them. Some viruses use analytical defense techniques based on altering the processor's prefetch queue. The size of the prefetch queue is different from processor to processor. Viruses try to overwrite code in the next instruction slot, hoping that such code is already in the processor prefetch queue. Such modification occurs during debugging of the virus code; thus, novice virus code analysts are often unable to analyze such viruses. This technique is also effective against early code emulationbased heuristics scanners. However, the disadvantage of such virus code is that it might become incompatible with certain kinds of processors or even operating systems. 3.3. Operating System Dependency Traditionally, operating systems were hard-coded to a particular CPU architecture. Microsoft's first operating systems, such as MS-DOS, supported Intel processors only. Even Microsoft Windows supported only Intel-compatible hardware. However, in the '90s the need to support more CPU architectures with the same operating system was increasing. Windows NT was Microsoft's first operating system that supported multiple CPU architectures. Most computer viruses can operate only on a single operating system. However, cross-compatibility between DOS, Windows, Windows 95/98, and Windows NT/2000/XP still exists on the Intel platforms even today. Thus, some of the viruses that were written for DOS can still replicate on newer systems. We tend to use less and less old, "authentic" software, however, thus reducing the risk of such infections. Furthermore, some of the older tricks of computer viruses will not work in the newer environments. On Windows NT, for example, port commands cannot be used directly to access the hardware from DOS programs. As a result, all DOS viruses that use direct port commands will fail at some point because the operating system generates an error. This might prevent the replication of the virus altogether if the port commands (IN/OUT operations) occur before the virus multiplies itself. A 32-bit Windows virus that will infect only portable executable (PE) files will not be able to replicate itself on DOS because PE is not a native file format of DOS and thus will not execute on it. However, so-called multipartite viruses are able to infect several different file formats or system areas, enabling them to jump from one operating environment to another. The most important environmental dependency of binary computer viruses is the operating system itself. 3.4. Operating System Version Dependency Some computer viruses depend not only on a particular operating system, but also on an actual system version. Young virus researchers often struggle to analyze such a virus. After a few minutes of unsuccessful test infections on their research systems, they might believe that a particular virus does not work at all. Especially at the beginning of a particular computer virus era, we can see a flurry of computer viruses repeating the same mistakes that make them dependent on some flavor of Windows. For example, the W95/Boza virus does not work on non-English releases of Windows 95, such as the Hungarian release of the operating system. This leads to the discovery that computer viruses might be used to target the computers of one particular nation more than others. For example, Russian Windows systems can be different enough from U.S. versions to become recognizable, enabling the author of a virus, intentionally or unintentionally, to target only a subset of computer users. In general, however, after a virus has been created, its author has very little or no control over exactly where his or her creation will travel. 3.5. File System Dependency Computer viruses also have file system dependencies. For most viruses, it does not matter whether the targeted files reside on a File Allocation Table (FAT), originally used by DOS; the New Technology File System (NTFS), used by Windows NT; or a remote file system shared across network connections. For such viruses, as long as they are compatible with the operating environment's high-level file system interface, they work. They will simply infect the file or store new files on the disk without paying attention to the actual storage format. However, other kinds of viruses depend strongly on the actual file system. 3.5.1. Cluster Viruses Some successful viruses can spread only on a specific file system. For instance, the Bulgarian virus, DIR-II, is a so-called cluster virus, written in 1991. DIR-II has features specific to certain DOS versions but, even more importantly, spreads itself by manipulating key structures of FAT-based file systems. On FAT on a DOS system, direct disk access can be used to overwrite the pointer (stored in the directory entry) to the first cluster on which the beginning of a file is stored. Files are stored on the disk as clusters, and the FAT is used by DOS to put the puzzle pieces together. The DIR-II virus overwrites the pointer in the directory entry that points to the first cluster of a file with a value that directs the disk-read to the virus body, which has been stored at the end of the disk. The virus stores the pointer to the real first cluster of each host program in an encrypted form, in an unused part of the directory entry structure. This is used later to execute the real host from the disk after the virus has been loaded in memory. In fact, when the virus is active in memory, the disk looks normal and files execute normally. Such viruses infect programs extremely quickly because they only manipulate a few bytes in the directory entries on the disk. These viruses are often called "super fast" infectors1. It is important to understand that there is only one copy of DIR-II on each infected disk. Consequently, when DIR-II is not active in memory, the file system appears "cross-linked" because all infected files point to the same start cluster: the virus code. A similar cluster infection technique appeared in the BHP virus on the Commodore 64 in Germany, written by "DR. DR. STROBE & PAPA HACKER" in circa 198611. This virus manipulates with the block entries of host programs stored on Commodore floppy diskettes. I decided to call this special infection technique the cluster prepender method. Let me tell you a little bit more about this ancient creature. Normally, the Commodore 1541 floppy drive can store up to 166KB on each side of a diskette. The storage capacity of each diskette side is split into 664 "blocks" that are 256 bytes each. When BHP infects a program on the diskette, the virus will attempt to occupy eight free blocks for itself. Next, it replaces the "block" pointer in the first block of the host program to point to the virus code instead. Except for the first block, the host program's code will not be moved on the diskette. Instead, the virus will link its own "blocks" with the "blocks" of the host program as a single cluster of blocks. The infected host program will be loaded with the virus in front. Unlike the DIR-II virus, the BHP virus has multiple copies per diskettes. In each infection, eight blocks of free space will be lost on the diskette, but the infected files will not appear to be larger in a directory listing even if the virus is not active in memory. Figure 3.2(1) shows when a BHP-infected program called TEST is loaded for the first time with a LOAD command. When I list the content of the loaded program with the LIST command, a BASIC command line appears as shown in Figure 3.2(2). This SYS command triggers the binary virus code. When I execute the infected program with the RUN command, the 6502 Assembly-written virus gets control. On execution of the virus code, BHP becomes active in memory. Finally, the virus runs the original host program. Figure 3.2(2) shows that a "HI" message is displayed when the loaded virus is executed. This message is displayed by the host program. Figure 3.2. The BHP virus on Commodore 64. When BHP virus is active in memory it becomes stealth just like the DIR-II virus. As shown in Figure 3.2(3), I load the infected TEST program a second time. When I list the content of the program, I see the original host program, a single PRINT command that displays "HI." Thus, the virus is already stealth; as long as the virus code is active in memory, the original content of the program is shown instead of the infected program. In addition, the BHP virus implements a set of basic self-protection tricks. For example, the virus disables restart and reset attempts to stay active in memory. Moreover, BHP uses a self checksum function to check if its binary code was modified or corrupted. As a result, a trivially modified or corrupted virus code will intentionally fail to run. 3.5.2. NTFS Stream Viruses FAT file systems are simple but very inefficient for larger hard disks (in FAT terms, a drive of several Gigabytes is considered very large). Operating systems such as Windows NT demanded modern file systems that would be fast and efficient on large disks and, more importantly, on the large disk arrays that span many Terabytes, as used in commercial databases. To meet this need, the NTFS (NT file system) was introduced. A little-known feature of NTFS is primarily intended to support the multiple-fork concept of Apple's Hierarchical File System (HPS). Windows NT had to support multiple-fork files because the server version was intended to service Macintosh computers. On NTFS, a file can contain multiple streams on the disk. The "main stream" is the actual file itself. For instance, notepad.exe's code can be found in the main stream of the file. Someone could store additional named streams in the same file; for instance, the notepad.exe:test stream name can be used to create a stream name called test. When the WNT/Stream12 virus infects a file, it will overwrite the file's main stream with its own code, but first it stores the original code of the host in a named stream called STR. Thus WNT/Stream has an NTFS file system dependency in storing the host program. Malicious hackers often leave their tools behind in NTFS streams on the disk. Alternate streams are not visible from the command line or the graphical file manager, Explorer. They generally do not increment the file size in the directory entries, although disk space lost to them might be noticed. Furthermore, the content of the alternate streams can be executed directly without storing the file content in a main stream. This allows the potential for sophisticated NTFS worms in the future. 3.5.3. NTFS Compression Viruses Some viruses attempt to use the compression feature of the NTFS to compress the host program and the virus. Such viruses use the DeviceIoControl() API of Windows and set the FSCTL_SET_COMPRESSION control mode on them. Obviously, this feature depends on an NTFS and will not work without it. For example, the W32/HIV virus, by the Czech virus writer, Benny, depends on this. Some viruses also use NTFS compression as an infection marker, such as the WNT/Stream virus. 3.5.4. ISO Image Infection Although it is not a common technique, viruses also attack image file formats of CD-ROMs, such as the ISO 9660, which defines a standard file system. Viruses can infect an ISO image before it is burnt onto a CD. In fact, several viruses got wild spread from CD-R disks, which cannot be easily disinfected afterwards. ISO images often have an AUTORUN.INF file on them to automatically lunch an executable when the CD-ROM is used on Windows. Viruses can take advantage of this file within the image and modify it to run an infected executable. This technique was developed by the Russian virus writer, Zombie, in early 2002. 3.6. File Format Dependency Viruses can be classified according to the file objects they can infect. This short section is an introduction to binary format infectors. Many of the techniques are detailed further in Chapter 4, "Classification of Infection Strategies." 3.6.1. COM Viruses on DOS Viruses such as Virdem and Cascade only infect DOS binary files that have the COM extension. COM files do not have a specific structure; therefore, they are easy targets of viruses. Dozens of variations of techniques exist to infect COM files. 3.6.2. EXE Viruses on DOS Other viruses can infect DOS EXE files. EXE files start with a small header structure that holds the entry point of the program among other fields. EXE infector viruses often modify the entry point field of the host and append themselves to the end of the file. There are more techniques for infecting EXE files than for infecting COM files because of the format itself. EXE files start with an MZ identifier, a monogram of the Microsoft engineer, Mark Zbikowski, who designed the file format. Interestingly, some DOS versions accept either MZ or ZM at the front of the file. This is why some of the early Bulgarian DOS EXE viruses infect files with both signatures in the front. If a scanner recognizes EXE files based on the MZ signature alone, it might have a problem detecting a virus with a ZM signature. Some tricky DOS viruses replace the MZ mark with ZM to avoid detection by antivirus programs, and yet others have used ZM as an infection marker to avoid infecting the file a second time. Disinfecting EXE files is typically more complicated than disinfecting a COM file. In principle, however, the techniques are similar. The header information, just like the rest of the executable, must be restored, and the file must be truncated properly (whenever needed). 3.6.3. NE (New Executable) Viruses on 16-bit Windows and OS/2 One of the first viruses on Windows was W16/Winvir. Winvir uses DOS interrupt calls to infect files in the Windows NE file format. This is because early versions of Windows use DOS behind the scene. NE files are more complicated in their structure than EXE files. Such NE files start with an old DOS EXE header at the front of the file, followed by the new EXE header, which starts with an NE identifier. One of the most interesting NE virus infection techniques was developed in the W16/Tentacle_II family, which was found in the wild in June 1996 in the U.S., U.K., Australia, Norway, and New Zealand. Not only was Tentacle_II in the wild, but it was also rather difficult to detect and repair because it took advantage of the complexity of the NE file format. This virus is discussed further in Chapter 4. 3.6.4. LX Viruses on OS/2 Linear eXecutables (LXs) were also introduced in later versions of OS/2. Not many viruses were ever implemented in them, but there are a few such creations. For instance, OS2/Myname is a very simple overwriting virus. Myname uses a couple of system calls, such as DosFindFirst(), DosFindNext(), DosOpen(), DosRead(), and DosWrite(), to locate executables and then overwrites them with itself. The virus searches for files with executable extensions in the current directory. It does not attempt to identify OS/2 LX files for infection; it simply overwrites any files with its own copy. Nonetheless, OS2/Myname is dependent on the LX file format and OS/2 environment for execution given that the virus itself is an LX executable. The OS2/Jiskefet version of the virus also overwrites files to spread itself. This virus looks specifically for files with a New Executable header that starts with the LX mark: cmp word ptr [si], 'XL' jnz NO The header of the file is loaded by the virus, and the si (source index) register is used as an index to check for the mark. If the marker is missing, the virus will not overwrite the file. As a result, Jiskefet is more dependent on the LX file format than Myname. 3.6.5. PE (Portable Executable) Viruses on 32-bit Windows The first virus known to infect PE files was W95/Boza, written by members of the Australian virus-writing group, VLAD, for the beta version of Windows 95. The virus was named Bizatch by its authors but got its current name, Boza, from Vesselin Bontchev. He called the virus Boza, referring to a bizarre Bulgarian drink with color and consistency of mud that is disliked by most non-Bulgarians. Bontchev picked the name not only because Boza sounds similar to "Bizatch," but also because the virus was "buggy and messily written." The Bulgarian idiom, "This is a big boza," means "this is extremely messy and unclear." Quantum, the virus writer, was unhappy about this, which was Bontchev's intention in choosing the name. In fact, other viruses attacked antivirus software databases to change the name of Boza to Bizatch so that the original name would be displayed when an antivirus program detected it. This illustrates the psychological battle waged between virus writers and antivirus researchers. Because PE file infection is currently one of the most common infection techniques, I will provide more information about it in Chapter 4. Many binary programs use the PE file format, including standard system components, regular applications, screen-saver files, device drivers, native applications, dynamic link libraries, and ActiveX controls. The new 64-bit PE+ files are already supported by 64-bit architectures, such as IA64, AMD64, and EM64T. Computer virus researchers expected that 64-bit Windows viruses will appear to infect this format correctly with native 64-bit virus code. The W64/Rugrat.334413 virus appeared in May 2004, written by the virus writer "roy g biv." Rugrat is written in IA64 Assembly. The virus is very compactabout 800 lines. Rugrat utilizes modern features of the Itanium processor, such as code predication. In addition, roy g biv released the W64/Shruggle virus during the summer of 2004. W64/Shruggle infects PE+ files that run on the upcoming 64-bit Windows on AMD64. 3.6.5.1 Dynamic Link Library Viruses The W95/Lorez virus was one of the first 32-bit Windows viruses that could infect a dynamic link library (DLL). A Windows DLL uses the same basic file format as regular PE executables. Dynamic linked libraries export functions that other applications can use. The interface between applications and dynamic link libraries is facilitated by exports from DLLs and imports into the executables. Lorez simply infects the user mode KERNEL client component, KERNEL32.DLL. By modifying the DLL's export directory, such viruses can hook an API interface easily. DLL infection became increasingly successful with the appearance of the Happy99 worm (also known as W32/SKA.A, the worm's CARO name), written by Spanska in early 1999. Figure 3.3 is a capture of Happy99's fireworks payload. Figure 3.3. The Happy99 worm's payload. Just as many other worms are linked to holidays, this worm took advantage of the New Year's period by mimicking an attractive New Year's card application. Happy99 injected a set of hooks into the WSOCK32.DLL library, hooking the connect() and send() APIs to monitor access to mail and newsgroups. Happy99 started a debate about computer malware classifications by carrying the following message for researchers: Is it a virus, a worm, a trojan? MOUT-MOUT Hybrid (c) Spanska 1999. 3.6.5.2 Native Viruses Recently, a new kind of 32-bit Windows virus is on the rise: native infectors. The first such virus, W32/Chiton, was created by the virus writer, roy g biv, in late 2001. Unlike most Win32 viruses, which depend on calling into the Win32 subsystem to access API functions to replicate, W32/Chiton can also replicate outside of the Win32 subsystem. A PE file can be loaded as a device driver, a GUI Windows application, a console application, or a native application. Native applications, such as autochk.exe, load during boot time. Because they load before subsystems are available, they are responsible for their own memory management. In their file headers, the PE.OptionalHeader.Subsystem value is set to 0001 (Native). The HKLM\System\CurrentControlset\Control\Session Manager\BootExecute value contains the names and arguments of native applications that are executed by the Session Manager at boot time. The Session Manager looks for such applications in the Windows\System32 directory, with the native executable names specified. Native applications use the NTDLL.DLL (Native API), where hundreds of APIs are stored and remain largely undocumented by Microsoft. Native applications do not rely on the subsystem DLLs, such as KERNEL32.DLL, as these DLLs are not yet loaded when native applications load. There are only a handful of APIs that a computer virus needs to be able to call from NTDLL.DLL, and virus writers have already discovered the interface for these functions and their parameters. W32/Chiton relies on the following NTDLL.DLL APIs for memory, directory, and file management: 1. Memory management: • RtlAllocateHeap() • RtlFreeHeap() 2. Directory and file search: • RtlSetCurrentDirectory_U() • RtlDosPathNameToNtPathName_U() • NtQueryDirectoryFile() 3. File management: • NtOpenFile() • NtClose() • NtMapViewOfSection() • NtUnmapViewOfSection() • NtSetInformationFile() • NtCreateSection() Native viruses can load very early in the boot process, which gives them great flexibility in infecting applications. Such viruses are similar in structure to kernel-mode viruses. Therefore, it is expected that kernel-mode and native infection techniques will be combined in the future. 3.6.6. ELF (Executable and Linking Format) Viruses on UNIX Viruses are not unknown on UNIX and UNIX-like operating systems, which generally use the ELF executable file format14. Typically, ELF files do not have any file extensions, but they can be identified based on their internal structure. Just like PE files, ELF files can support more than one CPU platform. Moreover, ELF files can properly support 32-bit as well as 64-bit CPUs in their original design, unlike PE files, which needed some minor updates to make them compatible with 64-bit environments (resulting in the PE+ file format). ELF files contain a short header, and the file is divided into logical sections. Viruses that spread on Linux systems typically target this format. Most Linux viruses are relatively simple15. For instance, the Linux/Jac.8759 virus can only infect files in the current folder. One of the most complex Linux viruses is {W32,Linux}/Simile.D (also known as Etap.D), which was the first entry-point obscuring Linux virus (more on this in Chapter 4). Of course, Simile.D's success will depend on how well security settings are used in the file system. Writeable files will be infected; however, the virus does not elevate privileges to infect files. It seems likely that future computer worm attacks (such as Linux/Slapper) will be combined with ELF infection on Linux. The elevated privileges often gained by exploiting network services can result in better access to binary files. The main problem for ELF viruses is the missing binary compatibility between various flavors of UNIX systems. The diversification of the binaries on various CPUs introduces library dependency. Because of this, many ELF-infecting viruses suffer serious problems and crash with core dumps rather than causing infections. 3.6.7. Device Driver Viruses Device driver infectors were not very common in the DOS days, although virus writer magazines such as 40Hex dedicated early articles to the subject. Device drivers for popular operating systems tend to have their own binary format, but as these are special forms of the more general executable formats for those platforms, all can be infected with known virus infection techniques. For example, 16-bit Windows drivers must be in the LE (linear executable) format. LE is very similar to the OS/2 LX file format. Of course, viruses can infect such files, too. On Windows 9x, the VxD (virtual device driver) file format was never officially documented by Microsoft for the general public's use. As a result, only a few viruses were created that could infect VxD files. For example, W95/WG can infect VxD files and modify their entry point to run an external file each time the infected VxD is loaded. Consequently, only the entry-point code of the VxD is modified to load the virus code from the external source. Other viruses, such as the W95/Opera family, infect VxD files by appending the virus code to the end of the file and modifying the real mode entry point of the VxD to run themselves from it. Recently, device driver infectors appeared on Windows XP systems. On NT-based systems, device drivers are PE files that are linked to NT kernel functions. The few such viruses that exist today hook the INT 2E (System Service on IA32-based NT systems) interrupt handler directly in kernel mode to infect files on the fly. For example, WNT/Infis and W2K/Infis families can infect directly in Windows NT and Windows 2000 kernel mode. The W32/Kick virus was created by the Czech virus writer, Ratter, in 2003. W32/Kick infects only SYS files in the PE device driver format. The virus loads itself into kernel mode memory but runs its infection routine in user mode to infect files through the standard Win32 API. Note More information about in-memory strategies of computer viruses is available in Chapter 5, "Classification of In-Memory Strategies." 3.6.8. Object Code and LIB Viruses Object and LIB infections are not very common. There are only about a dozen such viruses because they tend to be dependent on developer environments. Source code is first compiled to object code, and then it is linked to an executable format: Source Code - Object code / Library code - Executable. Viruses that attack objects or libraries can parse the object or library format. For instance, the Shifter virus16 can infect object files. Such viruses spread in a couple of stages as shown in Figure 3.4. Figure 3.4. The infection stages of the Shifter virus. Shifter was written by Stormbringer in 1993. The virus carefully checks whether an object file is ready to be linked to a COM, DOS executable. This is done by checking the Data Record Entry offset of object files. If this is 0x100, the virus attempts to infect the object in such a way that once the object is linked, it will be in the front of the COM executable. 3.7. Interpreted Environment Dependency Several virus classes depend on some sort of interpreted environment. Almost every major application supports users with programmability. For example, Microsoft Office products provide a rich programmable macro environment that uses Visual Basic for Applications. (Older versions of Word, specifically Word 6.0/Word 95, use WordBasic.) Such interpreted environments often enhance viruses with multi-platform capabilities. 3.7.1. Macro Viruses in Microsoft Products Today there are thousands of macro viruses, and many of them are in the wild. Users often exchange documents that were created with a Microsoft Office product, such as Word, Excel, PowerPoint, Visio, or even Access or Project. The first wild-spread macro virus, WM/Concept.A17, appeared in late 1995. Within a couple of months, only a few dozen such viruses were found, but by 1997 there were thousands of similar creations. The XM/Laroux18, discovered in 1996, was the first wild-spread macro virus to infect Excel spreadsheets. The first known Word macro virus was WM/DMV, written in 1994. The author of the WM/DMV virus also created a nearly functional Excel macro (XM) virus at the same time. Figure 3.5 illustrates a high-level view of an OLE2 file used by Microsoft products. Microsoft does not officially document the file structure for the public. Figure 3.5. A high-level view of the OLE2 file format. Please note that Microsoft products do not directly work with OLE2 files. As a result, technically a macro virus in any such Microsoft environments does not directly infect an OLE2 file because Microsoft products access these objects through the OLE2 API. Also note that different versions of such Microsoft programs use different languages or different versions of such languages. In the front of OLE2 files, you can find an identifier, a sequence of hex bytes "D0 CF 11 E0," which looks like the word DOCFILE in hex bytes (with a lowercase L). These bytes can appear in both big-endian and little-endian formats. Other values are supported by various beta versions of Microsoft Office products. The header information block contains pointers to important data structures in the file. Among many important fields, it contains pointers to the FAT and the Directory. Indeed, the OLE2 file is analogous to MS-DOS FAT-based storage. The problem is that OLE2 files have an extremely complex structure. They are essentially file systems in a file with their own clusters, file allocation table, root directory, subdirectories (called "storages"), files (called "streams"), and so on. The basic sector size is 512 bytes, but larger values are also allowed. (In some implementations, a mini-FAT19 allows even shorter "sector" sizes.) Office products locate macros by looking in the Directory of an OLE2 file for the VBA storage folder. The macros appear as streams inside the document. Obviously, any objects can get fragmented, as in a real file systemcorruptions of all kinds are also possible, including circular FAT or Directory entries, and so on. Unfortunately, even macros can get corrupted; as you will see, this fact contributes to the natural creation of new macro virus variants. In addition, documents have a special bit inside, the so-called template bit. WinWord 6/7 does not look for macros if the template bit is off20. Macro viruses are stored inside the document instead of at the front or at the very end of the file. Even worse, the macros are buried inside some of the streams, and the streams themselves have a very complex structure. When looking at the physical OLE2 document, without understanding its structure, the (otherwise logically continuous) body of a macro of a macro virus could be split into chunkssome of them as small as 64 bytes. A major challenge is the protection of user macros in the documents during the removal of virulent macros. In some cases, it is simply impossible to remove a macro virus safely without also removing the user macros. Obviously, users prefer to keep their own macros and remove the viruses from them, but such acrobatics are not always possible. Macro viruses are much easier to create than other kinds of file infectors. Furthermore, the source of the virus code is available to anybody with the actual infection. Although this greatly simplifies the analysis of macro viruses, it also helps attackers because the virus source code can be accessed and modified easily. To understand the internal structure of OLE2 documents better, look at a comment fraction of the W97M/Killboot.A virus in Microsoft's DocFile Viewer application, shown in Figure 3.6. DocFile Viewer is available as part of Microsoft Visual C++ 6.0. This tool can be used to browse the document storage and find the "ThisDocument" stream in the Macros\VBA directory. Figure 3.6. The W97M/Killboot.A virus in DocFile Viewer. The ThisDocument stream can be further browsed to find the virus code. In Figure 3.6, a comment by the virus writer can be seen encoded as VBA code: E0 00 00 00 39 00 73 65 74 20 74 68 65 20 64 61 ....9.set the da 79 20 6F 66 20 41 72 6D 61 67 65 64 64 6F 6E 2C y of Armageddon, 20 74 68 65 20 32 39 74 68 20 64 61 79 20 6F 66 the 29th day of 20 74 68 65 20 6E 65 78 74 20 6D 6F 6E 74 68 00 the next month.  The 0xE0 opcode is used for comments. The 0x39 represents the size of the comment. Thus the preceding line translates to 'set the day of Armageddon, the 29th day of the next month The opcode itself is VBA version-specific, so the 0xE0 byte can change to other values, resulting in Word up-conversion and down-conversion issues21. One of the most interesting aspects of macro viruses is that they introduced a new set of problems not previously seen in such quantities with any other type of computer virus. 3.7.1.1 Macro Corruption Many macro viruses copy themselves to new files using macro copy commands. A macro virus can copy itself into a new document in this way, often attacking the global template called NORMAL.DOT first and then copying itself from the global template back to user documents. A natural mutation often occurs in Microsoft Word environments22. The real reason for the corruption was never found, but it is believed to be connected to saving documents on floppy disks. Some users simply did not wait until the document was written perfectly to disk, which can result in a couple of bytes of corruption in the macro body. Because Word interprets the VBA code line by line, it will not generate an error message unless the faulty code is about to be executed1. As demonstrated earlier, macros are stored as binary data in Word documents. When the binary of the macro body gets corrupted, the virus code often can survive and work at least partially. The problem is that such corruptions are so common that often hundreds of minor variants of a single macro virus family are created by the "mutation engine" of Microsoft Word itself! For instance, the WM/Npad family has many members that are simply natural corruptions, which are not created intentionally. Corrupted macro viruses can often work after corruption. There are several reasons for this commonly observed behavior: • The VBA code necessary to copy a macro to another document is very short. • Even a single working macro can copy dozens of corrupted macros. • The corruption's side effects might only appear in conditional cases. • The corruption happens after the replication of viral code. • The virus supports an "On Error Resume Next" handler. Consider the example shown in Listing 3.1. Listing 3.1. A Corrupted Macro Example Sub MAIN SourceMacro$= FileName$()+ "Foobar" DestinationMacro$ = "Global:Foobar"

MacroCopy(SourceMacro$, DestinationMacro$)

// Corruption here //
End Sub

Because most macro viruses include an error handler at the beginning of their code, macro virus compilation and execution has tended to be resilient to all but the most traumatic corruptions.

Because many AV products use checksums to detect and identify macro viruses, the antivirus software can get confused by the corrupted macro virus variants. Using checksums is the only way to exactly identify each different variant.

Other types of viruses, such as Assembly-written viruses on DOS, most often fail immediately when the slightest corruption occurs in them. However, macro viruses often survive the corruption because the actual replicating instructions are so short in the macro body.

3.7.1.2 Macro Up-Conversion and Down-Conversion

When creating Word 97 and additional support for VBA, Microsoft decided to create new document formats and started to use a different, even richer macro language. To solve compatibility problems for customers, they decided to automatically convert old macros to the new formats. As a result, when a macro virus in the Word 95 WordBasic format was opened with the newer editions of Word, the virus might be converted to the new environment, creating a new virus. As a result, WM viruses are often converted to W97M format, and so on.

The macro up-conversion issue generated many problems for antivirus researchers that went beyond simple technicalities. Some researchers believed it was not ethical to up-convert all old macro viruses to the new format, while others believed it was the only choice to protect customers. Today, techniques are available23 to convert the different macro formats to a canonical form; thus, detection can be done on the canonical form using a single definition. This greatly simplifies the macro detection problems and reduces the antivirus scanner's database growth because less data need to be stored to detect the viruses, and the virus code no longer needs to be replicated on more than one Office platform.

3.7.1.3 Language Dependency

Given that Microsoft translated basic macro commands, such as FileOpen, into different language versions for Office products, most viruses that use these commands to infect files cannot spread to another language version of Microsoft Office, such as the German edition.

Table 3.1 lists some of the most common macro names in Microsoft Word in various localized versions.

Table 3.1. Common Macro Names in Microsoft Word in Some Localized Versions

EnglishFinnishGerman
FileNewTiedostoUusiDateiNeu
FileCloseTiedostoSuljeDateiSchliesen
FileSaveAsTiedostoTallennaNimmelläDateiSpeichernUnter
FileTemplatesTiedostoMallitDateiDokVorlagen
ToolsMacroTyökalutMacroExtrasMakro
SpanishFrenchItalian
ArchivoNuevoFichierNouveauFileNuovo
ArchivoAbrirFichierOuvrirFileApri
ArchivoCerrarFichierFermerFileChiudi
ArchivoGuardarFichierEnregisterFileSalva
ArchivoGuardarComoFichierEnregisterSousFileSalvaConNome
ArchivoPlantillasFichierModulesFileModelli
HerramMacroOutilsMacroStrumMacro

Various Office products use different versions of these macro names. A few common examples can be found in Table 3.2 for English Microsoft Office products.

Table 3.2. Differences in Macro Names Between Word and Excel

Microsoft WordMicrosoft Excel
AutoCloseAuto_Close
AutoOpenAuto_Open

The WM/CAP.A24 virus is an example of language independence, because it uses menu indexes. Using menu indexes was strongly recommended to macro developers by the Microsoft Access team. Of course, menu indexes only work reliably if the host environment has not been customized.

The WM/CAP.A virus also fools users to believe that they are saving their files in RTF (Rich Text Format) when, in fact, they are saving them as infected DOC files instead. Users would prefer to save files as RTF to avoid saving active macros into documents. The virus takes over the File/SaveAs... operation for this trick25.

3.7.1.4 Platform Dependency of Macro Viruses

Although most macro viruses are not platform-dependent, several have introduced some sort of dependency on the actual platform. Microsoft Office products are used not only on Windows but also on Macintosh systems. Not all macro viruses, however, are able to work on both platforms because of the following common reasons.

Win32 function calls
A few macro viruses define API function calls for their own use from the Win32 set of Windows. Such viruses might fail to replicate on the Mac because the API is not implemented on it. For instance, the virus WM/Hot.A used the GetWindowsDirectory() API calls in January, 199626.
Declare Function GetWindowsDirectory Lib "KERNEL.EXE" \
(Buffer As String, Size As Integer) As Integer
:
:
GetWindowsDirectory(WinPath$, SizeBuf) Tricky macro viruses use Win32 callback functions to run code outside of the context of the macro interpreter. For instance, a simple string variable is defined that has encoded Assembly code. Often the chr() function is used to build larger strings that contain code. Then the callback routine is used to run the string directly as code. This way, the macro virus jumps out of the context of the macro interpreter and becomes CPU and platform -dependent. For example, the {W32, W97M}/Heathen.12888 virus uses the CallBack12(), CallBack24(), and CreateThread() APIs of KERNEL32.DLL to achieve infection and dropping mechanism of both documents and 32-bit executables. Location of files in storage Another key difference among operating system platforms is the location of files on the disk. Some macro viruses use hard-coded path names, such as the location of the NORMAL.DOT template on the C: drive. Obviously, they cannot work on the Mac. In addition, viruses often assume a Windows-style file system, even if they use the "correct" VBA methods to get the configured folder locations. Registry modifications Some macro viruses modify Registry keys on Windows systems to introduce extra tricks or store variables. Such viruses introduce OS dependencies as a result. 3.7.1.5 Macro Evolution and Devolution Macro viruses consist of a single macro or set of macros. Because these individual macros must be recognized by the antivirus programs on a macro-to-macro basis, a set of interesting problems occurs. Some macro viruses will copy more than their own set of macros. They can snatch macros from the documents they had infected previously. This way, the virus might evolve into new forms naturally. Some viruses will lose macros from their sets and thus will naturally devolve27 to other forms. There are also sandwiches28, which are created when more than one macro or script virus shares a macro name or script file. A set of dangerous situations was introduced because of antivirus detection and disinfection, one of which was found by Richard Ford29. The problem occurs when an antivirus product detects a subset of known macros ("macro virus remnants") from a set of macros in a newer virus that has at least one new macro among the other, older known macros. If the antivirus product removes the known macros, it could create a new virus by leaving a macro or set of macros in the document that is still part of the virus and often remains viral itself. This problem can be avoided in several different ways, one of which is to remove all macros from infected documents (although this means removing user macros from the documents also). Researchers also suggested defining a minimal set of macros from a known virus to "safely remove" a set of viral macros from a document. However, there is a natural extension of Richard Ford's problem, which was found by Igor Muttik, described in a scientific paper of Vesselin Bontchev in detail30. This is known as "Igor's problem." Suppose there is a virus known as Foobar that consists of a single macro called M. The antivirus program identifies M in an infected document, but when it attempts to disinfect the document, a problem occurs. This happens because there is a variant of the Foobar virus in the document. This variant of Foobar consists of { M, P} macros. Unfortunately, macro P is not known to the antivirus program; thus, whenever the antivirus removes macro M, it will leave P behind. The major problem is that P could be a fully functional virus on its own. Consequently, an antivirus program, even with exact identification for Foobar, would create a new virus by accident when repairing a document in such situation. Indeed, sometimes, it is dangerous to remove a macro virus without removing all macros from the document. The environment of the malicious programs and agents within the environment of the programs can make changes in computer viruses that result in newly evolved or devolved creatures. In addition, multiple infections of different macro viruses in the same document can lead to "crossed" threats and behavior. Indeed, viruses can become "sexual" by accident: They can exchange their macros ("genes") and evolve and devolve accordingly. 3.7.1.6 Life Finds a WaySource, P-code, and Execode Microsoft file formats had to be reverse-engineered by AV companies to be able to detect computer viruses in them. Although Microsoft offered information to AV developers about certain file formats under NDA, the information received often contained major bugs or was incomplete31. Some AV companies were more successful in their reverse-engineering efforts than others. As a result, a new kind of expert quickly emerged at AV companies: the file format expert. Among the best file format experts are Vesselin Bontchev, Darren Chi, Peter Ferrie, Andrew Krukov ("Crackov"), Igor Muttik, and Costin Raiu to just name a few. Starting with VBA5 (Office 97), documents contain the compressed source of the macros, as well as their precompiled code, called p-code (pseudocode), and execode. Execode is a further optimization of p-code that simply runs without any further checks because its state is self-contained. A problem appears because under the right circumstances, any of these three forms can run. Unfortunately, some AV companies produced products that occasionally corrupted the documents they repaired. In other cases, the products removed any of the three forms, without removing at least one of the other two. For example, some antivirus programs might remove the p-code, but they leave the source behind. Normally the p-code would run first. The VBA Editor also displays decompiled p-code as "a source" for macros, instead of using the actual source code of macros which are saved in the documents. Given the right circumstances, however, when the p-code is removed but the source is not, the virus might be revived. This happens when the document is created in Office 97 but is opened with Office 2000. Most viruses break without the source because they often use a function such as MacroCopy() that copies the source. In other cases such as worms, however, the macro will continue to function properly because it does not refer to its source. In some other cases, the execode might run on its own without source and p-code in the document. If the VBA project does contain execode and is opened by the same version of the Office application as the one that created it, the execode runs, and everything else is ignored. In fact, antivirus researchers experienced a case with the X97M/Jini.A virus where both the p-code and the source were removed from a document, but the execode was left behind when an antivirus program "cleaned" the document. The virus runs from the execode when the infected document is opened in the same version of Office that created it26; thus, some of the "half-cooked repaired" viruses can still function and infect further. Life finds a way, so to speak! Indeed, not all viruses will survive, but those that do don't need to refer to their sources or modules. Jini survives because it does not copy any modules. Instead, it copies the victim's data sheets to the workbook where it resides and then overwrites the file of the victim with the file in which it resides. Of course these tricky cases introduce major problems for the antivirus programs. Viruses that exist only in execode form are especially hard to detect. (So far, Microsoft has not provided information about this format to AV developers even under NDA26.) 3.7.1.7 Macro Viruses in the Form of the Multipartite Infection Strategy There are a couple of binary viruses that attempt to infect documents. These viruses are not primarily dependent on the interpreted environments. For instance, the multipartite virus, W32/Coke, drops a specially infected global template with a little loader code. This loader will fetch polymorphic macro code (as discussed in Chapter 7) from a text file into the global template. As a result, Coke is one of the most polymorphic binary viruses, as well as a macro virus. Polymorphic macro viruses are usually very slow because of many iterations required to run their code. However, it is normally the polymorphic engine that is slow. Because Coke generates polymorphic macro virus code in a text file using its Win32 code, the polymorphic macro of Coke is not as slow as most polymorphic macro viruses based on macro polymorphic engines. Other viruses do not need Word to infect Office documents. These viruses are very rare and usually very buggy. Even the Word 6 file format is complicated enough to parse and modify it in such a way that a macro is inserted in the file. The W95/Navrhar virus injects macro code to load a binary file from the end of the Word document. Thus, Navrhar can infect documents without Word installed on the system. 3.7.1.8 New Formula Another set of problems occurred because Excel not only supported standard macros, but formula macros as well. As you might expect, formulas are not stored with macros; therefore, their locations had to be identified. Viruses that need the Microsoft Excel Formula language to replicate are predicated with the XF/ tag. Excel macros are stored in the Excel macro module area, but Excel formulas are stored in the Excel 4 macro area instead. Therefore, these viruses are not visible via the Tools/Macro menu, and users must create a special macro to find them. The first such virus, known as XF/Paix32, was of French origin. 3.7.1.9 Infection of User Macros Most macro viruses replicate their own set of macros to other documents. However, infection is also possible by modifying existing user macros to spread the virus code, similar to the techniques of binary infectors. In practice, very few macro viruses use these parasitic techniques. This is because most of the documents do not contain user macros, and thus the spreadability of such parasitic macro viruses is seriously limited. (In addition, macro viruses often delete any existing macros in the objects they are infecting.) This kind of macro virus is very difficult to detect and remove with precision. 3.7.1.10 New File Formats: XML (Extensible Markup Language) Microsoft Office 2003 introduced the ability to save documents in XML, textual format. This caused a major headache for antivirus developers, who must parse the entire file to find the embedded, encoded OLE2 files within such documents and then locate the possible macros within them. Currently, Word and Visio 2003 support the XML format with embedded macros33. Initially, such documents did not have any fields in their headers that would indicate whether or not macros were stored in them. Microsoft changed the file format of Word slightly in the release of this version due to pressure from the AV community. Visio 2003, however, was released without any such flags, leaving no choice for AV software but to parse the entire XML file to figure out whether there are macros in it. Thus, the overhead of scanning increases dramatically and is particularly noticeable when files are scanned over the network. Note XML infection was considered by virus writers years ago using VBS (Visual Basic Script) code. The idea was that an XML file can contain a Web link to reference code that is stored in an XSL (Extensible Stylesheet Language) file. This technique was first proposed by the virus writer, Rajaat, and was later introduced by the W32/Press virus of Benny. 3.7.2. REXX Viruses on IBM Systems IBM has a long tradition of implementing interpreted language environments. Examples include the powerful Job Control languages on mainframe systems. IBM also introduced the REXX command script language to better support both large batch-like installations and simple menu-based installation programs. Not surprisingly, virus writers used REXX to create new script viruses. In fact, some of the first mass-mailer script viruses, such as the infamous CHRISTMA EXEC34 worm, were written in REXX. The worm could execute on machines that supported the REXX interpreter on an IBM VM/CMS system and were also connected to a network. This worm was created by a German Informatics student35 in 1987. CHRISTMA EXEC displayed the Christmas tree and message shown in Figure 3.7 when the REXX script was executed by the user. Obviously, such viruses rely on social engineering for their execution on remote systems. However, users were happy to follow the instructions in the source of the script. The worm looked around for user IDs on the system and used the CMS command SENDFILE (or SF in short form) to send CHRISTMA EXEC files to other users. Figure 3.7. A snippet of the CHRISTMA EXEC worm. At one point, such viruses were so common that IBM had to introduce a simple form of content filtering on its gateways to remove them. REXX interpreters were made available on other IBM operating systems, such as OS/2, as well; thus, a few REXX viruses appeared on OS/2. 3.7.3. DCL (DEC Command Language) Viruses on DEC/VMS The Father Christmas worm was released in 1988. This worm attacked VAX/VMS systems on SPAN and HEPNET. It utilized DECNET protocols instead of Internet TCP/IP protocols and exploited TASK0, which allows outsiders to perform tasks on the system. This worm made copies of itself as HI.COM. Although DOS COM files have a binary format, the DCL files with COM extensions are simple text files. The worm sent mail from the infected nodes; however, it did not use e-mail to propagate itself. In fact, this worm could not infect the Internet at all. It attacked remote machines using the default user account and password and copied itself line by line (151 lines) to the remote machine. Then the worm exploited TASK0 to execute its own copy remotely. It used the SET PROCESS/NAME command to run itself as a MAIL_178DC process on the remote node36. Father Christmas mailed users on other nodes the following funny message:$ MAILLINE0 = "HI,"
$MAILLINE1 = ""$ MAILLINE2 = " HOW ARE YA ? I HAD A HARD TIME PREPARING ALL THE PRESENTS."
$MAILLINE3 = " IT ISN'T QUITE AN EASY JOB. I'M GETTING MORE AND MORE"$ MAILLINE4 = " LETTERS FROM THE CHILDREN EVERY YEAR AND IT'S NOT SO EASY"
$MAILLINE5 = " TO GET THE TERRIBLE RAMBO-GUNS, TANKS AND SPACE SHIPS UP HERE A T"$ MAILLINE6 = " THE NORTHPOLE. BUT NOW THE GOOD PART IS COMING."
$MAILLINE7 = " DISTRIBUTING ALL THE PRESENTS WITH MY SLEIGH AND THE"$ MAILLINE8 = " DEERS IS REAL FUN. WHEN I SLIDE DOWN THE CHIMNEYS"
$MAILLINE9 = " I OFTEN FIND A LITTLE PRESENT OFFERED BY THE CHILDREN,"$ MAILLINE10 = " OR EVEN A LITTLE BRANDY FROM THE FATHER. (YEAH!)"
$MAILLINE11 = " ANYHOW THE CHIMNEYS ARE GETTING TIGHTER AND TIGHTER"$ MAILLINE12 = " EVERY YEAR. I THINK I'LL HAVE TO PUT MY DIET ON AGAIN."
$MAILLINE13 = " AND AFTER CHRISTMAS I'VE GOT MY BIG HOLIDAYS :-)."$ MAILLINE14 = ""
$MAILLINE15 = " NOW STOP COMPUTING AND HAVE A GOOD TIME AT HOME !!!!"$ MAILLINE16 = ""
$MAILLINE17 = " MERRY CHRISTMAS"$ MAILLINE18 = "                AND A HAPPY NEW YEAR"
$MAILLINE19 = ""$ MAILLINE20 = "                        YOUR FATHER CHRISTMAS"

3.7.4. Shell Scripts on UNIX (csh, ksh, and bash)

Most UNIX systems also support script languages, commonly called shell scripts. These are used for installation purposes and batch processing. Naturally, computer worms on UNIX platforms often use shell scripts to install themselves. Shell scripts have the advantage of being able to run equivalently on different flavors of UNIX. Although binary compatibility between most UNIX systems is not provided, shell scripts can be used by attackers to circumvent this problem. Shell scripts can use standard tools on the systems, such as GREP, that greatly enhances the functionality of the viruses.

Shell scripts can implement most of the known infection techniques, such as the overwriter, appender, and prepender techniques. In 2004 some new worms appeared such as SH/Renepo.A that use bash script to copy themselves into the StartupItems folders of mounted drives on MAC OS X. This indicates a renewed interest of worm developments on MAC OS X. In addition, threats like Renepo exposes MAC OS X systems to a flurry of attacks by turning the firewall off, run the popular password cracker tool John The Ripper, and create new user accounts for the attackers. However, current attacks require root privileges.

(It is expected that MAC OS X will be the target of future remote exploitation attacks as well.)

3.7.5. VBScript (Visual Basic Script) Viruses on Windows Systems

Windows script viruses appeared after the initial macro virus attack period was over. The [email protected] worm spread very rapidly around the world in May of 2000. LoveLetter arrived with a simple message with the subject ILOVEYOU, as shown in Figure 3.8. The actual attachment has a "double extension." The "second" extension is VBS, which is necessary to run the attachment as a Visual Basic Script. This "second" extension is not visible unless the Windows Explorer Folder option Hide File Extensions for Known File Types is disabled. By default, this option is enabled. As a result, many novice users believed they were clicking a harmless text file, a "love letter."

Figure 3.8. Receiving a "love letter".

On execution of the attachment, the VBS file runs with the script interpreter WSCRIPT.EXE. Mass-mailer VBS script worms typically use Outlook MAPI functions via CreateObject ("Outlook.Application") followed by the NameSpace ("MAPI") method to harvest e-mail addresses with AddressLists(), and then they mass-mail themselves as an attachment to recipients via the Send() method. In this way, many users receive e-mail from people they know. As a result, many recipients are curious enough to run the attachmentoften on more than one occasion.

VBS viruses can use extended functionality via ActiveX objects. They have access to file system objects, other e-mail applications, and locally installed ActiveX objects.

3.7.6. BATCH Viruses

BATCH viruses were not particularly successful in the DOS years. Several unsuccessful attempts were made to develop in-the-wild BATCH viruses, none of which actually became wild. Nevertheless, common infection types, such as the prepender, appender, and overwriting techniques, were all developed as successful demonstrations. For example, BATCH files can be attacked with the appender technique by placing a goto label instruction at the front of the file and appending the extra lines of virus code to the end after the label.

BATCH viruses are also combined with binary attacks. BATVIR uses the technique of redirecting echo output to a DEBUG script; thus, the virus is a textual BATCH command starting with

rem [BATVIR] '94 (c) Stormbringer [P/S]

This is followed by a set of echo commands to create a batvir.94 file with the DEBUG script. The DEBUG command receives a G GO command via the script and runs the binary virus without ever creating it in a new file.

BAT/Hexvir uses a similar technique, but it simply echoes binary code into a file and runs that as a DOS COM executable to locate and infect other files.

Some other tricky BATCH viruses use the FOR % IN () commands to look for files with the BAT extension and insert themselves into the new files in packed form using PKZIP. BAT/Zipbat uses PKUNZIP on execution of the infected BATCH files to extract a new file called V.BAT, which will infect other files by placing itself in them, again in zipped form. Members of the BAT/Batalia family use the compressor, ARJ, instead. Batalia, however, uses random passwords to pack itself into BATCH files.

Similar to BAT/Zipbat, the BAT/Polybat family also uses the PKZIP and PKUNZIP applications to pack and unpack itself at the ends of files. Polybat is practically a polymorphic virus. The virus inserts garbage patterns of percent signs (%) and ampersands (&) that are ignored during normal interpretation. For instance, the ECHO OFF command is represented in some way similar to the following:

@ec%&%h%&%o o%&%f%&%f

@e%&%ch%o&% %&o%f%f&%

BATCH viruses, or at least multi-component viruses with a significant BATCH part, are becoming a bigger threat on Windows systems. For instance, the BAT/Mumu family got especially lucky in corporate environments by using a set of binary shareware tools (such as PSEXEC) in combination with the BAT filedriven virus code.

Several custom versions of BATCH languages do exist, such as the BTM files in 4DOS and 4NT productsjust to name a few which also have been used by malicious attackers.

3.7.7. Instant Messaging Viruses in mIRC, PIRCH scripts

Instant messaging software, such as mIRC, supports script files to define user actions and simplify communications with others. The script language allows the definition of commands whenever a new member joins a conference and is often stored in script.ini in the system's mIRC folder.

IRC worms attempt to create or overwrite this file with an INI file that sends copies of the worm to others on IRC. The command script supports the /dcc send command. This command can be used to send a file to a recipient on a connected channel.

3.7.8. SuperLogo Viruses

In April of 2001, a new LOGO worm was created and mass-mailed to some antivirus companies. It never became wild, though, and there is definitely more than one reason for that. Its author calls herself Gigabyte. Gigabyte has a background of creating other malware and has authored mIRC worms. As you will see, she tried to use her existing mIRC knowledge to create the Logic worm37. The actual worm is created in Super Logo, a reincarnation of the old Logo language for Windows platforms, which claims to be "the Windows platform for kids."

In 1984, I came across several Logo implementations for various 8-bit computers. Our 8-bit school computer, the HT 1080Za Z80-based TRS-80 clone built in Hungaryhad a top screen resolution of 128x48 dots in black and white. Although we had not paid too much attention to the fact at the time, the built-in Basic of HT 1080Z was created by Microsoft in 1980.

The Logo language's primary purpose is to provide drawing with the "Turtle." The Turtle is the pen, and its head can be turned and instructed to draw. For instance, in Super Logo, the following commands are common: HIDETURTLE, FORWARD, PENUP, PENDOWN, WAIT, and so on.

The set of commands can be formed as subroutines and saved in a Logo project file with an LGP extension. The actual project file is a pretokenized binary format, but commands and variable names remain easily readable and stored as Pascal-style strings. The project file can be loaded and executed with the Super Logo interpreter. The original Logo language is well extended in Super Logo to compete with other existing implementations. It can deal simultaneously with multiple graphical objects (see the cute Turtle as an example in Figure 3.9) and move them around the screen with complete mouse support.

Figure 3.9. Main Turtle ICON.

We can easily determine, however, that the Super Logo language does not support mailing or embedded executables; neither does it support spawning of other executables or scriptsyet Super Logo does support a PRINTTO "XYZ" command. XYZ can be a complete path to a file. With that statement, a Logo program might modify any file, such as winstart.bat, overwriting its content with something like the following:

@cls
@echo You think Logo worms don't exist? Think again!

Get the idea? When the logic.lgp project is loaded and executed, the worm will draw LOGIC on screen with a short message, as shown in Figure 3.10.

Figure 3.10. The payload of the Logic worm.

The worm will make sure that a STARTUP.VBS file is created in one of the Windows startup folders and, as such, will be executed automatically the next time Windows is booted. The worm also tries to modify the shortcuts (if any) of some common Windows applications, such as notepad.exe, to start the VBS file without a reboot.

This VBS file propagates the 4175-byte logic.lgp worm project file to the first 80 entries of the Outlook address book. This is a very standard VBS mail propagation that has a set of minor bugs. In 2004, Gigabyte was arrested by Belgian authorities. She is facing criminal prosecution; the penalty might include imprisonment and large fines.

3.7.9. JScript Viruses

One of the reasons to turn off JScript support in a Web browser such as Internet Explorer has to do with JScript viruses. JScript viruses typically use functions via ActiveX communication objects. They can access such objects in a way similar to VBS scripts. For instance, the very first overwriting JScript viruses accessed the file system object via the CreateObject ("Scripting.FileSystemObject") method. This kind of virus was first created by jacky of the Metaphase virus-writing group around 1999.

The File System Object provides great flexibility to attackers. For example, an attacker can use the CopyFile() method to overwrite files. This is how overwriting JScript viruses work. Of course, more advanced attacks have been implemented by the attackers using the OpenTextFile(), Read(), Write(), ReadAll(), and Close() functions. Thus JScript viruses can carry out complex file infection functionality similar to VBS viruses, using a slightly different syntax.

3.7.10. Perl Viruses

Perl is an extremely popular script language. Perl interpreters are commonly installed on various operating systems, including Win32 systems. The virus writer, SnakeByte, wrote many Perl viruses in this script language.

Perl scripts can be very short, but they have a lot of functionality in a very compact form. Attackers can use Perl to develop not only encrypted and metamorphic viruses, but also entry point obscuring ones. The open(), print, and close() functions are used to move newly created content to a target file located in storage with the foreach() function.

For example, the following Perl sequence reads its source to the CurrentContent variable:

open(File,$0); @CurrentContent=<File>; close(File); Perl viruses are especially easy to write because Perl is such a powerful script language to process file content. 3.7.11. WebTV Worms in JellyScript Embedded in HTML Mail Microsoft WebTV is a special embedded device that allows users to browse the Web over their televisions. In July 2002, a new, malicious WebTV worm appeared, which at first glance was believed to be a Trojan horse. The payload of the worm reconfigured the access number (dial-up number) for the WebTV network to call 911 (the phone emergency center of the U.S.) instead, to perform a DoS attack. WebTV HTML (Hypertext Markup Language) files can run HREF (hyperlink reference) within the <script> </noscript> tags using WebTV's Internet Explorer. The HREF would normally link a page to another location on the World Wide Web; however, in WebTV JellyScript, these special commands were used to set up the WebTV. Obviously, these commands have not been documented officially, though many people tried to figure out something more about WebTV and published detailed information about the available commands. This malicious program, NEAT, was later identified as a worm that used the sendpage commands to send HTML mail that contained the worm to others on the WebTV network. The mail was sent by various fake "from" addresses, such as Owner_, minimoo, masonman, and so on. The worm also introduced many pop-up advertising messages on the recipient's machine before it used the ConfirmPhoneSetup?AccessNumber command to reconfigure the dial-up number to 911 to overload the emergency network with a DoS attack. 3.7.12. Python Viruses Python is an extremely handy programming language. Unlike shell script, which can be rather limited in functionality because of speed issues, Python is fast and modular. Because of its more general data types, Python can solve a larger problem. It has built-in modules to support I/O, system calls, sockets, and even interfaces to graphical user interface toolkits. Although Python viruses are not extremely common, a few concept viruses written in Python scripts exist. They typically combine the open(), close(), read(), and write() functions to locate files with listdir() to replicate themselves to other files. However, this virus type is probably the simplest imaginable form for a Python virus, which could utilize much more on the system to implement a variety of infection strategies. 3.7.13. VIM Viruses A successor of the VI UNIX editor is VIM (VI IMproved). Unlike VI, VIM works on Windows, Macintosh, Amiga, OS/2, VMS, QNX, and other systems. VIM is a text editor that includes almost all VI commands and a lot of new ones. Among its many new features, VIM supports a very powerful scripting language that has already been used by virus writers to create worms. (The known example of such a worm is an intended worm, which will not replicate.) 3.7.14. EMACS Viruses Just like VIM, newer versions of the EMACS editor also support scripting. This kind of virus is not common, but proof-of-concept creations exist for the environment. 3.7.15. TCL Viruses TCL (Tool Command Language) is a portable script language that can run on systems such as HP-UX, Linux, Solaris, MAC, and even Windows. TCL is very similar language to Perl. TCL scripts are executed by the tclsh interpreter. The first virus implemented in TCL (pronounced "tickle") was Darkness, a very simple virus written by Gigabyte in 2003. TCL supports foreach(), open(), close(), gets(), and puts() functions, which are all TCL script viruses need to repli cate themselves. 3.7.16. PHP Viruses PHP (a recursive acronym for PHP: hypertext preprocessor) is an open-source, general-purpose scripting language. It is well suited to Web development and can be embedded into HTML. PHP is different from client-side scripting, such as JScript, because PHP runs on the server instead of on the local machine. However, PHP also can be used in command-line mode without any server or browser. PHP/Caracula was introduced in 2001 by the virus writer, Xmorfic, of the BCVG virus-writing group. The virus spreads as an overwriter and creates mIRC scripts to spread as a worm. PHP viruses typically use the fopen(), fread(), fputs(), fclose() sequence to write themselves to new files, which they locate with direct action infection techniques using the opendir(), readdir(), closedir() sequence in combination with the file_exists() function. There are examples of polymorphic PHP viruses, such as PHP/Feast, written by the virus writer, Kefi, in 2003. Feast looks for files to overwrite, but it overwrites them with an evolved copy of itself. In particular, each variable in the body of the virus will mutate to random character sequences. 3.7.17. MapInfo Viruses MapInfo, developed by Geo-Information Systems, is not a widely used application. It is used for mapping and geographical analysis. The MPB/Kynel38 virus demonstrated that it is possible to make this platform virulent. Kynel was created by Russian virus writers in late 2003. MapInfo Professional has it own development environment called MapBasic, which is a Basic-like language. MapBasic is very powerful and, as expected, supports Open, Close, Read, and Write to both ASCII and binary files. It also supports API calls from other DLLs, dynamic data exchange (DDE), and object linking and embedding (OLE). When these programs are compiled, a new executable, MBX, is created, called MapBasic eXecutable. As expected, however, these files can be only executed by MapInfo. The MPB/Kynel virus infects new tables. It enumerates for new tables each time the function WinChangedHandler() is called. WinChangedHandler() is triggered whenever the user changes something in a document. The virus hooks this function and uses this moment to create a copy of itself in the newly enumerated tables, as tablename.mif. It then inserts a Run Application line to this MBX executable into the TAB file of MapInfo documents. In this way, the MBX file will be run whenever the infected document is opened. MapInfo is available on both Windows and Macintosh platforms. It is not very common, but like the SuperLogo virus threats, it demonstrates virus writers' interest in all platforms as possible targets. 3.7.18. ABAP Viruses on SAP The first virus known to attempt to infect SAP was ABAP/Rivpas, written in April 2002. It is a proof-of-concept virus that is based on the Advanced Business Application Programming scripting language. This creation had a few intentional bugs and did not have a chance to replicate. However, other variants with the fix appeared quicklythat were real viruses. In about 20 lines of script, the virus replicates in databases by copying itself from one database to another. 3.7.19. Help File Viruses on WindowsWhen You Press F1 A very powerful but surprisingly unpopular virus infection target is Windows Help files. Windows Help files are in binary format and contain a script section. The scripts have access to Windows API calls. Most Help viruses inject a little script into the SYSTEM directory of HLP files. This script section will be executed the next time the Help file is loaded. As a result, such a virus is triggered simply by pressing the F1 button in an application that is associated with an infected HLP file. The major trick of such viruses is to define functions for their use, such as EnumWindows() of the USER32.DLL. For example, the Dream virus uses this technique to infect Windows Help files. The RR ('USER32.DLL','EnumWindows','SU') script line will define an EnumWindows() callback for use. Then an EnumWindows(virusbody) call is made by the script, which will execute the "string," the virus body, via the callback. Thus execution can continue in native code, getting out of its script context. The first virus to infect Windows Help files was the 32-bit polymorphic virus, W95/SK39, written in Russia. Unlike Demo, SK uses WinExec() functions to execute a set of command.com /c echo commands to print code into a binary for execution outside of the HLP file in the root directory. The first native Help infector, the HLP/Demo virus, also appeared to replicate from one Windows Help file to another. 3.7.20. JScript Threats in Adobe PDF The PDF format is used by Adobe Acrobat products. In 2003, the {W32,PDF}/Yourde virus infected PDF files using an executable that is dropped by a JScript exploit (a PDF form is also dropped). The binary is executed by the form when the form is loaded. The complete version of the Adobe Acrobat installation is required to infect files because the virus relies on the user's saving the infected file. (Saving the infected file cannot be forced externally with Adobe Acrobat. Additionally, the reader-only version cannot save PDFs at all.) The JScript runs automatically by Acrobat itself, without relying on an external interpreter such as Windows Scripting Host; thus, the vulnerability is Acrobat version-specific. 3.7.21. AppleScript Dependency AppleScript is used on Macintosh systems to support local scripting. Not surprisingly, some threats can replicate only if AppleScript is installed. For example, the [email protected] worm is written in AppleScript. After it is executed, it utilizes Outlook Express or Entourage to send a copy of itself to everybody in the address book. This particular worm was not reported frequently from the wild; however, AppleScript threats expose Mac users to similar security problems as those of other powerful script languages, such as VBS on Windows. 3.7.22. ANSI Dependency IBM PCs introduced ANSI.SYS drivers that fulfill the needs of many users by providing the ability to reconfigure certain key functions via escape (ESC) sequences. These sequences are usually stored in a file with an ANS extension. ESC sequences can start with a special escape code (accessible via holding the Alt key and typing40 on the numeric keypad). Whenever the line DEVICE=ANSI.SYS is included in the CONFIG.SYS file, the support to execute ESC sequences is available. For example, a simple ANSI sequence can redefine the N key to Y and the n key to y. Consequently, the user would give the wrong answer to confirmation questions asked by applications. This would be done the following way: ESC [78;89;13p ESC [110;121;13p This kind of redefinition might be desirable for other keys; the Enter key also can be redefined, and del *.* or format c: might be displayed when Enter is pressed. ANSI sequences also can be used to redefine entire commands. Thus, the wrong command name is displayed when a different command is typed. 3.7.23. Macromedia Flash ActionScript Threats A newcomer on the malicious scene is ActionScript malware. The LFM virus uses the ActionScript of Flash files to create and run a DOS COM executable. Such threats, then, are fairly limited because they introduce several other dependencies. For instance, LFM41 needs to be downloaded to the local machine from a Web page. It can only infect files if it is downloaded to a folder that contains other clean files and only as long as the external file V.COM can run properly. 3.7.24. HyperTalk Script Threats "An excellent beginning tool to teach average people, from 5th grade, on how to control their computers as masters rather than slaves." Steve Wozniak HyperCard is a versatile environment that supports a scripting language called HyperTalk. Created by Bill Atkinson, HyperTalk is one of the most linguistic script languages available. Not surprisingly, some of the oldest computer viruses were written in HyperTalk. The first HyperTalk script virus was Dukakis, written around 1988. HyperTalk scripts activate based on event handlers associated with a name in the stack. The scripts are stored in HyperCard data files, called stacks, which are in binary format. But the script code itself is purely textual inside the stacks. For instance, upon opening a HyperCard stack, the openStack event handler can be invoked. This is fairly similar to how Microsoft Office products work with macros, though HyperCard is much more than a scripted text editor. It can be used to create many different projects with menus and database front-ends for cards (records in the database), and different stacks can share their functions with each other. HyperCard extended the promise of easy-to-use systems to easy-to-program environments. HyperTalk script code is interpreted between the event handler tags of the keywords on and end. Here is an example: on openStack ask "What is your name?" put it * it into field "Name" end openStack HyperCard was developed well before Microsoft's Visual Basic. Like Microsoft Office products' global templates (or should I say, the other way around?), HyperCard supports a so-called Home stack, which contains an arsenal of useful scripts. Most HyperCard viruses infect the Home stack by copying themselves into it with the help of put keywords. After this, they can copy themselves to the newly opened stacks. Any stack can be a Home stack, as long as its name is home. The Dukakis virus uses the following lines to select its script body for a new copy: put the script of stack "home" into temp2 get offset (""-** The HyperAvenger **-,"temp2) put char it to it+2426 of temp2 into theCode This script snippet looks for the offset where the virus code starts in the home stack and copies the virus script (2426 bytes) from that location to the variable, theCode. The virus then only needs to copy theCode into another stack later. The this stack is a reference to the currently opened stack. Its content can be accessed with yet another put command. Several other HyperCard viruses exist on the Mac; the most famous ones are the Merry Xmas and 3 Tunes families. 3.7.25. AutoLisp Script Viruses HyperTalk script viruses are very readable and easy to understand; AutoLisp threats are a little more difficult to read. A few script viruses, such as Pobresito42 and ALS/Burstead43, use the AutoLisp scripting feature of AutoCAD environments. Note Newer versions of AutoCAD also support VBA. Pobresito was written during the summer of 2001. Burstead appeared much later, during December 2003 in Finland, and managed to infect a few major corporations that run modern versions of AutoCAD. AutoCAD is rather expensive software, and it is not used as widely as other script language environments. AutoLisp scripts are stored in text files with the LSP extension. Burstead.A looks for the location of the base.dcl file in the AutoCAD search path, using the findfile function: (setq path (findfile "base.dcl")) This is done to locate the directory where the other LISP files can be found. Such viruses attempt to modify files with a load command to load their own LSP file. Thus, whenever the modified LSP file is executed, the virus can get control via the load command: (load "foobar") Here, foobar is the name of a file that has an LSP extension in the default folder. Obviously, AutoLisp allows write-line functions, which could be used by attackers for different kinds of infection methods. 3.7.26. Registry Dependency Some viruses are implemented to infect from Windows Registry files. The Registry is a central storage database on Windows systems. Previous versions of Windows mostly used INI files to store application settings. On modern Windows systems, the Registry database, called a hive, is used to store such information in trees. An interesting capability of the Registry is that it stores file paths for system startup time execution under several different subentries of the hive, such as HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\RUN. Keys like this are commonly attacked by all kinds of malicious code, and other locations of the Registry provide similar attack points for virus writers. For instance, the W32/PrettyPark worm family modifies the Registry key located at HKEY_CLASSES_ROOT\exefile\shell\open\command to get executed whenever an EXE file is run by the user. The worm executes the program that the user wanted to runbut only after itself. Registry-dependent viruses use such keys to insert a reference to system commands for later execution. Registry entry installation files are stored in textual format, and they contain information about keys and values to install via Regedit. Such viruses are implemented as a single command entry in the REG files. Regedit will interpret the commands in the REG file; as a result, a new entry will be stored in the Registry for later execution. The malicious entry uses standard system commands with the passed parameters to look for other REG files on local and network sites and modifies them to include the command string to REG files. This technique is based on the fact that DOS batch commands can be executed from the Registry. 3.7.27. PIF and LNK Dependency Viruses also attack PIFs (program information files) and LNKs (link files) on Windows systems. PIFs are created when you create a shortcut to or modify the properties of an MS-DOS program and allow you to set default properties, such as font size, screen colors, and memory allocation. PIFs also store the path of the executable to run. Some viruses attack PIFs by modifying their internal links that point to an executable. The approach of typical PIF creations is to run commands via command.com execution, using this link path. They use the copy command to copy the PIF to other locations on the local disk, such as Windows, mIRC, or P2P folders, or to attack network resources. The LNK (link shortcut files) on Windows 95 and above can be attacked in a manner similar to PIFs. 3.7.28. Lotus Word Pro Macro Viruses Another class of macro viruses attacks Lotus Word Pro documents of Lotus SmartSuite. For example, the LWP/Spenty virus only replicates in the Chinese version of Word Pro. The virus infects files as they are opened by hooking the DocumentOpened() and DocumentedCreated() macros. The security settings of the document are changed in such a way that a password is set to 720401. In this way, the virus attempts to prevent any modifications of infected objects. The Spenty virus became widespread in China in 2002. Spenty introduced the problem of Word Pro file parsing for antivirus producers. Word Pro uses a script-like macro language. 3.7.29. AmiPro Document Viruses Viruses do not frequently attack AmiPro documents, and there is a good reason for this. Unlike most text editors, AmiPro saves documents and macros into two separate files. The documents are stored in files with SAM extensions, and the macro files are kept in files with SMM extensions. AmiPro viruses must connect the two files in such a way that when the SAM file is opened, it invokes execution of the SMM. The APM/Greenstripe virus consists of four functions: Green_Stripe_Virus(), Infect_File(), SaveFile(), and SaveAsFile(). The SaveFile() and SaveAsFile() functions are hooks installed with the ChangeMenuAction() function, and they correspond to the Save and Save As menus. The virus uses the AssignMacroToFile() function to establish the connection between SAM and SMM files. The virus uses the FindFirst() and FindNext() functions to search for new SAM files to attack. AmiPro viruses are much less likely to spread via e-mail than Microsoft Office macro viruses because of AmiPro's use of separate document and macro files, as opposed to a single container. 3.7.30. Corel Script Viruses Corel Draw products also support a script language that is saved in files with CSC extensions. (In addition, contemporary versions of Corel Draw also support VBA.) Corel Script viruses typically look for victim files with the FindFirstFolder() function. The CSC/CSV virus identifies infected victim files by checking for the "REM ViRUS" marker in the CSC files. If CSV does not find the marker in the file, it will attempt to infect it by prepending its script with print # commands. It then looks for the next file with the FindNextFolder() function. In practice, the virus creates a new host script with the same name, copies itself into it, and then appends the original host script to itself. REM ViRUS GaLaDRieL FOR COREL SCRIPT bY zAxOn/DDT The CSC/PVT virus follows a similar strategy. It uses the same functions to look for new files to infect. It even checks potential victims for REM PVT anywhere in the script before attempting to infect them. REM PVT by Duke/SMF Unlike CSV, the PVT virus appends itself to the end of the script. As a result, the original script runs first, and upon the exit of the original script, the appended script code is executed. 3.7.31. Lotus 1-2-3 Macro Dependency Although there are widespread rumors about a Lotus 1-2-3 macro virus with the name Ramble, the actual threat is not viral. Rather, the known threat is a dropper of a BATCH virus. (This is not to say, however, that Lotus 1-2-3 macros would not be able to infect another set of Lotus 1-2-3 worksheets.) The BAT/Ramble virus dropper, written by "Q The Misanthrope," works the following way: First, the user opens a Trojanized Lotus 1-2-3 document. The malicious Lotus macro activates upon when the document is opened. The malicious macro is then inserted in the A8167 ... A8191 range of the sheet. In this way, it is not visible to the user. After the macro runs, it creates a BATCH virus in the C:\WINSTART.BAT file. After the BATCH virus is created by the dropper, the macro dropper code removes itself from the sheet, using the /RE command (Range Erase). It also removes the \0 macro name that automatically runs whenever a worksheet is opened. It must be noted that newer versions of Lotus 1-2-3 have a different worksheet format, which has allowed a macro up-conversion problem to be introduced on this platform. 3.7.32. Windows Installation Script Dependency The 32-bit Windows versions introduced a new installation script language in INF files. These scripts are invoked via the Windows Setup API. The install scripts have various sections for installation and uninstallation. The script can be generated manually or by using tools such as Microsoft's BATCH.EXE or INF generators. One of the many features of installation scripts is the use of the autoexec.bat file. Commands can be directly installed into and removed from the automatically executed batch file on system startup. This is done via the UpdateAutoBat command in the Install section associated with a named section of the script. That section contains commands to delete linesas well as to add new malicious commandswith CmdDelete and CmdAdd, respectively. (CmdDelete is used to delete the malicious code in case it was inserted into the file in a previous attack.) The virus writer, 1nternal, introduced a couple of viruses, such as the INF/Vxer family, that take advantage of INF file infection via batch execution. The CmdAdd entries are used to deliver the source of the viral batch lines to AUTOEXEC.BAT. As a result, on each system startup the virus will look into the Windows\INF folder to infect other INF files. 3.7.33. AUTORUN.INF and Windows INI File Dependency AUTORUN.INF files and Windows INI files are very similar in structure to Windows installation scripts. Some viruses modify the AUTORUN.INF file to get auto-launched whenever a removable disk is loaded. AUTORUN.INF was a new feature in Microsoft Windows 95 systems. It was primarily designed to run an application automatically whenever a user inserted a CD into the CD-ROM drive. Whenever an AUTORUN.INF file exists in the root directory of a removable disk type, it is executed by most 32-bit Windows systems, although some of the newer editions of Windows primarily support the CD-ROMs only. There are a couple of Registry entries associated with Autorun functionality. Whenever such options are enabled, the AUTORUN.INF is interpreted, and its Autorun section is invoked. The Autorun section supports an Open command that can be used to run an executable via the feature. This is the command that malicious code sets alter to be invoked automatically. The HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer Registry entry must be modified with a NoDriveAutoRun or NoDriveTypeAutoRun entry set to customized values, such as 0xFF, to turn off the feature for each drive. Windows INI files are attacked for a similar reason. For instance, the WIN.INI supports a Windows section. In that section, a run= entry can be used to RUN an application during the startup of Windows. Malicious Trojans often modify this entry to load themselves via system startup. 3.7.34. HTML (Hypertext Markup Language) Dependency HTML does not support functionality to malicious attacks in its strict form, but it supports embedded scripting, such as VBScript or JScript. Several viruses attack HTML files. One of the most successful such attacks was implemented in the W32/Nimda worm in September 2001. Nimda attacks HTML files by inserting a little JScript section into them. This section opens an EML file that contains a malformed MIME exploit. The JScript code uses the window.open function to launch the EML file. The result is an automatically executed worm executable upon accessing a compromised HTML page using a vulnerable Internet Explorer. Some HTML threats get invoked from HTML files via HREF entries. They trick the user into clicking something that will, in turn, execute the referenced malicious code. The first viruses that attacked HTML files were created by the virus writer, 1nternal. Although some vendors initially classified these threats as HTML viruses, the proper classification is based on the actual script language used, such as VBS. 3.8. Vulnerability Dependency Fast spreading worms, such as W32/CodeRed, Linux/Slapper, W32/Blaster, or Solaris/Sadmind, can only infect a new host if the system can be exploited via a known vulnerability. If the system is not vulnerable or is already patched, such worms cannot infect them. However, several worms, such as W32/Welchia, exploits multiple vulnerabilities to invade new systems. Therefore, the system might remain exploitable by at least one of the nonpatched vulnerabilities. Chapter 10 , "Exploits, Vulnerabilities, and Buffer Overflow Attacks," is dedicated to computer virus attacks that utilize exploits to spread themselves. 3.9. Date and Time Dependency Tyrell: What seems to be the problem? Roy: Death! Tyrell: Death. Well, I am afraid that's a little out of my jurisdiction. Roy: I want more life... Blade Runner, 1982 Several viruses replicate only within a certain time frame of the day. Others refuse to replicate before or after a certain date. For instance, the W32/Welchia worm only attempted to invade systems until January 2004. Another example is the original W32/CodeRed worm, which was set to kill itself in 2001. However, other variants of the worm were modified to introduce an "endless life" version without this limitation. The life cycle manager of worms is discussed in more detail in Chapter 10. 3.10. JIT Dependency: Microsoft .NET Viruses A natural evolution of Microsoft's ambitious computer language and execution environment developments is .NET Framework's Just-in-Time compilation. .NET uses executables that are somewhat special portable executable (PE) files. Currently, such executables contain a minimal architecture-dependent code (a single API call to an init function)44. Elsewhere, the compiled PE file contains MSIL (Microsoft Intermediate Language) and metadata information. The first viruses that targeted .NET executables were not JIT-dependent. For example, Donut45 was created by Benny in February of 2002. This virus attacked .NET executables at their native entry point, replacing _CorExeMain() import (which currently runs the JIT initialization) with its own code and appending itself to the end of the file. A few months later, JIT-dependent viruses appeared that could infect other MSIL executables. The first such virus was written by Gigabyte. W32/HLLP.Sharpei40 implements a simple prepender infection technique. The MSIL code of the virus is JIT compiled by the CLR (common language runtime) of .NET Framework. JIT does not compile the module when it is loaded, but only when a particular method is first used. Only then is the MSIL code translated to the local architecture, and native code execution begins. Figure 3.11 shows the payload message of the W32/HLLP.Sharpei virus. Figure 3.11. The payload message of Sharpei. In 2004, new infection techniques appeared that targeted .NET executables. These new viruses parasitically infect MSIL programs. It is not surprising that such viruses did not show up any earlier because it is much more difficult to implement them. In fact, some researchers argued that such complex MSIL viruses will never appear. For example, the metamorphic virus, MSIL/Gastropod, uses the System.Reflection.Emit namespace to rebuild its code and the host program to alter the appearance of the virus body. Gastropod is a creation of the virus writer, Whale, who also authored the W95/Perenast viruses. (Whale was captured by the Russian police in November 2004. He was required to pay$50.)

On the other hand, the MSIL/Impanate virus is aware of both 32-bit and 64-bit MSIL files and infects them using EPO (Entry Point Obscuring) techniques without using any library code to do so. MSIL/Impanate was authored by the virus writer, roy g biv.

Note

More information on infection techniques is available in Chapter 4, "Classification of Infection Strategies." Metamorphic viruses are discussed in Chapter 7, "Advanced Code Evolution Techniques and Computer Virus Generator Kits."

3.11. Archive Format Dependency

Some viruses might not be able to spread without packed files. A few viruses only infect archive file formats. The majority of such viruses infect binary files, as well as ZIP, ARJ, RAR, and CAB files (to name the most common archive formats).

Spreading viruses in archive files gained popularity when Microsoft implemented a virus-protection feature for Outlook. Outlook no longer runs regular executable extensions, and recent versions simply do not provide such attachments to end users. However, virus writers quickly figured out that they could send packed files, such as zipped files, which Outlook does not remove from e-mail messages.

Some tricky mailer or mass-mailer worms, such as [email protected]46, even use password-protected attachments. Because the password and instructions on how to use it are available to the user, the malicious code can trick the user into running an application, such as Winzip, and typing the given password to unpack and then execute its content. Such viruses often carry their own packer engines, such as InfoZIP libraries, to create new packer containers.

File infector viruses typically insert a new file into an archive file. For example, ZIP infection is simple because ZIP stores a directory for each file in the container's archive. By locating such headers, viruses can insert new files into the project and trick the user into running the files. For example, viruses might insert a file with a name such as "readme.com" and simply hope that the user will execute it to read "the documentation" of the package.

Some very complex viruses, such as the Russian virus, Zhengxi47, infect self-extracting EXE files with multiple archive infection capability, including the packed file format, HA, inside such binary files.

3.12. File Format Dependency Based on Extension

Some viruses have extension dependency. Depending on the extension, a file might be placed in a different execution environment. A simple example of this is COM and BAT (ASCII) extension replacement. As a COM file, the file can function as binary. With a BAT extension, it looks like an ASCII BATCH file.

Other common examples of this kind of dependency are as follows:

• COM/VBS
• COM/OLE2 (a trivial variant has the header of an OLE2 file)
• HTA/SCRIPT
• MHTML (Binary+Script)
• INF/COM
• PIF/mIRC/BATCH

This method is often used as an attempt to confuse scanners about the type of object they are scanning. Because scanners often use header and extension information to determine the environment of the file, their scanning capabilities (such as heuristics analysis) might be affected if they do not identify the type of object properly.

For example, PIF worms typically use mIRC, BAT, or even VBS combinations, based on extension dependency. A file with a PIF extension will function as a PIF. However, with a BAT extension, it will run as a BATCH instead, and the PIF section in the front of the file is simply ignored. Other examples include an mIRC and BATCH combination based on extension dependency tricks.

Figure 3.12 demonstrates how the PIF is organized for extension dependency. The Phager virus uses the previously discussed technique.

Figure 3.12. A high-level structure of a PIF with extension dependency.

Another example that involves extension dependency tricks is INF/Zox, which infects Windows INF files. The main virus body is stored in INF/Zox in an INF file called ULTRAS.INF. However, this INF file can run as a DOS COM executable when renamed.

In the INF form, the virus uses CmdAdd (add command) entries to attack AUTOEXEC.BAT. It also uses the CopyFile entry of the DefaultInstall section to copy the ULTRAS.INF file as Z0X.SYS. The trick is that the new AUTOEXEC.BAT section will rename the Z0X.SYS file to Z0X.COM and run it. The virus starts with a comment entry in the INF form using a semicolon (;) (0x3b).

When the file is loaded as a DOS COM file, the marker is ignored as a compare (CMP) instruction. After the comment, binary code is inserted that "translates" to a jump (JMP) instruction to the binary portion of the virus code at the end of the file:

13BE:0100 3B00          CMP     AX,[BX+SI]      ; Compare instruction ignored

Zox is a direct-action overwriter virus. It overwrites INF files with itself.

3.13. Network Protocol Dependency

Nowadays, the Internet is the largest target of virus attacks. TCP and UDP protocols are used by malicious mobile code48 to attack new targets. There are some old worms, however, such as the Father Christmas worm, that could not spread on the Internet because they relied on DECNET protocols thus, computer worms are typically network protocol dependent.

3.14. Source Code Dependency

Some tricky computer viruses, such as those of the W32/Subit family, infect source files such as Visual Basic or Visual Basic .NET source files. Other viruses spread in C or Pascal sources. These threats have a very long history.

Consider the C source file shown in Listing 3.2, in clean and infected form.

Listing 3.2. A Source Infector Virus

#include <stdio.h>
void main(void)
{
printf("Hello World!");
}

The infected copy would look similar to the following:

#include <stdio.h>
void infect(void)
{
/* virus code to search for *.c files to infect */
}
void main(void)
{
infect(); /* Do not remove this function!! */
printf("Hello World!");
}

After the infected copy is compiled and executed, the virus will search for other C sources and infect them.

Source code viruses typically use a large string to carry their own source code, defined as a string. The W32/Subit family uses a concatenated string to define its source code, starting with the following lines:

J = "44696D20532041732053797374656D2E494F2E53747265616D5772697465720D"
J = J & "0A44696D204F2C205020417320446174650D0A44696D2052204173204D696372"
J = J & "6F736F66742E57696E33322E52656769737472794B65790D0A52203D204D6963"

This will be converted to Visual Basic .NET source code:

Dim S As System.IO.StreamWriter
Dim O, P As Date
Dim R As Microsoft.Win32.RegistryKey
:
:

The source code infectors replicate in two stages. The first stage is the running of an already infected application with the embedded virus code. After the New() function is called in the infected program, the virus code will search for other Visual Basic .NET project source files on the system and copy its own source code into those files. In the second stage, Subit inserts a function call to run the virus body itself. As a result, the virus can multiply again after the compromised source is compiled and executed on a system.

The major problem with such viruses is that they can appear virtually anywhere in the application, inserted somewhere in the code flow. The code of the virus will be translated differently, depending on the language and the compiler version and options, making the virus look different in binary form on various systems.

3.14.1. Source Code Trojans

The idea of source-only viruses originates in the famous "self-reproducing program" ideas of Ken Thomson (co-author of the UNIX operating system). In his article, "Reflections on Trusting Trust,"49 Thomson introduced the idea of C programs, so-called "guines," that print an exact copy of their source as an output. The idea is nice and simple. The program source's code is defined as a string that is printed to the output with the printf() function.

Thomson also demonstrated a CC (C compiler) hack. The idea was to modify the source code of CC in such a way that whenever the modified compiler binary is used, it will do the following two things:

• Recognize when the source code of login was compiled and insert a Trojan function into the original source. The Trojanized version of login would let anybody log in to the system with his or her own password. Furthermore, it would let an attacker connect with a specific password for any user account.
• Introduce source modifications to the CC sources on the fly. Thus, the modification in the source code was available only during the compilation, and it was quickly removed after the compiler's source was compiled.

Source code infectors use the Thomson principle to inject themselves into application source files. Such viruses will be more relevant in the future as open source systems gain popularity.

3.15. Resource Dependency on Mac and Palm Platforms

Some computer viruses are dependent on system resources. For example, the Macintosh environment is a very rich platform of resources. Various functions are implemented in the form of resources that can be edited easily via Resource Editors. For instance, there is a menu definition resource on the Mac. Such definitions get invoked according to the applications' menu items. Macs store information in two forks for each file on the disk: the data fork and the resource fork. Resources, stored in the resource fork, contain code. Because even data files can contain resources on the Mac, the distinction between data and code files is not as clear-cut as it is for the PC, for example.

The MDEF (menu definition) viruses on the Apple Macintosh use the technique of replacing menu definitions with themselves. Thus, the virus code gets invoked whenever a particular menu is activated.

Table 3.3 contains common resource types on the Mac. It is an incomplete list of the most commonly attacked resources by malicious code on the Mac platform36.

Table 3.3. Common Resource Types on the Mac

Resource TypeDescription
CDEFControl Definition Function
DRVRDevice Driver
FMTRDisk Format Code
CODECode Segment
INITInitialization Code Resource
WDEFWindows Definition Function
FKEYCommand-Shift-Number Function
PTCHROM Patch Routine
MMAPMouse Function

Similar dependencies exists in the Palm viruses. The Palm stores executable applications in PRC files with special application resources. When the application is executed, the resources are accessed from it. In particular, the DATA and CODE resources are important for program execution. The virus Palm/Phage, discovered in September 2000, reads its own DATA and CODE resources and overwrites other applications resources with these. This resource dependency is very similar to the one on the Macintosh platforms.

3.16. Host Size Dependency

To infect applications accurately, many computer viruses have limits on how small or how large the applications they infect can be. For instance, COM files on DOS cannot load if they are larger than a code segment. Consequently, most DOS viruses introduce limits to avoid infecting files that would grow past acceptable limits if the virus code were included in them.

In other cases, viruses such as W95/Zmist use an upper size limitation, such as 400KB, for a file. This enhances the virus infection's reliability by reducing the risks involved in infecting files that are too large. Furthermore, host size dependency also can be used as an antigoat technique (see more details in Chapter 6, "Basic Self-Protection Strategies") to avoid test files that computer virus researchers use.

3.17. Debugger Dependency

Some viruses use an installed debugger, usually DEBUG.EXE of DOS, to convert themselves from textual to binary forms or simply to create binary files. Such threats typically use a piped debug script input to DEBUG, such as

DEBUG <debugs.txt

The input file contains DEBUG commands such as the following:

N example.com
E 100 c3 RCX
1
W
Q

This script would create a 1-byte long COM file containing a single RET instruction. A single RET instruction in a COM file is the shortest possible COM program. COM files are loaded to offset 0x100 of the program segment. Before the program segment, the PSP (program segment prefix) is located at offset 0; thus, a single RET instruction will give control to the top of the PSP, assuming that the stack is clear and a zero is popped. The trick is that the top of the PSP contains a 0xCD, 0x20 (INT 20 Return to DOS interrupt) pattern:

13BA:0000 CD20          INT     20

So whenever the execution of a program lands at offset 0, the program will simply terminate.

Note

The N command is used to name an output file. The E command is used to enter data to a memory offset. The CX register holds the lower 16-bit word of the file size, and BX holds the upper 16-bit word. The W command is used to write the content to a file. Finally, the Q command quits the debugger. Viruses typically use several lines of data that use the Enter command to create the malicious code in memory.

The virus writer, Vecna, used this approach in the W95/Fabi family to create EXE files using Microsoft Word macros and debug scripts in combination. From the infected MS Office documents, Fabi creates a new file in the root directory as FABI.DRV and uses the PRINT commands to print the debug script into it:

OPEN "C:\FABI.DRV" FOR OUTPUT AS 1
PRINT #1, "N C:\FABI.EX"
PRINT #1, "E 0100 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00"
PRINT #1, "E 0110 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00"

The content of the FABI.DRV will look like the following:

N C:\FABI.EX
E 0100 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00 ; DOS EXE header
E 0110 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00

[Virus body is cut from here]

E 4D20 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F 10 0F
E 4D30 10 0F 10 0F 10 0F 10 FF FF FF
RCX
4C3A
W
Q

Another BATCH file is also created by the macro in a manner similar to the debug script. This contains the command to drive DEBUG with the debug script:

DEBUG <C:\FABI.DRV >NUL

Note that DEBUG cannot create EXE files. At least, it cannot save them from memory with an EXE suffix. It can, however, save the content of memory easily without an EXE extension, which works when the file is loaded without an extension in the first place. This is the approach that W95/Fabi uses. It first saves the file with DEBUG as FABI.EX and uses yet another BATCH file to copy FABI.EX as FABI.EXE to run it.

Evidently, if DEBUG.EXE is not installed on the system or is renamed, some of these viruses cannot function completely or at all.

3.17.1. Intended Threats that Rely on a Debugger

Some malicious code might require the user to trace code in a debugger to replicate the virus. In some circumstances, this might happen easily in the case of macro threats. For instance, an error occurs during the execution of the malicious macro. Microsoft Word might then offer the user an option to run the macro debugger to resolve the cause of the problem. When the user selects the macro debugger command and traces the problem, the error might be bypassed. As a result, the virus code can replicate itself in this limited, special environment. There is an agreement between computer virus researchers, however, that such threats should be classified as intended.

Several binary viruses spread their own source code during replication. This technique can be found in worms that target systems where binary compatibility is not necessarily provided. To enhance the replication of such worms on more than one flavor of Linux, the Linux/Slapper worm replicates its own source code to new systems. First, it breaks into the system via an exploit code, and then it uses gcc to compile and link itself to a binary. The worm encodes its source on the attacker's system and copies that over to the target system's temporary folder as a hidden file. Then it uses the uudecode command to decode the file:

/usr/bin/uudecode -o /tmp/.bugtraq.c /tmp/.uubugtraq;

The source code is compiled on the target with the following command:

gcc -o /tmp/.bugtraq /tmp/.bugtraq.c lcrypto;

The virus needs the crypto library to link its code perfectly, so not only must gcc be installed with standard source and header files on the target system, but the appropriate crypto libraries must also be available. Otherwise, the worm will not be able to infect the target system properly, although it might successfully penetrate the target by exploiting an Open SSL vulnerability.

The advantage of the source code-based infection method is the enhanced compatibility with the target operating system version. Fortunately, these techniques also have disadvantages. For example, it is a good practice to avoid installing sources and compilers on the path (unless it is absolutely necessary), greatly reducing the impact of such threats. Many system administrators tend to overlook this problem because it looks like a good idea to keep compilers at hand.

3.19. Device Translator Layer Dependency

Many articles circulated that concluded that no Windows CE viruses would ever be implemented, and for many years we did not know of any such creations. However, in July 2004, the virus writer, Ratter, released the first proof of concept virus, WinCE/Duts.1520, to target this platform, as shown in Figure 3.13.

Figure 3.13. The message of the WinCE/Duts virus on an HP iPAQ H2200 Pocket PC.

Many recent devices run WinCE/Duts successfully because the ARM processor is available on a variety of devices, such as HP iPAQ H2200 (as well as many other iPAQ devices), the Sprint PCS Toshiba 2032SP, T-Mobile Pocket PC 2003, Toshiba e405, and Viewsonic V36, among others. Several additional GSM devices are built on the top of Pocket PC.

Interestingly, WinCE/Duts.1520 is able to infect Portable Executable files on several systems, despite the fact that the virus code looks "hard-coded" to a particular Windows CE release. For instance, the virus uses an ordinal-based function importing mechanism that would appear to be a serious limitation in attacking more than one flavor of Windows CE. In fact, it appears that the author of the virus believed that WinCE/Duts was only compatible with Windows CE 4. In our tests, however, we have seen the virus run correctly on Windows CE 3 as well.

It was not surprising that Windows CE was not attacked by viruses for so long. Windows CE was released on a variety of processors that create incompatibility issues (an inhomogeneous environment) and appear somewhat to limit the success of such viruses.

In addition, Windows CE does not support macros in Microsoft products such as Pocket Word or Pocket Excel, but there might be some troubling threats to come.

Prior to Windows CE 3.0, it was painful to create and distribute Windows CE programs because of binary compatibility issues. The compiled executables were developed in binary format as portable executable (PE) files, but the executable could only run on the processor on which it was compiled. So for each different device, the developer must compile a compatible binary. This can be a time-consuming process for both the developer and the user (who is impatient to install new executables).

The CPU dependency is hard-coded in the header of PE files. For instance, on the SH3 processor, the PE file header will contain the machine type 0x01A2, and its code section will contain compatible code only for that architecture.

Someone can easily create an application that is compiled to run on an SH3 platform; however, Windows CE was ported to support several processors, such as the SH3, SH4, MIPS, ARM, and so on. Consequently, a native Windows CE virus would be unable to spread easily among devices that use different processors. For example, WinCE/Duts.1520 will not infect SH3 processor-based systems.

Virus writers might be able to create a Win32 virus that drops a Windows CE virus via the Microsoft Active Sync. Such a virus could easily send mail and propagate its Intel version (with an embedded Pocket version), but it would only be able to infect a certain set of handheld devices that use a particular processor. In the future, this problem is going to be less of an issue for developers as more compatible processors are released. For example, the new XScale processors are compatible with the ARM series. XScale appears not only in Pocket PC systems, but in Palm devices as well. Obviously, this opens up possibilities for the attackers to create "cross-platform" viruses to target Palm as well as Pocket PC systems with the same virus.

Microsoft developed a new feature on the Pocket PC that made the Windows CE developers' jobs easier. In the Pocket PC, Microsoft started to support a new executable file format: the common executable file (CEF) format.

CEF executables can be compiled with Windows CE development tools, such as eMbedded Visual C++ 3.0. A CEF executable is basically a special kind of PE file. CEF is a processor-neutral code format that enables the creation of portable applications across CPUs supported by Windows CE. In fact, CEF contains MSIL code.

In eMbedded Visual C++, CEF tools (compilers, linkers, and SDK) are made available to the developer the same way that a specific CPU target (such as MIPS or ARM) is selected. When a developer compiles a CEF application, the compiler and linker do everything but generate machine-specific code. You still get a DLL or EXE, but the file contains intermediate language instructions instead of native machine code instructions.

CEF enables WindowsCE application developers to deliver products that support all the CPU architectures that run the WindowsCE 3.0 and above operating systems. Because CEF is an intermediate language, processor vendors can easily add a new CPU family that runs CEF applications. For instance, HP Jornada 540 comes with such a built-in device translator layer. The CEF file might have an EXE extension when distributed, so nothing really changes from the user's perspective.

The device translator is specific to a particular processor and WindowsCE device. The device version normally translates a CEF executable to the native code of that processor when the user installs the CEF executable on the device. This occurs seamlessly, without any indication to the user, other than a brief pause for translation after the executable is clicked on. An operating system hook catches any attempt to load and execute a CEF EXE, DLL, or OCX file automatically and invokes the translator before running the file.

For example, if the Pocket PC is built on an SH3 processor, the translator layer will attempt to compile the CEF file to an SH3 format. The actual CEF executable will be replaced by its compiled SH3 native version, changing the content of the file completely to a native executable. Indeed, the first reincarnation of MSIL, JIT (Just-in-Time) compiling on Pocket PC rewrites the executables themselves on the file system.

Obviously, virus writers might take advantage of the CEF format in the near future. A 32-bit Windows virus could easily install a CEF version of itself to the Pocket device, allowing it to run on all Pocket PC devices because the OS would translate the CEF executable to native format. We can only hope that CEF will not be supported on systems other than Windows CE. A desktop implementation, for example, would be very painful to see in case the operating system would rewrite CEF objects to native executables.

Because executables are converted to new formats on the fly, the content of the file changes. This is an even bigger problem than the up-conversion of Macro viruses in Office products50. Obviously, this is going to be a challenge for antivirus software, integrity checkers, and behavior blocker systems.

First of all, it is clearly a major problem for antivirus software given that the virus code needs to be detected and identified in all possible native translations as well as the original MSIL form. If the MSIL virus is executed on a device, before a signature of the virus is known to the antivirus program, the virus will run and its code will be converted to any of a number of native formats according to the actual type of the system. As a result, the MSIL signature of the virus will not be useful to find the virus afterwards. The virus needs to be detected in all possible native translations as well, but this task is not trivial.

It is a problem for the integrity checkers because the content of the program changes on the disk, not only in memory. As a result, integrity checkers cannot be sure if the change was the result of a virus infection or a simple native code translation. Finally, it is a problem for behavior blocker systems because the content of an executable is changed on the disk, which easily can be confused with virus activity.

3.20. Embedded Object Insertion Dependency

The first known binary virus that could infect Word 6 documents, called Anarchy.609351, appeared in 1997. Not surprisingly, we have not seen many other viruses like this because attacking the document formats to add macros to them is no trivial task. Anarchy was a DOS-based COM, EXE, and DOC file infector.

The first virus to infect VBA documents from binary code was released from Russia. The virus is called {Win32,W97M}/Beast.41472.A, and it appeared in the wild in April 1999. The virus is written in Borland Delphi and compiled to 32-bit PE format.

Beast uses a different means of infection than other binary viruses that infect documents. Instead of having it attack the VBA format on a bit-by-bit level, the Beast author used OLE (Object Linking and Embedding) APIs, such as AddOLEObject(), to inject macro code and embedded executable code into documents by using the internal OLE support of Microsoft Word. Via OLE support, the virus injects an embedded object (executable) into VBA documents. However, this embedded object will not be visible to the user, as it normally would. This is because the virus uses a trick to hide the icon of the embedded object.

The virus looks for actively opened documents in Word. When a handle to an active document is available, the virus calls its infection module. First it tries to check whether there are no embedded objects in the document, but in some cases this routine fails because the virus might have added multiple embedded executables into the documents.

Next, Beast tries to add itself as C:\I.EXE shape into the document, named 3BEPb (Russian for beast). If this procedure goes as planned, then a new macro called AutoOpen()also will be injected into the document.

The execution of the embedded object is facilitated by using the Activate method for the 3BEPb shape in the active document:

ActiveDocument.Shapes("3BEPb").Activate

Beast introduced the need for detection and removal of malicious embedded objects in documents not the simplest problem to solve.

3.21. Self-Contained Environment Dependency

One interesting dependency appears when malicious code carries its own environment to the platform. The W32/Franvir virus family offers a good example.

Franvir is clearly a Win32 application. It is compiled with Borland Delphi to a 32-bit PE program. However, the actual Win32 binary part is known as the Game Maker, written by Mark Overmars of the Netherlands ( http://www.cs.uu.nl/people/markov/gmaker/doc.html ).

The Franvir virus was written by a French virus writer using the script language of Game Maker, called GML (Game Maker Language). This is only available in the registered version of Game Maker, which provides developers with security options for using these functions (turning them on and off). It is up to the developer to set the security settings; therefore, a malicious author can easily use GML of Game Maker for virus writing.

Game Maker is a professional game developer environment. Hundreds of brilliant games have been created in it by professionals. It can be used to develop all kinds of games, including scrolling shooters, puzzle games, and even isometric games. For instance, the shooter game called Doomed was created using Game Maker (see Figure 3.14).

Figure 3.14. Doomed in action.

GML provides functions for Registry, File, and program execution. The File operation functions are extremely rich and provide high flexibility for game developers to install and execute programsbut they also can be used by malicious attackers. Some of the functions of GML include the following:

• file_exists(fname)
• file_delete(fname)
• file_copy(fname,newname)
• file_open_write(fname)
• directory_create(dname)
• file_find_next()
• file_attributes(fname,attr)
• registry_write_string_ext()

GML scripts are stored in the resources of Game Maker, but they are accessed and executed by the environment, the interpreter in Game Maker itself. Franvir is an encrypted GML script. It copies itself all over the hard disk under various existing program names. It also installs itself to local P2P (peer-to-peer) folders or even creates the shared folder for KaZaA if the directory is not installed ("kazaa\my shared folder") and changes the KaZaA settings to share the folder. Furthermore, it does damage by deleting the win.com file of Windows. Thus, ultimately Franvir must be classified as a Win32 P2P worm. In reality, however, it is a GML script that is carried by its own environment to new platforms. When the virus successfully executes, it eventually uses the show_message() function to display the false error message shown in Figure 3.15.

Figure 3.15. The false error message of Franvir.

The virus could ultimately offer to play a game such as the DOS virus, Playgame, instead of executing the malicious file delete action as an activation routine, but well... what can we expect from a typical virus writer?

3.22. Multipartite Viruses

The first virus that infected COM files and boot sectors, Ghostball, was discovered by Fridrik Skulason in October 1989. Another early example of a multipartite virus was Tequila. Tequila could infect DOS EXE files as well as the MBR (master boot sector) of hard disks.

Multipartite viruses are often tricky and hard to remove. For instance, the Junkie virus infects COM files and is also a boot virus. Junkie can infect COM files on the hidden partitions52 that some computer manufacturers use to hide data and extra code by marking the partition entries specifically. Because Junkie loads to memory before these hidden files are accessed, these files can get infected easily. Scanners typically scan the content of the visible partitions only, so such infections often lead to mysterious reinfections of the system. This is because the virus has been cleaned from everywhere but from the hidden partition, so the virus can infect the system again as soon as the hidden partition is used to run one of the infected COM files.

In the past, boot and multipartite viruses were especially successful at infecting machines that used the DOS operating system. On modern Windows systems, such viruses are less of a threat, but they do exist.

The Memorial virus53 introduced DOS COM, EXE, and PE infection techniques in the same virus. The payload of the Memorial virus is show in Figure 3.16.

Figure 3.16. The message of the W95/Memorial virus.

W95/Memorial also used the VxD (Virtual Device Driver) format of Windows 9x systems to load itself into kernel mode and hook the file system to infect files on the fly whenever they were accessed. As a result, Memorial also infects 16-bit and 32-bit files.

Another interesting example of a multipartite infection is the Russian virus, 3APA3A, which was found in the wild in Moscow in October 199454. 3APA3A is a normal boot virus on a diskette, occupying two sectors for itself, but it uses a special infection method on the hard disk. It infects the DOS core file IO.SYS. First it makes a copy of IO.SYS, and then it overwrites the original. After the infection, the root directory contains two IO.SYS files, but the first is set as a volume label of the disk; thus, the DIR command does not display two files, but a volume label "IO SYS" and a single IO.SYS file. The point is to trick DOS into loading the infected copy of IO.SYS. Then the virus starts the original one after itself. This happens because DOS will load the first IO.SYS file regardless of its attributes. This method represents a special subclass of companion infection techniques.

3.23. Conclusion

New viral environments are discovered each year. Over the last 20 years of PC viruses, there has been tremendous dark energy in place to develop computer viruses for almost every platform imaginable. All over the world, thousands of people created computer viruses. Because of this we are experiencing an ever-growing security problem with malicious code and, consequently, seeing the development of computer virus research as a new scientific field. There is absolutely no question whether computer viruses will stay with us and evolve to future platforms in the upcoming decades.

Fred Cohen's initial research with computer viruses in 1984 concluded that the computer virus problem is ultimately an integrity problem. Over the last 20 years, the scope of integrity expanded dramatically from file integrity to the integrity of applications and operating system software. Modern computer viruses, such as W32/CodeRed and W32/Slammer, clearly indicate this new era: Computer viruses cannot be controlled by file-based integrity checking alone because they jump from system to system over the network, injecting themselves into new process address spaces in such a way that they are never stored on the disk.

Computer viruses changing their environments to suit their needs is a problem that will likely begin to emerge. For example, the W32/Perrun virus appends itself to JPEG picture files. Normally, pictures files are not infectious unless some serious vulnerability condition exists in a picture file viewer (such as the one described in Microsoft Security Bulletin MS04-02855). However, Perrun modifies the environment of the infected host to include an extractor component, resulting in Perrun-compromised JPEG files not being infectious on a clean system but on infected computers only. Such computer viruses can modify the host's environment in such a way that previous assumptions about the environments no longer hold.

References

1. Dr. Vesselin Bontchev, "Methodology of Computer Anti-Virus Research," University of Hamburg, Dissertation, 1998.

2. Dr. Harold Highland, "A Macro Virus," Computers & Security, August 1989, pp. 178-188.

3. Joe Wells, "Brief History of Computer Viruses," 1996, http://www.research.ibm.com/antivirus/timeline.htm.

4. Dr. Peter Lammer, "Jonah's Journey," Virus Bulletin, November 1990, p.20.

5. Peter Ferrie, personal communication, 2004.

6. Jim Bates, "WHALE...A Dinosaur Heading For Extinction," Virus Bulletin, November 1990, pp. 17-19.

7. Eric Chien, "Malicious Threats to Personal Digital Assistants," Symantec, 2000.

8. Dr. Alan Solomon, "A Brief History of Viruses," EICAR, 1994, pp. 117-129.

9. Intel Pentium Processor III Specification Update, http://www.intel.com/design/PentiumIII/specupdt/24445349.pdf.

10. Mikko Hypponen, Private Communication, 1996.

11. Thomas Lipp, "Computerviren," 64'er, Markt&Technik, March 1989.

12. Peter Szor, "Stream of Consciousness," Virus Bulletin, October 2000, p.6.

13. Peter Szor and Peter Ferrie, "64-bit Rugrats," Virus Bulletin, July 2004, pp. 4-6.

14. Marious Van Oers, "Linux VirusesELF File Format," Virus Bulletin Conference, 2000, pp. 381-400.

15. Jakub Kaminski, "Not So Quiet on the Linux Front: Linux Malware II," Virus Bulletin Conference, 2001, pp. 147-172.

16. Eugene Kaspersky, "Shifter.983," http://www.viruslist.com, 1993.

17. Sarah Gordon, "What a (Winword.) Concept," Virus Bulletin, September 1995, pp. 8-9.

18. Sarah Gordon, "Excel Yourself!" Virus Bulletin, August 1996, pp. 9-10.

19. Yoshihiro Yasuda, personal communication, 2004.

20. Dr. Igor Muttik, "Macro VirusesPart 1," Virus Bulletin, September 1999, pp. 13-14.

21. Dr. Vesselin Bontchev, "The Pros and Cons of WordBasic Virus Up-conversion," Virus Bulletin Conference, 1998, pp. 153-172.

22. Dr. Vesselin Bontchev, "Possible Macro Virus Attacks and How to Prevent Them," Virus Bulletin Conference, 1996, pp. 97-127.

23. Dr. Vesselin Bontchev, "Solving the VBA Up-conversion Problem," Virus Bulletin Conference, 2001, pp. 273-300.

24. Nick FitzGerald, "If the CAP Fits," Virus Bulletin, September 1999, pp.6-7.

25. Jimmy Kuo, "Free Anti-Virus Tips and Techniques: Common Sense to Protect Yourself from Macro Viruses," NAI White Paper, 2000.

26. Dr. Vesselin Bontchev, personal communication, 2004.

27. Jakub Kaminski, "Disappearing MacrosNatural Devolution of Up-converted Macro Viruses," Virus Bulletin Conference, 1998, pp. 139-151.

28. Katrin Tocheva, "Multiple Infections," Virus Bulletin, 1999, pp.301-314.

29. Dr. Richard Ford, "Richard's Problem," private communication on VMACRO mailing list, 1997.

30. Dr. Vesselin Bontchev, "Macro Virus Identification Problems," Virus Bulletin Conference, 1997, pp. 157-196.

31. Dr. Vesselin Bontchev, private communication, 1998.

32. Vesselin Bontchev, "No Peace on the Excel Front," Virus Bulletin, April 1998, pp. 16-17.

33. Gabor Szappanos, "XML Heaven," Virus Bulletin, February 2003, pp. 8-9.

34. Peter G. Capek, David M. Chess, Alan Fedeli, and Dr. Steve R. White, "Merry Christmas: An Early Network Worm," IEEE Security & Privacy, http://www.computer.org/security/v1n5/j5cap.htm.

35. Dr. Klaus Brunnstein, "Computer 'Beastware': Trojan Horses, Viruses, Worms A Survey," HISEC'93, 1993.

36. David Ferbrache, "A Pathology of Computer Viruses," Springer-Verlag, 1992, ISBN: 3-540-19610-2.

37. Peter Szor, "Warped Logic?" Virus Bulletin, June 2001, pp. 5-6.

38. Mikhail Pavlyushchik, "Virus Mapping," Virus Bulletin, November 2003, pp. 4-5.

39. Eugene Kaspersky, "Don't Press F1," Virus Bulletin, January 2000, pp.7-8.

40. Peter Szor, "Sharpei Behaviour," Virus Bulletin, April 2002, pp. 4-5.

41. Gabor Kiss, "SWF/LFM-926Flash in the Pan?" Virus Bulletin, February 2002, p. 6.

42. Dmitry Gryaznov, private communication, 2004.

43. Sami Rautiainen, private communication, 2004.

44. Philip Hannay and Richard Wang , "MSIL for the .NET Framework: The Next Battleground?," Virus Bulletin Conference, 2001, pp. 173-196.

45. Peter Szor, "Tasting Donut," Virus Bulletin, March 2002, pp. 6-8.

46. Peter Ferrie, "The Beagle Has Landed," http://www.virusbtn.com/resources/viruses/indepth/beagle.xml.

47. Eugene Kaspersky, "Zhengxi: Saucerful of Secrets," Virus Bulletin, April 1996, pp. 8-10.

48. Roger A. Grimes, Malicious Mobile Code, O'Reilly, 2001, ISBN: 1-56592-682-X (Paperback).

49. Ken Thomson, "Reflections on Trusting Trust," Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763, http://cm.bell-labs.com/who/ken/trust.html.

50. Peter Szor, "Pocket Monsters," Virus Bulletin, August 2001, pp. 8-9.

51. Igor Daniloff, "Anarchy in the USSR," Virus Bulletin, October 1997, pp.6-8.

52. Lakub Kaminski, "Hidden Partitions vs. Multipartite Viruses I'll be back!," Virus Bulletin Conference, 1996.

53. Peter Szor, "Junkie Memorial," Virus Bulletin, September 1997, pp. 6-8.

54. Dr. Igor Muttik , "3apa3a," http://www.f-secure.com/v-descs/3apa3a.shtml, 1994.

55. "Buffer Overrun in JPEG Processing (GDI+) Could Allow Code Execution," MS04-028, http://www.microsoft.com/technet/security/bulletin/ms04-028.mspx.

Chapter 4. Classification of Infection Strategies

"All art is an imitation of nature."

Seneca

In this chapter, you will learn about common computer virus infection techniques that target various file formats and system areas.

4.1. Boot Viruses

The first known successful computer viruses were boot sector viruses. In 1986 two Pakistani brothers, on the IBM PC, created the first such virus called Brain.

Today the boot infection technique is rarely used. However, you should become familiar with boot viruses because they can infect a computer regardless of the actual operating system installed on it.

Boot sector viruses take advantage of the boot process of personal computers (PCs). Because most computers do not contain an operating system (OS) in their read-only memory (ROM), they need to load the system from somewhere else, such as from a disk or from the network (via a network adapter).

A typical IBM PC's disk is organized in up to four partitions, which have logical letters assigned to them on several operating systems such as MS-DOS and Windows NT, typically C:, D:, and so on. (Drive letters are particularities of the operating systemfor example, UNIX systems use mount points, not driver letters.) Most computers only use two of these partitions, which can be accessed easily. Some computer vendors, such as COMPAQ and IBM, often use hidden partitions to store additional BIOS setup tools on the disk. Hidden partitions do not have any logical names assigned to them, making them more difficult to access. Good tools such as Norton Disk Editor can reveal such areas of the disk. (Please use advanced disk tools very carefully because you can easily harm your data!)

Typically PCs load the OS from the hard drive. In early systems, however, the boot order could not be defined, and thus the machine would boot from the diskette, allowing great opportunity for computer viruses to load before the OS. The ROM-BIOS reads the first sector of the specified boot disk according to the boot order settings in the BIOS setup, stores it in the memory at 0:0x7C00 when successful, and runs the loaded code1.

On newer systems, each partition is further divided into additional partitions. The disk is always divided into heads, tracks, and sectors. The master boot record (MBR) is located at head 0, track 0, sector 1, which is the first sector on the hard disk. The MBR contains generic, processor-specific code to locate the active boot partition from partition table (PT) records. The PT is stored in the data area of the MBR. At the front of the MBR is some tiny code, often called a boot strap loader.

Each PT entry contains the following:

• The addresses of the first and last sectors of the partition
• A flag whenever the partition is bootable
• A type byte
• The offset of the first sector of the partition from the beginning of the disk in sectors
• The size of the partition in sectors

The loader locates the active partition and loads its first logical sector as the boot sector. The boot sector contains OS-specific code. The MBR is general-purpose code, not related to any OS. Thus IBM PCs can easily support more than one partition with different kinds of file systems and operating systems. This also makes the job of computer viruses very simple. The MBR code can be easily replaced with virus code that loads the original MBR after itself and stays in memory, depending on the installed operating system. In the case of MS-DOS, boot viruses can easily remain in memory and infect other inserted media on the fly. A few tricky boot viruses, like Exebug, always force the computer to load them on the system first and then complete the boot process themselves. Exebug changes the CMOS settings of the BIOS to trick the PC into thinking it has no floppy drives. Thus, the PC will boot using the infected MBR first. When the virus is executed (from the hard disk), it checks if there is a diskette in drive A:, and if there is one, it will load the boot sector of the diskette and transfer control to it. Thus when you try to boot from a boot diskette, the virus can trick you into believing that you indeed booted from the diskette, but in reality, you did not.

In the case of floppy diskettes, the boot sector is the first sector of the diskette. The boot record contains OS-specific filenames to load, such as IBMBIO.COM and IBMDOS.COM.

It is advisable to set the boot process in such a way that you boot from the hard drive first. In first-generation IBM PCs, the boot process was not designed that way, so whenever a diskette was left in drive A:, the PC attempted to boot from it. Boot viruses took advantage of this design mistake. By setting the boot process properly, you can easily avoid simple boot sector viruses.

Note

If your system has a SCSI disk connected to it, the system might not boot from those drives first because it is unable to handle these disks directly from its BIOS.

The following sections discuss in detail the basic kinds of MBR and boot sector infection techniques.

4.1.1. Master Boot Record (MBR) Infection Techniques

Infection of the MBR is a relatively trivial task for viruses. The size of the MBR is 512 bytes. Only a short code fits in there, but it is more than enough for a small virus. Typically the MBR gets infected immediately upon booting from an infected diskette in drive A.

4.1.1.1 MBR Infection by Replacement of Boot Strap Code

The classic type of MBR viruses uses the INT 13h BIOS disk routine to access the disks for read and write access. Most MBR infectors replace the boot strap code in the front of the MBR with their own copy and do not change the PT. This is important, because the hard disk is only accessible when booting from a diskette whenever the PT is in place. Otherwise, DOS has no way to find the data on the drive.

The Stoned virus is a typical example of this technique. The virus stores the original MBR on sector 7 (see Figure 4.1). After the virus gets control via the replaced MBR, it reads the stored MBR located on sector 7 in memory and gives it control. A couple of empty sectors are typically available after the MBR, and Stoned takes advantage of this. However, this condition cannot be 100% guaranteed, and this is exactly why some MBR viruses make a system unbootable after infection.

Figure 4.1. The typical layout of the disk before and after a Stoned infection.

4.1.1.2 Replacing the MBR Code but Not Saving It

Another technique of viruses to infect the MBR is to overwrite the boot strap code, leaving the PT entries in place but not saving the original MBR anywhere. Such viruses need to perform the function of the original MBR code. In particular, they need to locate the active partition, load it, and give control to it after themselves.

One of the first viruses that used this technique was Azusa2, discovered in January of 1991 in Ontario, Canada. Viruses like this cannot be disinfected with regular methods because the original copy of the MBR is not stored anywhere.

Antivirus programs quickly reacted to this threat by carrying a standard MBR code within them. To disinfect the virus, this generic MBR code was used to overwrite the virus code, thereby saving the system.

4.1.1.3 Infecting the MBR by Changing the PT Entries

An easy target of MBR viruses is the partition table record of the MBR. By manipulating the PT entry of the active partition, a virus can make sure it loads a different boot sector, where the virus body is stored. Thus the MBR will load the virus boot sector instead of the original one, and the virus will load the original after itself.

The StarShip virus is an example of this technique. Some tricky viruses, such as some members of the Ginger family, manipulate the PT entries in such a way as to create a "circular partition"3,4 effect. Apparently this trick causes MS-DOS v4.07.0 to run in an endless loop when booted. Thus only a clean MS-DOS 3.3x or some other non-Microsoft-made DOS system, such as PC DOS, must be used to be able to boot properly from a diskette.

4.1.1.4 Saving the MBR to the End of the Hard Disk

A common method of infecting the MBR is to replace the MBR completely and save the original at the end of the hard drive, in the hope that nothing overwrites it there. Some of the more careful viruses reduce the size of the partition to make sure that that this area of the disk will not be overwritten again. The multipartite virus, Tequila, uses this technique.

4.1.2. DOS BOOT Record (DBR) Infection Techniques

Boot sector viruses infect the first sector, the boot sector of the diskettes. They optionally infect the hard-disk boot sectors, as well. There are more known infection techniques to infect boot sectors than there are to infect MBRs.

4.1.2.1 Standard Boot Infection Technique

One of the most frequently used boot infection techniques was developed in viruses like Stoned. Stoned infects a diskette's boot sector by replacing the 512-byte boot sector with its own copy and saving the original to the end of the root directory.

In practice, this technique is safe most of the time, but accidental damage to the content of the diskette can happen if there are too many filenames stored in the diskette's directory. In such a case, the original sector's content might overwrite the content of the directory; as a result, only some garbage is displayed on-screen via a DIR command.

4.1.2.2 Boot Viruses That Format Extra Sectors

Some boot viruses are simply too large to fit in a single sector. Most diskettes can be formatted to store more data than their actual formatted size. Not all floppy disk drives support the formatting of extra sectors, but many do. For example, my first PC clone's diskette drive did not support the access to these areas of diskettes. As a result, some copy-protected software simply did not work properly on my system.

Copy-protection software often takes advantage of specially formatted "extra" diskette sectors placed outside of normal ranges. As a result, normal diskette copying tools, such as DISKCOPY, fail to make an identical copy of such diskettes.

Some viruses specially format a set of extra diskette sectors to make it more difficult for the antivirus program to access the original copy during repair. However, the typical use of extra sectors is to make more space for a larger virus body.

The Indonesian virus, Denzuko, is an example that uses this technique. Denzuko was released during the spring of 1988. Unlike with most other viruses, the author of this virus is known. It was written by Denny Yanuar Ramdhani. The nickname of the virus writer is Denny Zuko, which comes from "Danny Zuko," the character in the popular musical movie Grease played by John Travolta5. This boot virus was among the first to implement a counterattack against another computer virus. Denzuko killed the Brain virus whenever it encountered it on a computer.

Denzuko also displayed the graphical payload shown in Figure 4.2 for a fraction of a second when Ctrl-Alt-Del was pressed. Then the computer appeared to reboot, but the virus stayed in memory6.

Figure 4.2. Payload of the Denzuko virus.

The extremely complex and dangerous Hungarian stealth BOOT/MBR virus, Töltögetö (also known as Filler), uses this technique as well. This virus was written by a computer student at a technical high school in SzE8kesfehE8rvE1r, Hungary, in 1991. Filler has formatting records for both 360KB and 1.2MB diskettes and format sectors on track 40 or 80 on these, respectively. These areas of the diskette are not formatted normally.

A benefit of such an infection technique is the possibility of reviving dead virus code. Reviving attempts were first seen in computer viruses in the early '90s. For example, some COM infector viruses would attempt to load to the very end of the disk, outside of normally formatted areas, and give control to the loaded sector. Many early antivirus solutions did not overwrite the virus code everywhere on the disk during cleanup. The boot sector of the disk was often fixed, and the virus code was considered dead in the diskettes' "out of reach" areas. Unfortunately, this provided the advantage of allowing virus writers to revive such dead virus instances easily, using another virus.

4.1.2.3 Boot Viruses That Mark Sectors as BAD

An interesting method of viruses to infect boot sectors is to replace the original boot sector with the virus code and save the original sector, or additional parts of the virus body, in an unused cluster marked as BAD in the DOS FAT. An example of this kind of virus is the rather dangerous Disk Killer, written in April 19897.

4.1.2.4 Boot Viruses That Do Not Store the Original Boot Sector

Some boot sector viruses do not save the diskette's original boot sector anywhere. Instead, they simply infect the active boot sector or the MBR of the hard disk and give control to saved boot sectors on the hard disk. Thus the diskette infection cannot be repaired with standard techniques because the virus does not need to store the original sector anywhere. Because the boot sector is operating systemspecific, this task is not as simple as replacing the MBR code; there are too many different OS boot sectors to choose from. Not surprisingly, the most common antivirus solution to this problem has been to overwrite the virus code with a generic boot sector code that displays a message asking the user to boot from the hard disk instead. As a result, a system diskette cannot be repaired properly.

A second, less common method is to overwrite the diskette boot sector with the virus code, which will infect the MBR or the boot sector of the hard disk. The virus then displays a false error message, such as "Non-system disk or disk error," and lets the user load the virus from the hard disk. The Strike virus is an example that uses this technique.

A further method to infect the boot sector of diskettes without saving is to mimic the original boot sector functionality and attempt to load some system files. Obviously, this method will only work if the virus code matches the system files on the diskette. The Lucifer virus is an example of this technique.

4.1.2.5 Boot Viruses That Store at the End of Disks

A class of boot viruses replaces the original boot sector by overwriting it and saving it at the end of the hard disk, like MBR viruses, which also do this occasionally. The infamous Form virus uses this method. It saves the original boot sector at the very end of the disk. Form hopes that this sector will be used infrequently, or not at all, and thus the stored boot sector will stay on the disk without too much risk of being modified. Thus the virus does not mark this sector in any way; neither does it reduce the size of the partition that contains the saved sector.

Another class of boot viruses also saves the boot sector at the end of the active partition and makes the partition shorter in the partition table to be certain that this sector is not going to be "free" for other programs to use. Occasionally, the boot sector's data area is modified for the same reason.

4.1.3. Boot Viruses That Work While Windows 95 Is Active

Several boot viruses, typically the multipartite kind, attack the new floppy disk driver of Windows 95 systems stored in \SYSTEM\IOSUBSYS\HSFLOP.PDR. The technique appeared in the Slovenian virus family called Hare (also known as Krishna) in May of 1996, written by virus writer Demon Emperor.

Viruses delete this file to get access to INT 13h, BIOS, real-mode interrupt handler while Windows 95 is active on the system. Without this trick, other boot viruses cannot infect the diskettes using INT 13h because it is not available for them to use.

4.1.4. Possible Boot Image Attacks in Network Environments

Diskless workstations boot using a file image from the server. On Novell NetWare file servers, for instance, the command DOSGEN.EXE can create an image of a bootable diskette, called NET$DOS.SYS, for the use of terminals. The terminals have a special PROM chip installed that searches for the boot images over the network. This provides two obvious possibilities for the attacker. The first is to infect or replace the NET$DOS.SYS file on the server whenever access is available to it. The second is to simulate the functionality of the server code and host fake virtual servers via virus code on the network with images that contain virus code.

No such viruses are known. However, the NETDOS.SYS image file is often infected, which is ignored by many virus scanners. This exposes the "dumb terminals" to virus attacks. 4.2. File Infection Techniques In this section, you will learn about the common virus infection strategies that virus writers8 have used over the years to invade new host systems. 4.2.1. Overwriting Viruses Some viruses simply locate another file on the disk and overwrite it with their own copy. Of course, this is a very primitive technique, but it is certainly the easiest approach of all. Such simple viruses can do major damage when they overwrite files on the entire disk. Overwriting viruses cannot be disinfected from a system. Infected files must be deleted from the disk and restored from backups. Figure 4.3 shows how the content of the host program changes when an overwriting virus attacks it. Figure 4.3. An overwriting virus infection that changes host size. Normally, overwriting viruses are not very successful threats because the obvious side effects of the infections are easily discovered by users. However, such viruses have better potential when this technique is combined with network-based propagation. For instance, the [email protected] virus mass mails itself to other systems. When executed, it will overwrite with its own copy any local files with the following extensions: .vbs, .vbe, .js, .jse, .css, .wsh, .sct, .hta, .jpg, .jpeg, .wav, .txt, .gif, .doc, .htm, .html, .xls, .ini, .bat, .com, .avi, .qt, .mpg, .mpeg, .cpp, .c, .h, .swd, .psd, .wri, .mp3, and .mp2 Another overwriting virus infection method is used by the so-called tiny viruses. A classic family of this type is the Trivial family on DOS. During the early 1990s, many virus writers attempted to write the shortest possible binary virus. Not surprisingly, there are many variants of Trivial. Some of the viruses are as short as 22 bytes (Trivial.22). The algorithm for such viruses is simple: 1. Search for any (*.*) new host file in the current directory. 2. Open the file for writing. 3. Write the virus code on top of the host program. The shortest viruses are often unable to infect more than a single host program in the same directory in which the virus was executed. This is because finding the next host file would be "as expensive" as a couple of bytes of extra code. Such viruses are not advanced enough to attack a file marked read-only because that would take a couple of extra instructions. Often the virus code is optimized to take advantage of the content of the registers during program execution as they are passed in by the operating system. Thus the virus code itself does not need to initialize registers that have known content set by the system loader. By using this condition, virus writers can make their creation even shorter. Such optimization, however, can cause fatal errors when the virus code is executed on the wrong platform, which did not initialize the registers in the way that the virus expected. Some tricky overwriting viruses also use BIOS disk writes instead of DOS file functions to infect new files. A very primitive form of such a virus was implemented in 15 bytes. The virus overwrites each sector on the disk with itself. Evidently, the system corruption is so major that such viruses kill the host system very quickly, keeping the virus from spreading any further. Figure 4.4 illustrates an overwriting virus that simply overwrites the beginning of the host but does not change its size. Figure 4.4. An overwriting virus that does not change the size of the host. 4.2.2. Random Overwriting Viruses Another rare variation of the overwriting method does not change the code of the program at the top of the host file. Instead, the virus seeks to a random location in the host program and overwrites the file with itself at that location. Evidently, the virus code might not even get control during execution of the host. In both cases, the host program is lost during the virus's attack and often crashes before the virus code can execute. An example of this virus is the Russian virus Omud9, as shown in Figure 4.5. Figure 4.5. A random overwriter virus. To improve performance by reducing the disk I/O, modern antivirus scanners are optimized to find viruses at "well-known" locations of the file whenever possible. Thus random overwriting viruses are often problematic for scanners to find because a scanner would need to scan the contents of the complete host program for the virus code, which is too I/O expensive. 4.2.3. Appending Viruses A very typical DOS COM file infection technique is called normal COM. In this technique, a jump (JMP) instruction is inserted at the front of the host to point to the end of the original host. A typical example of this virus is Vienna, which was published in Ralf Burger's computer virus book in a slightly modified form with its source code. This was back in 1986-1987. The technique gets its name from the location of the virus body, which is appended to the end of the file. (It is interesting to note that some viruses infect EXE files as COM by first converting the EXE file to a COM file. The Vacsina virus family uses this technique.) The jump instruction is sometimes replaced with equally functional instructions, such as the following: 1. CALL start_of_virus 2. PUSH offset start_of_virus RET The first three overwritten bytes at the top of the host program (sometimes 416) are stored in the virus body. When the virus-infected program is executed, the virus loads in memory with the actual infected host. The jump instruction directs control to the virus body, and then the virus typically replicates itself by locating new host programs on the disk or by executing some sort of activation routine (also called a trigger). Finally, the virus virtually cleans the program in memory by copying the original bytes to offset CS:0x100 (the location where the COM files are loaded) and executes the original program by jumping back to CS:0x100. The COM files are loaded to CS:0x100 because the program segment prefix (PSP) is placed at CS:0CS:0xFF. Figure 4.6 shows how a DOS COM appender virus infects a host program. Figure 4.6. A typical DOS COM appender virus. Obviously the appender technique can be implemented for any other type of executable file, such as EXE, NE, PE, and ELF formats, and so on. Such files have a header section that stores the address of the main entry point, which, in most cases, will be replaced with a new entry point to the start of the virus code appended to the end of the file. Section 4.3 is dedicated to Win32 infection techniques to demonstrate the principles of file infection techniques in modern file formats. These formats often have complicated internal structures offering many more opportunities to attackers. 4.2.4. Prepending Viruses A common virus infection technique uses the principle of inserting virus code at the front of host programs. Such viruses are called prepending viruses. This is a simple kind of infection, and it is often very successful. Virus writers have implemented it on various operating systems, causing major virus outbreaks in many. An example of a COM prepender virus is the Hungarian virus Polimer.512.A, which prepends itself, 512 bytes long, at the front of the executable and shifts the original program content to follow itself. Let's take a look at the front of the Polimer virus in DOS DEBUG. Polimer is a good example to study because the top of the virus code is a completely harmless data area with a message that is displayed onscreen during execution of infected programs. >DEBUG polimer.com -d 142F:0100 E9 80 00 00 3F 3F 3F 3F-3F 3F 3F 3F 43 4F 4D 00 ....????????COM. 142F:0110 05 00 00 00 2E 8B 26 68-01 00 00 00 00 00 00 00 ......&h........ 142F:0120 00 00 00 00 00 00 00 00-41 20 6C 65 27 6A 6F 62 ........A le'job 142F:0130 62 20 6B 61 7A 65 74 74-61 20 61 20 50 4F 4C 49 b kazetta a POLI 142F:0140 4D 45 52 20 6B 61 7A 65-74 74 61 20 21 20 20 20 MER kazetta ! 142F:0150 56 65 67 79 65 20 65 7A-74 20 21 20 20 20 20 0A Vegye ezt ! . The virus body is loaded to offset 0x100 in memory. The virus code starts with a jump (0xe9) instruction to give control to the virus code after its own data area. Because Polimer is 512 bytes (0x200) long, at the front of COM executable, offset 0x300 in memory should be the original host program (0x100+0x200=0x300). Indeed, in this example, the actual infected host is the Free Memory Query Program. Prepending COM viruses can easily start their host programs by copying the original programs' content to offset 0x100 and giving it control. -d300 142F:0300 E9 9E 00 0D 46 72 65 65-20 4D 65 6D 6F 72 79 20 ....Free Memory 142F:0310 51 75 65 72 79 20 50 72-6F 67 72 61 6D 2C 20 56 Query Program, V 142F:0320 65 72 73 69 6F 6E 20 34-2E 30 33 0D 0A 53 4D 47 ersion 4.03..SMG 142F:0330 20 53 6F 66 74 77 61 72-65 0D 0A 28 43 29 20 43 Software..(C) C 142F:0340 6F 70 79 72 69 67 68 74-20 31 39 38 36 2C 31 39 opyright 1986,19 142F:0350 38 37 20 53 74 65 76 65-6E 20 4D 2E 20 47 65 6F 87 Steven M. Geo 142F:0360 72 67 69 61 64 65 73 0D-0A 1A 00 00 00 00 00 00 rgiades......... Figure 4.7 illustrates how a prepender virus is inserted at the front of a host program. Figure 4.7. A typical prepender virus. Prepender viruses are often implemented in high-level languages such as C, Pascal, or Delphi. Depending on the actual structure of the executable, the execution of the original program might not be as trivial a task as it is for COM files. This is exactly why a generic solution involves creation of a new temporary file on the disk to hold the content of the original host program. Then a function, such as system(), is used to execute the original program in the temporary file. Such viruses typically pass command-line parameters of the infected host to the host program stored in the temporary file. Thus the functionality of the application will not break because of missing parameters. 4.2.5. Classic Parasitic Viruses A variation of the prepender technique is known as the classic parasitic infection, as shown in Figure 4.8. Such viruses overwrite the top of the host with their own code and save the top of the original host program to the very end of the host, usually virus-size long. The first such virus was Virdem, written by Ralf Burger. In fact, Virdem is one of the first examples of a file virus ever seen; Burger's book did not even contain information about any other kinds of computer viruses but file viruses. Burger distributed his creation at the Chaos Computer Club conference in December 1986. Figure 4.8. A classic parasitic virus. Often when such viruses are repaired, a common problem occurs. In many cases, the repair definition directs to copy N number of bytes to the front of the file by calculating backward from the end of the infected program. Then the file is truncated at FILESIZE-N, where N is typically the size of the virus but the size of the file can change. The most common reason for this is a multiple infection, when the file is infected more than once. In other cases, the file has some extra data appended, such as inoculation information placed there by some other antivirus program. For instance, the Jerusalem virus uses the MsDos marker at the end of the infected file to "recognize" files that are already infected. Some early antivirus programs appended the string to the end of all COM and EXE files to inoculate files from recurrent Jerusalem infections. Although it might sound like a great idea, the extra modification of the files can easily cause trouble for disinfectors. This happens when the inoculated file is already infected with another parasitic virus. When the FILESIZE-N calculation is used, the repair routine will seek to an incorrect location 5 bytes after the top of the original program content. This repair will result in a garbage host program that will crash when executed. This kind of disinfection is often called a half-cooked repair10. Some special parasitic infectors do not save the top of the host to the end of the host program. Instead they use a temporary file to store this information outside of the file, sometimes with hidden attributes. For example, the Hungarian DOS virus, Qpa, uses this technique and saves 333 bytes (the size of the virus) to an extra file. Some members of the infamous W32/Klez family use this technique to store the entire host program in a new file. 4.2.6. Cavity Viruses Cavity viruses (as shown in Figure 4.9) typically do not increase the size of the object they infect. Instead they overwrite a part of the file that can be used to store the virus code safely. Cavity infectors typically overwrite areas of files that contain zeros in binary files. However, other areas also can be overwritten, such as 0xCC- filled blocks that C compilers often use for instruction alignment. Other viruses overwrite areas that contain spaces (0x20). Figure 4.9. A cavity virus injects itself into a cave of the host. The first known virus to use this technique was Lehigh, in 1987. Lehigh was a fairly unsuccessful virus. However, Ken van Wyk created a lot of publicity about the virus and eventually set up the VIRUS-L newsgroup on Usenet to discuss his findings. Cavity infectors are usually slow spreaders on DOS systems. The Bulgarian Darth_Vader viruses, for instance, never caused major outbreaks. This was also due to the fact that Darth_Vader was a slow infector virus. It waited for a program to be written, and only then did it infect the program using a cavity of the host. The W2K/Installer virus (written by virus writers Benny and Darkman) uses the cavity infection technique to infect Win32 PE on Windows 2000 without increasing the file's size. A special kind of cavity virus infection relies on PE programs' relocation sections. Relocations of most executables are not used in normal situations. Modern linker versions can be configured to compile PE executable files without a relocation tableto make them shorter. Relocation cavity viruses overwrite the relocation section when there are relocations in the host. When the relocation section is longer than the virus, the virus does not increase the file size. Such viruses make sure that the relocation section is the last or it has sufficient length. Otherwise, the file gets corrupted easily during infection. For example, the W32/CTX and W95/Vulcano virus families use this technique. 4.2.7. Fractionated Cavity Viruses A few Windows 95 viruses implement the cavity infection technique extremely successfully. The W95/CIH virus implements a variation of cavity infection called the fractionated cavity technique. In this case, the virus code is split between a loader routine and N number of sections that contain section slack space. First the loader (HEAD) routine of the virus locates the snippets of the virus code and reads them into a continuous area of memory, using an offset tablet kept in the HEAD part of the virus code. During infection, the virus locates the section slack gaps of portable executable (PE) files and injects its code into as many section slack holes as necessary. A new viral entry point will be presented in the header of the file to point to the start of the virus code, usually inside the header section of the host applications. Some shorter cavity infectors, such as Murkry, use this area to infect files in a single step. However, CIH is longer and needs to split its code into snippets. Eventually, the virus executes the original host program from the stored entry point (EP). The advantage of the technique is that the virus only needs to "remember" the original EP of the host and simply jump there to execute the loaded program in memory. Figure 4.10 represents the state of the host program before and after the infection of a fractionated cavity virus. The host would normally start at its entry point (EP) defined in its header section. The virus replaces that EP value with VEP, the viral entry point. The VEP points to the loader of the virus snippets. If there is not enough slack space in the file to present the loader in a single snippet, the file cannot get infected. Figure 4.10. A fractionated cavity virus. The section slacks are typically presented in modern file formats such as PE, and they can be easily located using the section header information of such files. One of the special problems of cavity virus repair is that the content of overwritten areas cannot be restored 100%. This happens when the virus overwrites areas of files that usually contain zeros, but in other cases contain some other pattern. Thus the cryptographic checksums of files after repair will be often different from the original program's content. Furthermore, exact identification of such viruses is complicated because the virus snippets need to be pieced together. Detection of the virus code is simple based on the content of the HEAD routine, which must be placed in a single snippet of code. 4.2.8. Compressing Viruses A special virus infection technique uses the approach of compressing the content of the host program. Sometimes this technique is used to hide the host program's size increase after the infection by packing the host program sufficiently with a binary packing algorithm. Compressor viruses are sometimes called "beneficial" because such viruses might compress the infected program to a much shorter size, saving disk space. (Runtime binary packers, such as PKLITE, LZEXE, UPX, or ASPACK, are extremely popular programs. Many of these have been used independently by attackers to pack the content of Trojan horses, viruses, and computer worms to make them obfuscated and shorter.) The DOS virus, Cruncher, was among the first to use the compression technique. Some of the 32-bit Windows viruses that use this technique include W32/HybrisF (a file infector plug-in of the Hybris worm), written by the virus writer, Vecna. Another infamous example is W32/Aldebera, which combines the infection method with polymorphism. Aldebera attempts to compress the host in such a way that the host remains equivalent in size to the original file. This virus was written by the virus writer, B0/S0 (Bozo) of the IKX virus writing group, in 1999. The W32/Redemption virus of the virus writer, Jacky Qwerty, also uses the compression technique to infect 32-bit PE files on Windows systems. Figure 4.11 shows how a compressor virus attacks a file. Figure 4.11. A compressor virus. 4.2.9. Amoeba Infection Technique A rarely seen virus infection technique, Amoeba, embeds the host program inside the virus body. This is done by prepending the head part of the virus to the front of the file and appending the tail part to the very end of the host file. The head has access to the tail and is loaded later. The original host program is reconstructed as a new file on the disk for proper execution afterwards. For example, W32/Sand.12300, written by the virus writer, Alcopaul, uses this technique to infect PE files on Windows systems. Sand is written in Visual Basic. Figure 4.12 shows the host program before and after infection by a virus that uses the Amoeba infection technique. Figure 4.12. The Amoeba infection method. 4.2.10. Embedded Decryptor Technique Some crafty viruses inject their decryptors into the executable's code. The entry point of the host is modified to point to the decryptor code. The location of the decryptor is randomly selected, and the decryptor is split into many parts. The overwritten blocks are stored inside the virus code for proper execution of the host program after infection. When the infected application starts, the decryptor is executed. The decryptor of the virus decrypts the encrypted virus body and gives it control. The Slovakian polymorphic virus, One_Half, used this method to infect DOS COM and EXE files in May 1994. Evidently, the proper infection of EXE files with this technique is a more complicated task. If relocations are applied to parts of the file that are overwritten with pieces of the virus decryptor, the decryptor might get corrupted in memory. This can result in problems in executing host programs properly. Figure 4.13 shows the "Swiss cheese" layout of infected program content. The detection of such viruses made scanning code more complicated. The scanner needed either to detect decryptor blocks split into many parts or to include some more advanced scanning technique, such as code emulation, to resolve detection easily. (These techniques are discussed in Chapter 11.) Figure 4.13. A "Swiss cheese" infection. The easiest way to analyze such virus code is based on the use of special goat (decoy) files filled with a constant pattern, such as 0x41 ("A") characters. After the test infection, the overwritten parts stand out in the infected test program the following way: 142F:0D80 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0D90 41 41 41 41 41 2E FD 16-2E F9 FB 36 E9 77 FD 41 AAAAA......6.w.A 142F:0DA0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0DB0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0DC0 41 41 41 41 41 41 41 41-41 3E 2E BB 88 14 2E F9 AAAAAAAAA>...... 142F:0DD0 EB 9B 41 41 41 41 41 41-41 41 41 41 41 41 41 41 ..AAAAAAAAAAAAAA Note the 0xE9 (JMP) and 0xEB (JMP short) patterns in the previous dump in two pieces of One_Half's decryptor. These are the pointers to the next decryptor block. In the past, several antivirus products would put together the pieces of the decryptor by following these offsets to decrypt the virus quickly and identify it properly. 4.2.11. Embedded Decryptor and Virus Body Technique A more sophisticated infection technique was used by the Bulgarian virus, Commander_Bomber, written by Dark Avenger as one of his last known viruses in late 1993. The virus was named after the string that can be found in the virus body: COMMANDER BOMBER WAS HERE. The Commander_Bomber virus body is split into several parts, which are placed at random positions on the host program, overwriting original content of the host. The head of the virus code starts in the front of the file and gives control to the next piece of the virus code, and so on. These pieces overwrite the host program in a way similar to the One_Half virus. The overwritten parts are stored at the end of the file, and a table is used to describe their locations. Figure 4.14 shows the sophistication of the virus code's location within the host program. Scanners must follow the spiral path of the control flow from block to block until they find the main virus body. Figure 4.14. Commander_Bomber-style infection. The control blocks are polymorphic, generated by the DAME (Dark Avenger Mutation Engine) of the virus. This makes the blocks especially difficult to read because they contain a lot of garbage code with obfuscated ways to give control to the next block, until the nonencrypted virus body is reached. Eventually, the control arrives at the main virus body, which can be practically anywhere in the file, not at its very end. This is a major advantage for such viruses, because scanners need to locate where the main body of the virus is stored. Back in 1993, this technique was extremely sophisticated, and only a few scanners were able to detect such viruses effectively. The host program is reconstructed by the virus in runtime. 4.2.12. Obfuscated Tricky Jump Technique W32/Donut, the first virus to infect .NET executables, was not dependent on JIT compilation as discussed in Chapter 3. This is because first editions of the .NET executable format can be attacked at its entry-point code, which is still architecture-dependent. (In later versions of Windows, this platform-dependent code will be eliminated by moving the functionality to the system loader itself.) Donut gets control immediately upon executing an infected .NET PE file. The virus uses the simplest possible infection technique to infect .NET images. In fact, Donut turns .NET executables to regular-looking PE files. This is because the virus nullifies the data directory entry of the CLR header when it infects a .NET application. The six-byte-long jump to the _CorExeMain() import at the entry point of .NET files is replaced by Donut with a jump to the virus entry point. The _CorExeMain() function is used to fire up the CLR execution of the MSIL code. The entry point in the header is not changed by the virus. This technique is called an obfuscated tricky jump. Evidently, this method can fool some heuristic scanners. The actual jump at the entry point will be replaced with a 0xE9 (JMP) opcode, followed by an offset to the start of the virus body in the first physical byte of the relocation section, as shown in Figure 4.15. Figure 4.15. An applied obfuscated tricky jump technique. The obfuscated tricky jump is a common technique to avoid changing the original entry point of the file. One of the first viruses that used this trick was DOS COM infector, Leapfrog, which followed the jump instruction at the front of the host and inserted its own jump to the actual entry point instead, as Figure 4.15 demonstrates. The first documented Win32 virus, W32/Cabanas11, used this technique as an antiheuristic feature to infect regular PE files on Windows 95 and Windows NT. When activated, W32/Donut displays the following message box shown in Figure 4.16. Figure 4.16. The message box of the W32/Donut virus. Note The virus writer wanted to call this creation ".dotNet," but because this is a platform name, it cannot be the name of the virus. For obvious reasons, viruses are not called "DOS," "Windows," and so on. So I decided to name the virus something that sounds similar to "dotNET," calling it Donut instead. 4.2.13. Entry-Point Obscuring (EPO) Viruses Entry-point obscuring viruses do not change the entry point of the application to infect it; neither do they change the code at the entry point. Instead, they change the program code somewhere in such a way that the virus gets control randomly. 4.2.13.1 Basic EPO Techniques on DOS Several viruses use the EPO strategy on DOS to avoid easy detection with fast scanners that scan the file near its entry-point code. For example, in early 1997 the Olivia virus12 infected DOS EXE and COM files using this method. This technique became increasingly popular among virus writers to defeat heuristics analyzer programs after 1995. Olivia infects COM and EXE files as they are run or renamed or as their attributes are changed. First, the virus clears the attributes of the file, and then it opens the file to analyze its structure. Figure 4.17 demonstrates the simplified look of an EPO virus-infected program. Figure 4.17. A typical encrypted DOS EPO virus. If the victim has a COM extension, Olivia uses a special function that reads four bytes in a loop from the beginning of the victim and checks for E9h (JMP), EBh (JMP short), 90h (NOP), F8h (CLC), F9h (STC), FAh (CLI), FBh (STI), FCh (CLD), and FDh (STD) each time. If one of the previous instructions is found, the virus seeks the place of the next such instruction. If that position is not in the last 64 bytes of the host, the virus modifies the host program at the location where the previous instruction sequence was detected. Olivia uses the 0x68 (Intel 286 PUSH) opcode to push a word value to the stack. This is followed by a 0xC3 (RET) instruction, which gives control to the virus code by popping the pushed offset to the decryptor of the virus. (0x68) PUSH offset DECRYPTOR (0xC3) RET In Figure 4.17, a jump instruction is shown to transfer control to a decryptor located at the end of the file, followed by the encrypted virus body. Other viruses often use a CALL instruction or similar trampoline to transfer control to the start of the virus body. Figure 4.18 shows the happy birthday message displayed by Olivia upon activation. Figure 4.18. The payload of the Olivia virus. 4.2.13.2 Advanced EPO Techniques on DOS The Nexiv_Der virus13 (shown in Figure 4.19) is polymorphic in COM files, and it also infects the disk's boot sector (DBS). The most interesting technique of this virus, however, is the special EPO technique that it uses to infect files. Nexiv_Der was named after a backward string contained in its encrypted body: "Nexiv_Der takes on your files." Figure 4.19. A polymorphic EPO virus. This virus traces the execution of a program as an application debugger does. Then it patches the code at a randomly selected location to a CALL instruction. This CALL instruction points to the polymorphic decryptor of the virus. The execution path through a program depends on many parameters, including command-line arguments passed to the program and DOS version number. Depending on the same parameters, an infected victim program will most likely run the virus code upon normal execution each time. However, the virus might not run at all on a different version of DOS because the virus code cannot take control. This generates a major problem for even sophisticated heuristic scanners that use a virtual machine to simulate the execution of programs because it is difficult to emulate all of the system calls and the execution path of the victim. The major idea of the Nexiv_Der virus is based on its hook of the INT 1 handler (TRACE) under DOS. This handler is the real infection routine. It starts to trace the host program for at least 256 instructions and stops at the maximum of 2,048 iterations. If the last instruction of the trace happens to be an E8h, E9h, or 80C0..CF opcode (CALL, JMP, ADD AL,byte .. OR BH,byte), then Nexiv_Der replaces it with a CALL instruction, which starts the virus at the end of the file. Figure 4.19 shows a high-level look at a Nexiv_Der-infected executable. The main advantage of this technique is the increased likelihood of virus code execution in a similar host-system environment. This technique, however, is too complicated and thus encountered very rarely. 4.2.13.3 EPO Viruses on 16-Bit Windows One of the first EPO viruses in the wild was the Tentacle_II family14 on Windows 3.x systems. This virus does not change the original entry point of the NE header, which is the obvious choice of typical 16-bit Windows viruses. How can it take control, then? The virus takes advantage of the NE file structure. Although the NE on disk structure is more complicated to parse, it provides many more possibilities for an attacker to inject code reliably into the execution flow. Tentacle_II takes advantage of the module reference table of NE files to find common function calls that are expected to be executed among the first function calls made by the host program. Tentacle_II checks for the KERNEL and VBRUN300 module names in the module reference table. The virus picks the module number of the found module name and reads the segment relocation records of every segment. It looks for the relocation record 91 (INITTASK) in the case of KERNEL or 100 (THUNKMAIN) in the case that VBRUN300 has been found previously. Both of these relocation records point to standard initialization code that must be called at the beginning of a Windows application. For example, the original KEYVIEW.EXE (a standard Windows application) has a relocation entry for KERNEL.91 for its first segment as follows: type offset target PTR 0053h USER.1 OFFS 007Eh KERNEL.178 PTR 0073h FAXOPT.12 PTR 00D3h USER.5 PTR 005Bh FAXOPT.44 PTR 00CAh KERNEL.30 PTR 0031h USER.176 PTR 00A0hKERNEL.91 (INITTASK) PTR 008Eh KERNEL.102 Relocations: 9  When KEYVIEW.EXE gets infected, the virus patches this record to point to a new segment, the VIRUS_SEGMENT. Segment relocation records: 1. Segment 0001h relocations type offset target PTR 0053h USER.1 OFFS 007Eh KERNEL.178 PTR 0073h FAXOPT.12 PTR 00D3h USER.5 PTR 005Bh FAXOPT.44 PTR 00CAh KERNEL.30 PTR 0031h USER.176 PTR 00A0h 0003h:002Eh (VIRUS_SEGMENT:2Eh) PTR 008Eh KERNEL.102 Relocations: 9  2. Segment 0003h relocations (VIRUS_SEGMENT) type offset target PTR 2964h SHELL.6 (REGQUERYVALUE) PTR 2968h SHELL.5 (REGSETVALUE) PTR 296Ch KERNEL.91 (INITTASK -> STARTHOST) Relocations: 3  Thus the infected file starts as it would before the infection, but when the application calls one of the preceding initialization functions, control is passed to the address where the virus starts. The VIRUS_SEGMENT has three relocation records. One of these will point to the original initialization procedure KERNEL.91 or VBRUN300. In this way, the virus is able to start the host program after itself. This infection technique is an NE entry pointobscuring infection technique, which makes Tentacle_II an antiheuristic Windows virus. The preceding analysis was made with the help of Borland's TDUMP (Turbo Dump) utility. In the analysis techniques and tools sections, I will give a longer introduction to such tools and their role in virus analysis. The payload of Tentacle_II is shown in Figure 4.20. The virus creates a TENTACLE.GIF file on the disk, which will be displayed each time a GIF image is viewed on the infected system. Figure 4.20. The payload of the Tentacle_II virus. 4.2.13.4 API-Hooking Technique on Win32 On Win32 systems, EPO techniques became highly advanced. The PE file format15 can be attacked in different ways. One of the most common EPO techniques is based on the hooks of an instruction pattern in the program's code section. A typical Win32 application makes a lot of calls to APIs (application program interfaces). Many Win32 EPO viruses take advantage of API CALL points and change these pointers to their own start code. For example, the W32/CTX and W32/Dengue viruses of GriYo locate a CALL instruction in the host program's code section that points to the import directory. In this way, the virus can reliably identify byte patterns that belong to a function call. After that, the CALL instruction is modified in such a way that it will point to the start of the virus code located elsewhere, typically appended to the end of the file. Such viruses typically search for one or both API call implementations: • Microsoft API Implementation CALL DWORD PTR [] • Borland API Implementation JMP DWORD PTR [] This kind of virus also makes its selection for an API hook location totally at random; in some cases, the virus might not even get control each time a host program is executed. Some families of computer viruses make sure that the virus will execute from the file most of the time. Viruses can hook an API that is called whenever the application exits back to the system. In this case, most programs call the ExitProcess() API. By replacing the call to ExitProcess() with the call to the virus body, a virus can trigger its infection routine more reliably whenever the application exits. To make antivirus detection more difficult, viruses often combine EPO techniques with code obfuscation techniques, such as encryption or polymorphism. Figure 4.21 illustrates a Win32 EPO virus that replaces a CALL to ExitProcess() API with a CALL to the virus code. After the virus takes control, it will eventually run the original code (C) by fixing the code in memory and giving control to the fixed block. Figure 4.21. An EPO virus that hooks API calls of the host. Normally disk activity increases whenever the application exits. This happens for several different reasons. For example, if an application has used a lot of virtual memory, the operating system will need to do a lot of paging, which increases disk activity. Thus it is likely that viruses like this remain unnoticed for a long time. 4.2.13.5 Function Call Hooking on Win32 Another common technique of EPO viruses is to locate a function call reliably in the application's code section to a subroutine of the program. Because the pattern of a CALL instruction could be part of another instruction's data, the virus would not be able to identify the instruction boundaries properly by looking for CALL instruction alone. To solve this problem, viruses often check to see whether the CALL instruction points to a pattern that appears to be the start of a typical subroutine call, similar to the following: CALL Foobar Foobar: PUSH EBP ; opcode 0x55 MOV EBP, ESP ; opcode 0x89E5 Figure 4.22 illustrates the replacement of a function call to Foobar() with a call to the start of virus code. The Foobar() function starts with the 0x55 0x89 0xe5 sequence; it is easily identified as a function entry point. A similar opcode sequence is 0x55 0x8B 0xEC, which also translates to the same assembly. This virus technique is used by variants of W32/RainSong (created by the virus writer, Bumblebee). Figure 4.22. Function callhooking EPO virus. Note The Russian virus, Zhengxi, uses a checksum of the preceding patterns, among others, to obfuscate the virus code further. Zhengxi uses the pattern to infect DOS EXE files using the EPO technique. 4.2.13.6 Import Table Replacing on Win32 Newer Win32 viruses infect Win32 executables in such a way that they do not need to modify the original code of the program to take control. Instead, such EPO viruses work somewhat similarly to the 16-bit Windows virus Tentacle_II. To get control, the virus simply changes the import address table entries of the PE host in such a way that each API call of the application via the import address directory will run the virus code instead. In turn, the activated virus code presents a new import table in the memory image of the program. As a result, consequential API CALLs run proper, original entry-point code via the fixed import table. This technique is used by the W32/Idele family of computer viruses, written by the virus writer, Doxtor L, as shown in Figure 4.23. W32/Idele changes the program section slack area of the code section with a small routine that allocates memory and decrypts the virus code into the allocated block and then executes it. Thus Idele avoids creating import entries with addresses that do not point to the code section. Figure 4.23. Import table-replacing EPO virus. 4.2.13.7 Instruction Tracing Technique on Win32 The Nexiv_Der virus inspired modern virus writing on 32-bit Windows systems. In 2003, new viruses started to appear that use EPO, based on the technique that was pioneered on DOS. For example, the W32/Perenast16 family of viruses is capable of tracing host programs before infection by running the host as a hidden debug process using standard Windows debug APIs. 4.2.13.8 Use of "Unknown" Entry Points Another technique to execute virus code in a semi-EPO manner involves code execution via non-well-known entry points of applications. The Win32 PE file format is commonly known to execute applications from the MAIN entry point stored in the PE.OptionalHeader.AddressOfEntryPoint field of the executable's header structure. Thus it is common knowledge that such programs always start wherever this field points. It might come as a surprise that this is not necessarily the first entry point in a PE file that the system loader executes. On Windows NT systems and above, the system loader looks for the thread local storage (TLS) data directory in the PE files header first. If it finds TLS entry points, it executes these first. Only afterward will it run the MAIN entry-point code. The following two message boxes are printed by a TLSDEMO program of Peter Ferrie. The demo was created when he discovered the TLS entry-point trick at Symantec during heuristic analysis research in 2000. When the application is executed, it prints a message box from both the TLS and the MAIN entry points of the applications. First, it prints the message box from the TLS, as shown in Figure 4.24. Figure 4.24. The TLS entry point is executed first. When you click on the OK button, you arrive at the real main entry point, as shown in Figure 4.25. Figure 4.25. The main entry point is executed next. Initially we did not talk about this trick because it could be used to develop even trickier viruses. However, the virus writer, roy g biv, discovered this undocumented trick and has already used it successfully in some of his W32/Chiton17 viruses in 2003. 4.2.13.9 Code IntegrationBased EPO Viruses A very sophisticated virus infection technique is called code integration. A virus using this technique inserts its own code into the execution flow of the host program using standard EPO techniques and merges its code with the host program's code. This is a complicated process that requires complete disassembling and reassembling of the host. Fortunately, it is extremely complicated to develop such viruses. Disassembling the host program is a fairly CPU-intensive operation that requires a lot of memory. Such viruses need to update the host program's content with proper relocations for code and data sections of the host. The W95/Zmist virus, by the Russian virus writer, Zombie, uses this approach. Because of its high sophistication, this technique is detailed in Chapter 7. Figure 4.26 shows a typical layout of a file infected with a sophisticated code-integration EPO virus. Figure 4.26. A poly and metamorphic code-integration virus. Code integration is a major challenge for scanners and computer virus analysts. The entire file must be examined to find the virus. The virus is camouflaged in the code section of the infected host program, and it is very difficult to locate the instruction that transfers control to the start of the virus. In the case of W95/Zmist, the decryptor of the virus code is not in one piece but is split in a manner similar to the One_Half or the Commander_Bomber virus. 4.2.14. Possible Future Infection Techniques: Code Builders After reading the previous sections, you might wonder what could get more complicated and sophisticated than the code-integration EPO technique. This section provides you with an example that has not yet been seen in the most complex implementations of known computer viruses with the kind of sophistication that is unknown in computer viruses. The closest example is the W95/Zmist virus. The Zmist virus makes use of the host program's content in a manner that is similar to the Code Builder technique. Zmist calls into the host program's code to execute an RET (Return) instruction from it. Thus, the virus code flows into the host program's code and back. The author of the virus probably intended to extend this approach to build the entire virus body on the fly, using the content of the host program. Consider the code-builder virus shown in Figure 4.27. Figure 4.27. A code-builder virus. The idea is based on the fact that any program might contain another set of programs in it as instructions or instruction sequences. A virus might be able to analyze the host program's code in such a sophisticated way that these strings of instruction could be used as the virus itself. It might be difficult to find code that would transfer control properly with accurate register state. However, to demonstrate the idea, imagine a simple code-builder virus that would find the letters V, I, R, U, and S in the host program's code. The builder of the virus would copy these pieces together into memory. The builder itself would look like a generic sequence of code, which could be easy to vary based on metamorphic techniques. The builder would be integrated into the code of the host program itself. Fortunately, this is a rather complicated virus, but certainly it would be very challenging to detect it in files. (A few members of the W95/Henky family use an approach similar to this, except that the viruses are not EPO, which simplifies their detection.) 4.3. An In-Depth Look at Win32 Viruses The world of computer antivirus research has changed drastically since Windows 95 appeared on the market18. One reason this happened was that a certain number of DOS viruses became incompatible with Windows 95. In particular, the tricky viruses that used stealth techniques and undocumented DOS features failed to replicate under the new system. Many simple viruses remained compatible with Windows 95, such as Yankee Doodle, a very successful old Bulgarian virus. Regardless of this, virus writers felt that the new challenge was to investigate the new operating system, to create new DOS executable viruses and boot viruses with special attention to Windows 95 compatibility. Because most virus writers did not have enough in-depth knowledge of the internal mechanisms of Windows 95, they looked for shortcuts to enable them to write viruses for the new platform. They quickly found the first one: macro viruses, which are generally not dependent on the operating system or on hardware differences. Some young virus writers are still happy with macro viruses and develop them endlessly. After writing a few successful macro viruses, however, most grow bored and stop developing them. You may think, fortunately, but the truth is otherwise. Virus writers are looking for other challenges, and they usually find new and different ways to infect systems. The first Windows 95 virus, W95/Boza, appeared in the same year that Windows 95 was introduced. Boza was written by a member of the Australian VLAD virus-writing group. It took a long time for other virus writers to understand the workings of the system but, during 1997, new Windows 95 viruses appeared, some of them in the wild. At the end of 1997, the first Win32, Windows NT compatible virus, Cabanas, was written by the same young virus writer (Jacky Qwerty/29A) who wrote the infamous WM/Cap.A virus. Cabanas is compatible with Windows 9x, Windows NT, and Win32s. (It is also compatible with Windows 98 and Windows 2000, even though the virus code was never tested on these systems by the virus writer because these systems appeared later than the actual virus.) Cabanas turned Microsoft's Win32 compatibility dream into a nightmare. Although it used to be difficult to write such viruses, we suspected that file-infecting DOS viruses from the early years of computer viruses would eventually be replaced by Win32 creations. This transition in computer virus writing was completed by 2004. Even macro viruses are now very rare; virus writers currently focus on 32-bit and 64-bit Windows viruses. 4.3.1. The Win32 API and Platforms That Support It In 1995, Windows 95 was introduced by Microsoft as a new major operating system platform. The Windows 95 system is strongly based on Windows 3.x and DOS technologies, but it gives real meaning to the term Win32. What is Win32? Originally, programmers did not even understand the difference between Win32 and Windows NT. Win32 is the name of an API no more, no less. The set of system functions available to be called from a 32-bit Windows application is contained in the Win32 API. The Win32 API is implemented on several platformsone of them being Windows NT, the most important Win32 platform. Besides DOS programs, Windows NT also is capable of executing 16-bit Windows programs, OS/2 1.x character applications (and, with some extensions, even Presentation Managerbased 1.3 programs with some limitations). In addition, Windows NT introduced the new portable executable (PE) file format (format very similar to, if not based on, the UNIX COFF format) that can run Win32 applications (which call functions in the Win32 API set). As the word portable indicates, this format is supposed to be an easily portable file format, which is actually the most common and important one to run on Windows NT. Other platforms are also capable of running Win32 applications. In fact, one of them was shipped before Windows NT. This platform is called Win32s. Anyone who has ever tried to develop software for Win32s knows that it was a very unstable solution. Because Windows NT is a robust system that needs strong hardware on which to run, Win32 technology did not take the market position Microsoft wanted quickly enough. That process ended up with the development of Windows 95, which supported the new PE format by default. Therefore, it supports a special set of Win32 APIs. Windows 95 is a much better implementation of the Win32 APIs than Win32s. However, Windows 95 does not contain the full implementation of the Win32 APIs found in Windows NT. Until Windows NT gained more momentum, Windows 9x was Microsoft's Win32 platform. After Windows NT, Windows 2000 and Windows 98/Me gained popularity and were replaced by Windows XP and the more secure Windows 2003 server editions, which support the .NET extension by default. On the horizon, Microsoft is talking about the next new Windows release, codenamed Longhorn. All of these systems will support a form of Win32 API that, in most cases, provides binary compatibility among all of these systems. Last but not least, the Win32 API and the PE format are supported by Windows CE (Windows Mobile edition), which is used primarily by handheld PCs. The main hardware requirement includes 486 and above Intel and AMD processors for a Windows CE platform. However, current implementations seem to use SH3, ARM, and Intel XScale processors. Now we get to the issue of CPUs. Both Windows NT and Windows CE are capable of running on machines that have different CPUs. The same PE file format is used on the different machines, but the actual executed code contains the compiled binary for the actual processor, and the PE header contains information about the actual processor type needed to execute the image. All of these platforms contain different implementations of Win32 functions. Most functions are available in all implementations. Thus a program can call them regardless of the actual platform on which it is running. Most of the API differences are related to the actual operating system capabilities and available hardware resources. For instance, CreateThread() simply returns NULL when called under Win32s. The Windows CE API set consists of several hundred functions, but it does not support trivial functions such as GetWindowsDirectory() at all because the Windows CE KERNEL is designed to be placed in ROM of the handheld PC. Due to the hardware's severe restrictions (Windows CE must run on machines with 2 or 4MB of RAM without disk storage), Microsoft was forced to create a new operating system that had a smaller footprint than either Windows NT or Windows 95. Although several manifestations of the Win32 API implement some of the Win32 APIs differently or not at all, in general it is feasible to write a single program that will work on any platform that supports Win32 APIs. Virus writers already understand this fact very well. Their first such virus creation attacked Windows 95 specifically, but virus writers slowly improved the infection methods to attack the PE file format in such a way that the actual infected program remains compatible and also executes correctly under Windows NT/2000/XP systems. Most Windows 95 viruses depend on Windows 95 system behavior and functionality, such as features related to VxD (virtual device driver) and VMM (virtual machine manager), but some of them contain only a certain amount of bugs and need only slight fixes to be able to run under more than one Win32 platform, such as Windows 95/Windows NT. Detection and disinfection of such viruses is not a trivial task. In particular, the disinfection can be difficult to implement. This is because, so far, the PE structure is much more complicated than any other executable file format used by DOS or Windows 3.x. However, it is also a fact that the PE format is a much nicer design than, for example, NE. Unfortunately, over the period from 1995 to 2004, virus writers utilized these platforms aggressively, resulting in the appearance of more than 16,000 variants of 32-bit Windows viruses. However, the principles of these viruses have not changed much. In the next section, you will find details about infection techniques of the PE file format from the perspective of an attacker. Note Win64 is almost the same as Win32, but for 64-bit Windows architectures. There are a couple of minor modifications in Win64 to accommodate the platform differences. 4.3.2. Infection Techniques on 32-Bit Windows This section describes the different ways in which a 32-bit Windows virus can infect different kinds of executable programs used by Windows 95/Windows NT. Because the most common file format is the PE format, most of the infection methods are related to that. The PE format makes it possible for viruses to jump easily from one 32-bit Windows platform to another. We shall concentrate on infection techniques that attack this particular format because these viruses have a strong chance of remaining relevant in the future. Early Windows 95 viruses have a VxD part, which is dropped by other infected objects such as DOS, EXE, and COM executables or a PE application. Some of these infection methods are not related to Win32 platforms on the API level. For instance, VxDs are only supported by Windows 9x and Windows 3.x, not by Windows NT. VxDs have their own 32-bit, undocumented, linear executable (LE) file format. It is interesting to note that this format was 32-bit even at the time of 16-bit Windows. Microsoft could not drop the support of VxDs from Windows 95 because of the many third-party drivers developed to handle special hardware components. The LE file format remained undocumented by Microsoft, but there are already several viruses, such as Navrhar, that infect this format correctly. I will describe these infection techniques briefly to explain the evolution of Win32 viruses. 4.3.2.1 Introduction to the Portable Executable File Format In the following section, I will provide an introductory tour of the PE file format that Microsoft designed for use on all its Win32 operating systems (Windows NT, Windows 95, Win32s, and Windows CE). There are several good descriptions of the format on the Microsoft Developer Network CD-ROM, as well as in many other Windows 95 related books, so I'll describe the PE format from the point of view of known virus infection techniques. To understand how Win32 viruses work, you need to understand the PE format. It is that simple. The PE file format will play a key role in all of Microsoft's operating systems for the foreseeable future. It is common knowledge that Windows NT has a VAX VMS and UNIX heritage. As I mentioned earlier, the PE format is very similar to COFF (common object file format), but it is an updated version. It is called portable because the same file format is used under various platforms. The most important thing to know about PE files is that the executable code on disk is very similar to what the module looks like after Windows has loaded it for execution. This makes the system loader's job much simpler. In 16-bit Windows, the loader must spend a long time preparing the code for execution. This is because in 16-bit Windows applications, all the functions that call out to a DLL (dynamic loaded library) must be relocated. Some huge applications can have thousands of relocations for API calls, which have to be patched by the system loader while reading the file in portions and allocating memory for its structures one by one. PE applications do not need relocation for library calls anymore. Instead, a special area of the PE file, the import address table (IAT), is used for that functionality by the system loader. The IAT plays a key role in Win32 viruses, and I shall describe it later in detail. For Win32, all the memory used by the module for code, data, resources, import tables, and export tables is in one continuous range of linear address space. The only thing that an application knows is the address where the loader mapped the executable file into memory. When the base address is known, the various pieces of the module can easily be found by following pointers stored as part of the image. Another idea you should become familiar with is the relative virtual address, or RVA. Many fields in PE files are specified in terms of RVAs. An RVA is simply the offset of an item to where the file is mapped. For instance, the Windows loader might map a PE application into memory starting at address 0x400000 (the most common base address) in the virtual address space. If a certain item of the image starts at address 0x401234, then the item's RVA is 0x1234. Another concept to be familiar with when investigating PE files and the viruses that infect them is the section. A section in a PE file is roughly equivalent to a segment in a 16-bit NE file. Sections contain either code or data (and occasionally a mixture of both). Some sections contain code or data declared by the actual application, whereas other data sections contain important information for the operating system. Before jumping into the important details of the PE file, examine Figure 4.28, which shows the overall structure of a PE file. Figure 4.28. A high-level view of the PE file image. 4.3.2.1.1 The PE Header The first important part of the PE format is the PE header. Just like all the other Microsoft executable file formats, the PE file has a header area with a collection of fields at an easy-to-find location. The PE header describes vital pieces of the portable executable image. It is not at the very beginning of the file; rather, the old DOS stub program is presented there. The DOS stub is just a minimal DOS EXE program that displays an error message (usually "This program cannot be run in DOS mode"). Because this header is presented at the beginning of the file, some DOS viruses can infect PE images correctly at their DOS stub. However, Windows 95 and Windows NT's system loaders execute PE applications correctly as 32-bit images, and the DOS stub program remains as a compatibility issue with 16-bit Windows systems. The loader picks up the PE header's file address from the DOS header lfanew field. The PE header starts with an important magic value of PE\0\0. After that is the image file header structure, followed by the image optional header. From now on, I will describe only the important fields of the PE header that are involved with Windows 9x/Win32 viruses. The fields are in order, but I will concentrate on the most commonly used values so several will be missing from the list. Figure 4.28 shows the high-level structure of a PE file image. The following paragraphs list important fields of the image file header. WORD Machine Indicates the CPU for which this file is intended. Many Windows 9x virus check this field by looking for the Intel i386 magic value before actual infection. However, some bogus viruses do not check the machine type and infect PE files for other platforms and cause such files to crash when the virus code is executed on the wrong platform. There is a certain risk that we will see viruses with multiprocessor support in the future. For example, the same viruses could target ARM as well as IA64 and regular X86 PE files. WORD NumberOfSections The number of sections in the EXE (DLL). This field is used by viruses for many different reasons. For instance, the NumberOfSections field is incremented by viruses that add a new section to the PE image and place the virus body in that section. (When this field is changed by the virus code, the section table is patched at the same time.) Windows NT based systems accept up to 96 sections in a PE file. Windows 95 based system do not inspect the section number. WORD Characteristics The flags with information about the file. Most viruses check these flags to be sure that the executable image is not a DLL but a program. (Some Windows 9x viruses infect KERNEL32.DLL. If so, the field is used to make sure that the executable is a DLL.) This field is not usually changed by viruses. Important fields of the image optional header follow. WORD Magic The optional header starts with a "magic" field. The value of the field is checked by some viruses to make sure that the actual program is a normal executable and not a ROM image or something else. DWORD SizeOfCode This field describes the rounded-up size of all executable sections. Usually viruses do not fix the value when adding a new code section to the host program. However, some future viruses might change this value. DWORD AddressOfEntryPoint The address where the execution of the image begins. This value is an RVA that normally points to the .text (or CODE) section. This is a crucial field for most Windows 9x/Win32 viruses. The field is changed by most of the known virus infection types to point to the actual entry point of the virus code. DWORD ImageBase When the linker creates a PE executable, it assumes that the image will be mapped to a specific memory location. That address is stored in this field. If the image can be loaded to the specified address (currently 0x400000 in Microsoft programs), then the image does not need relocation patches by the loader. This field is used by most viruses before infection to calculate the actual address of certain items, but it is not usually changed. DWORD SectionAlignment When the executable is mapped into memory, each section must start at a virtual address that is a multiple of this value. This field minimum is 0x1000 (4096 bytes), but linkers from Borland use much bigger defaults, such as 0x10000 (64KB). Most Win32 viruses use this field to calculate the correct location for the virus body but do not change the field. DWORD FileAlignment In the PE file, the raw data starts at a multiple of this value. Viruses do not change this value but use it in a similar way to SectionAlignment. DWORD SizeOfImage When the linker creates the image, it calculates the total size of the portions of the image that the loader has to load. This includes the size of the region starting at the image base up through the end of the last section. The end of the last section is rounded up to the nearest multiple of section alignment. Almost every PE infection method uses and changes the SizeOfImage value of the PE header. Not surprisingly, many viruses calculate this field incorrectly, which makes image execution impossible under Windows NT. This is because the Windows 9x's loader does not bother to check this value when executing the image. Usually (and fortunately) virus writers do not test their creations for long, if at all. Most Windows 95 viruses contain this bug. Some antivirus software used to calculate this field incorrectly when disinfecting files. This causes a side effect: A Windows NTcompatible Win32 program will not be executed by Windows NT but only by Windows 9x, even when the application has been disinfected. DWORD Checksum This is a checksum of the file. Most executables contain 0 in this field. All DLLs and drivers, however, must have a checksum. Windows 95's loader simply ignores the checking of this field before loading DLLs, which makes it possible for some Windows 95 viruses to infect KERNEL32.DLL very easily. This field is used by some viruses to represent an infection marker to avoid double infections. Another set of viruses recalculates it to hide an infection even better. 4.3.2.1.2 The Section Table and Commonly Encountered Sections Between the PE header and the raw data for the image's sections lies the section table. The section table contains information about each section of the actual PE image. (See the following dumps that I made with the PEDUMP tool.) Basically, sections are used to separate different functioning modules from each other, such as executable code, data, global data, debug information, relocation, and so on. The section table modification is important for viruses to specify their own code section or to patch an already existing section to fit actual virus code into it. Each section in the image has a section header in the section table. These headers describe the name of each section (.text, ... .reloc) as well as its actual, virtual, and raw data locations and sizes. First-generation viruses, like Boza, patch a new section header into the section table. (Boza adds its own .vlad section, which describes the location and size of the virus section.) Sometimes there is no place for a section header in the file, and the patch cannot take its place easily. Therefore, viruses today (such as W95/Anxiety19 variants) attack the last existing section header and modify its fields to fit the virus code in that section. This makes the virus code section less visible and the infection method less risky. Listing 4.1 is the section table example of CALC.EXE (the Windows Calculator). Listing 4.1. Looking at the Section Table of CALC.EXE with PEDUMP 01 .text VirtSize: 000096B0 VirtAddr: 00001000 raw data offs: 00000400 raw data size: 00009800 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: 60000020 CODE MEM_EXECUTE MEM_READ 02 .bss VirtSize: 0000094C VirtAddr: 0000B000 raw data offs: 00000000 raw data size: 00000000 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: C0000080 UNINITIALIZED_DATA MEM_READ MEM_WRITE 03 .data VirtSize: 00001700 VirtAddr: 0000C000 raw data offs: 00009C00 raw data size: 00001800 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: C0000040 INITIALIZED_DATA MEM_READ MEM_WRITE 04 .idata VirtSize: 00000B64 VirtAddr: 0000E000 raw data offs: 0000B400 raw data size: 00000C00 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: 40000040 INITIALIZED_DATA MEM_READ 05 .rsrc VirtSize: 000015CC VirtAddr: 0000F000 raw data offs: 0000C000 raw data size: 00001600 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: 40000040 INITIALIZED_DATA MEM_READ 06 .reloc VirtSize: 00001040 VirtAddr: 00011000 raw data offs: 0000D600 raw data size: 00001200 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: 42000040 INITIALIZED_DATA MEM_DISCARDABLE MEM_READ  The name of the section can be anything. It could even contain just zeros; the loader does not seem to worry about the name. In general, however, the name field describes the actual functionality of the section. There is a chance for confusion here because the actual code is placed into a .text section of the PE files. This is the traditional name, the same as in the old COFF format. The linker concentrates all the .text section of the various OBJ files to one big .text section and places this in the first position of the section table. As I will describe later, the .text section contains not only code, but an additional jump table for DLL library calls. The Borland linker calls the .text section CODE, which is not a traditional name (but not one beyond normal understanding). Another common section name is .data, where the initialized data goes. The .bss section contains uninitialized static and global variables. The .rsrc contains and stores the resources for the application. The .idata section contains the import tablea very important part of the PE format for viruses. (Note that sections are only used as logical separators in the file image. Because nothing is mandatory, the ".idata" section's content might be merged in any other sectionsor not presented at all.) The .edata section is also very important for viruses because it lists all the APIs that the actual module exports for other executables. The .reloc section stores the base relocation table. Some viruses take special care of relocation entries of the executables; however, this section seems to disappear from most Windows 98 executables from Microsoft. Somehow the .reloc section had an early PE format design problem. The actual program is loaded before its DLLs, and the application is executed in its own virtual address spacethere seems to be no real need for that. Last but not least, there is a common section name, the .debug section, which holds the debug information of the executable (if there is any). This is not important for viruses, although they could take advantage of it for infections. Because the name of the section can be specified by the programmer, some executables contain all kinds of special names by default. Three of the section table header's fields are very important for most viruses: VirtualSize (which holds the virtual size of the section), SizeOfRawData (which holds the size of the section after it has been rounded up to the nearest file alignment), and the Characteristics field. The Characteristics field holds a set of flags that indicate the section's attributes (code, data, readable, writable, executable, and so on). The code section has an executable flag but does not need writable attributes because the data are separated. This is not the same with appended virus code, which must keep its data area somewhere in its code. Therefore viruses must check for and change the Characteristics field of the section in which their code will be presented. All of this indicates that the actual disinfection of a 32-bit virus can be more complicated than that of a normal DOS EXE virus. The infection itself is not trivial in most methods, but so many sources are available on various Internet locations that virus writers have all the necessary support to write new virus variants easily. 4.3.2.1.3 PE File Imports: How Are DLLs Linked to Executables? Most of the Windows 9x and Windows NT viruses are based heavily on the understanding of the import table, which is a very important part of the PE structure. In Win32 environments, DLLs are linked through the PE file's import table to the application that uses them. The import table holds the names of the imported DLLs and also the names of the imported functions from those DLLs. Consider the following examples: ADVAPI32.DLL Ordn Name 285 RegCreateKeyW 279 RegCloseKey KERNEL32.DLL Ordn Name 292 GetProfileStringW 415 LocalSize 254 GetModuleHandleA 52 CreateFileW 278 GetProcAddress 171 GetCommandLineW 659 lstrcatW 126 FindClose 133 FindFirstFileW 470 ReadFile 635 WriteFile 24 CloseHandle 79 DeleteFileW  The executable code is located in the .text section of PE files (or in the CODE section, as the Borland linker calls it). When the application calls a function that is in a DLL, the actual CALL instruction does not call the DLL directly. Instead, it goes first to a jump (JMP DWORD PTR [XXXXXXXX]) instruction somewhere in the executable's .text section (or in the CODE section in the case of Borland linkers). The address that the jump instruction looks up is stored in the .idata section (or sometimes in .text) and is called an entry within the IAT (Import Address Table). The jump instruction transfers control to that address pointed by the IAT entry, which is the intended target address. Thus, the DWORD in the .idata section contains the real address of the function entry point, as shown in the following dump. In Listing 4.2, an application calls FindFirstFileA() in KERNEL32.DLL. Listing 4.2. Function Imports .text (CODE) 0041008E E85A370000 CALL 004137ED ; KERNEL32!FindFirstFileA 004137E7 FF2568004300 JMP [KERNEL32!GetProcAddress] ; 00430068 004137ED FF256C004300 JMP [KERNEL32!FindFirstFileA] ; 0043006C 004137F3 FF2570004300 JMP [KERNEL32!ExitProcess] ; 00430070 004137F9 FF2574004300 JMP [KERNEL32!GetVersion] ; 00430074 .idata (00430000) . 00430068 1E3CF177 ;-> 77F13C1E Entry of KERNEL32!GetProcAddress 0043006C DBC3F077 ;-> 77F0C3DB Entry of KERNEL32!FindFirstFileA 00430070 6995F177 ;-> 77F19569 Entry of KERNEL32!ExitProcess 00430074 9C3CF177 ;-> 77F13C9C Entry of KERNEL32!GetVersion  The calls are implemented in this way to make the loader's job easier and faster. By thunking all calls to a given DLL function through one location, there is no longer the need for the loader to patch every instruction that calls a DLL. All the PE loader has to do is patch the correct addresses into the list of DWORDs in the .idata section for each imported function. The import table is very useful for modern 32-bit Windows viruses. Because the system loader has to patch the addresses of all the APIs that a Win32 program uses by importing, viruses can easily get the address of an API they need to call by looking into the host program's import table. With traditional DOS viruses, this problem does not exist. When a DOS virus wants to access a system service function, it simply calls a particular interrupt with the corresponding function number. The actual address of the interrupt is placed in the interrupt vector table and is picked up automatically during the execution of the program. The interrupt vector table is not saved from the running programs; all applications can read and write into it because there are no privilege levels in DOS. The OS and all applications share the same available memory with equivalent rights. Therefore access to a particular system function does cause problems for a DOS virus. It has access to everything it needs by default, regardless of the infection method used. A Windows 95 virus must call APIs or system services to operate correctly. Most 32-bit applications use the import table, which the linker prepares for them. However, there are a couple of ways to avoid imports. Avoiding imports is often necessary for compatibility reasons. When an application is linked to a DLL, the actual program cannot be executed if the system loader cannot load all the DLLs specified in the import table. Moreover, the system loader checks all the necessary API calls and patches their addresses into the import table. If the loader is unable to locate a particular API by its name or ordinal value, the application cannot be executed. Some applications must overcome this problem. For instance, if a Win32 program wants to list by name all the running processes under both Windows 95 and NT, it must use system DLLs and API calls under Windows 95 that are different from those under Windows NT. In such a case, the application is not linked directly to all the DLLs it wants to access because the program could not be executed on any system. Instead, the LoadLibrary() function is used to load the necessary DLLs, and GetProcAddress() is used to get the API's address. The actual program can access the API address of LoadLibrary() and GetProcAddress() from its import table. This solves the chicken-and-egg problem of how to call an API without knowing its address if an API call is needed. As we will see later, Boza solves the problem by using hard-coded API addresses. Modern Win32 viruses, however, are capable of searching the import table during infection time and saving pointers to the .idata section's important entries. Whenever the application has imports for a particular API, the attached virus will be able to call it. Note One of the important differences in 64-bit and 32-bit PE files is their handling of import and export entries. The IA64 PE files use a PLABEL_DESCRIPTOR structure in place of any IAT entries. (This structure is detailed in Chapter 12.) 4.3.2.1.4 PE File Exports The opposite of importing a function is exporting a function for use by EXEs or other DLLs. A PE file stores information about its exported functions in the .edata section. Consider the following dump, which lists a few exports of KERNEL32.DLL: Entry Pt Ordn Name 000079CA 1 AddAtomA . 0000EE2B 38 CopyFileA . 0000C3DB 131 FindFirstFileA . 00013C1E 279 GetProcAddress  KERNEL32.DLL's export table consists of an Image_Export_directory, which has pointers to three different lists: the function address table, the function name table, and the function ordinal table. Modern Windows 95/NT viruses search for the "GetProcAddress" string in the function name table to be able to retrieve the API function entry-point value. When this value is added to the ImageBase, it gives back the 32-bit address of the API in the DLL. In fact, this is almost the same algorithm that the real GetProcAddress() from KERNEL32.DLL follows internally. This function is one of the most important for Windows 95 viruses that want to be compatible with more than one Win32-based system. When the address of GetProcAddress() is available, the virus can get all the API addresses it wants to use. 4.3.2.2 First-Generation Windows 95 Viruses The first Windows 95 virus, known as W95/Boza.A, was introduced in the VLAD virus writer magazine. Boza's authors obviously wanted to be the first with their creation, and they had to find a Windows 95 beta version very quickly to do so. Pioneer viruses used to be very buggy, and Boza was no exception. Basically, the virus cannot work on more than two Windows 95 versions: a beta release and the final version. Even on those two Windows 95 releases, the virus causes many general protection faults during replication. Infected files are often badly corrupted. Boza is a typical appending virus that infects PE applications. The virus body is placed in a new section called .vlad. First the .vlad section header is patched into the section table as the last entry, and the number of sections field is incremented in the PE header. The body of the virus is appended to the end of the original host program, and the PE header's entry point is modified to point to the new entry point in the virus section. Boza uses hard-coded addresses for all the APIs it has to call. That approach is the easiest, but, fortunately, it is not very successful. The authors of the virus worked on a beta version of Windows 95 first and used addresses hard-coded for that particular implementation of KERNEL32.DLL. Later they noticed that the actual virus did not remain compatible with the final release of Windows 95. This happened because Microsoft did not have to provide the same ordinal values and addresses for all the APIs for every system DLL in all releases. This would be impossible. Different Windows 95 implementationsbetas, language versions, OSR2 releasesdo not share the same API addresses. For instance, the first API call in Boza happens to be GetCurrentDirectoryA(). Figure 4.29 shows that the ordinal values and entry points of GetCurrentDirectoryA are different in the English version of Windows 95 and in the Hungarian OSR2 Windows 95 release of KERNEL32.DLL. Figure 4.29. The ordinal references on two releases of Windows 95.  Entry Pt Ordn A. 00007744 304 GetCurrentDirectoryA (Windows 95 ENG) B. 0000774C 307 GetCurrentDirectoryA (Windows 95 OSR2-HUN)  ImageBase is 0xBFF70000 in both KERNEL32.DLL releases, but the procedure address of GetCurrentDirectoryA() is 0xBFF77744 in the English release and 0xBFF7774C in the Hungarian OSR2 version. When Boza wants to replicate on the Hungarian version of Windows 95, it calls an incorrect address and, obviously, fails to replicate. Therefore, Boza cannot be called a real Windows 95compatible virus. It turns out that Boza is incompatible with most Windows 95 releases. Regardless of these facts, many viruses try to operate with hard-coded API addresses. Most of these Windows 95 viruses cannot become in the wild. Virus writers seem to understand Win32 systems much better already, creating viruses that are compatible not only with all Windows 95 releases but also with Windows 98 and Windows NT versions. 4.3.2.2.1 Header Infection This type of Windows 95 virus inserts itself between the end of the PE header (after the section table) and the beginning of the first section. It modifies the AddressOfEntryPoint field in the PE header to point to the entry point of the virus instead. The first known virus to use this technique was W95/Murkry. The virus code must be very short in Windows 95 header infections. Because sections must start at an offset that is a multiple of the FileAlignment, the maximum available place to overwrite cannot reach much more than the FileAlignment value. When the application contains too many sections and the FileAlignment is 512 bytes, there is no place for the virus code. The AddressOfEntryPoint field is an RVA; however, the virus code is not placed in any of the sections and, therefore, the actual RVA is the real physical offset in the file that the virus must place in the header. It is interesting to note that the entry point does not point into any code section but, regardless of that fact, Windows 95's loader happily executes the infected program. There is a chance that a scanner will fail to detect the second generation of such viruses. This happens when the scanner is only tested on first-generation samples. In first-generation samples, the AddressOfEntryPoint points to a valid section. When the scanner looks for the entry point of the program, it must check all the section headers and whether the AddressOfEntryPoint points to any of them. There is a chance that this function is not implemented to handle those cases in which the entry point does not point to any of the sections. Some scanners may skip the file instead of scanning it from the real entry point, thereby failing to detect the infection in second-generation samples. 4.3.2.2.2 Prepending Viruses The easiest way to infect PE files is to overwrite their beginning. Some DOS viruses infect PE files this way, but none of the known Windows 95 viruses use this infection method. Of course, the application will not work correctly after the infection. Such viruses are discovered almost immediately for this reason, which is why viruses that do not want to handle the complicated file format of PE files use the prepending method. Such viruses are usually written in a high-level language (HLL) such as C or even Delphi. This method consists of prepending the virus code to the PE file. The infected program starts with the EXE header of the virus. When the virus wants to transfer control to the original program code, it has to extract it to a temporary file and execute it from there. Disinfection of such viruses is easy. The original header information is available at the very end of the infected program in a nonencrypted format. Virus writers will recognize that and will encrypt the original header information later on. This will make disinfection more complicated. 4.3.2.3 Appending Viruses That Do Not Add a New Section Header A more advanced appending method is used by the W95/Anxiety virus. Anxiety is very similar to Boza in its infection mechanism, but its code is more related to the somewhat bogus W95/Harry virus. The Anxiety virus does not add a new section header at the end of the section table. Rather, it patches the last section's section header to fit into that section. In this way, the virus can infect all PE EXE files easily. There is no need to worry that the actual section header does not fit into the section table. By modifying the VirtualSize and SizeOfRawData fields, the virus code can be placed at the end of the executable. In this way, the NumberOfSection field of the PE header should not need to be modified. The AddressOfEntryPoint field is changed to point to the virus body, and the SizeOfImage is recalculated to represent the new size of the program. Listing 4.3 is the last section of CALC.EXE before and after the W95/Anxiety.1358 infection. Listing 4.3. The Section Modification of W95/Anxiety.1358 06 .reloc VirtSize: 00001040 VirtAddr: 00011000 raw data offs: 0000D600 raw data size: 00001200 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: 42000040 INITIALIZED_DATA MEM_DISCARDABLE MEM_READ 06 .reloc VirtSize: 00002040 VirtAddr: 00011000 raw data offs: 0000D600 raw data size: 00001640 relocation offs: 00000000 relocations: 00000000 line # offs: 00000000 line #'s: 00000000 characteristics: E0000040 INITIALIZED_DATA MEM_EXECUTE MEM_READ MEM_WRITE  The Characteristics field of the last section header is changed to have writable/executable attributes. The writable characteristic is enough in itself to execute self-modifying code from any section, but many virus writers initially did not realize that. Viruses like W32/Zelly use two or more infection strategies. In basic infection mode Zelly adds two sections to the host program. In advanced infection mode, it merges all sections of the host into a single section, and appends the virus to the end of the image. This integrates the virus body tighter into the host program. 4.3.2.4 Appending Viruses That Do Not Modify the Entry Point Some Windows 95 and Win32 viruses do not modify the AddressOfEntryPoint field of the infected program. The virus appends its code to the PE file, but it gets control in a more sophisticated way. It calculates where the original AddressOfEntryPoint points to and places a JMP instruction there that points to the virus body. Fortunately, it is very difficult to write such viruses. This is because the virus must take care of the relocation entries that point to the overwritten part of the code. The W32/Cabanas virus masks out the relocation entries that point to that area. W95/Marburg does not place a JMP instruction at the entry point if it finds relocations for that area; instead, it modifies the AddressOfEntryPoint field. The JMP instruction should not be the first instruction in the program. W95/Marburg shows this by placing the JMP instruction after a random garbage block of code when no relocations are present in the first 256 bytes of entry-point code. In this way, it is not obvious to scanners and integrity checkers how to figure out the entry point of the virus code. 4.3.2.5 KERNEL32.DLL Infection Most Windows 95 viruses attack the PE format, but some of them also infect DOS COM, EXE programs, VxDs, Word documents, and 16-bit Windows new executables (NE). Others may infect DLLs accidentally because these are linked in PE (or NE) formats, but the infection is not able to spread further because the standard entry point of the DLLs is not called by the system loader. Instead, the DLL's execution normally starts at its specified DLL entry point. KERNEL32.DLL infectors do not attack the entry point. Instead, this type of virus must gain control differently. PE files have many other entry points that are useful for viruses, especially DLLs, which are export APIs (their entry points) by nature. Therefore, the easiest way to attack KERNEL32.DLLs is to patch the export RVA of one of the APIs (for instance, GetFileAttributesA) to point to the virus code at the end of the DLL image. W95/Lorez20 uses this approach. Viruses like this are able to go "resident" easily. The system loads the infected DLL during the system initialization period. After that, every program that has KERNEL32.DLL imports will be attached to this infected DLL. Whenever the application has a call to the API in which the virus code has been attached, the virus code gets control. All the system DLLs contain a precalculated checksum in their PE header, placed there by the linker. Unlike Windows 95, Windows NT recalculates this checksum before it loads the DLL. If the calculated checksum is not the same as in the header of the DLL, the system loader stops with an error message during the blue screen boot-up period. However, this does not mean that such a virus cannot be implemented for Windows NTit just makes implementation a bit more complicated. Although the checksum algorithm is not documented by Microsoft, there are APIs available in IMAGEHLP.DLL for these purposeslike CheckSumMappedFile()which are efficient enough to calculate a new, correct checksum after the actual infection is done. This is not enough, however, for Windows NT's loader. There are several other steps to take, but there is no doubt that virus writers will be able to solve these questions soon. There is a need for virus scanners to check the consistency of a KERNEL32.DLL by recalculating the PE header checksum, especially if the scanner is a Win32 application itself and is attached to an infected KERNEL32.DLL. 4.3.2.6 Companion Infection Companion viruses are not very common. Nevertheless, some virus writers do develop Windows 95 companion viruses. A path companion virus depends on the fact that the operating system always executes files with a COM extension first in preference to an EXE extension, if the names of two files in the same directory differ only in their extensions. These viruses simply look for a PE application with an EXE extension and then copy themselves with the same name into the same directory (or somewhere on the path) with a COM extension, using the host's name. W95/Spawn.4096 uses this technique. This functionality is implemented by using FindFirstFileA(), FindNextFileA() APIs for search, CopyFileA() to copy the virus code, and CreateProcessA() to execute the original host program. 4.3.2.7 Fractionated Cavity Infection I originally predicted this infection technique as one that would possibly be developed in the future. However, the W95/CIH virus had already introduced this technique before my first lecture on Win32 viruses. There is slack space between most sections, which is usually filled with zeros (or 0xCC) by the linker. This is because the sections have to start at the file alignment, as described in the PE header's FileAlignment field. The actual virtual size each section uses is usually different from the raw data representation. Usually, the virtual size is a smaller value. In most cases, Microsoft's Link program generates PE files like that. The difference between the raw data size of the section and the virtual size is the actual alignment area, which is filled by zeros and not loaded when the program is mapped into its own address space. Because the default value of FileAlignment is 512 bytes (usual sector size), the usual slack area size is smaller than 512 bytes. When I first considered this kind of infection method, I thought that no such viruses would be developed because less than 512 bytes is not big enough for an average PE infector virus of that kind. However, two minutes later I had to recognize that this simple problem would not stop virus writers from developing such viruses. The only thing that has to be done by the virus is to split its virus body into several parts and then into as many section alignments as are available. The loader code for these blocks can be very short, first moving each separated code block to an allocated memory area, one by one. This code itself fits into a big enough section alignment area. This is the precise method used by the W95/CIH virus. This makes the job of the scanner and the disinfector much harder. The virus changes the virtual size of the section to be the same as the raw data size in each section header, into which it injects a part of its virus body. The exact identification of such viruses is more difficult than for normal viruses because the virus body must be fetched from different areas of the PE image first. W95/CIH uses the header infection method at the same time and infects Microsoft Linkercreated images without any problem. The fragmented cavity infection technique has a very important advantage from the virus's point of view. The infected file does not get bigger after the infection; its size remains the same. This makes noticing the virus much harder. The identification must be done very carefully because a virus like that may split its body at any offset, which might also separate the actual search string into several parts. This fact shows that it is very important to analyze new Windows 95 viruses with extreme care. Otherwise, the scanner might not find all generations of the same virus code. 4.3.2.8 Modification of the lfanew Field in an Old EXE Header This is the second infection method that I originally intended to describe as one that has not yet been developed. However, as with the fragmented cavity infection method (discussed in the previous section), this technique appeared in a virus during the time I was writing about it. This infection method is one of the simplest to implement and therefore is used in many viruses. The first known virus to use this method was W95/Cerebrus. The method itself works on Windows NT, but there is a trivial bug in the virus that makes this impossible. Basically, this infection method is an appending typethe virus body is attached to the very end of the original program. The important difference is that the virus code itself contains its own PE header. When the virus infects a PE application, it modifies the lfanew field (at 0x3c address) in the old EXE header. As described earlier, the lfanew field holds the file address of the PE header. Because this field points to a new PE header, the program is executed as if it contains only the virus code. The virus functions like a normal Win32 application. It has its own imports and can easily access any APIs it wants to call. When the replication is done, the virus creates a temporary file with a copy of the infected program. In this file, the lfanew field will point correctly to the original PE header. Thus, the original program is functional again when the virus executes the temporary file. 4.3.2.9 VxD-Based Windows 95 Viruses Most Windows 95 viruses are direct-action infectors. Virus writers recognized the importance of fast infection and tried to look for solutions to implement Windows 95 resident viruses. Though not the easiest, the evident solution was to write a VxD virus. One of the first VxD-based viruses was W95/Memorial. It infects DOS, COM, EXE, and PE applications. The virus does not replicate without Windows 95. The infected programs use a dropping mechanism to extract the real virus code a VxD into the root directory of drive C: as CLINT.VXD. When the VxD is loaded, the virus code is executed on ring 0, thus the virus can do anything it wants. VxDs can hook the file system easily, and that is exactly what most VxD viruses want to do. They simply hook the installable file system (IFS) with one simple VxD service routine. After that, the virus can monitor access to all files. The VxD code has to be extracted, and the dropper code needs different implementation for each and every format that the virus wants to infect. This makes the virus code very complicated and relatively big (12,413 bytes). Therefore, it is very unlikely that many viruses like this will be developed in the future. 4.3.2.10 PE Viruses That Operate as VxDs A much easier solution has been introduced by the W95/Harry and W95/Anxiety viruses. These viruses can overcome complications by patching their code into the VMM (virtual machine manager) of Windows 95. When an infected PE program is executed, the virus code takes control. Programs are executed on the application level, which is why they cannot call system-level functions (VxD calls) normally. These viruses bypass the system by installing their code into the VMM, which runs on ring 0. The installation routine of such a virus searches for a big enough hole in the VMM's code area after the 0C0001000h address. If a large enough area, consisting of only 0FFh bytes, is detected, the virus looks for the VMM header at 0x0C000157Fh and checks this area by comparing it to VMM. If this is detected, the virus picks up the Schedule_VM_Event system function's address from the VMM and saves it for later use. Then it copies its code into the VMM by overwriting the previously located hole and changes the original Schedule_VM_Event's address to point to a new function. Finally, it executes the original host program by jumping to the original entry point. This all is possible because Microsoft is unable to protect that area from changes to keep backward compatibility with old Windows 3.x VxDs. The full VMM area is available for read and write access for application-level programs. Before the host program can be executed, the VMM will call Schedule_VM_Event, which is now replaced by the initialization routine of the virus. This code is executed on ring 0 already, which enables it to call VxD functions. Anxiety hooks the IFS by calling IFSMgr_InstallFileSystemApiHook from there. This installs the new hook API of the virus. The virus replication code needs special care. When VxD code is executed, VxD calls are patched by the VMM. The VMM turns the 0CDh, 20h, DWORD function ID (INT 20H, DWORD ID)21 to FAR CALLS. Some of the VxD functions consist of a single instruction. In this case, the VMM patches the six bytes with this single instruction, which fits there. The VMM does this dynamically with all the executed VxDs to speed up their execution. When the virus code is executed, the VxD functions in the virus body are patched by the VMM, and the virus therefore cannot copy this image immediately to files again because the virus code would not work in a different Windows 95 environment. These viruses contain a function that patches all their VxD functions back to their normal format first and only after that replicates the code into the host program. Even if this technique looks very complicated, it is not very difficult for virus writers. W95/Anxiety variants used to be in the wild in many countries. There is no doubt that several viruses will try to overcome the ring 3 to ring 0 problem using similar methods even on Windows NT based systems. W95/CIH uses instructions that are available only from Intel 386 processors and above. It is interesting to note that the interrupt descriptor table is available to write under Windows 95 (because it is part of the VMM). W95/CIH uses the SIDT (store IDT) instruction to get a pointer to the IDT (this technique is detailed in Chapter 6). In this way, the virus can modify the gate descriptor of INT 3 (debug interrupt) in the IDT and allocate memory by using VxD services. The INT 3 routine will be executed as a ring 0 interrupt from its PE virus body. This trick shows how easy it is for virus writers to overcome the ring 3, ring 0 problem. Similar methods will be discovered by Windows 95 virus writers in the near future, resulting in an even simpler method. 4.3.2.11 VxD Infection A few viruses, such as Navrhar, infect Windows Virtual Device Drivers (VxDs). Navrhar also infects Word documents that are in the OLE2 format and some standard system VxDs. The virus does not infect unknown VxDs, but only known system VxDs that are listed in its PE dropper. When an infected Word document is opened, the virus extracts its PE dropper, which is attached to the very end of the document. Therefore, the only way to access this code is to use Win32 APIs, which is why the virus imports KERNEL32.DLL APIs in its macro code. When the dropper's code is extracted from the document, the dropper is executed, checking for the listed VxDs and infecting them one by one. When the system is rebooted, one of the infected VxDs will be loaded by Windows 95. The virus takes control from the infected VxD, hooks the file system, and checks for Word document access. Navrhar illustrates that, unlike DOC files, PE applications are not so commonly exchanged by usersnot to mention VxDs, which are not normally exchanged at all. This is why modern Win32 viruses use some form of worm propagation mechanism instead (see Chapters 9 and 10). 4.3.2.12 DLL Load Insertion Technique This particular infection technique is based on manipulation of PE files in such a way that when the host application is loaded, it will load an extra DLL, which is the virus code. For example, W32/Initx loads a DLL with the name INITX.DAT via a single LoadLibrary() call inserted into the host program. This extra code is inserted into a slack space of the code section of the host, and the entry point of the host is modified to point to the inserted code. On execution of the host program and whenever the INITX.DAT file is available, the virus code is launched before the host program's code. After this, control is given to the original host entry-point code. 4.3.3. Win32 and Win64 Viruses: Designed for Microsoft Windows? Microsoft's strategy is clear. The Designed for Microsoft Windows logo program's important requirement is that every application in your product must be a Microsoft Win32 program compiled with a 32-bit compiler that generates an executable file of the PE format. Not surprisingly, the number of Win32 programs developed by third parties has grown intensively during the last few years. People exchange and download more PE programs. The main reason that Windows 95 and Win32 viruses did not cause big problems for a long time was that virus writers had to learn a lot to "support" the new systems. Young virus writers understand Microsoft's message: "Windows everywhere!" Their answer seems to be "Windows viruses everywhere!" These young guys will not waste their time with DOS viruses anymore but will continuously explore Win32 and Win64 platforms instead. There is no longer any point in attackers' writing DOS viruses. Virus scanners are much weaker in handling Windows viruses generically and heuristicallydetection and disinfection are not that easy. Vendors must learn and understand the new 64-bit file formats and spend a reasonable amount of time researching and designing new scanning technology. Because Windows 95 and Windows NT are more complicated systems, it is natural that the first period of such viruses took more time than DOS viruses. However, the number of Win32 viruses surpassed 10,000 in 2004. It took about 10 years for DOS viruses to reach 10,000 known variants, but only 9 years for Win32 threats. This indicates that, although virus writing slows down as new platforms appear (replacing older ones), eventually the growth ratio of any virus type will be exponential. In the following section, I will describe some important issues that make a Windows 95 virus incompatible with Windows NT. This specifies the differences between the Windows 95 and Win32 prefixes that scanners use to identify 32-bit Windows viruses. 4.3.3.1 Important Windows 95 and NT System Loader Differences Before I understood W32/Cabanas, I had a different picture of Windows NT from the security point of view because I had incorrect conclusions about the level of system security when the first Windows 95 virus, Boza, appeared. Most antivirus researchers immediately performed some tests with Boza on Windows NT. The result looked reassuring: Windows NT did not even try to execute the infected image as shown on Figure 4.30. Figure 4.30. An error message is displayed when executing W95/Boza on Windows NT. What is good for Window 95's loader is not good for Windows NT. Why? I answered this question myself by patching PE files. The PE file format was designed by Microsoft for use by all its Win32 operating systems (Windows NT/2000/XP/2003, Windows 95/98/Me, Win32s, and Windows CE). (Later, the PE file format was extended to PE+ to accommodate the needs of 64-bit platforms.) That is why all the system loaders in Win32 systems have to understand this executable structure. However, the implementation of the loader is different from one system to another. Windows NT's loader simply checks more things in the PE file before it executes the image than Windows 95's loader does. Thus Windows NT finds the Boza-infected file suspicious. This happens because one field in the .vlad section header (which is patched into the section table of the host program) is not precisely calculated by the virus. As a result, correctly calculated sections and section headers can be added to a PE file without any problem. Thus the Windows NT's loader does not have any superior virus detection, as some may assume. If this problem were fixed in Boza, the virus would be capable of starting the host program even on a Windows NT platform. However, the virus would still not be able to replicate. This is because of another incompatibility problem, from which all the initial Windows 95 viruses have suffered. Every Windows 95 virus must overcome a specific problem: It must be able to call two Win32 KERNEL APIs: GetModuleHandle() and GetProcAddress(). Because those APIs are in KERNEL32.DLL, Windows 95 viruses could access those functions from KERNEL32.DLL directly with a hack. Most Windows 95 viruses have hard-coded pointers to GetModuleHandle() and GetProcAddress() KERNEL APIs. By using GetProcAddress(), the virus can access all the APIs it wants to call. (Alternatively, some viruses use LoadLibrary() to get a module handle to KERNEL32.DLL, but this method is less common. This is because most applications already map the KERNEL32 API in their process address space.) When the linker creates an executable, it assumes that the file will be mapped to a specific location in memory. In the PE file header, there is a field called ImageBase holding this address. For executables, this address is usually 0x400000 by default. In the case of Windows 95, the KERNEL32.DLL's ImageBase address is 0xBFF70000. Thus, the address of GetModuleHandle() and GetProcAddress() will be at a certain fixed location in the same release of KERNEL32.DLL. However, this address can be different in a new release, which makes Windows 95 viruses incompatible even with other Windows 95 releases. This ImageBase address is 0x77F00000 in Windows NT as the default. Thus Windows 95 viruses that operate with a Windows 95specific base address cannot work on Windows NT. (Interestingly enough, first-generation exploit code often suffers from similar problems and is only able to work on a single platform.) The third reason for incompatibility is obvious: Windows NT does not support VxDs. Viruses such as Memorial cannot operate on Windows NT because such viruses are VxD-based. They should have included different infection algorithms at the driver level for Windows NT and Windows 95 to operate on both systems, which would make them complicated. If a Windows 95 virus can overcome the preceding incompatibility and implementation problems, it will eventually work on Windows NT/2000/XP/2003 as well. Such viruses might have Unicode support, but it is not mandatory. W32/Cabanas supports all of these features, being able to trespass the OS barrier imposed by early Windows 95 creations. Both Boza and Cabanas are 32-bit Win32 programs. Cabanas infects files under Windows 95/98/Me (and any other localized versions) and under all major Windows NTbased systems releases, such as 3.51, 4.0, 5.0 (Windows 2000), and 5.1 (Windows XP). Boza replicates only under the English Windows 95 release. Therefore, the prefix part of the virus name is Win32 for Cabanas and Win95 for Boza. 4.4. Conclusion This chapter has presented a great deal about computer virus infection techniques in files and other objects. It is important to be familiar with these techniques because they have a great impact on the design of antivirus engines. Even more importantly, they affect the analysis process for both manual and automated methods, which will be demonstrated in Chapter 15. References 1. Adam Petho, ROM BIOS, 1989, ISBN: 963-553-129-X (Paperback). 2. Fridrik Skulason, "AzusaComplicating the Recovery Process," Virus Bulletin, April 1991, p. 23. 3. Jakub Kaminski, "Rainbow: To Envy or to Hate," Virus Bulletin, September 1995, pp. 2-7. 4. Mike Lambert, "Circular Extended Partitions: Round and Round with DOS," Virus Bulletin, September 1995, p. 14. 5. Fridrik Skulason, "Investigation: The Search for Den Zuk," Virus Bulletin, 1991, pp. 6-7. 6. Mikko Hypponen, "Virus Activation Routines," EICAR, 1995, pp.1-11. 7. Fridrik Skulason, "Disk Killer," Virus Bulletin, January 1990, pp.12-13. 8. Jan Hruska, "Virus Writers and Distributors," Virus Bulletin, July 1990, pp. 12-14. 9. Dr. Vesselin Bontchev, private communication, 1996. 10. Peter Morley, personal communication, 1999. 11. Peter Szor, "Coping with Cabanas," Virus Bulletin, November 1997, pp. 10-12. 12. Peter Szor, "Olivia," Virus Bulletin, June 1997, pp. 11-12. 13. Peter Szor, "Nexiv_Der: Tracing the Vixen," Virus Bulletin, April 1996, pp. 11-12. 14. Peter Szor, "Shelling Out," Virus Bulletin, February 1997, pp.6-7. 15. Matt Pietrek, Windows Internals, Addison-Wesley, 1993, ISBN: 0-201-62217-3 (Paperback). 16. Adrian Marinescu, "Russian Doll," Virus Bulletin, August 2003, pp. 7-9. 17. Peter Ferrie, "Unexpected Resutls [sic]," Virus Bulletin, June 2002, pp. 4-5. 18. Peter Szor, "Attacks on Win32," Virus Bulletin Conference, 1998. 19. Peter Szor, "High Anxiety," Virus Bulletin, January 1998, pp. 7-8. 20. Peter Szor, "Breaking the Lorez," Virus Bulletin, October 1998, pp. 11-13. 21. Andrew Schulman, Unauthorized Windows 95, IDG Books, 1994, ISBN: 1-568-84305-4. Chapter 5. Classification of In-Memory Strategies "Little by little, one travels far." J.R.R. Tolkien In this chapter, you will learn about common memory residency strategies that computer viruses use to infect other objects on a system or across systems. Depending on the in-memory residency strategy alone, some viruses can become much more virulent than others. 5.1. Direct-Action Viruses Some of the simpler computer viruses do not actively manifest themselves in computer memory. The very first file infector viruses on the IBM PC, such as Virdem and Vienna, belong to this category. Usually direct-action viruses do not spread fast and do not easily become in the wild. Direct-action viruses load with the host program into computer memory. Upon getting control, they look for new objects to infect by searching for new files. This is exactly why one of the most common kinds of computer virus is the direct-action infector. This kind of virus can be crafted with relative ease by the attacker on a variety of platforms, in binary or in script languages. Direct-action viruses typically use a FindFirst, FindNext sequence to look for a set of victim applications to attack. Typically such viruses only infect a couple of files upon execution, but some viruses infect everything at once by enumerating all directories for victims. In other cases, direct-action viruses simply copy themselves between the diskettes and the hard disk without waiting for the user to copy an infected file to the diskette. This technique, however, makes them much more likely to be noticed by a user because the extra diskette activity is a noisy operation. Depending on the location of the actual host, the virus might become luckier in network environments. On the network, the virus might enumerate network shares or simply attack files, assuming that writeable network resources are available in the A: to Z: range. In this way, direct-action viruses can be extremely slow infectorsunless they appear in a networked environment. Thousands of virus construction kitgenerated computer viruses use the direct-action method on DOS. An example of this kind of virus is VCL.428, created by the Virus Construction Laboratory. 5.2. Memory-Resident Viruses A much more efficient class of computer viruses remains in memory after the initialization of virus code. Such viruses typically follow these steps: 1. The virus gets control of the system. 2. It allocates a block of memory for its own code. 3. It relocates its code to the allocated block of memory. 4. It activates itself in the allocated memory block. 5. It hooks the execution of the code flow to itself. 6. It infects new files and/or system areas. This is the most typical pattern, but several other methods exist that do not require all of the preceding steps. On single-tasking operating systems such as DOS, only a single-user application can run at any one time; any other program code needs to make itself TSR (Terminate-and-Stay-Resident). DOS offers a variety of services in the form of interrupts to develop TSR code. A typical example of a TSR program on DOS is a clock application to display the time on the screen during the execution of any single program. Because all applications share a single "thread" of execution, any program can easily interfere with any other in more than one way. Indeed, even the code of DOS, some system data structures, device drivers, or interfaces can accidentally be changed by a buggy user application, which can lead to catastrophic system crashes and corruptions. Here is an anecdote about that. The first version of the Borland Quattro spreadsheet program for DOS was developed in 100% Assembly in Hungary. An interesting situation occurred during the development of the project. Sometimes during the execution of a loop, the control flow took the opposite direction than expected. The code did not explain why that would happen. It turned out that a loaded clock program on the system occasionally flipped the control flow because it modified the direction flag but in some cases forgot to set it back afterward. As a result, the not-intentionally-malicious clock program could easily do harm to the contents of spreadsheets and other programs. Of course, the bogus clock program was a TSR (Terminate-and-Stay-Resident). The point is that DOS applications are not separated or walled up from each other in any way. Malicious code can take advantage of this kind of system very easily. On standard DOS, the processor is used in a single mode, and therefore any program has the privilege to modify any other program's code in the physical memory, which is addressable up to 1MB long (with some computers capable of accessing an extra high memory area of 64KB above that). 5.2.1. Interrupt Handling and Hooking DOS programs use DOS and BIOS interrupts for system services. In the past, on microcomputers, programmers typically transferred control to a BIOS-based entry point, so programmers needed to keep such entry points in mind. The interrupt vector table (IVT) simplifies the programmer's task on the IBM PC for several reasons. Using the IVT, programs can refer to functions by their interrupt number and service number. As a result, hard-coded addresses to services need not be compiled into the program code. Instead, the INT x instruction can be used to transfer control to a service via the IVT. Figure 5.1 illustrates how a typical boot virus such as Brain installs itself into the execution flow by hooking the BIOS disk handler. Figure 5.1. A typical boot virus hooks INT 13h. Boot viruses typically hook the INT 13h BIOS disk interrupt handler and start to monitor its functions, wait for diskette access for read and write, and during such operations write their code (or part of it) into the boot sector of the diskettes. On DOS the IVT is placed at the beginning of physical memory at 0:0. The table holds the segment and offset values of each interrupt, so each entry in the table occupies four bytes. Thus INT 21h's vector can be found at 0:84h in memory. Table 5.1 shows common interrupts and their typical use by computer viruses. Table 5.1. Typical Interrupts Used by Computer Viruses INT IDFunction CategoryOffset in IVTIntercepted/Used by Virus Code INT 00Divide Error CPU Generated0:[0]Anti-Debugging, Anti-Emulation INT 01Single Step CPU Generated0:[4]Anti-Debugging, Tunneling, EPO INT 03Breakpoint CPU Generated0:[0Ch]Anti-Debugging, Tracing INT 04Overflow CPU Generated0:[10h]Anti-Debugging, Anti-Emulation (caused by an INTO instruction) INT 05Print Screen BIOS0:[14h]Activation routine, Anti-Debugging INT 06Invalid Opcode CPU Generated0:[18h]Anti-Debugging, Anti-Emulation INT 08System Timer CPU Generated0:[20h]Activation routine, Anti-Debugging INT 09Keyboard BIOS0:[24h]Anti-Debugging, Password stealing, Ctrl+Alt+Del handling INT 0DhIRQ 5 HD Disk (XT) Hardware0:[34h]Hardware level Stealth on XT INT 10hVideo BIOS0:[40h]Activation routine INT 12hGet Memory Size BIOS0:[48h]RAM size check INT 13hDisk BIOS0:[4Ch]Infection, Activation routine, Stealth INT 19hBootstrap Loader BIOS0:[64h]Fake rebooting INT 1AhTime BIOS0:[68h]Activation routine INT 1ChSystem Timer Tick BIOS0:[70h]Activation routine INT 20hTerminate Program DOS Kernel0:[80h]Infect on Exit, Terminate Parent INT 21hDOS Service DOS Kernel0:[84h]Infection, Stealth, Activation routine INT 23hControl-Break Handler DOS Kernel0:[8Ch]Anti-Debug, Non-Interrupted Infection INT 24hCritical Error Handler DOS Kernel0:[90h]Avoid DOS errors during Infections (usually hooked temporarily) INT 25hDOS Absolute Disk Read (DOS Kernel)0:[94h]Disk Infection, Stealth (Gets to INT 13 however) INT 26hDOS Absolute Disk Write (DOS Kernel)0:[98h]Disk Infection, Stealth (Gets to INT 13 however) INT 27hTerminate-and-Stay Resident (DOS Kernel)0:[9Ch]Remain in memory INT 28hDOS IDLE Interrupt DOS Kernel0:[A0h]To perform TSR action while DOS program waits for user input INT 2AhNetwork Redirector DOS Kernel0:[A8h]To infect files without hooking INT 21 INT 2FhMultiplex Interrupt Multiple use0:[BCh]Infect HMA memory, Access Disk Structures INT 40hDiskette Handler BIOS0:[100h]Anti-Behavior Blocker INT 76hIRQ 14 HD Operation Hardware0:[1D8h]Hardware Level Stealth on AT and above The x86 family of processors has the capability to store 256 different interrupts in the IVT. Information about the preceding interrupts (and many others) is available in the Ralf Brown Interrupt List, which offers 3,000 pages of further details. Initially, the available information about interrupts was minimal, but The Interrupt List became an essential guide for DOS virus researchers over the years and has increased understanding of undocumented interrupts. The resident virus technique clearly has a major advantage over direct-action viruses. Resident viruses can easily infect new objects "on the fly" whenever you access them on your system. Furthermore, such viruses also can hide themselves easily using stealth techniques. In eastern European countries, such as Bulgaria, such information was very hard to get in the pre-Internet days. In fact, programmers in such countries typically reverse-engineered DOS to figure out such details. Not surprisingly, many advanced Bulgarian viruses used such functions as DOS 2+ internal service calls such as "Get List of Lists" (INT 21hAH=52h), for tunneling, leaving wet-behind-the-ears virus researchers to wonder what the virus actually did with them. 5.2.2. Hook Routines on INT 13h (Boot Viruses) An interrupt is typically used with a set of registers that define subfunction identifiers and pointers to data structures. For instance, the INT 13h takes its subfunctions in AH. To read the disk, someone must set the following registers: • AH = 2 • AL = Number of sectors to read • CH = Cylinder • CL = Sector • DH = Head number • DL = Drive number • ES:BX = Pointer to allocated data buffer The memory must be allocated first, and the disk needs to be reset beforehand. In fact, the diskettes are usually slow and do not spin quickly enough, requiring a couple of extra reads, with disk resets in between. It is nice to know that the hard disk numbers start at 80h (with bit 7 set). When the interrupt is executed, the return address is pushed to the stack. When the called interrupt handler (or chained handlers) returns with an IRET instruction, it will use the return address from the stack. The interrupt handler also can return with an RETF instruction. Boot viruses are naturally curious about the AH values passed into INT 13h. By presenting a new interrupt handler in the IVT, the virus code can easily monitor this value with a set of compare (CMP) instructions and take action according to the value. Typically, boot viruses first save the original INT 13h handler upon execution: MOV AX,[004C] ; Offset of INT 13h MOV [7C09],AX ; Save it for later use MOV AX,[004E] ; Segment of INT 13h MOV [7C0B],AX ; Save it for later use And boot viruses generally allocate memory just below the top 640KB boundary by manipulating the BIOS DATA area at segment 40h and by changing the 40h:13h (0:[413h]) word value that holds the top available memory. When this value is changed, no memory allocation will be possible for any programs above a newly set limit, usually one or a couple of KBs less than the previous value. Next, the virus copies its code to the "allocated" block and hooks the INT 13h handler. It is interesting to note that viruses such as Stoned hook INT 13h before they relocate their code to the memory, which is set as the new handler. Obviously, the virus expects that no other disk reads can take place during boot time so that the code will not crash. Hooking a handler is therefore as simple as setting new values in the IVT. MOV [004C],AX ; Set new INT 13h Offset in IVT MOV [004E],ES ; Set new INT 13h Segment in IVT The new handler of Stoned is shown in Listing 5.1. Listing 5.1. A New Handler Installed by Stoned PUSH DS ; Save DS to stack PUSH AX ; Save AX to stack CMP AH,02 ; Disk Read? JB Exit ; Jump to Exit if Below CMP AH,04 ; Disk Verify? JNB Exit ; Jump to Exit if Not Read/Write OR DL,DL ; Diskette A: ? JNZ Exit ; Jump to Exit if Not XOR AX,AX ; Set AX=0 MOV DS,AX ; Set DS=0 MOV AL,[043F] ; Read Diskette Motor Status TEST AL,01 ; Is motor on in Drive A:? JNZ Exit ; Jump to Exit if Not CALL Infect ; Attempt infection Exit: POP AX ; Restore AX from top of stack POP DS ; Restore DS from top of stack CS:JMP FAR [0009] ; Jump to Previously Saved Handler Obviously, it would be unethical to illustrate the virus with more code, but the previous code should give you a good idea of hooking in general. It also shows computer virus research from the perspective of code analysis. In the past, we typically commented code on printed paper, line by line. Eventually, the prints became far too long, and analysis of code appeared to be a 100-meter tournament, so to speak. Fortunately, great tools such as IDA (the Interactive Disassembler) came to the rescue (which will be discussed in Chapter 15, "Malicious Code Analysis Techniques"). 5.2.3. Hook Routines on INT 21h (File Viruses) File viruses typically hook INT 21h on DOS and it is commonly done using the INT 21h sub-functions 35h and 25h, Get and Set Interrupt vectors, respectively. Not all viruses, however, need to change INT 21h's vector in the IVT itself. An example of a virus that does not change the INT 21h vector is Frodo, written in Israel in 1989. Frodo does not hook the INT 21h vector using normal methods. Instead, the virus modifies the real entry point of INT 21h by placing a jump instruction to the entry point of the handler to its own handler. Apparently, Frodo is among the first few full-stealth file viruses on MS-DOS. (The Dark Avenger virus, Number_Of_The_Beast1, used full file stealth techniques a few months earlier than Frodo, but Frodo made the technique famous.) By intercepting INT 21h subfunctions, Frodo can hide file changes from DOS programs, even when they read from the file. The virus is sophisticated enough to show the original file content instead. Let's look at the INT 21h vector on a Frodo-infected DOS system, using DEBUG. C:\>DEBUG (We enter to DEBUG.) We dump the INT 21h vector, which holds the value 19:40EB (segment:offset in memory). Even to a trained eye, this value is not suspicious at all and looks normal. This is because memory is typically filled from the lower segments toward the higher ones, and so "segment 19" might be in DOS itself, or even before it, pointing to a low memory segment. -d 0:84 l4 0000:0080 EB 40 19 00 .@.. Next, we take a look at the handler with the unassembly command from the address we found in the IVT (see Listing 5.2). Listing 5.2. The Jump (JMP) Instruction to Frodo's Hook Routine -u19:40eb 0019:40EB EAD502209E JMP 9E20:02D5 ; Jump to VIRSEG:02d5 0019:40F0 D280FC33 ROL BYTE PTR [BX+SI+33FC],CL 0019:40F4 7218 JB 410E 0019:40F6 74A2 JZ 409A 0019:40F8 80FC64 CMP AH,64 0019:40FB 7711 JA 410E 0019:40FD 74B5 JZ 40B4 0019:40FF 80FC51 CMP AH,51 0019:4102 74A4 JZ 40A8 The preceding code seems strange. Although we can see usual CMP (compare) instructions, the jump instruction at the entry point takes the control flow to 9E20:02D5. This code is patched there by the virus itself to take control in a sophisticated manner. Finally, we can take a look at a fraction of the entry-point code of Frodo. Another unassembly command reveals the virus code in memory, as shown in Listing 5.3. Listing 5.3. The Hook Routine Entry of Frodo -u9e20:02d5 9E20:02D5 55 PUSH BP ; Save BP 9E20:02D6 8BEC MOV BP,SP ; Set BP to SP : : 9E20:02F3 53 PUSH BX ; Save BX 9E20:02F4 BB9002 MOV BX,0290 ; Set BX to Function Table 9E20:02F7 2E CS: 9E20:02F8 3A27 CMP AH,[BX] ; Is this one hooked? 9E20:02FA 7509 JNZ 0305 ; Check all entries 9E20:02FC 2E CS: ; Found a match 9E20:02FD 8B5F01 MOV BX,[BX+01] ; offset of hook 9E20:0300 875EEC XCHG BX,[BP-14] ; Set the address to ; "return" to 9E20:0303 FC CLD 9E20:0304 C3 RET ; Run the hook routine 9E20:0305 83C303 ADD BX,+03 ; Get Next Entry 9E20:0308 81FBCC02 CMP BX,02CC ; Are we at the end? 9E20:030C 72E9 JB 02F7 ; If not, compare Frodo is very tricky, however. Instead of using a simple switch statement using compare (CMP) instructions, the virus uses a table at offset 290 of the virus segment. The virus transfers control to the subfunctions of INT 21h according to the table. The bold characters in the next table are DOS subfunctions, followed by the offset where the subhandlers are located. Let's dump the memory from virus segment:290. -d9e20:290 9E20:0290 30 7C 07 23 4E 04 37 8B-0E 4B 8B 05 3C D5 04 3D 0|.#N.7..K..<..= 9E20:02A0 11 05 3E 55 05 0F 9B 03-14 CD 03 21 C1 03 27 BF ..>U.......!..'. 9E20:02B0 03 11 59 03 12 59 03 4E-9F 04 4F 9F 04 3F A5 0A ..Y..Y.N..O..?.. 9E20:02C0 40 8A 0B 42 90 0A 57 41-0A 48 34 0E 3D 00 4B 75 @..B..WA.H4.=.Ku Note The preceding functions are listed with their descriptions in the "Full-Stealth Viruses" section. File viruses typically infect on interception of INT 21h, AH=4Bh (EXEC). This event is among the easiest to work with from the virus's point of view: A filename is presented on a silver platter because it is passed for the function as a parameter. The most successful viruses use this trick to replicate, but many of them also infect during file open and close events. This can help the spread of the virus dramatically. For instance, a virus scanner will open all objects for scanningthis is intercepted by the virus, and the virus can infect the scanned files immediately, saying thanks for the help to the antivirus scanner. (Modern antivirus solutions, such as F-PROT, check the file size by two different means to reduce the likelihood of this kind of attack. F-PROT uses the standard "get file size" function and also seeks to the end of the file to obtain the position of the seek pointer. Then F-PROT compares the results of these two methods. If they do not match, F-PROT assumes that a stealth virus is in control. However, a full stealth strategy can be effective against even this tricky solutionunless the virus is detected in memory and its hook routine gets deactivated2. Table 5.2 shows some of the early, in-the-wild viruses with their common interrupt hook distributions and infection characteristics. Table 5.2. Common Interrupt Hook Distributions in Early Computer Viruses Virus NameInfection CharacteristicsHooked Interrupts BrainDBR, StealthINT 13h StonedDBR, MBRINT 13h CascadeCOM, EncryptedINT 1Ch, INT 21h FrodoCOM, EXE, StealthINT 1, INT 23h, INT 21h TequilaMultipartite: EXE, MBR, Oligomorphic, StealthINT 13h, INT 1Ch, INT 21h Yankee_DoodleCOM, EXEINT 1, INT 1Ch, INT 21h 5.2.4. Common Memory Installation Techniques Under DOS In this section you will learn about the most common techniques computer viruses use to install themselves in memory of a DOS machine. Because DOS does not have any memory protection, viruses can easily manipulate any area of memory. Fortunately on a DOS system, viruses cannot effectively hide themselves in memory. This is because physical memory is continuous, and short pieces of virus code can easily be found. This is why memory stealth viruses are unknown on DOS, although a number of techniques exist that attempt to install virus code in unusual places of memory to confuse antivirus products that look for virus patterns only on the code path of certain interrupt handlers or certain areas of memory (up to 640KB, but not beyond that limit). • The easiest way to install a virus in memory is to not take care of memory allocations at all. This rare technique is used by the virus called Stupid. This virus simply installs itself below the 640KB memory limit, but it does not reduce the top memory field kept at 0:[413]. Thus the virus hopes that this area of memory will never be allocated for any program. Indeed, the virus would crash if some program used the same memory area. Occasionally this technique is improved by copying the virus code to the end of memory but not letting DOS allocate memory blocks above the start of virus code in memory. This avoids the unwanted overwrite effect that might occur. • A common method involves finding some sort of hole in the memory that is already allocated but rarely used. Such hole exists at a couple of locations in the DOS memory. For instance, the second half of the IVT (above 0:200h) is rarely used, so a short piece of virus code can install itself into this "hole" of the IVT. Obviously, such a virus is incompatible with DOS extensions and network shells that occupy interrupt vectors above 0:200h so that whenever such a shell is installed, the virus crashes. Other viruses, such as Darth_Vader (written by V.T. in Bulgaria), install themselves into the DOS kernel itself in a small hole of memory. A couple of other holes like this exist, and viruses that use them might not be able to spread if these places are occupied by something else. • Sometimes, but not often, DOS viruses use TSR (Terminate-and-Stay-Resident) functions, such as INT 27h, to allocate memory for the virus code with normal procedures. The Jerusalem virus uses this method. • One of the most common techniques was introduced by the boot viruses, such as Brain. The virus gets the top memory field of the BIOS data area by reading the word value at 0:[413] in memory and then decrements this value by a couple of kilobytes, reducing the 640KB limit to 639KB, 638KB, and so on. In this way, the top of the memory becomes a perfect place for the virus. Such viruses are very easy to spot in memory by checking for interrupt vectors that point to a high segment in memory. Boot viruses typically use this method. Occasionally, INT 12h is used to get the value of the top memory, and then the BIOS data area is manipulated to reduce the top memory to a smaller value. • One special technique is to manipulate with the MCB (memory control block) chain of DOS. Such viruses usually extend or shrink memory blocks to attach themselves to a particular application's memory allocations in a parasitic manner. Other viruses simply allocate a new MCB and set the owner of the MCB to COMMAND.COM, the command interpreter of DOS. Cascade viruses use this technique to confuse memory map tools that can show associations of applications with allocated memory blocks. Some boot viruses, such as Filler, also hook INT 21h to intercept when COMMAND.COM is loaded and manipulate COMMAND.COM's MCB to make space for the virus. • Some early DOS viruses, such as Lehigh, allocate memory for themselves in the DOS stack area. • A tricky technique was introduced in the Starship viruses. These viruses install their main part above the 640KB and 1MB (UMB: upper memory block) limit of DOS. They take advantage of unused areas of the UMB memory, such as a part of the video memory that is not associated with the visible screen. An additional example of a virus that installs itself into UMB is Tremor, written in 1992. • Advanced viruses can allocate virus code into the High Memory Area (HMA) that is available when the HIMEM.SYS device is loaded. This memory area is above the 1MB boundary and is 64KB long. The GoldBug virus is an example that uses the HMA on 286 and above computers. GoldBug was written in the U.S. in 1994 by Q the Misanthrope. Very few viruses install themselves to the memory regions, such as XMS (Extended Memory Specification), but some viruses dofor example, one variant of the Ginger family. It was written in 1995 by roy g biv and RT Fishel. An unusual memory allocation technique is used in the Reboot Panel (INT 2Fh, AX=4A06h) to force DOS to build the Memory Control Block (MCB) around the code; viruses written by 'Q the Misanthrope' claim to use this technique3. 5.2.4.1 Self-Detection Techniques in Memory A common technique of self-recognition in memory is based on the use of "Are you there?" calls. Boot viruses typically do not use this technique because they only load once during the booting of the system. However, other viruses that infect files need to hook the system only once, so the virus hooks an interrupt or file system and returns specific output for special input registers. The newly executed copies of the virus can check if a previous copy is installed by calling this routine. The memory resident copy answers the call, "Yes, I am here. Do not bother to install again." Table 5.3 contains some examples of this from DOS systems. Table 5.3. "Are You There?" Call Examples in Early Computer Viruses Virus Name"Are you there?"Call Return Values JerusalemINT 21h AH=E0hAX=0300h FlipINT 21h AX=FE01hAX=01Feh SundayINT 21h AH=FFhAX=0400h InvaderINT 21h AX=4243hAX=5678h NomenklaturaINT 21h AX=4BAAhCarry Flag is Cleared On other systems such as Windows, viruses often use ram semaphores, such as a global mutex, that they set during the first time the virus is loaded. This way, the newly loaded copies can simply quit when they are executed. Windows 95 viruses that hook the file system in kernel mode often have similar installation checks to DOS viruses. In some cases, viruses hook I/O port access and return values on these virtual I/O ports. The W95/SK virus got its name from such an I/O port routine. The virus hooks access to I/O port 0x534B (SK) and returns 0x21 (!) when this I/O port is read. Other viruses might examine the content of the memory at a specific location, check for the existence of a filename that is created as a flag, and so on. Early antivirus products used these calls to detect viruses in memory. Specific monitor programs were also written to simulate "are you there calls" of viruses. Such solutions tricked viruses into believing that their malicious code was already installed in memory; thus the viruses never loaded again actively on the system. Such methods, however, are not general enough to be useful; only virus variant specific antivirus tools might use them. 5.2.5. Stealth Viruses Stealth viruses always intercept a single function or set of functions in such a way that the caller of the function will receive manipulated data from the hook routine of the virus. Therefore, computer virus researchers only call a virus "stealth" if the virus is active in memory and manipulates the data being returned. Virus writers always attempt to challenge users, virus researchers, and virus scanners. Some techniques, such as antiheuristics and antiemulation, were only invented by virus writers when scanners started to get stronger; however, stealth viruses appeared very quickly. In fact, one of the first-known viruses on the PC, Brain (a boot virus), was already stealth. Brain showed the original boot sector whenever an infected sector was accessed and the virus was active in memory, hooking the disk interrupt handling. This was in the golden days when Alan Solomon (author of one of the most widely used virus-scanning engines) was challenged to figure out what exactly was going on in Brain-infected systems. The stealth technique also quickly appeared in DOS file infector viruses. This method was a sure way for a virus to go unnoticed for a relatively long period of time. In fact, in the DOS days, users would remember sizes of system files in an attempt to apply their own integrity checking. By knowing the original size of a file such as COMMAND.COM, the command interpreter was halfway to success in finding an on-going infection. According to how difficult it was to find a virus in files and what kind of method was used, virus researchers started to describe the techniques differently. The following sections depict the most common stealth techniques: semistealth, read stealth, full stealth, cluster and sector-level stealth, and hardware-level stealth. 5.2.5.1 Semistealth (Directory Stealth) We call a virus semistealth if it hides the change of file size but the changed content of the infected objects remains visible via regular file access. The first known semistealth virus, called Eddie-2, was written in Bulgarian virus factories4. The semistealth technique requires the following basic attack strategy: 1. Virus code is installed somewhere in memory. 2. The virus intercepts file functions such as FindFirstFile or FindNextFile using FCB. 3. It infects files of a constant size (usually). 4. It marks infected files with a flag. 5. When an already infected file is intercepted, the virus reduces the file size in the returned data. Because such viruses need to determine quickly if a file is already infected, the easiest approach is to set a special marker on the file date/time stamp. One of the most popular methods was first seen in the Vienna viruses (although this virus was a direct-action type and thus did not use the trick in conjunction with stealth). Vienna sets the seconds field of the infected file's time/date stamp to an "impossible" value of 30 (which means "60 seconds") or 31 (which means "62 s0econds"). This is because the MS-DOS time/date stamp is stored as a 32-bit value. The lower 5 bits (0-4) of the time/date stamp store the seconds in "compressed" form. The real seconds are divided by 2. Thus, a stored 2 translates to 4 seconds, and 29 translates to 58. However, 5 bits is enough to claim "60" and "62" seconds, which viruses could use as an infection marker. Because FindFirstFile and FindNextFile return this information in a data structure, the infection marker is readily available when the hook routine of a stealth virus calls the original function to get proper data about the file. So there is not much overhead to figure if the file is already infected, which is an advantage for the attacker. The data structure is manipulated for the file size, and the false data with reduced file size is returned. Semistealth is not a very common technique on modern operating systems, such as 32-bit Windows. Nevertheless, the first documented Win32 virus, W32/Cabanas, used semistealth (or so-called directory stealth). 5.2.5.1.1 VxDCallINT21_Dispatch Handler This technique was introduced by W95/HPS5. W95/HPS monitors 714Eh, 714Fh LFN (long file name) FindFirst/FindNext functions, which is mandatory under Windows 95 from the virus's point of view. The actual implementation of the stealth handler is unique. The virus patches the return address of FindFirst/FindNext functions on the fly on the stack to its own handler. This handler checks that the actual program size is divisible by 101 without a remainder, and if so, the virus opens the program with an extended open LFN function and then reads the virus size from the last four bytes of the infected program and subtracts this value as a 32-bit variable from the original return value of FindFirst/FindNext on the stack. Finally, it returns to the caller of the function. In this way, the virus can hide the file size differences from most applications while the size of the virus body should not be a constant value. 5.2.5.1.2 Hook on Import Address Table (IAT) This method was introduced by W32/Cabanas and is likely to be reused in new Win32 viruses. The same technique can work under most major Win32 platforms by using the same algorithm. The hook function is based on the manipulation of the IAT. Because the host program holds the addresses of all imported APIs in its .idata section, all the virus has to do is replace those addresses to point to its own API handlers. First, Cabanas searches the IAT for all the possible function names it wants to hook: GetProcAddress, GetFileAttributesA, GetFileAttributesW, MoveFileExA, MoveFileExW, _lopen, CopyFileA, CopyFileW, OpenFile, MoveFileA, MoveFileW, CreateProcessA, CreateProcessW, CreateFileA, CreateFileW, FindClose, FindFirstFileA, FindFirstFileW, FindNextFileA, FindNextFileW, SetFileAttrA, or SetFileAttrW. Whenever it finds one, it saves the original address to its own jump table and replaces the .idata section's DWORDs (which holds the original address of the API) with a pointer to its own API handlers. Consider dump, shown in Listing 5.4, to illustrate hooked GetProcAddress(), FindFirstFileA() functions. Listing 5.4. Hooking the IAT .text (CODE) 0041008E E85A370000 CALL 004137ED 004137E7 FF2568004300 JMP <strong>[00430068]</strong> 004137ED FF256C004300 JMP <strong>[0043006C]</strong> 004137F3 FF2570004300 JMP [KERNEL32!ExitProcess] 004137F9 FF2574004300 JMP [KERNEL32!GetVersion] .idata (00430000) <strong>00430068</strong> 830DFA77 ;-> 77FA0D83 Entry of new GetProcAddress <strong>0043006C</strong> A10DFA77 ;-> 77FA0DA1 Entry of new FindFirstFileA 00430070 6995F177 ;-> 77F19569 Entry of KERNEL32!ExitProcess 00430074 9C3CF177 ;-> 77F13C9C Entry of KERNEL32!GetVersion NewJMPTable: 77FA0D83 B81E3CF177 MOV EAX,KERNEL32!GetProcAddress ; Original 77FA0D88 E961F6FFFF JMP 77FA03EE ;-> New handler . . 77FA0DA1 B8DBC3F077 MOV EAX,KERNEL32!FindFirstFileA ; Original 77FA0DA6 E9F3F6FFFF JMP 77FA049E ;-> New handler GetProcAddress is used by many Win32 applications to make dynamical, instead of import address table-based ("static") calls. When the host application calls GetProcAddress, the new handler of the virus first calls the original GetProcAddress to get the address of the requested API. Afterward, it checks whether the function is a KERNEL32 API and whether it is one of the APIs that the virus needs to hook. If the virus wants to hook the API, it returns a new API address that will point into the hook table (NewJMPTable). Thus the host application will also get an address to the new handler. W32/Cabanas is a directory stealth virus: during FindFirstFileA, FindFirstFileW, FindNextFileA, and FindNextFileW, the virus checks for already-infected programs. If a program is not infected, the virus will infect it; otherwise, it hides the file size difference by returning the original size of the host program. Because the cmd.exe (Command Interpreter of Windows NT) uses the preceding APIs during the DIR command, every uninfected file will be infected (if the cmd.exe was infected previously by W32/Cabanas). 5.2.5.2 Read Stealth The read stealth technique is an attack strategy that is a bit more advanced. Read stealth shows the original content of an infected object using content simulation, usually by intercepting seek and/or read functions only. In fact, the first stealth viruses, such as Brain, use the read stealth technique. The virus simply intercepts any access to the first sector of diskettes. When the first sector is accessed and it is not infected, the virus infects it and stores the original sector elsewhere on the diskette. When an application attempts to read the infected DBR, the virus reads the originally stored DBR sector and returns that to the caller. As a result, programs accessing "the boot sector" believe that the fake sector is the true one. See Figure 5.2 for an illustration. Figure 5.2. A read stealth computer virus. Evidently, read stealth on the diskette is one of the simplest stealth methods. Virus writers have also implemented read stealth in file infectors on DOS. The virus does not need to do muchjust intercept read and seek access in files, returning the simulated content of the file instead. For example, a prepender virus can easily intercept the open request to any infected file. Whenever any application attempts to read the content of the infected file, the virus can easily seek the position where the original file header starts. The caller application will read the host application's content without any hesitation. 5.2.5.3 Read Stealth on Windows You might wonder what happened to read stealth viruses on Windows systems. Do you happen to know any Windows users who remember the size of any Windows application? Who would pay attention to that in these days, when typical applications are so huge that they hardly fit on a diskette? This appears to be the primary reason why there have been only a few attempts so far to develop stealth viruses on 32-bit Windows systems. The first read stealth virus on Windows 9x, W95/Sma6, was discovered in June of 2002, about seven years after the discovery of the first 32-bit virus on Windows. Evidently, development of stealth techniques on Windows did not mature as quickly as on DOS for several reasons. When I initially attempted to replicate the Sma virus on my test system, it was only a few minutes before I figured that I was "in the Matrix" of the virus. The "Matrix" had me, so seemingly I could not replicate it. First I believed that I had replicated the virus. I knew that because the size of my goat files changed on the hard disk of my replication machine. I then copied the infected files to a diskette to move them over to my virus research machine. Surprisingly enough, I copied clean files. I repeated the procedure twice before I started to suspect that something was just not right with W95/Sma. I used my Windows Commander tool on the infected machine to look into the file. Sure enough, there was nothing new in the file. In fact, the file was bigger, but nothing seemed to be appended to it. Then I accessed the file on the diskette one more time. Suddenly, the size of the file changed on the diskette also. I quickly inserted the diskette into my virus research system and saw that W95/Sma was in there. Gotcha! The virus attempts to set the second field of the infected PE files to 4 to hide its size in such specially marked, infected files. However, there is a minor bug in the virus: It clears the bit that it wants to detect before it compares, so it will always fail to hide the size change. Infected files will appear 4KB longer, but the file seemingly does not have anything more than zeros appended to it. Whenever an infected PE file is opened that is marked infected, the virus virtualizes the file content. In fact, it hides the changes so well that it is very difficult to see any changes at all. The virus assumes zeros for all the places where unknown data has been placed in the executable, such as the places where the decryptor of the virus would be stored in PE section slack areas. Otherwise, original content is returned for all previously modified fields of PE headers and section headers. Evidently, if the bug were not in the code, the virus would be totally hidden from the eye. Is it? Yes and no. The virus code remains hidden from regular file _open() _read() functions. Consequently, when someone copies an infected file via such functions, the copy will first be "cleaned" from the virus. W95/Sma, however, does not hook memory mapping at all. This means that a sequence of memory mapping APIs can reveal the infected file content, so the virus can easily be detected via these routines! This is definitely good news. It is unfortunate that most antivirus software is written using regular C functions for reasons of portability. Such functions, however, are all monitored properly by the virus, and as a result, some of the on-demand scanners can easily miss such infections in files. Even more interesting is the payload of W95/Sma. The virus listens on a UDP port, and whatever datagram it receives will execute it in kernel mode. This allows the attacker to do practically anything he wants to do on the systemfor example, to burn the FLASH BIOS remotely. The next section offers information about full-stealth techniques on DOS. 5.2.5.4 Full-Stealth Viruses Resident, file infector viruses usually hook several DOS functions. (Table 5.4 lists Frodo's hook table.) Frodo hooks several functions that return the size of the file or the content of the file. The virus increments the infected files' date stamp by 100 years, which can be easily accessed later as a virus marker. Table 5.4. The Function Hook Table of the Full-Stealth Frodo Virus Sub Function in AHFunction Description 30hGet DOS version 23hGet File Size for FCB (File Control Block) 37hGet/Set AVAILDEV flag 4BhExec Load or Execute Program 3ChCreate or Truncate File 3DhOpen Existing File 3EhClose Existing File 0FhOpen File using FCB 14hSequential Read from FCB 21hRead Random Record from FCB file 27hRandom Block Read from FCB file 11hFind First Matching File using FCB 12hFind Next Matching File using FCB 4EhFind First Matching File 4FhFind Next Matching File 3FhRead from File 40hWrite to File 42hSeek to File Position 57hGet File Time/Date stamp 48hAllocate Memory The DOS DIR command only shows the year field in the directory, such as 1/09/89, so you can easily miss it if the file date stamp of an executable is "in the future" at, say, 2089. Frodo can easily manipulate the data that are returned to the caller based on the detection of the extra bit of information. Whenever an application such as a virus scanner or an integrity checker tries to check the size of the file or its content, false data are returned based on the virus marker. The virus will decrement the file size of each file that has a date stamp greater than or equal to the year 2044 by 4096 (the size of the virus). Evidently, this trick only works correctly before 2008. This is because the virus adds 100 years to the date, and the DOS date runs out at 2107, and thus the date wraps around, and Frodo starts to fail. Interestingly, the virus starts to fail even more after 2044 because it can no longer distinguish between infected and clean files anymore. (Jokingly, I can say, that over time, even viruses show signs of getting old, though, of course, this is not a real world concern of the author of the virus.) Otherwise, all files are believed to be infected, and their size is incorrectly reduced by 4,096 bytes, Frodo's file infection size. 5.2.5.5 Cluster and Sector -Level File Stealth The Bulgarian Number_of_the_Beast virus uses a remarkably advanced stealth technique. The virus infects files, but it hides the changes in them by hooking INT 13h (the BIOS disk handler). It infects the fronts of the files using the classic parasitic technique and will not infect a file if the last cluster occupied by the host does not have at least 512 bytes of free space. The idea is simple. It is based on the fact that most DOS disks will be formatted with cluster sizes of 2,048 bytes. As a result, this will be the minimum size occupied by a file, even if it is only a couple of bytes long. This means that a cluster, as well as a sector slack space (usually less than 512 bytes), exists in which no content is saved by the system. Number_of_the_Beast uses this space to store the overwritten part of the host program. Even when the virus is not active in memory, the size of the file remains the same as it was before infection because the file size is displayed according to the directory entries. The virus can easily manipulate the content of files when they are accessed by simply presenting the original content at the front of the file. The virus uses many tricks and undocumented interfaces, making it rather DOS version dependent. Viruses less than 512 bytes long evidently could place themselves into individual sector slacks. Thus it is crucial to overwrite the virus-occupied disk areas with some pattern, such as zeros, when disinfecting the disk. Otherwise, antivirus programs can produce an in-memory ghost positive. A ghost positive is a special case when a computer virus code is detected in partial or complete form, but the virus code is not active. This can happen if a virus less than 512 bytes long infects a file in its sector slack, and the disinfector program restores the original file content but does not overwrite the virus body in the slack space. Because the BIOS will read the file according to its size in sectors, the virus code will be loaded in memory in an inactive form. This can trigger the attention of a memory scanner that looks for patterns of virus code in the system memory. Figure 5.3 shows how a 1,536-byte program occupies disk space with 2,048 byte cluster size. The virus "recycles" the slack space to store original program content in it. After the infection, the DIR command will show 1,536 bytes as the file size, even when the virus is not active in memory. Thus the virus remains directory stealth, but the changes in the file are visible to antivirus programs and integrity checkers at that point. Figure 5.3. A cluster-level stealth virus. 5.2.5.6 Hardware-Level Stealth Finally, we arrive at a highly sophisticated technique, the hardware-level stealth, that was used in the Russian boot virus Strange7, written in 1993. Strange hooks INT 0Dh, which corresponds to IRQ 5. On an XT system, upon completion of the INT 13 call, the disk buffer is filled with data, and INT 0Dh is generated by the hardware. The Strange virus hooks INT 0Dh and is thereby able to intercept each disk read after the buffer has been filled with the data. In the INT 0Dh routine, the virus uses port 6 to get the pointer to the disk buffer and then checks whether the content of the buffer is infected. If Strange detects itself in the buffer, the virus overwrites the buffer with the original, clean sector content and completes INT 0Dh. In this way, the application that accessed the virus-infected sector will not be able to see any differences, even if the product manages to call into the BIOS INT 13h handler in an attempt to get around common virus handlers. On AT systems, the virus cannot use INT 0Dh but uses INT 76h instead. In hooking INT 76h, it can monitor which sector is accessed using ports 1F3h1F6h. If the particular sector in which the virus code is stored is accessed, the virus rewrites the port contents with the descriptor of the original sector. This is because INT 76h is generated before the completion of INT 13h interrupt. Unbeknownst to the application that accesses the sector, its read access is forwarded on the fly to the clean, original sector. Figure 5.4 shows the three steps that hardware viruses use to manipulate INT 76h. Figure 5.4. A hardware-level stealth virus. Note The INT 76h trick does not work if Windows 3.0+ is running. 5.2.6. Disk Cache and System Buffer Infection A very interesting attack strategy involves infection of files in the operating system's buffers or file system cache organized by the cache manager. Such viruses are both anti-behavior blocking and antiheuristic. Behavior blockers typically get invoked in system events, such as opening an existing executable file to write and blocking the event to prevent virus infections. This is based on the idea that viruses write to files to replicate to them, so it seems logical that blocking write events to existing binaries would reduce the likelihood of virus infections. The Darth_Vader virus was the first to inject its code into the DOS kernel itself in such a sophisticated way; the virus does not need to allocate extra memory for itself because it uses a memory cave of the DOS kernel instead. It then modifies the DOS kernel to get control from the operating system itself without ever modifying the interrupt vector table. In this way, the virus can monitor the system buffer for executable content by checking whenever a new file gets fetched into the system buffer. The virus is able to modify the file in the system buffer itself! Cache and system buffer infections usually are implemented as a cavity infection type. The virus does not need to worry about extending the file to a larger size, which would introduce complications. This infection strategy was also implemented on Windows 9x systems in the W95/Repus family of viruses by the virus writer who calls himself Super. The Repus virus uses a unique trick to jump to kernel mode where it is able to call a VxD function to query the content of each file system's cache. If the virus finds a PE file in any of the caches, it writes its code into the header of the cached file and marks the page "dirty." If the file is copied from one location to another, the virus will inject its code into the cached buffers. Thus, the virus does not need to access any executable's content on the disk itself for write. It can simply wait until you copy a file to a new location; the "copy" of the file will be infected. (See the overly simplified drawing in Figure 5.5.) Figure 5.5. A disk cache infector virus. 5.3. Temporary Memory-Resident Viruses A slightly more exotic type of computer virus is not always resident in the computer's memory. Instead, the virus remains in memory for a short period of time or until a particular event occurs. Such an event might be triggered after a certain number of successful infections. For example, the Bulgarian virus, Anthrax, uses this method. Anthrax infects the MBR and installs itself in memory during the booting of the infected PC. The virus remains in memory until it successfully infects one EXE file. At that point, the virus removes itself from memory4 and becomes a direct-action virus that will only infect another file when an infected EXE file is executed. Such viruses tend to be much less successful at becoming in the wild. First of all, direct-action viruses are much easier to spot because they increase the disk activity considerably, though this problem could be mitigated by the attacker. Permanent resident viruses, however, are usually more infectious and spread much more rapidly than temporary memory resident viruses. Nevertheless, there are a few successful viruses, such as the Hungarian DOS virus, Monxla, that use a similar technique to infect files. Monxla monitors the INT 20h (return to DOS) interrupt. The virus remains active in memory with the host and intercepts when the host returns to DOS. The virus quickly infects all COM files in the current directory. In this way, the virus might be able to spread to new systems, successfully avoiding user attention because the increased disk activity is more typical when you execute new programs or exit from them. 5.4. Swapping Viruses Another exotic technique in computer virus writing relies on loading a small piece of virus code actively into memory all the time. This small piece of code might be a hook event. Whenever the hook event is triggered, the virus loads a segment of viral code from the disk and infects a new object. After that, the virus again clears the loaded segment from memory. Although it appears that there are certain advantages to this technique, such as the fact that the virus consumes less physical memory and can keep its code encrypted in files most of the time, there are also many disadvantagesfor instance, the possibility of introducing heavily increased disk activity that makes it much easier to spot the attack. 5.5. Viruses in Processes (in User Mode) On modern, multitasking operating systems, viruses need to use slightly different strategies. The virus does not have to become "resident" in the traditional sense. It is usually enough if the virus runs itself as a part of the process. Memory space is divided according to security rings associated with the mode of the processors. Most modern operating systems, such as Windows NTbased systems, separate regular applications, which use user mode, from those that use kernel mode, such as the OS, drivers, and relevant security data structuresfor better security and system stability. For this reason, applications normally do not interfere with the system kernel, as DOS programs do. An attacker has several options: • The virus loads with the infected process, gets control using one of the techniques listed in this chapter, creates a thread (or a set of threads) in the running process itself in user mode, and infects files using regular direct-action techniques. • Alternatively, the virus loads before the original host program; it does not create any threads but infects files before the execution of its host. Usually the host is created as a temporary file on the disk and executed in its own process by passing command-line parameters of the original program. This is a very primitive but fairly common approach. • The virus also can run as its own process in user mode. • Furthermore, the virus can use the Service Control Manager to load as a service process. • Per-process resident viruses also hook APIs in the user-mode process and are able to replicate whenever the host process executes the hooked API. • The virus uses a DLL injection technique. The easiest approach is to load a DLL via the modification of one of many Registry keys. When the host is executed, the viral DLL is loaded into the host process. User-mode rootkits usually combine this technique with per-process API hooking. • Some hybrid viruses also load to kernel mode and hook operations of the system there but execute their infection routines in user mode, with user mode APIs. Some of these techniques will be detailed further in correlation with memory-scanning techniques in Chapter 12, "Memory Scanning and Disinfection." 5.6. Viruses in Kernel Mode (Windows 9x/Me) Quite a few viruses can hook the file system on Windows 9x and Me. The first such viruses introduced a VxD, a 9x-specific kernel-mode driver using functions such as IFSMgr_InstallFileSystemApiHook() API8. Virus writers, however, realized that it was completely unnecessary to use VxDs because regular PE files on Windows 9x systems could call into kernel-mode functions via such tricks as a call gate mechanism. W95/CIH is an infamous example of this type of virus, which takes advantage of kernel-mode access to ports to damage the system hardware (by overwriting the content of FLASH BIOS). 5.7. Viruses in Kernel Mode (Windows NT/2000/XP) Infis was the first memory-resident parasitic kernel-mode driver virus under Windows NT environments. One variant of the virus operates under Windows NT only; another variant supports Windows 2000. The virus stays in memory as a kernel-mode driver and hooks the main NT service interrupt (INT 2Eh), so it can replicate on the fly when files are opened. This method is a nonstandard way to hook the file operations and is therefore not 100% successful, but unfortunately it turns out to be a good enough solution for a virus. The installation routine copies the virus to the system and registers it in the Registry. The virus is attached to the end of the infected files with its own PE header and extracts itself as a standalone driver called INF.SYS into the %SystemRoot%\system32\drivers directory. The virus installs the proper Registry key to allow itself to load on next system startup: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\inf Type = 1 Start = 2 ErrorControl = 1  Each driver needs this entry to introduce itself to the Service Control Manager. The driver will be loaded each time the system starts and, just like any similar driver installation, it needs a reboot under Windows NT and 2000. When the virus driver takes control, it allocates memory from the nonpaged pool and reads its complete copy from its file image (INF.SYS) for further use in its infection routine. Finally, the virus hooks INT 2Eh by patching the interrupt descriptor table (IDT) (see Figure 5.6). Because the virus is running in kernel mode, it enjoys the most powerful rights on the system. Figure 5.6. A kernel-mode virus that operates on Windows NT-based systems. Normally, INT 2Eh points to the KiSystemService() function. However, when the virus hooks INT 2Eh, it will take control before KiSystemService() can transfer control to the corresponding NTOS kernel function using the system service table. A Win32 application normally calls an API from the Win32 subsystem. The subsystem translates the documented API calls to undocumented ones that have been exported from NTDLL.DLL, called the native API. This DLL is mapped in user mode, but for most functions it switches to kernel mode by using the INT 2Eh service interrupt with a function ID in the EAX register (under IA32 platforms). Ultimately, each file open function will eventually hit the INT 2Eh handler, KiSystemService(), which is hooked by Infis in a manner similar to a regular DOS TSR virus. The INT 2Eh hook of Infis intercepts the file open function only, checks the file name and extension, and then opens the file. The hook function of Infis checks if the host programs' file format is PE and attempts to infect. So how can you check the system for an Infis infection? It is possible to see the name of the loaded drivers by checking the driver list. In that way, you can find the name of the virus, but such viruses could hide themselves from the driver list using stealth as well. Windows 2000/XP places the driver list under Computer Management. First, turn on the Display Administrative Tools option for the taskbar. Then click Computer Management and select Device Manager. The View has to be changed to Show Hidden Devices. The inf driver should appear on the list. Check the dialog box shown in Figure 5.7, which was captured on an Infis-infected system. Figure 5.7. The properties of the inf driver of the Infis virus. Infis has many limitations. Obviously the author of the virus did not understand the various NT contexts, so Infis lacks system privileges for most file access because it does not schedule a kernel-mode infection thread that would run in the system context, bypassing the privilege level of the user-mode thread that requested a file to be opened. As a result, unless the user-mode requestor (such as Windows Explorer) runs with appropriate rights (such as Administrator) to be able to write the file object, Infis cannot infect the file in question. Furthermore, each service ID is created by a macro when Microsoft compiles the kernel, and so the service ID can be different from one release of Windows to another. Because Infis uses hard-coded service IDs, it will not be compatible with all Windows releases, but, of course, there are reliable methods to determine the service IDs on any Windows NTbased systems. Future kernel-mode viruses will most likely support this kind of mechanism. 5.8. In-Memory Injectors over Networks High profile computer worms such as CodeRed and Slammer inject their code into a vulnerable process connected to the network stack. The virus does not need to manifest itself as an object stored on the disk; instead, it can travel to the new system as a set of network packets. These highly important techniques, as well as other Win32 memory attacks, are detailed further in Chapter 12. References 1. Jim Bates, "666The Number of The Beast," Virus Bulletin, May 1990, pp. 13-15. 2. Dr. Alan Solomon, "Mechanism of Stealth," Computer Virus and Security Conference, 1992, pp. 232-238. 3. Peter Ferrie, private communication, 2004. 4. Fridrik Skulason (with contribution of Dr.Vesselin Bontchev ), "The Bulgarian Computer Viruses, The Virus Factory," Virus Bulletin, June 1990, pp. 6-9. 5. Peter Szor, "HPS," Virus Bulletin, June 1998, pp. 11-13. 6. Peter Szor, "Stealth Survival," Virus Bulletin, July 2002, pp.10-11. 7. Eugene Kaspersky, "Strange," http://viruslist.com/eng/viruslist.html?id=2836. 8. Paul Ducklin, "Not the Virus Writer's Guide to Windows 95," Virus Bulletin Conference, 1996, pp. IX-XXIV Chapter 6. Basic Self-Protection Strategies "There is every reason to believe that, as software technology evolves over the next century or so, there will be plenty of important and interesting new problems that must be solved in this field." Dr. Steve R. White, IBM Thomas J. Watson Research Center This chapter illustrates the most common self-protection techniques that viruses use to survive as long as possibleeven on systems that are protected with a generic solution, such as a computer virusmonitoring program. Developing defense systems against computer viruses without giving consideration to these techniques is a sure path to failure. 6.1. Tunneling Viruses Memory-resident viruses often use a tunneling technique to get around behavior blocker systems1. Resident tunneling viruses attempt to be the first on a call chain of interrupts, installing themselves in front of other resident applications, to call interrupts directly at the entry point of their original handlers. In this way, control gets to the virus first, and the virus proceeds to execute the original handler to bypass antivirus monitoring programs. Obviously nonresident viruses can also use this technique to look for the original handler and call that directly, but most tunneling viruses are memory resident. In the following sections, you will learn about some of the most common tunneling methods. 6.1.1. Memory Scanning for Original Handler On a DOS PC, it is possible to scan the entire physical memory for interrupt handler addresses, allowing the virus to keep a short piece of instruction sequence in its body to search for the original entry point of interrupts, such as INT 21h or INT 13h. Indeed, even the code of the BIOS is available for read access. After the virus has obtained the address of INT 21h, it can hook the interrupt by placing a jump instruction onto the front of the INT 21h routine, as the Frodo virus does. Calling the original handler bypasses the interrupt change and might be incompatible with installed software. For example, if there is a disk-encryption system installed, such viruses might be able to bypass the encryption driver and cause a crash. This is a general problem with tunneling, but keep in mind that computer viruses, unlike antiviruses, do not need to be perfect. The Eddie virus (also known as Dark_Avenger.1800.A) was among the first that I have analyzed that use this technique to detect the entry point of the INT 13h handler for the MFM hard disk controller. Virus writers often used this technique and even created engine plug-ins to help less-experienced virus writers create new viruses of this kind. 6.1.2. Tracing with Debug Interfaces The Bulgarian Yankee_Doodle virus was among the first viruses to use INT 1 to trace for original interrupt handlers to implement a tunneling technique. The idea is to hook INT 1, turn the trace flag of the processor ON, and run a harmless interrupt call. INT 1 will be called each time an instruction is executed, and the virus can trace the code path until it arrives at the particular handler, such as INT 21h. Then the virus saves the address of the handler and is ready to use it or hook it at that location, bypassing any installed behavior blockers. Of course, the virus needs to take care of many problems during the tracing. Several instructions can affect the trace flag and prevent the tracing of code. The virus controls the execution and looks ahead in the execution path to avoid such situations. 6.1.3. Code EmulationBased Tunneling An obvious, safe alternative to the preceding method is to use a code emulator that mimics the processor well enough to trace the execution path to the desired function entry point without using the debug interfaces at all. This technique was first published in the infamous Australian magazine, VLAD, as a general-use tunneling engine. 6.1.4. Accessing the Disk Using Port I/O A common technique of copy protection schemes is to obtain access to the hard drive and diskettes by "talking directly to the metal" using port I/O. Not only is this confusing to the defender (because such port sequences are difficult to read), but it provides the ability to access the disk on a low enough layer to avoid using regular interrupts or APIs to access the disk. Furthermore, it is possible to do tricks with port commands that interrupt calls and other APIs would not allow. The disadvantage of this technique is rather obvious. Just like copy protection, computer viruses with such methods can be incompatible among systems. Thus the virus is simply less infectious than viruses that infect on a higher level. Only a few known viruses have attempted to use this technique, such as the Slovenian virus, NoKernel. 6.1.5. Using Undocumented Functions Other viruses simply use undocumented APIs to get access to original handlers. As I mentioned previously, gaining knowledge of undocumented interfaces and file formats is among the great challenges facing serious computer antivirus researchers. The early Dark Avenger viruses use tricks to call the "Get List of Lists," INT 21h, internal DOS function2. Because this was not documented by Microsoft, it was difficult to understand what the virus attempted to do with the structures. Apparently, the virus can query the chain of device drivers on the system with this function, thereby obtaining a handler that can be called directly, bypassing the monitoring programs. A large part of Microsoft operating systems is not documented, including the native API and important parts of the kernel APIs. This makes virus code analysis and virus detection much more complicated. 6.2. Armored Viruses The term armored virus was coined by the computer crime unit of New Scotland Yard to describe computer viruses that make it even more difficult to detect and analyze their functions quickly. A computer virus's primary goal is to spread as far as possible without being noticed. In previous chapters, you learned about advanced techniques, such as stealth, that can hide the virus code and prevent it from being detected too quickly. The authors of armored viruses want to be sure that the virus code is even more difficult for scanners to detect, even if the scanners use techniques such as heuristics that can pinpoint previously unknown computer viruses. Furthermore, if a virus sample is obtained by any means, its author wants to make the analysis of the virus code as difficult as possible to further delay rapid response to the virus attack. The following section describes basic methods of armored viruses: • Antidisassembly • Antidebugging • Antiheuristics • Antiemulation • Antigoat 6.2.1. Antidisassembly Computer viruses written in Assembly language are challenging to understand because they often use tricks that normal programs never or very rarely use. There are several good disassemblers on the market. I can strongly recommend Data Rescue's IDA for computer virus analysis, which allows you to redefine the location of code and data on the fly. Several disassemblers have achieved great success in the past, such as the almost automated disassembler called Sourcer. To prevent such analysis, virus writers deploy techniques to trick such solutions. The greatest attack on disassemblers are simple variations of code obfuscation techniques such as encryption, polymorphism, and especially metamorphism. Because these techniques are rather important to discuss in detail, Chapter 7, "Advanced Code Evolution Techniques and Computer Virus Generator Kits" is dedicated to them. 6.2.2. Encrypted Data One of the most obvious ways to avoid disassembling is to use encrypted data in the virus code. All constant data in the virus can be encrypted. When the virus is loaded into the disassembler, the virus code will reference encrypted snippets, which you need to decrypt one by one to understand what the virus does with themmaking virus analysis an even more tedious process. For example, the W95/Fix2001 worm sends stolen account information to the attacker via e-mail. The author of the worm does not want you to discover quickly the e-mail address to which it sends the information, so examining the code of the worm in a file view or disassembler will not reveal the e-mail addresses to which the worm sends the stolen information. Can you guess what the encrypted block shown in Figure 6.1 hides? Figure 6.1. An encrypted block of data in the W95/Fix2001 worm. When you analyze the worm in a disassembler such as IDA (the interactive disassembler), you will find a decryption routine that uses a simple XOR decryption. The decryption loop will decrypt 87h bytes, with the constant value 19h coming backwards, as shown in Figure 6.2. Figure 6.2. A data decryptor of the W95/Fix2001 worm. Note The "xor byte ptr encrypted[ecx], 19h" instruction could be also shown as "xor byte ptr [ecx+004048DB],19". To continue your analysis, you need to decrypt the data. For example, you might prefer to use a debugger, which will obviously take some time and make your analysis more difficult. The decrypted data will look like that shown in Figure 6.3. Figure 6.3. Decrypted data in the W95/Fix2001 worm. On many occasions, incorrect information is published by teams of incompetent virus analysts. This happens even more often when a virus is encrypted. Such misleading information often reaches the media, causing damage by misinforming users about the threat. The only reliable way to analyze virus code is with comprehensive care. Anything else is unprofessional and must be avoided. 6.2.3. Code Confusion to Avoid Analysis Another possibility for the attacker to challenge disassembling is to use some sort of self-modifying code. When the code is examined in the disassembler, it might not be easily read. Consider a simple file-writing function under DOS: MOV CX, 100h ; this many bytes MOV AH, 40h ; to write INT 21h ; use main DOS handler Such code is very easy to read in a disassemblerbut the attacker knows this. In fact, heuristic analyzers that use disassembling also will find similar code sequences. When the code is written in an obfuscated way, as shown in Listing 6.1, you need to make your own calculations to figure which function will be called. Thus you might prefer to use a debugger; however, the attacker also might present antidebugging techniques to prevent you from using the debugger. Listing 6.1. Slightly Obfuscated Code MOV CX,003Fh ; CX=003Fh INC CX ; CX=CX+1 (CX=0040h) XCHG CH,CL ; swap CH and CL (CX=4000h) XCHG AX,CX ; swap AX and CX (AX=4000h) MOV CX,0100h ; CX=100h INT 21h As a result, when the code reaches INT 21h, the registers will be set accordingly. The attacker also might prefer to introduce many jump instructions into the code flow. This can make the use of a disassembler hard, especially when several kilobytes of code are used to jump back and forth to confuse you. The preceding code can easily be dealt with using a code emulator in the scanner. The attacker, however, might introduce antiemulation techniques to prevent you from using one. 6.2.4. Opcode Mixing Based Code Confusion Consider the example shown in Figure 6.4, selected from the W98/Yobe virus. Notice that the CALL instruction has an offset of 4013E6+1. The +1 is a side effect of a B8h (MOV) opcode, which was inserted into the code flow to confuse the disassembling. Figure 6.4. Opcoding mixing code confusion in W98/Yobe. Fortunately, a modern disassembler such as IDA allows the quick redefinition of an instruction as data. When the B8h is redefined as a single byte of data, the code flow is cleared. Notice the proper disassembling of the code at loc_4013e7 in the disassembly shown in Figure 6.5. Figure 6.5. Correctly disassembled code by defining a B8h data byte. In fact, a CALL to a POP instruction is a common sequence in computer viruses to adjust for their location in the file. Note that CALL pushes the offset where the code would need to return. This is immediately picked from the top of the stack to get the location of the virus body. Consequently the preceding trick might confuse a heuristic analyzer that uses disassembling because it will easily miss the CALL to a POP pair. 6.2.5. Using Checksum Many viruses use some sort of checksum, such as the CRC32 algorithm or something simpler, to avoid using any string matching in code. The code is straightforward to read, but you can get confused about its meaning. In some cases, this kind of checksum becomes an extremely tricky puzzle. The W95/Drill virus3, written by Metal Driller, contains such a trick. The virus picks API names according to the name's checksum, which is stored in the virus. When I analyzed the virus, I could not resolve a few API checksums to anything meaningful as a string. In fact, a few APIs were only used by an antiemulation trick in the virus, so they were not essential to the proper functioning of the virus code. In my article about W95/Drill for Virus Bulletin, I could never explain exactly to what some API checksums corresponded. A couple of months later when we received the source of the virus code in a zine, I found the following minor typo: ;; DWORD GetCurrentProcessID(void) dd 8D91AE5Fh, 0 The proper way to write the name of the API is GetCurrentProcessId(). Because the last character in the API name is incorrect, the precalculated checksum will never match any API string. Another mystery solved! 6.2.6. Compressed, Obfuscated Code Many successful computer worms, such as W32/Blaster (further discussed in Chapter 10, "Exploits, Vulnerabilities, and Buffer Overflow Attacks"), are written in a high-level language. The attacker often uses compression because it has several advantages, including making the relatively large virus code more compact. The analysis of the code is more difficult because the executable needs to be unpacked first. Furthermore, scanners and heuristics analyzers both are challenged unless appropriate support is available. There are well over 500 packer versions that attackers can use. Fortunately, many of these packers are buggy and do not run on all Win32 platforms. Several solutions do exist, such as ASPack and ASProtect, that support polymorphism for individual compressions as well as forms of antidebugging. In fact, it is known that ASPack uses the polymorphic engine of the W95/Marburg virus4. Consider the example from W32/Blaster, shown in Figure 6.6. The header of the worm's file will not reveal the usual section names, but it allows you to guess the name of the packer, which is UPX. Figure 6.6. The PE header area of the W32/Blaster worm. Later in the worm's body, you can only find readable snippets of strings, as shown in Figure 6.7. Figure 6.7. Some string snippets in the packed content of the W32/Blaster worm. To analyze the code effectively in a disassembler, it must be unpacked first. You might be able to use a Win32 debugger, such as Turbo Debugger, OllyDBG, or SoftICE, to unpack the content of the program in memory. However, you might first need to deal with antidebugging tricks. Antidebugging is explained in the next section. 6.2.7. Antidebugging Attackers can use a number of tricks as antidebugging features. The attacker's goal is to prevent you from using a debugger easily. Because hardware supports debugging, the antidebug features can be rather platform-specific. In this section, you will learn about a few common techniques on more than one platform. Several of these methods are incompatible with certain systems. For example, DESQView/DOS was a multitasking DOS operating system that simply did not like such tricks. I was disappointed to learn that I could not protect my antivirus, Pasteur, by several such techniques in this environment. Indeed, antidebugging techniques are useful to protect the code of the antivirus program and DRM (Digital Rights Management) software from novice attackers. 6.2.7.1 Hooking INT 1 and INT 3 on x86 A common technique is to hook INT 1 and INT 3 interrupts. This is harmless during the normal execution of the virus code, but the debuggers will lose their context as a result. Typically the hook routine is set to an IRET, as it is supposed to be without a debugger installed. Some viruses, such as V2Px, use hooks to decrypt their body with INT 1 and INT 3. 6.2.7.2 Calculating in the Interrupt Vectors of INT 1 and INT 3 Instead of hooking INT 1 and INT 3, the armoring code can simply make essential calculations in the interrupt vector table (IVT). Attackers learned this trick from common copy protection wrappers5, similar to that of the powerful EltGuard. The idea is to use the vectors of INT 1 and INT 3 continuously during the execution of the code, such as calculating a decryption key to decrypt the next encrypted layer in the code. By itself, this kind of code is easy to bypass using a more powerful debugger, such as Turbo Debugger in Virtual 86 mode. In that mode, the debugger uses a virtual machine, so the vectors of INT 1 and INT 3 can be modified without harming code execution. 6.2.7.3 Calculating Checksum of the Code to Detect Break Points Because a debugger uses a 0xCC opcode byte (INT 3) inserted into the code flow as a break point, the code changes when the break point is placed. The debugger mimics the proper code by displaying correct code in a dump or disassembly wherever a break point was put. When the running code checks that location, however, the debugger no longer provides the original code, and thus the change can be detected easily. Several viruses checksum their running code to see changes in their code and halt the execution of the virus code if the debugger is detected in this way. Emulation-based analysis does not require code manipulation based on break points. You can defeat such a trick easily in a code emulator based analyzer. 6.2.7.4 Checking the State of the Stack During Execution of Code A very simple way to detect a debugger is to check the state of the stack during execution. The debugger saves a trace record to the stack during single-stepping, which includes CS:IP and the flags. So by simply comparing the states of the stack for a given value, single-stepping can be detected, as shown in Listing 6.2. Listing 6.2. Detecting Single-Stepping Using Stack State MOV BP,SP ; Let's pick the Stack Pointer PUSH AX ; Let's store any AX mark on the stack POP AX ; Pick the value from the stack CMP WORD PTR [BP-2], AX ; Compare against the stack JNE DEBUG ; Debugger detected! The memory-resident monitor portion of some antivirus products also contains such routines to avoid tunneling techniques that use tracing. Needless to say, emulation-based tunneling will not be detectable with this method. 6.2.7.5 Using INT 1 or INT 3 to Execute Another Interrupt This is similar to the previously mentioned hooking method, but instead of simply hooking INT 1 and INT 3 with do-nothing routines, the attacker can easily call another interrupt, such as the original INT 21h. Whenever the virus uses an INT 21h, it can now use INT 1 instead. Consider the following example: MOV AH, 40 ; write function INT 3 ; call original INT 21h via previous hook The INT 3 vector will be hooked to INT 21h, and the preceding function will perform a write-into file, which you cannot trace with a debugger in this form. 6.2.7.6 Using INT 3 to Enter Kernel Mode on Windows 9x Some computer viruses use a trick for the transition from user mode to kernel mode on Windows 9x by manipulating the entries in the IDT for a particular interrupt. Unfortunately, this is rather easy to achieve because the IDT is writeable for user-mode code on Windows 9x. This is, of course, a huge problem: Any interrupt will be associated with kernel-level access, so the moment that the virus code executes such a "hook," the virus code switches to kernel mode. On Intel 386 and higher processors, the IDT holds the offsets and selectors for each interrupt with special flags. The trick is that the offset of the handler is stored as two words split on the "sides" of a QWORD (8 bytes) IDT entry. There are 256 interrupts. The first 32 entries are reserved in the table for processor exceptions; the rest are software interrupts. The address of the IDT is available via the SIDT instruction, which can read the content of the processor's IDTR register that holds the pointer to the IDT. An antidebug technique can be developed by hooking INT 3 in the IDT and executing code via the handler. Unfortunately, however, this confuses debuggers such as SoftICEin case you wanted to use a break point to trace and analyze the virus code. This is exactly how the W95/CIH virus jumps to kernel mode and is antidebugging at the same time. Consider Listing 6.3 as an example. Listing 6.3. "My Precious!" Ring 0 PUSH EAX SIDT FWORD PTR [ESP-2] ; Get IDT Address POP EBX ; and move it to EBX ADD EBX, 1CH ; Points into INT 3's slot CLI ; (3*8+4 = 1Ch) MOV EBP, [EBX] ; Get high half of current MOV BP, [EBX-4] ; Rest of INT 3 handler LEA ESI, NEW_HANDLER ; Offset of new handler in ESI MOV [EBX-4], SI ; Set low half in IDT SHR ESI, 10H MOV [EBX+2], SI ; Set high half in IDT INT 3 ; Run the new handler in Ring0 Evidently, CIH cannot easily be traced in SoftICE; however, debug register break points can be used to step into the virus code without using an INT 3based break-point condition. Obviously, the preceding code violates security and generates an exception on systems such as Windows NT/2000/XP/2003 (which is handled by CIH using an exception handler). However, the virus can use the same trick if appropriate rights are granted. 6.2.7.7 Using INT 0 to Generate a Divide-by-Zero Exception I mentioned earlier that the Virtual 86 mode of Turbo Debugger is an excellent way to deal with malicious armored code, but there are several counterattacks possible against Turbo Debugger. For instance, the attacker can hook INT 0 (the division-by-zero interrupt) to generate a division by zero and run the next instruction in the code flow as the "handler." This will confuse Turbo Debugger. Examples of such a virus are Velvet and W95/SST.951. 6.2.7.8 Using INT 3 to Generate an Exception A variation of the preceding method is to generate an exception on 32-bit Windows systems with a handler to catch it. In this way, the virus code can recover easily, but an application-level debugger like Turbo Debugger will lose the context because the exception handling gets into the picture, which runs kernel-mode code. Some viruses on Win32 systems use INT 3 to generate the exceptions. In such cases, the exception handler functions as a general API caller routine with function IDs and parameters passed in on the stack. W32/Infynca uses this method. 6.2.7.9 Using Win32 with IsDebuggerPresent() API Probably the simplest method for the attacker is to use the IsDebuggerPresent() API, which returns TRUE when a user-mode debugger is running on the system. 6.2.7.10 Detecting a Debugger via Registry Keys Look-up There are many ways to detect modern debuggers such as SoftICE. An example is using Registry keys. Such keys are very easy to find. 6.2.7.11 Detecting a Debugger via Driver-List or Memory Scanning There are, of course, other means for the attacker to find out that you are running a debugger. For instance, the device list can be queried for driver names and the list checked to see whether it contains the name of a debugger driver. An even simpler technique is to use memory scanning and simply detect the debugger's code in memory. 6.2.7.12 Decryption Using the SP, ESP (Stack Pointer) The Cascade virus was the first known virus to use an SP (stack pointer) in its decryption routine. Because the stack is used by INT 1 during tracing, the decryption fails. Other viruses simply decrypt or build their code on the stack, such as the W95/SK virus. Not surprisingly, it is difficult to use debuggers against such code-armoring attacks. 6.2.7.13 Backward Decryption of the Virus Body A standard decryption routine might be used by the virus, but the direction of the decryption loop can be turned around by the attacker. Typically you can trace the virus code until the first byte of the encrypted virus body is decrypted and put a break point to the location of the decrypted byte. This will not work, however, if the decryption routine comes backward because the break-point opcode (0xCC INT 3) will be overwritten by the decryption loop. Several viruses use backward decryption. For example, the W95/Marburg virus can generate both forward and backward decryption loops. 6.2.7.14 Prefetch-Queue Attacks (and When They Backfire) Whale was dubbed "the mother of all viruses." In fact, it used so much armoring that the virus could hardly replicate because of its high complexity. The original variant of Whale was only able to infect early XT (8088) systems because its self-modification routine backfired. This is an interesting hardware dependency in the code, which was fixed in later variants of the virus. Many researchers, however, could not replicate the virus at the time because it was incompatible with 8086, AT, 386, and 486 systems. Thus they believed the virus was headed for complete extinction. Surprisingly, Whale "reincarnated" on Pentium CPU series, at least so in theory, because these processors flush the prefetch queue. Consider the following code of Whale to understand this discovery more thoroughly. The Whale virus hooks INT 3 to force a handler execution. Apparently, the handler executes rubbish, most likely as the result of a bug in calculating the handler's address to a nearby RET instruction. The code is obfuscated, as you can see in Listing 6.4. Listing 6.4. The Obfuscated Trick of Whale pop ax ; POP 0xE9CF into AX register xor ax,020C ; decrypt 0xEBC3 in AX (0xc3 RET) cs: mov [trap],al ; try to overwrite INT 3 with RET add ax,020C ; fill the prefetch queue trap: INT 3 ; Will change to RET ; Only if the prefetch queue is ; already full (on 8088 only) or ; flushed (Pentium+) INT3: ; Points to Rubbish Invd ; Random Rubbish (2 bytes) ret The virus writer expected that the INT 3 would be successfully replaced with a RET instruction to take the control flow to the proper place. His computer was an XT (8088), which has a 4-byte processor prefetch queue size (later replaced with 6 bytes on 8086). This is why the preceding code worked on his computer. Other viruses use prefetch queue attacks to mislead debuggers and emulators. In single-stepping (or emulation not supporting the prefetch queue), such self-modification always takes place. Therefore the attacker can detect tracing easily by checking that the modified code is running instead of the instructions in the prefetch queue. 6.2.7.15 Disabling the Keyboard When you are tracing code, you are using the keyboard. The attacker knows that and disables the keyboard to prevent you from continuing to trace, perhaps by reconfiguring the content of the I/O PORT 21h or PORT 61h. Such ports might be different according to how modern operating systems map them. For instance, the Whale virus on DOS uses the following sequence to disable the keyboard: IN AL,21 OR AL,02 OUT 21,AL Other methods hook INT 9 (keyboard handler) early on. This prevents some debuggers from popping up for some time after the code is executed. A common technique to break into armored code is to run the virus and break in on the fly using the debugger's hot key. If the hot key is disabled, the debugger has no chance to break into the code. Another interesting attack is performed by the Cryptor virus, which stores its encryption keys in the keyboard buffer, which is destroyed by a debugger6. 6.2.7.16 Using Exception Handlers Many Win32 viruses use exception handlers to challenge you while debugging them. The W32/Cabanas7 was the first to use this trick. Pay close attention to the FS:[0] access in malicious code, which is used as a reference to get and set exception handler records. Also pay attention to anything loading from FS:[18] because that can be used to access the FS: data without using the FS: override. Both of these data are in the TIB (Thread Information Block). 6.2.7.17 Clearing the Content of Debug (DRn) Registers Some viruses, such as CIH, use the debug registers for the purpose of "Are you there?" calls. Running SoftICE on a CIH-infected system can easily confuse the virus, which might install itself a second time in such a situation. But it is also true that debug registers can be cleared to confuse even advanced debugging tricks, such as those that use memory break points. 6.2.7.18 Checking the Content of Video Memory Although I have not seen this dangerous trick in too many viruses, a few viruses do use it. The idea is based on the hooking of the timer interrupt (INT 8 or INT 1Ch, which is triggered by it). The routine continuously checks the contents of the video memory for a search string. When the virus is running and you attempt to locate the virus on the system by dumping the memory in a debugger, the search string is detected in the video memory, and the virus can notice this and attack the system, possibly by trashing the disk's content. 6.2.7.19 Checking the Content of the TIB (Thread Information Block) Some Windows viruses check whether the TIB has a nonzero value stored at FS:[20h]. Application-level debuggers can be detected with this method. This can easily be ignored during your analysis, however. The trick is to know what to look for. 6.2.7.20 Using the CreateFile() API (the "Billy Belcebu" Method) Kernel-mode drivers usually communicate with a Win32 user-mode component. To gain a handle to the driver, the Win32 applications can simply use the CreateFile() API with the name of a device. For example, SoftICE's name is \\.\SICE on Windows 9x and \\.\NTICE on Windows NT. This technique is easily noticed during analysis, but many malicious programs simply refuse to work when they detect SoftICE. This can make test replications more difficult if you happened to load the debugger by default on the "dirty" PC that you use for analysis. This technique was first introduced by the virus writer known as Billy Belcebu. 6.2.7.21 Using Hamming Code to Attack Break Points Some viruses such as variants in the Yankee_Doodle family, written by the Bulgarian virus writer "T.P," used error correction with Hamming code to self-heal their code. Hamming code is an invention of Richard Hamming from the 1940s, which allows programs not only to detect errors in transmitted data, but also to correct them (with some limitations). In fact, Yankee_Doodle had a few bugs and could really get corrupted when infecting EXE files that had ES:SP (program stack) header fields pointing into the virus body. As a result, when incorrectly infected samples started to run, the initial PUSH instructions in the virus code could easily corrupt the main part of the virus body because the program stack (ES:SP) accidentally pointed into the virus. The error checking code in the virus could potentially detect these corruptions and repair them to some extentor at least stop the execution of the incorrect samples. However, the real intention of the author of the virus was to eliminate break-points inserted by debuggers8. Consequently, when a debugger inserted a break-point 0xCC byte (INT 3) into the virus code (and when lucky enough), the virus could find out about this and fix itself, removing the break-point. The virus code fixes up to 16 break-points, or corrupted bytes in itself using the hamming data. 6.2.7.22 Obfuscated File Formats and Entry Points Many debuggers fail if the file format containing the virus code is obfuscated or trickily constructed. For example, it is completely legal to overlap some areas of a Portable Executable file via manipulations of internal file structures. Such manipulations, however, might confuse the debugger, which is prepared to work with "standard" file structures. Not surprisingly, debuggers often fail to pick the correct entry-point of the application in case the entry point is obscure. For example, as discussed in Chapter 3, "Malicious Code Environments," the Thread Local Storage (TLS) of Portable Executable files offers an alternate entry point, on NT-based systems, before the main entry point is executed. Some debuggers such as SoftICE will break only on the main entry point, and thus, the virus that loads via the TLS is already running by the time the debugger offers a prompt to a virus analyst. Run-time packers expose similar problems to debuggers because they often cause unexpected file structure changes. 6.2.8. Antiheuristics In 1998, Windows virus development was in a relatively early stage9. This is why a wide variety of different infection methods were introduced, making it possible to consider heuristic analysis against 32-bit Windows viruses. Heuristic analysis can detect unknown viruses and closely related variants of existing viruses using static and dynamic methods. Static heuristics rely on file format and common code fragment analysis. Dynamic heuristics use code emulation to mimic the processor and the operating system environment and detect suspicious operations as the code is "running" in the virtual machine of the scanner. Virus writers developed antiheuristic and antiemulation techniques to fight back against heuristic analyzers. Careful analysis of different infection types led to the development of first-generation Win32 heuristic detectors, which used static heuristics. Static heuristics are capable of pinpointing suspicious portable executable (PE) file structures and therefore can catch first-generation 32-bit Windows viruses with a very high detection rate. The idea was based on DOS virus detection that uses similar methods. By the end of 1999, many new virus replication methods had already been developed. Moreover, a major part of 32-bit Windows viruses used some sort of encryption, polymorphic, or metamorphic technique. Encrypted viruses are more difficult to detectand very dangerous when we look ahead to the future of scanners. This is because scanners can become very slow in attempting decryption for too many clean files on your system. First-generation heuristics were extremely successful against PE file viruses. Even virus writers were surprised by their success, but, as always, we did not have to wait long before they came up with attacks against heuristic detectors. Several antiheuristic infection types have been introduced during the last couple of years. Some virus writers also introduced antiemulation techniques against the strongest component of the antivirus product: the emulator. 6.2.8.1 Attacks Against First-Generation Win32 Heuristics This section describes attacks against first-generation Win32 heuristics recently introduced by virus writers. Examination of these new methods enables us to enhance heuristic scanners to deal with these new virus-writing tricks. (The heuristics detection techniques are the subject of Chapter 11, "Antivirus Defense Techniques".) 6.2.8.1.1 New PE File-Infection Techniques Many PE file viruses add a new section or append to the last section of PE files. Even today, many viruses can be detected with the simplest possible PE file heuristic. This heuristic has the benefit that it can easily be performed by endusers. The heuristic checks to see if the entry point of the PE file points into the last section of the application. This heuristic can cause false positives, but additional checks can confirm whether the file is likely compressed. Compressed PE files cause the most significant false alarm rate, because the actual extractor code is very often patched into the last section of the application. A large number of commercial PE file on-the-fly packers were introduced for Win32 platforms, such as UPX, Neolite, Petite, Shrink32, ASPack, and so on. Such packers are commonly used by virus writers and Trojan code creators to hide their creations. Many people can perform heuristic examination, and a file can be extremely suspicious if it contains certain words or phrases. Consequently, virus writers use packers to hide any suspicious material from the heuristics of scanners and humans. Unfortunately, certain worm or Trojan code can be changed with packers. The W32/ExploreZip worm was the first family of computer worms that were packed. The worm had several variants that were created by using various 32-bit PE packers, such as UPX. Modern antivirus software must deal with new packers. It should not matter to the scanner if a Word document is placed in a ZIP archive that contains another embedded document that also has an embedded executable object packed with a 32-bit packer. The scanner needs to get the last possible decomposed object and perform its scanning task on the decompressed content. Obviously, the decomposer needs to have a recognizer module that can easily sort out the packed object. This means that the false positive rate of 32-bit Windows virus heuristic will decrease significantly, while the detection rate of regular scanning will be much better. Virus writers have realized that infecting the last section of PE files is a far too obvious method and have come up with new infection techniques to avoid heuristic detection that raised the alarm on such objects. 6.2.8.1.2 More Than One Virus Section Several Win32 viruses append not only to one section but to many sections at a time. For instance, the W32/Resure virus appends four sections (.text, .rdata, .data, and .reloc) to each PE host. Because the entry point of the application will be changed to point into the new .text section of the virus, the heuristic is fooled. This is because the last section does not receive any control. Furthermore, the virus will maintain a very high-compatibility factor with most Win32 systems. This is because the virus is written in C instead of Assembly. The file can be suspect heuristicallya human might easily notice the same four sections added to the section tables of a few PE files, but a heuristic scanner has more difficulty. This is because some linkers create suspicious structures in some circumstances, so a heuristic developed for this virus to detect multiple section names would have to be considered a low-value heuristic. 6.2.8.1.3 Prepending Viruses that Encrypt the Host File Header Another simple antiheuristic trick is used by prepender viruses written in a high-level language. These viruses do not need to deal with file formats. First-generation heuristics can look for a PE file header at the purported end of the PE file, according to the prepended virus file header. Again, a human can perform such a heuristic easily, even if the end of the file is somewhat encrypted (as with W32/HLLP.Cramb). However, it is more difficult to create a program for this case because the encryption of the original application can be performed in various ways, and thus the program heuristic will likely fail. 6.2.8.1.4 First-Section Infection of Slack Area Some viruses, such as W95/Invir10, do not modify the entry point of the application to point to the last section. Rather, the virus overwrites the slack area of the first section and jumps from there to the start of virus code that is placed somewhere else, such as the last section. As we will see later, this technique is often combined with antiemulation tricks. This is because second-generation, dynamic heuristics use code emulation to see whether the actual application jumps from one section to another. 6.2.8.1.5 First-Section Infection by Shifting the File Sections The first known virus utilizing this method was the W32/IKX virus, by the virus writer Murkry. This virus is also known as "Mole," referring to a special cavity infection style used by the virus. The virus body of IKX is longer than would fit into a single section's slack area, so the virus makes a "hole" for itself by shifting each section of the PE host after its body. The virus appends itself to the code section and adjusts the raw data offset of each subsequent section by 512 bytes. Clearly, dynamic heuristics are necessary in such a casethat is, heuristics performed on the code instead of on the structure of the application. 6.2.8.1.6 First-Section Infection with Packing One of the most interesting antiheuristic viruses was W95/Aldabera. W95/Aldabera infects the last section of PE files. The virus cannot infect small files; rather, it looks for larger PE files and tries to pack their code section. If the code section of the application can be packed such that the actual packed code and the virus code fit in the very same place, the virus will shrink the code section and place the results together in the code section. After that, it will unpack the code section on execution, similar to an on-the-fly packer. Obviously, W95/Aldabera causes problems for heuristic scanners and regular scanners alike. This is because the virus is rather big and decrypts/unpacks itself and the code section very slowly during emulation, taking a lot of emulator iterations. The virus also uses the RDA (random decryption algorithm) decryption technique11. RDA viruses do not need to store decryption keys to decrypt their code. Instead, the virus decryptor generates keys and decryption methods applying brute-force decryption. (This technique is further discussed in Chapter 7.) Moreover, the virus accesses and modifies too much virtual memory. More than 64KB of "dirty pages" must be kept by the emulator to decrypt the code properly, and this exceeds the maximum of some of the early 32-bit code emulatorsthose that still try to support the XT platform (oh, well). 6.2.8.1.7 Entry-Point Obscuring Techniques Various 32-bit Windows viruses apply a very effective antiheuristic infection method known as the entry-point obscuring or inserting technique. Many old DOS viruses used this technique, several of which were introduced in Chapter 4, "Classification of Infection Strategies." As expected, virus writers implemented inserting polymorphic viruses to evade detection by heuristic scanners. To date, the entry-point obscuring method is the most advanced technique of virus writers. I strongly suspect that future entry-point obscuring polymorphic viruses will use internal mass-mailing capabilities. A further indication of this tendency is the W32/Perenast virus family, which also spreads quickly on networks. W32/Perenast is a highly obfuscated virus that is very difficult to detect. Antivirus software will not survive in this decade without utilizing a scanning engine that supports the detection of such complex viruses. 6.2.8.1.8 Selecting a Random Entry Point in the Code Section The W95/Padania was one of the first 32-bit Windows viruses that occasionally did not modify the entry point of the application (in the PE header) to point to the start of virus code. Sometimes the virus searches the relocation section of the application to find a safe position to perform code replacement in the code section of PE files. The virus does change certain CALL or PUSH instructions to a JMP instruction that points to the start of the virus. The virus selects this position near to the original entry point of the application; therefore, the virus code will likely get control when the original application is executedthough there is no guarantee it will. Whenever the virus is not encrypted, the virus is detected easily by those scanning engines that do not just apply an entry pointbased scanning engine model, but that also scan the top and bottom of each PE file for virus strings. Scanners that do not support this kind of scanning or some sort of algorithmic scanning (discussed in Chapter 11) will be more difficult to update to detect such viruses. Moreover, heuristic scanning needs to deal with the problem of PE emulation in a different manner. Emulators typically stop when encountering an "unknown" API call. Thus the actual position where the jump (JMP) is made to the virus entry point will not be reached. The issue here is that different APIs have a variable number of parameters, and the APIs are responsible for their own stack cleaning. It is simply impossible to know all Win32 APIs and the number of parameters passed in to the function, which is the minimum needed to follow an accurate program code chain. Someone could implement the standard APIs, but what about the thousands of other DLLs? Even dynamic heuristic file scanning is not effective against such creations. Some files might be detected heuristically when the jump to the virus start is placed near the actual entry point of the PE application. 6.2.8.1.9 Recycling Compiler Alignment Areas Several viruses, such as W95/SK and W95/Orez, recycle the compiler alignment areas that are often filled with zeros or 0xCCs by various compilers. W95/Orez fragments its decryptor into these "islands" of the PE files. W95/SK uses a relatively short but efficiently permuted decryptor that replaces such an area in the PE files. Control is given from a randomly selected place via a CALL. 6.2.8.1.10 Viruses That Do Not Change Any Section to Writeable It is a very useful heuristic to check whether a code section is also marked writeable. In particular, a writeable entry-point section is suspicious. Some viruses do not need to mark any section to be writeable or do not necessarily change any section to writeable. For instance, the W95/SK virus decrypts itself on the stack and not at the place of the actual virus body in the last section. Therefore, SK does not need to set the writeable flag on itself. Most viruses that decrypt on the stack will not need to rely on the writeable flag at all, so heuristics scanners have less chance to detect such viruses. 6.2.8.1.11 Accurate File Infection First-generation heuristics can check file structures for accuracy. Several first- generation viruses modify the file structure in a way that corrupts the file. When someone tries to execute an infected file on a Windows NT/2000 system, the system loader will refuse to execute such an application. Unfortunately, the majority of 32-bit Windows viruses are already classified in the Win32 class, meaning that the infection strategy is properly done on more than one major Win32 system. Therefore, this heuristic detection is declining in usefulness against new viruses. 6.2.8.1.12 Checksum Recalculation A possible heuristic is the recalculation of the system DLL's checksum, such as KERNEL32.DLL. When the KERNEL32.DLL's checksum is incorrect, Windows 95 still loads the DLL for execution (unless major corruption is identified, such as a too-short image). Win32 viruses that access KERNEL32.DLL will try to recalculate the checksum of the DLL by using Win32 APIs. Others implement the checksum calculation. The actual checksum API does the following12: 1. Sum up the entire file word by word. Add the carry as another word each time (for example: 0x8a24+0xb400 = 0x13e24 -> 0x3e24 + 0x1 = 0x3e25). 2. Add the final carry if there is one. 3. Add the file size (as DWORD). W32/Kriz was the first virus that implemented the DLL checksum recalculation algorithm without calling any system APIs. It is more difficult to see an application inconsistency when the checksum is properly calculated. Other viruses also recalculate the checksum for PE files whenever the checksum is a nonzero value in the PE header. Therefore, this first-generation heuristic will fail to identify modern Win32 viruses. 6.2.8.1.13 Renaming Existing Sections Some of the first-generation heuristic scanners try to check whether a known noncode section gets control during emulation. Under normal circumstances, several sections (such as ".reloc," ".data," and so on) do not contain any code (unless an on-the-fly packer patches itself to an existing section similarly to a virus). Some viruses change the section name to a random string. For instance, the W95/SK virus renames the ".reloc" section name to a five-character-long random name with a one-in-eight probability. As a result, heuristic scanners cannot pinpoint the virus easily based on the section name and its characteristics. 6.2.8.1.14 Avoiding Header Infection First-generation heuristics can check whether the entry point is located directly after the section headers. W95/Murkry, W95/CIH, and a large number of W95/SillyWR variants use this header-infection strategy and easily can be detected heuristically or generically. Some of the modern 32-bit Windows viruses avoid infecting the PE header area for this reason. 6.2.8.1.15 Avoiding Imports by Ordinal A few early Windows 95 and Win32 viruses patched the host program's import table to include extra imports by ordinal. This is because many hosts programs do not import GetProcAddress() and GetModuleHandle() APIs. Because imports can be resolved using ordinal numbers (with some limitations) instead of function names, a few viruses insert ordinal-based imports into the executables import directory to resolve their need to call Win32 APIs. Most of the new Win32 viruses do not use such logic, so this heuristic became useless against them. 6.2.8.1.16 No CALL-to-POP Trick Most 32-bit Windows appending viruses use the CALL-to-POP trick to locate the start address for their data relocations. Because first-generation heuristics could easily look for that, virus writers tried to implement new ways to get the base address of the code. Normally, the trick looks like this: 0040601A E800000000 call 0040601F ; pick current offset to ESI 0040601F 5E pop esi ; ESI=0040601F In normal circumstances, code similar to the preceding is not generated by compilers, so the use of E800000000 opcode is suspicious. An example is the W32/Kriz virus, which avoids using E800000000 opcode that could be very useful to a static heuristic scanner looking for small, suspicious strings. Listing 6.5 shows the trick implemented in the W32/Kriz virus. Listing 6.5. Hiding CALL-to-POP trick Offset Opcode Instruction 0040601A E807000000 call 00406026h 0040601F 34F4 xor al,F4 00406021 F0A4 lock movsb 00406023 288C085EB934AC sub [eax+ecx-53CB46A2],cl 0040602A 0200. add al,[eax]  The call instruction at the 0040601A offset will reach a POP ESI (opcode 5Eh) instruction. Dynamic heuristics are necessary to see whether the CALL instruction points to an actual POP. 6.2.8.1.17 Fixing the Code Size in the Header A possible heuristic logic is the recalculation of the code section sizes that are actually used. The PE header holds the size of all code sections. Most linkers will calculate a proper code size for the header according to the raw data size of the actual code sections in the file. Some viruses set their own section to be executable but forget about the recalculation in the PE header. Static heuristics can take advantage of this. Unfortunately, some Win32 viruses, such as W32/IKX, recalculate the PE header's code section to a proper value to avoid heuristic detection. 6.2.8.1.18 No API String Usage A very effective antiheuristic/antidisassembly trick appears in various modern viruses. An example is the W32/Dengue virus, which uses no API strings to access particular APIs from the Win32 set. Normally, an API import happens by using the name of the API, such as FindFirstFileA(), OpenFile(), ReadFile(), WriteFile(), and so on, used by many first-generation viruses. A set of suspicious API strings will appear in nonencrypted Win32 viruses. For instance, if we use the strings command to see the strings in the virus body of the W32/IKX virus, we will get something like the following: Murkry\IKX EL32 CreateFileA *.EXE CreateFileMappingA MapViewOfFile CloseHandle FindFirstFileA FindNextFileA FindClose UnmapViewOfFile SetEndOfFile  The name of infamous virus writer Murkry appears on the list. (Actually, the names of certain virus writers or certain vulgar words are useful for heuristic detection.) Moreover, we see the *.EXE string, as well as almost a dozen APIs that search for files and make file modifications. This can make the disassembly of the virus much easier and is potentially useful for heuristic scanning. Modern viruses, instead, use a checksum list of the actual strings. The checksums are recalculated via the export address table of certain DLLs, such as KERNEL32.DLL, and the address of the API is found. It is more difficult to understand a virus that uses such logic because the virus researcher needs to identify the APIs used. By examining the checksum algorithm of the actual virus, the virus researcher can write a program that will list the API strings. The virus researcher, Eugene Kaspersky, employs this technique to identify the used API names relatively quickly. Moreover, Kaspersky creates a reference for IDA that can be used directly by the disassembler13. Although this technique is very useful, it is virus-specific and cannot be applied in heuristics because of latency problems. 6.2.9. Antiemulation Techniques First-generation heuristics used to be virus-infection-specific. The heuristics checked for possible infection methods used on a particular application. Virus writers realized that some scanners use emulation for detecting 32-bit Windows viruses and started to create attacks specifically made against the strongest component of the scanner: the emulator. In this section, I explain some of the antiemulation techniques that have been used by virus writers over the years. 6.2.9.1 Using the Coprocessor (FPU) Instructions Some virus writers realized the power of emulators and looked for weaknesses and quickly realized that coprocessor emulation was not implemented. In fact, most emulators skipped coprocessor instructions until recently, whereas most processors that are currently used support coprocessor instructions by default. Viruses do not limit themselves any longer when they rely on using coprocessor instructions. In the early years of virus development (back in the '80s), virus writers could not utilize coprocessor techniques much because early processors did not include a coprocessor unit. With the introduction of 486 processors, Intel placed the unit in the actual processor. (Even 486 SX systems were built with an FPU. However, their FPU got disabled for two possible reasons: to be able to sell 486DX processors with defective FPUs as SX and to reduce production costs to market "different" processors with lower prices.) Nowadays, the majority of systems have the capability to execute coprocessor code. Some of the nonpolymorphic but encrypted viruses already used coprocessor instructions to decrypt themselves. The Prizzy polymorphic engine (PPE) was first to use coprocessor instructions. The PPE is capable of generating 43 different coprocessor instructions for the use of its polymorphic decryptor. Coprocessor instructions, of course, must be supported by the code emulators of modern antivirus software. 6.2.9.2 Using MMX Instruction Other virus writers went so far as to implement a virus that used the MMX (multimedia extension) instructions of Pentium processors. The first of this kind was W95/Prizzy as well. The virus was too buggy to work, and no antivirus researchers managed to replicate the distributed sample of W95/Prizzy, regardless of the extra care taken to replicate it. The Prizzy polymorphic engine was capable of generating as many as 46 different MMX instructions. Other more successful viruses, such as W32/Legacy14 and W32/Thorin, were considerably better written than their predecessor. Most MMX-capable polymorphic engines do not assume that MMX is available in the processor. Some viruses try to create two polymorphic decryption loops and check the existence of MMX support by using the CPUID instruction. The virus might attempt to replicate on those Pentium systems that do not have MMX support, but it will fail unless it somehow checks the processor type. The CPUID instruction is not available in older processors, whereas a 386 processor is enough to execute a PE image. Not surprisingly, some of the MMX viruses fail to determine MMX support properly and execute MMX instructions on non-MMX systems. As a result, they fail and generate a processor exception. Virus writers might remove the CPUID check from polymorphic engines of the future because the number of Pentium systems that support MMX is growing very quickly. As of the year 2000, MMX emulation has become a must for all antivirus software. 6.2.9.3 Using Structured Exception Handling The use of structured exception handling in 32-bit Windows viruses is as old as the first real Win32 virus. W32/Cabanas used a structured exception handler (SEH) for antidebug purposes and to protect itself from potential crash situations. Many 32-bit Windows viruses set up an exception handler at their start and automatically return to the host program's entry point in case of a bug condition. Viruses often set up an exception handler to create a trap for the emulators used in antivirus products. Such a trick was introduced in the W95/Champ.5447.B virus. In addition, some polymorphic engines generate random code as part of their decryptors. After setting up an exception handler, the virus executes the garbage block, forcing an exception to occur. During actual code execution, the virus indirectly executes its own handler, which gives control to another part of the polymorphic decryptor. When the AV product's emulator cannot handle the exception, the actual virus code will not be reached during code emulation. Unfortunately, solving this problem is not a matter of implementing an exception handler recognizer module in AV products. Although some exception conditions can be determined easily, the emulated environment (what we can build into an AV product) cannot handle all exceptions perfectly. In a real-world situation, it is impossible to be 100% sure that a particular instruction will cause an exception to be generated15. As a result, a virus that uses random garbage blocks to cause a fault might not be realized in the emulation, so the heuristic scanner might not trigger perfectly or at all. 6.2.9.4 Executing Random Virus Code: Is This a Virus Today? Some viruses use a random code execution trap at their entry point. This problem is already known from those DOS viruses that only execute on a randomly generated date or time condition. One of the first viruses to use random execution logic was W95/Invir. This virus either transfers control to the host program (original entry point) or gives control to the virus body. In other words, executing an infected program does not guarantee that the virus will be loaded. W95/Invir uses the FS:[0Ch] value as the seed of randomness. On Win32 systems under Intel machines, the data block at FS:0 is known as the thread information block (TIB). The WORD value FS:[0Ch] is called W16TDB and only has meaning under Windows 9x. Windows NT specifies this value as 0. When the value is 0, the virus will execute the host program. How elegantthe virus will not try to load itself under Windows NT because it is not compatible with it. W16TDB is basically random under Windows 95. The TIB is directly accessible without calling any particular API, which is one of the simplest ways to get a random number. The basic scheme of the first tricky code block is the following: MOV reg, FS:[0C] AND reg, 8 ADD reg, jumptable JMP [reg] The problem for emulators is obvious. Without having the proper value at FS:[0Ch], the virus decryptor will not be reached at all. The issue is a matter of complexity, and the detection of such viruses could be extremely difficult even with virus-specific detection methods heuristics can easily fail to detect such a tricky virus. 6.2.9.5 Using Undocumented CPU Instructions Although there are not too many undocumented Intel processor instructions, there are a few. For example, W95/Vulcano uses the undocumented SALC instruction in its polymorphic decryptor as garbage to stop the processor emulators of certain antivirus engines that cannot handle it. Intel claims that SALC can be emulated as a NOP (a no-operation instruction). Hardly a NOP, this instruction sets AL=FFh if the carry flag is set (CF=1), or resets AL to zero if the carry flag is clear (CF=0)16. Some emulators' implementation of these instructions might differ subtly from the processor's so that a virus could detect when it is executing under emulation. Obviously, emulators should not stop when encountering unknown instructions. However, if the size of the actual opcode is not perfectly calculated, the dynamic heuristic scanner misses the virus. 6.2.9.6 Using Brute-Force Decryption of Virus Code Some viruses, use a brute-force algorithm also known as RDA (Random Decryption Algorithm) to decrypt themselves. This method was used by DOS viruses, such as Spanska virus variants and some older Russian viruses, such as the RDA family. All of these old tricks of virus writers have been recycled in Win32 viruses. Brute-force decryption does not use fixed keys but tries to determine the actual method and the proper keys by trial and error. This logic is relatively fast in the case of real-time execution, but it generates very long loops causing zillions of emulation iterations, ensuring that the actual virus body will not be reached easily. The decryption itself can be suspicious activity, and modern dynamic heuristics can make good use of these tricks. RDA is a very effective attack against code emulators built into antivirus software. 6.2.9.7 Using Multithreaded Virus Functionality Many viruses tried to use threads to give emulators a hard time. Emulators were first used to emulate DOS applications. DOS only supported single-threaded execution, a much simpler model for emulators than the multithreaded model. Emulation of multithreaded Windows applications is challenging because the synchronization of various threads is crucial but rather difficult. As emulators become stronger, virus writers will certainly try to use this antiemulation trick. Recently, there are some sophisticated Win32 emulators, such as the one developed in the Norwegian Norman Antivirus, which can deal with multithreaded Win32 code effectively. 6.2.9.8 Using Interrupts in Polymorphic Decryptors Just like some of the old DOS viruses, a few Windows 95 viruses used random INT instruction insertion in their polymorphic decryptors. Some old generic decryptors might stop emulating the program when the first interrupt call is reached. This is because most encrypted viruses do not use any interrupts before they are completely decrypted. The very same attack was built into the W95/Darkmil virus, which uses INT calls such as INT 2Bh, INT 08, INT 72h, and so on. 6.2.9.9 Using an API to Transfer Control to the Virus Code The W95/Kala.7620 virus was one of the first to use an API to transfer control to its decryptor. The virus writes a short code segment into the code section of the host application. This short code makes a call to the CreateThread() API via the import address table of the host program. The actual thread start address is specified in the last section of the host program that is created by the virus. Obviously, the virus code will not be reached with emulation if the CreateThread() API is not emulated by the scanner. Modern antivirus software needs to address this problem by adding support for certain APIs whenever necessary. It is crucial to have API emulation as part of the arsenal of the AV software. During the last couple of years, virus researchers expected to see new creations that use random API calls in their polymorphic decryptors, similarly to the technique described in the previous section. The W95/Drill virus was the first to implement it in practice. 6.2.9.10 Long Loops Computer viruses also attempt to use long loops to defeat generic decryptors that use emulation. The loop is often polymorphic, do-nothing code, but in some cases an overly long loop takes place to calculate a decryption key that is passed to the polymorphic decryptor. An example of such a virus is W32/Gobi. Gobi is an EPO virus. It uses long key-generator loops that run as many as 40 million iterations before passing a decryption key to the polymorphic decryptor. Thus the emulation of the virus code becomes extremely slow. 6.2.10. Antigoat Viruses Computer virus researchers typically build goat files to better understand the infection strategy of a particular virus (see Chapter 15, "Malicious Code Analysis Techniques" for more details). The infection of goat files helps virus analysis because it visually separates known file content from the virus body. The goat files typically contain do-nothing instructions (such as NOPs) and return to the operating system without any special functionality. Goat files are created with various file formats and internal structures for different kinds of viruses. For example, a set of goat files is created that are 4, 8, 16, or 32KB in size. Antigoat viruses use heuristic rules to detect possible goat files. For example, a virus might not infect a file if it is too small or if it contains a large number of do-nothing instructions, or if the filename contains numbers. Obviously, antigoat viruses are a little bit more time-consuming to analyze because the special taste of the virus needs to be fulfilled first. 6.3. Aggressive Retroviruses A retrovirus is a computer virus that specifically tries to bypass or hinder the operation of an antivirus, personal firewall, or other security programs17. There are many possible ways for an attacker to achieve this because most Windows users work with their computers as a user with administrative privileges. This gives computer viruses the potential to kill the processes and files that belong to antivirus software or to disable the antivirus programs. When Microsoft DOS was first released with the MSAV antivirus product, many viruses deployed techniques to kill the integrity checksum files of the antivirus scanner and the antivirus signatures; they also disabled the on-access virus protection18. At one point, the MSAV/VSAFE disabling routine (a single interrupt call with special parameters) was so popular in computer viruses that it became one of the best heuristic scanning methods to generically pinpoint possible retroviruses! Retroviruses have the potential to make way for other computer viruses that are otherwise known and easy for the antivirus software to handle. Therefore, virus writers routinely reverse engineer antivirus products to learn tricks that can be used in retro attacks. For example, aggressive retroviruses often do the following actions upon execution: • Disable or kill the antivirus programs in memory and/or on the disk. • Disable or bypass behavior-blocking products. • Bypass or kill personal firewall software. • Delete the integrity-checking database files. • Modify the integrity-checking database files. (For example, the IDEA.6155 virus of Spanska attacks TBSCAN's ANTI-VIR.DAT file by patching it. The virus modifies the first character of the non-encrypted host filename stored in ANTI-VIR.DAT. Thus the integrity checker component will not find out that the virus infected the host and simply takes a new checksum of the infected file the next time the integrity checker is used19.) • Execute virus code via the antivirus program. (For example, the Varicella virus escapes when Frans Veldman's TBCLEAN program attempts to clean it generically.) • Trojanize the antivirus database files. Such viruses activate when the scanner searches for viruses. (For instance, the AVP antivirus was challenged by a few viruses, such as a Mr. Sandman creation called Anti-AVP, intended to be called AntiCARO. This virus inserts a new, hacked AVP database into AVP's set to force the Boza virus being detected as Bizatch. Another variation of this attack forces AVP to not detect any viruses and to delete other antivirus programs from the disk during scanning. This virus was authored by the virus writer nicknamed TCP.) • Detect the execution of antivirus software and perform damage. • Take action against disinfectors by making a clean boot more difficult. (An example of this is the Ginger virus that causes circular partition problems.) • Remove the antivirus program's integrity mark from files. (An older version of McAfee SCAN appended an integrity check record to the end of the executable files, which the Tequila virus removed before infection.) • Attack common CRC checksum algorithms. (For example, the HybrisF virus of the virus writer, Vecna, attacks CRC checksums. This technique was previously developed by the virus writer, Zhengxi. CRC checksums are not cryptographically strong. To prove this, the HybrisF virus infected files in such a way that their CRC checksum remained unchanged after infectioneven the file size was the same. Many integrity-checker products did not manage to detect changes in the files.) • Prevent infected systems from connecting and downloading updates from antivirus Web sites. (W95/MTX and W32/Mydoom are examples of this. Other retroviruses can easily block access to security updates as well.) • Require extra password protection. (For example, several members of the [email protected] family send themselves in password-encrypted ZIP files. The password is randomly set and sent in the e-mail message with the worm. Although brute-force encryption of ZIP files is feasible, antivirus products cannot always brute-force attack such files because it can take minutes to decrypt even a simple file. An obvious solution is to use the words in the e-mail message to decrypt the content of the e-mail message. However, the attacker could send a second e-mail message saying, "Here is the password that I forgot to send you," or send an image file such as a JPEG or BMP file that contains the password in handwritten form. (I only wrote this paragraph a few weeks ago, and several variants of Beagle appeared in the meantime that use Microsoft GDIPLUS APIs to convert randomly generated passwords to picture files. Thus antivirus solutions that attempt to pick the password from the e-mail quickly failed to decrypt these variants.) • A few retroviruses not only target antivirus software, but virus analysis tools as well. For example, viruses might interfere with tools such as Filemon and Regmon that are commonly used to perform dynamical code analysis, as I illustrate in Chapter 15. Similar attacks are possible using other file formats, such as self-extracting archives and Microsoft document formats. When documents are protected with a password, the macros in the document are also protected. In early editions of Microsoft Office products, password protection was weak, and therefore antivirus products could decrypt password-protected macros to find the virus in a matter of seconds. Newer Microsoft Office releases have a stronger password protection for documents that can withstand a known plain-text attack and thus cannot be scanned anymore. Although the PKZIP password protection is breakable, it cannot be done in seconds, but minutes only, and so antivirus programs do not have the luxury to execute a brute-force attack to scan them. Retroviruses are particularly challenging for antivirus software. Modern antivirus solutions require extra protection to prevent attacks such as process termination to protect themselves better from unknown computer viruses. References 1. Vesselin Bontchev , "Future Trends in Virus Writing," Virus Bulletin Conference, 1994, pp. 65-82. 2. Andrew Schulman, Raymond J. Michaels, Jim Kyle, Tim Paterson, David Maxey, and Ralf Brown, Undocumented DOS, Addison-Wesley, 1990, ISBN: 0-201-57064-5 (Paperback). 3. Peter Szor, "Drill Seeker," Virus Bulletin, January 2001, pp. 8-9. 4. Peter Ferrie, private communication, 2003. 5. György Ráth, "Copy-Protection on the IBM PC," LSI, Budapest, 1989, ISBN: 963-592-902-1 (Paperback). 6. Peter Ferrie, personal communication, 2004. 7. Peter Szor, "Coping with Cabanas," Virus Bulletin, November 1997, pp. 10-12. 8. Dr. Vesselin Bontchev, "Cryptographic and Cryptanalytic Methods Used in Computer Viruses and Anti-Virus Software," RSA Conference, 2004. 9. Peter Szor, "Attacks on Win32Part II," Virus Bulletin Conference, September 2000, pp. 101-121. 10. Peter Szor, "The Invisible Man," Virus Bulletin, May 2000, pp. 8-9. 11. Igor Daniloff, "New Polymorphic Random Decoding Algorithm in Viruses," EICAR, 1995, pp. 9-18. 12. Wason Han, personal communication, 1999. 13. Eugene Kaspersky, personal communication, 1998. 14. Snorre Fageland, "Merry MMXmas!," Virus Bulletin, December 1999, pp. 10-11. 15. Kurt Natvig, personal communication, 1999. 16. "Undocumented OpCodes: SALC," http://www.x86.org/secrets/opcodes/salc.htm. 17. Mikko H. Hypponen, "Retroviruses - How Viruses Fight Back," Virus Bulletin Conference, New Jersey, 1994. 18. Yisreal Radai, "The Anti-Viral Software of MS-DOS 6," http://www.virusbtn.com/old/OtherPapers/MSAV/. 19. Peter Szor, "Bad IDEA," Virus Bulletin, April 1999, pp. 18-19. Chapter 7. Advanced Code Evolution Techniques and Computer Virus Generator Kits "In mathematics you don't understand things. You just get used to them." John von Neumann In this chapter you will learn about the advanced self-protection techniques computer virus writers have developed over the years to fight back against scanners. In particular, you will learn about encrypted, oligomorphic, polymorphic1, and advanced metamorphic computer viruses2. Finally, we will look at computer virus generator kits3 that use similar techniques to create different-looking virus variants. 7.1. Introduction We will examine the various ways in which virus writers have challenged our scanning products over the last decade. Although most of these techniques are used to obfuscate file-infector viruses, we can surely expect similar techniques to appear in future computer worms. Over the years, code evolution has come a long way in binary viruses. If someone were to trace the development result of viruses, it might appear that almost everything possible has already been done, and problems are not escalating. However, there are still computing distribution models that have not yet been seen in viruses. 7.2. Evolution of Code Virus writers continually challenge antivirus products. Their biggest enemies are the virus scanner products that are the most popular of current antivirus software. Generic AV solutions, such as integrity checking and behavior blocking, never managed to approach the popularity of the antivirus scanner. In fact, such generic virus detection models need a lot more thought and technology in place under Windows platforms. These technologies were beaten by some of the old DOS viruses in the DOS days. As a result, some people draw the incorrect conclusion that these techniques are not useful. Scanning is the market's accepted solution, regardless of its drawbacks. Thus it must be able to deal with the escalating complexity and emerging number of distributed and self-distributing malware. Although modern computing developed extremely quickly, for a long time binary virus code could not catch up with the technological challenges. In fact, the DOS viruses evolved to a very complex level until 1996. At that point, however, 32-bit Windows started to dominate the market. As a result, virus writers had to go back years in binary virus development. The complexity of DOS polymorphism peaked when Ply was introduced in 1996 with a new permutation engine (although the metamorphic virus, ACG, was introduced in 1998). These developments could not continue. New 32-bit infection techniques had to be discovered by the pioneer virus writers and later on Win32 platforms. Some virus writers still find the Windows platforms far too challenging, especially when it comes to Windows NT/2000/XP/2003. The basic infection techniques, however, have already been introduced, and standalone virus assembly sources are distributed widely on the Internet. These sources provide the basis of new mass-mailing worms that do not require major skillsjust cut and paste abilities. In the following sections, we will examine the basic virus code obfuscation techniques, from encrypted viruses to modern metamorphic techniques. 7.3. Encrypted Viruses From the very early days, virus writers tried to implement virus code evolution. One of the easiest ways to hide the functionality of the virus code was encryption. The first known virus that implemented encryption was Cascade on DOS4. The virus starts with a constant decryptor, which is followed by the encrypted virus body. Consider the example extracted from Cascade.1701 shown in Listing 7.1. Listing 7.1. The Decryptor of the Cascade Virus lea si, Start ; position to decrypt (dynamically set) mov sp, 0682 ; length of encrypted body (1666 bytes) Decrypt: xor [si],si ; decryption key/counter 1 xor [si],sp ; decryption key/counter 2 inc si ; increment one counter dec sp ; decrement the other jnz Decrypt ; loop until all bytes are decrypted Start: ; Encrypted/Decrypted Virus Body Note that this decryptor has antidebug features because the SP (stack pointer) register is used as one of the decryption keys. The direction of the decryption loop is always forward; the SI register is incremented by one. Because the SI register initially points to the start of the encrypted virus body, its initial value depends on the relative position of the virus body in the file. Cascade appends itself to the files, so SI will result in the same value if two host programs have equivalent sizes. However, the SI (decryption key 1) is changed if the host programs have different sizes. The SP register is a simple counter for the number of bytes to decrypt. Note that the decryption is going forward with word (double-byte) key length. The decryption position, however, is moved forward by one byte each time. This complicates the decryption loop, but it does not change its reversibility. Note that simple XOR is very practical for viruses because XORing with the same value twice results in the initial value. Consider encrypting letter P (0x50) with the key 0x99. You see, 0x50 XOR 0x99 is 0xC9, and 0xC9 XOR 0x99 will return to 0x50. This is why virus writers like simple encryption so muchthey are lazy! They can avoid implementing two different algorithms, one for the encryption and one for the decryption. Cryptographically speaking, such encryption is weak, though early antivirus programs had little choice but to pick a detection string from the decryptor itself. This led to a number of problems, however. Several different viruses might have the same decryptor, but they might have completely different functionalities. By detecting the virus based on its decryptor, the product is unable to identify the variant or the virus itself. More importantly, nonviruses, such as antidebug wrappers, might have a similar decryptor in front of their code. As a result, the virus that uses the same code to decrypt itself will confuse them. Such a simple code evolution method also appeared in 32-bit Windows viruses very early. W95/Mad and W95/Zombie use the same technique as Cascade. The only difference is the 32-bit implementation. Consider the decryptor from the top of W95/Mad.2736, shown in Listing 7.2. Listing 7.2. The Decryptor of the W95/Mad.2736 Virus mov edi,00403045h ; Set EDI to Start add edi,ebp ; Adjust according to base mov ecx,0A6Bh ; length of encrypted virus body mov al,[key] ; pick the key Decrypt: xor [edi],al ; decrypt body inc edi ; increment counter position loop Decrypt ; until all bytes are decrypted jmp Start ; Jump to Start (jump over some data) key DB 86 ; variable one byte key Start: ; encrypted/decrypted virus body In fact, this is an even simpler implementation of the simple XOR method. Detection of such viruses is still possible without trying to decrypt the actual virus body. In most cases, the code pattern of the decryptor of these viruses is unique enough for detection. Obviously, such detection is not exact, but the repair code can decrypt the encrypted virus body and easily deal with minor variants. The attacker can implement some interesting strategies to make encryption and decryption more complicated, further confusing the antivirus program's detection and repair routines: • The direction of the loop can change: forward and backward loops are supported (see all cases in Figure 7.1). Figure 7.1. Decryption loop examples. • Multiple layers of encryption are used. The first decryptor decrypts the second one, the second decrypts the third, and so on (see Figure 7.1c.). Hare5 by Demon Emperor, W32/Harrier6 by TechnoRat, {W32, W97M}/Coke by Vecna, and W32/Zelly by ValleZ are examples of viruses that use this method. • Several encryption loops take place one after another, with randomly selected directionsforward and backward loops. This technique scrambles the code the most (see Figure 7.1c.). • There is only one decryption loop, but it uses more than two keys to decrypt each encrypted piece of information on the top of the others. Depending on the implementation of the decryptor, such viruses can be much more difficult to detect. The size of the key especially mattersthe bigger the key size (8, 16, 32 -bit, or more), the longer the brute-force decryption might take if the keys cannot be extracted easily. • The start of decryptor is obfuscated. Some random bytes are padded between the decryptor and the encrypted body and/or the encrypted body and the end of the file. • Nonlinear decryption is used. Some viruses, such as W95/Fono, use a simple nonlinear algorithm with a key table. The virus encryption is based on a substitution table. For instance, the virus might decide to swap the letters A and Z, the letters P and L, and so on. Thus the word APPLE would look like ZLLPE after such encryption. Because the virus decryption is not linear, the virus body is not decrypted one byte after another. This easily might confuse a junior virus analyst because in some cases, the virus body might not look encrypted at all. Consequently, if a detection string is picked from such a sample, the virus detection will be partial. This technique easily can confuse even advanced detection techniques that use an emulator. Although in normal cases the emulation can continue until linear detection is detected, such as consecutive byte changes in the memory of a virtual machine used by the scanner, a nonlinear algorithm will force the emulation to continue until a hard-to-guess minimum limit. A variant of the W32/Chiton ("Efish") virus uses a similar approach to Fono's, but Chiton makes sure it always replaces each byte of the virus body with another value using a complete substitution table. In addition, Chiton uses multiple values to correspond to each byte in the code, significantly complicating the decryption. Viruses such as W95/Drill and { W32, Linux} /Simile.D represent the state of the art in nonlinear encryption, decrypting each piece of the encrypted virus body in a semi-random order, hitting each position in the virus only once.7 • The attacker can decide not to store the key for encryption anywhere in the virus. Instead, the virus uses brute force to decrypt itself, attempting to recover the encryption keys on its own. Viruses like this are much harder to detect and said to use the RDA (random decryption algorithm) technique. The RDA.Fighter virus is an example that uses this method. • The attacker can use a strong encryption algorithm to encrypt the virus. The IDEA family of viruses, written by Spanska, utilizes this method. One of several decryptors uses the IDEA cipher.8 Because the virus carries the key for the decryption, the encryption cannot be considered strong, but the repair of such viruses is painful because the antivirus needs to reimplement the encryption algorithm to deal with it. In addition, the second decryption layer of IDEA virus9 uses RDA. • The Czech virus W32/Crypto by Prizzy demonstrated the use of Microsoft crypto API in computer viruses. Crypto encrypts DLLs on the system using a secret/public key pair generated on the fly. Other computer worms and backdoor programs also use the Crypto API to decrypt encrypted content. This makes the job of antivirus scanners more difficult. An example of a computer worm using the Crypto API is [email protected], which encrypts EXE files. • Sometimes the decryptor itself is not part of the virus. Viruses such as W95/Resur10 and W95/Silcer are examples of this method. These viruses force the Windows Loader to relocate the infected program images when they are loaded to memory. The act of relocating the image is responsible for decrypting the virus body because the virus injects special relocations for the purpose of decryption. The image base of the executable functions as the encryption key. • The Cheeba virus demonstrated that the encryption key can be external to the virus body. Cheeba was released in 1991. Its payload is encrypted using a filename. Only when the virus accesses the file name will it correctly decrypt its payload11. Virus researchers cannot easily describe the payload of such virus unless the cipher in the virus is weak. Dmitry Gryaznov managed to reduce the key size necessary to attack the cipher in Cheeba to only 2,150,400 possible keys by using frequency cryptanalysis of the encrypted virus body, assuming that the code under the encryption was written in a similar style as the rest of the virus code12. This yielded the result, and the magic filename, "users.bbs" was found. This filename belonged to a popular bulletin board software. It is expected that more, so-called "clueless agents"13 will appear as computer viruses to disallow the defender to gain knowledge about the intentions of the attacker. • Encryption keys can be generated in different ways, such as constant, random but fixed, sliding, and shifting. • The key itself can be stored in the decryptor, in the host, or nowhere at all. In some cases, the decryptor's code functions as a decryption key, which can cause problems if the code of the decryptor is modified with a debugger. Furthermore, this technique can attack emulators that use code optimization techniques to run decryptors more efficiently. (An example of such as virus is Tequila.) • The randomness of the key is also an important factor. Some viruses only generate new keys once per day and are said to use a slow generator. Others prefer to generate keys every single time they infect an object; these are known as fast generators. The attacker can use many different methods to select the seed of randomness. Simple examples include timer ticks, CMOS time and date, and CRC32. A complicated example is the Mersenne Twister14 pseudo-number generator used by W32/Chiton and W32/Beagle. • The attacker can select several locations to decrypt the encrypted content. The most common methods are shown in Figure 7.2. Figure 7.2. Possible places of decryption. A) The decryptor decrypts the data at the location of the encrypted virus body. This method is the most common; however, the encrypted data must be writeable in memory, whichdepends on the actual operating system. B) The decryptor reads the encrypted content and builds the decrypted virus body on the stack. This is very practical for the attacker. The encrypted data does not need to be writeable. C) The virus allocates memory for the decrypted code and data. This can be a serious disadvantage for the attacker because nonencrypted code needs to allocate memory firstbefore the decryptor. Note Metamorphic viruses such as Simile circumvent this disadvantage because the code that allocates memory is made variable without providing the ability to pick a search string. The preceding techniques work very effectively when combined with variable decryptors that keep changing in new generations of the virus. Oligomorphic and polymorphic decryption are discussed in the following sections. 7.4. Oligomorphic Viruses Virus writers quickly realized that detection of an encrypted virus remains simple for the antivirus software as long as the code of the decryptor itself is long enough and unique enough. To challenge the antivirus products further, they decided to implement techniques to create mutated decryptors. Unlike encrypted viruses, oligomorphic viruses do change their decryptors in new generations. The simplest technique to change the decryptors is to use a set of decryptors instead of a single one. The first known virus to use this technique was Whale. Whale carried a few dozen different decryptors, and the virus picked one randomly. W95/Memorial had the ability to build 96 different decryptor patterns. Thus the detection of the virus based on the decryptor's code was an impractical solution, though a possible one. Most products tried to deal with the virus by dynamic decryption of the encrypted code. The detection is still based on the constant code of the decrypted virus body. Consider the example of Memorial shown in Listing 7.3, a particular instance of 96 different cases. Listing 7.3. An Example Decryptor of the W95/Memorial Virus mov ebp,00405000h ; select base mov ecx,0550h ; this many bytes lea esi,[ebp+0000002E] ; offset of "Start" add ecx,[ebp+00000029] ; plus this many bytes mov al,[ebp+0000002D] ; pick the first key Decrypt: <strong>nop ; junk</strong> <strong>nop ; junk</strong> xor [esi],al ; decrypt a byte inc esi ; next byte <strong>nop ; junk</strong> inc al ; slide the key dec ecx ; are there any more bytes to decrypt? jnz Decrypt ; until all bytes are decrypted jmp Start ; decryption done, execute body ; Data area Start: ; encrypted/decrypted virus body Notice the sliding-key feature. The order of the instructions can be slightly changed, and the decryptor can use different instructions for looping. Compare this example with another instance, shown in Listing 7.4. Listing 7.4. A Slightly Different Decryptor of W95/Memorial mov ecx,0550h ; this many bytes mov ebp,013BC000h ; select base lea esi,[ebp+0000002E] ; offset of "Start" add ecx,[ebp+00000029] ; plus this many bytes mov al,[ebp+0000002D] ; pick the first key Decrypt: <strong>nop ; junk</strong> <strong>nop ; junk</strong> xor [esi],al ; decrypt a byte inc esi ; next byte <strong>nop ; junk</strong> inc al ; slide the key loop Decrypt ; until all bytes are decrypted jmp Start ; Decryption done, execute body ; Data area Start: ; Encrypted/decrypted virus body Notice the appearance of a "loop" instruction in this instance, as well as the swapped instructions in the front of the decryptor. A virus is said to be oligomorphic if it is capable of mutating its decryptor only slightly. Interestingly, some products that we tested could not detect all instances of Memorial. This is because such viruses must be examined to their smallest details to find and understand the oligomorphic decryptor generator. Without such careful manual analysis, the slow oligomorphic virus techniques are impossible to detect with any reliability. For example, the decryptor of the Badboy virus15 changes in one instructionand only very rarely. Obviously, they are a great challenge for automated virus analysis centers. Another early example of an oligomorphic virus is the Russian virus family called WordSwap. 7.5. Polymorphic Viruses The next step in the level of difficulty is a polymorphic attack. Polymorphic viruses can mutate their decryptors to a high number of different instances that can take millions of different forms. 7.5.1. The 1260 Virus The first known polymorphic virus, 1260, was written in the U.S. by Mark Washburn in 199016. This virus has many interesting techniques that were previously predicted by Fred Cohen. The virus uses two sliding keys to decrypt its body, but more importantly, it inserts junk instructions into its decryptor. These instructions are garbage in the code. They have no function other than altering the appearance of the decryptor. Virus scanners were challenged by 1260 because simple search strings could no longer be extracted from the code. Although 1260's decryptor is very simple, it can become shorter or longer according to the number of inserted junk instructions and random padding after the decryptor for up to 39 bytes of junk instructions. In addition, each group of instructions (prolog, decryption, and increments) within the decryptor can be permutated in any order. Thus the "skeleton" of the decryptor can change as well. Consider the example of an instance of a decryptor extracted from 1260 (see Listing 7.5). Listing 7.5. An Example Decryptor of 1260 ; Group 1 Prolog Instructions <strong>inc si ; optional, variable junk</strong> mov ax,0E9B ; set key 1 <strong>clc ; optional, variable junk</strong> mov di,012A ; offset of Start <strong>nop ; optional, variable junk</strong> mov cx,0571 ; this many bytes - key 2 ; Group 2 Decryption Instructions Decrypt: xor [di],cx ; decrypt first word with key 2 <strong>sub bx,dx ; optional, variable junk</strong> <strong>xor bx,cx ; optional, variable junk</strong> <strong>sub bx,ax ; optional, variable junk</strong> <strong>sub bx,cx ; optional, variable junk</strong> <strong>nop ; non-optional junk</strong> <strong>xor dx,cx ; optional, variable junk</strong> xor [di],ax ; decrypt first word with key 1 ; Group 3 Decryption Instructions inc di ; next byte <strong>nop ; non-optional junk</strong> <strong>clc ; optional, variable junk</strong> inc ax ; slide key 1 ; loop loop Decrypt ; until all bytes are decrypted slide key 2 ; random padding up to 39 bytes Start: ; Encrypted/decrypted virus body In each group of instructions, up to five junk instructions are inserted (INC SI, CLC, NOP, and other do-nothing instructions) with no repetitions allowed in the junk. There are two NOP junk instructions that always appear. 1260 does not have register replacement, but more complicated polymorphic attacks use that trick. Nonetheless, 1260 is an effective polymorphic engine that generates a high variety of decryptors. 7.5.2. The Dark Avenger Mutation Engine (MtE) The next important development in the history of polymorphic viruses was MtE17, a mutation engine written by the Bulgarian Dark Avenger. The first version MtE was released during the summer of 1991, later followed by another version in early 1992. The idea of the mutation engine is based on modular development. For novice virus writers, it was difficult to write a polymorphic virus. However, more advanced virus writers came to their rescue. The MtE engine was released as an object that could be linked to any simple virus. The concept of MtE is to make a function call to the mutation engine function and pass control parameters in predefined registers. The engine takes care of building a polymorphic shell around the simple virus inside it. The parameters to the engine include the following: • A work segment • A pointer to the code to encrypt • Length of the virus body • Base of the decryptor • Entry-point address of the host • Target location of encrypted code • Size of decryptor (tiny, small, medium, or large) • Bit field of registers not to use In response, the MtE engine returns a polymorphic decryption routine with an encrypted virus body in the supplied buffer. (See Listing 7.6.) Listing 7.6. An Example Decryptor Generated by MtE mov bp,A16C ; This Block initializes BP ; to "Start"-delta mov cl,03 ; (delta is 0x0D2B in this example) ror bp,cl mov cx,bp mov bp,856E or bp,740F mov si,bp mov bp,3B92 add bp,si xor bp,cx sub bp,B10C ; Huh ... finally BP is set, but remains an ; obfuscated pointer to encrypted body Decrypt: mov bx,[bp+0D2B] ; pick next word ; (first time at "Start") add bx,9D64 ; decrypt it xchg [bp+0D2B],bx ; put decrypted value to place mov bx,8F31 ; this block increments BP by 2 sub bx,bp mov bp,8F33 sub bp,bx ; and controls the length of decryption jnz Decrypt ; are all bytes decrypted? Start: ; encrypted/decrypted virus body This example of MtE illustrates how remarkably complicated this particular engine is. Can you guess how to detect it? It makes sense to look at a large set of samples first. The first time, it took me five days before I managed to write a reliable detector for it. MtE could produce some decryptor cases that appeared only in about 5% or less of all cases. However, the engine had a couple of minor limitations that were enough to detect the virus reliably using an instruction size disassembler and a state machine. In fact, there is only one constant byte in an MtE decryptor, the 0x75 (JNZ), which is followed by a negative offsetand even that is placed at a variable location (at the end of the decryptor, whose length is not constant). Note MtE does not have garbage instructions in the decryptor, as 1260 does. Therefore MtE attacks techniques that attempt to optimize decryptors to defeat polymorphism. MtE's impact on antivirus software was clear. Most AV engines had to go through a painful rearchitecting to introduce a virtual machine for the use of the scanning engine. As Frans Veldman used to say, "We simply let the virus do the dirty work." MtE was quickly followed by many similar engines, such as TPE (Trident Polymorphic Engine), written by Masouf Khafir in Holland in 1993. Today, hundreds of polymorphic engines are known. Most of these engines were only used to create a couple of viruses. After a polymorphic decryptor can be detected, using it becomes a disadvantage to virus writers because any new viruses are covered by the same detection. Such detections, however, usually come with the price of several false positives and false negatives. More reliable techniques detect and recognize the virus body itself. This opens up the possibility for virus writers to use the same polymorphic engine in many different viruses successfully unless such viruses are handled with heuristic or generic detection methods. 7.5.3. 32-Bit Polymorphic Viruses W95/HPS and W95/Marburg18 were the first viruses to use real 32-bit polymorphic engines. These two viruses were authored by the infamous Spanish virus writer, GriYo, in 1998. He also created several highly polymorphic viruses earlier on DOS, such as the virus Implant19. Just like Implant's polymorphic engine, HPS's polymorphic engine is powerful and advanced. It supports subroutines using CALL/RET instructions and conditional jumps with nonzero displacement. The code of the polymorphic engine takes about half of the actual virus code, and there are random byte-based blocks inserted between the generated code chains of the decryptor. The full decryptor is built only during the first initialization phase, which makes the virus a slow polymorphic. This means that antivirus vendors cannot test their scanner's detection rate efficiently because the infected PC must be rebooted to force the virus to create a new decryptor. The decryptor consists of Intel 386 based instructions. The virus body is encrypted and decrypted by different methods, including XOR/NOT and INC/DEC/SUB/ADD instructions with 8, 16, or 32-bit keys, respectively. From a detection point of view, this drastically reduces the range of ideas. I am sad to say that the polymorphic engine was very well written, just like the rest of the virus. It was certainly not created by a beginner. Consider the following example of a decryptor, simplified for illustration. The polymorphic decryptor of the virus is placed after the variably encrypted virus body. The decryptor is split between small islands of code routines, which can appear in mixed order. In the example shown in Listing 7.7, the decryptor starts at the Decryptor_Start label, and the decryption continues until the code finally jumps to the decrypted virus body. Listing 7.7. An Illustration of a W95/Marburg Decryptor Instance Start: ; Encrypted/Decrypted Virus body is placed here Routine-6: dec esi ; decrement loop counter ret Routine-3: mov esi,439FE661h ; set loop counter in ESI ret Routine-4: xor byte ptr [edi],6F ; decrypt with a constant byte ret Routine-5: add edi,0001h ; point to next byte to decrypt ret Decryptor_Start: call Routine-1 ; set EDI to "Start" call Routine-3 ; set loop counter Decrypt: call Routine-4 ; decrypt call Routine-5 ; get next call Routine-6 ; decrement loop register cmp esi,439FD271h ; is everything decrypted? jnz Decrypt ; not yet, continue to decrypt jmp Start ; jump to decrypted start Routine-1: call Routine-2 ; Call to POP trick! Routine-2: pop edi sub edi,143Ah ; EDI points to "Start" ret The preceding decryptor is highly structural, with each differently functioning piece placed in its own routine. The result is millions of possible code patterns filled with random garbage instructions between the islands. Polymorphic viruses can create an endless number of new decryptors that use different encryption methods to encrypt the constant part (except their data areas) of the virus body. Some polymorphic viruses, such as W32/Coke, use multiple layers of encryption. Furthermore, variants of Coke also can infect Microsoft Word documents in a polymorphic way. Coke mutates its macro directly with the binary code of the virus instead of using macro code directly. Normally, polymorphic macro viruses are very slow because they need a lot of interpretation. Because Coke generates mutated macros with binary code, it does not suffer from slow-down issues and, as a result, is harder to notice. Consider the following example of Coke taken from two instances of mutated AutoClose() macros shown in Listing 7.8. Listing 7.8. An Example Snippet of Coke's Polymorphic Macro 'BsbK Sub AuTOclOSE() oN ERROr REsuMe NeXT SHOWviSuAlBASIcEditOr = faLsE If nmnGG > WYff Then For XgfqLwDTT = 70 To 5 JhGPTT = 64 KjfLL = 34 If qqSsKWW < vMmm Then For QpMM = 56 To 7 If qtWQHU = PCYKWvQQ Then If lXYnNrr > mxTwjWW Then End If If FFnfrjj > GHgpE Then End If  The second example is a little longer because of the junk. I have highlighted some of the essential instructions in these examples. Notice that even these are not presented in the same order whenever possible. For example, the preceding instance turns off the Visual Basic Editor, so you will no longer see it in Word's menus. However, in Listing 7.9, this happens later, after lots of inserted junk and other essential instructions. Listing 7.9. Another Snippet of Coke's Polymorphic Macro 'fYJm Sub AUtOcLOse() oN ERRor REsUME NexT optIOns.saVenorMALPrOmpT = FAlsE DdXLwjjVlQxU = "TmDKK"
NrCyxbahfPtt$= "fnMM" If MKbyqtt > mHba Then If JluVV > mkpSS Then jMJFFXkTfgMM$ = "DmJcc"
For VPQjTT = 42 To 4
If PGNwygui = bMVrr Then
dJTkQi = 07
'wcHpsxllwuCC
End If
Next VPQjTT
quYY = 83
End If
DsSS = 82
bFVpp = 60
End If
tCQFv=1
Rem kJPpjNNGQCVpjj
LyBDXXXGnWW$= "wPyTdle" If cnkCvCww > FupJLQSS Then VbBCCcxKWxww$ = "Ybrr"
End If
opTiONS.COnFIrmCOnvErsiOnS = faLSe
Svye = 55
PgHKfiVXuff$= "rHKVMdd" ShOwVisUALbaSiCEdITOR = fALSe  Newer polymorphic engines use an RDA-based decryptor that implements a brute-force attack against its constant but variably encrypted virus body in a multiencrypted manner. Manual analysis of such viruses might lead to great surprises. Often there are inefficiencies of randomness in such polymorphic engines, which can be used as a counterattack. Sometimes even a single wildcard string can do the magic of perfect detection. Most virus scanners, years ago, already had a code emulator capable of emulating 32-bit PE (portable executable) files. Other virus researchers only implemented dynamic decryption to deal with such viruses. That worked just as in the previous cases because the virus body was still constant under encryption. According to the various AV tests, some vendors were still sorry not to have support for difficult virus infection techniques. Virus writers used the combination of entry-point obscuring techniques with 32-bit polymorphism to make the scanner's job even more difficult. In addition, they tried to implement antiemulation techniques to challenge code emulators. Nevertheless, all polymorphic viruses carry a constant code virus body. With advanced methods, the body can be decrypted and identified. Consider Figure 7.3 as an illustration. Figure 7.3. Instances of encrypted and decrypted polymorphic virus bodies. 7.6. Metamorphic Viruses Virus writers still must often waste weeks or months to create a new polymorphic virus that does not have chance to appear in the wild because of its bugs. On the other hand, a researcher might be able to deal with the detection of such a virus in a few minutes or few days. One of the reasons for this is that there are a surprisingly low number of efficient external polymorphic engines. Virus writers try, of course, to implement various new code evolution techniques to make the researcher's job more difficult. The W32/Apparition virus was the first-known 32-bit virus that did not use polymorphic decryptors to evolve itself in new generations. Rather, the virus carries its source and drops it whenever it can find a compiler installed on the machine. The virus inserts and removes junk code to its source and recompiles itself. In this way, a new generation of the virus will look completely different from previous ones. It is fortunate that W32/Apparition did not become a major problem. However, such a method would be more dangerous if implemented in a Win32 worm. Furthermore, these techniques are even more dangerous on platforms such as Linux, where C compilers are commonly installed with the standard system, even if the system is not used for development. In addition, MSIL (Microsoft Intermediate Langauge) viruses already appeared to rebuild themselves using the System.Reflection.Emit namespace and implement a permutation engine. An example of this kind of metamorphic engine is the MSIL/Gastropod virus, authored by the virus writer, Whale. The technique of W32/Apparition is not surprising. It is much simpler to evolve the code in source format rather than in binary. Not surprisingly, many macro and script viruses use junk insertion and removal techniques to evolve in new generations20. 7.6.1. What Is a Metamorphic Virus? Igor Muttik explained metamorphic viruses in the shortest possible way: "Metamorphics are body-polymorphics." Metamorphic viruses do not have a decryptor or a constant virus body but are able to create new generations that look different. They do not use a data area filled with string constants but have one single-code body that carries data as code. Material metamorphosis does exist in real life. For instance, shape memory polymers have the ability to transform back to their parent shape when heated21. Metamorphic computer viruses have the ability to change their shape by themselves from one form to another, but they usually avoid generating instances that are very close to their parent shape. Figure 7.4 illustrates the problem of metamorphic virus bodies as multiple shapes. Figure 7.4. The virus body keeps changing in different generations of a metamorphic virus. Although there are some DOS metamorphic viruses, such as ACG (Amazing Code Generator), these did not become a significant problem for end users. There are already more metamorphic Windows viruses than metamorphic DOS viruses. The only difference between the two is in their respective potentials. The networked enterprise gives metamorphic binary worms the ability to cause major problems. As a result, we will not be able to turn a blind eye to them, assuming that we do not need to deal with them because they are not causing problems. They will. 7.6.2. Simple Metamorphic Viruses In December of 1998, Vecna (a notorious virus writer) created the W95/Regswap virus. Regswap implements metamorphosis via register usage exchange. Any part of the virus body will use different registers but the same code. The complexity of this, clearly, is not very high. Listing 7.10 shows some sample code fragments selected from two different generations of W95/Regswap that use different registers. Listing 7.10. Two Different Generations of W95/Regswap a.) 5A pop edx BF<strong>04000000</strong> mov edi,0004h <strong>8B</strong>F5 mov esi,ebp B8<strong>0C000000</strong> mov eax,000Ch <strong>81</strong>C2<strong>88000000</strong> add edx,0088h <strong>8B</strong>1A mov ebx,[edx] <strong>89</strong>9C86<strong>18110000</strong> mov [esi+eax*4+00001118],ebx b.) 58 pop eax BB<strong>04000000</strong> mov ebx,0004h <strong>8B</strong>D5 mov edx,ebp BF<strong>0C000000</strong> mov edi,000Ch <strong>81</strong>C0<strong>88000000</strong> add eax,0088h <strong>8B</strong>30 mov esi,[eax] <strong>89</strong>B4BA<strong>18110000</strong> mov [edx+edi*4+00001118],esi The bold areas show the common areas of the two code generations. Thus a wildcard string could be useful in detecting the virus. Moreover, support for half-byte wildcard (indicated with the ? mark) bytes such as 5? B? (as described by Frans Veldman) could lead to even more accurate detection. Using the 5?B? wildcard pattern we can detect snippets such as 5ABF, 58BB, and so on. Depending on the actual ability of the scanning engine, however, such a virus might need an algorithmic detection because of the missing support of wildcard search strings. If algorithmic detection is not supported as a single database update, the product update might not come out for several weeksor months for all platforms! Other virus writers tried to re-create older permutation techniques. For instance, the W32/Ghost virus has the capability to reorder its subroutines similarly to the BadBoy DOS virus family (see Figure 7.5). Badboy always starts in the entry point (EP) of the virus. Figure 7.5. The Badboy virus uses eight modules. The order of the subroutines will be different from generation to generation, which leads to different virus generations, where is the number of subroutines. BadBoy had eight subroutines, and 8!=40,320 different generations. W32/Ghost (discovered in May 2000) has 10 functions, so 10!=3,628,800 combinations. Both of them can be detected with search strings, but some scanners need to deal with such a virus algorithmically. Two different variants of the W95/Zmorph virus appeared in January of 2000. The polymorphic engine of the virus implements a build-and-execute code evolution. The virus rebuilds itself on the stack with push instructions. Blocks of code decrypt the virus instruction-by-instruction and push the decrypted instructions to the stack. The build routine of the virus is already metamorphic. The engine supports jump insertion and removal between any instructions of the build code. Regardless, code emulators can be used to deal easily with such viruses. A constant code area of the virus is useful for identification because the virus body is decrypted on the stack. 7.6.3. More Complex Metamorphic Viruses and Permutation Techniques The W32/Evol virus appeared in July of 2000. The virus implements a metamorphic engine and can run on any major Win32 platform. In Listing 7.11, section a. shows a sample code fragment, mutated in b. to a new form in a new generation of the same virus. Even the "magic" DWORD values (5500000Fh, 5151EC8Bh) are changed in subsequent generations of the virus, as shown in c. Therefore any wildcard strings based on these values will not detect anything above the third generation of the virus. W32/Evol's engine is capable of inserting garbage between core instructions. Listing 7.11. Different Generations of the W32/Evol Virus a. An early generation: C7060F000055 mov dword ptr [esi],5500000Fh C746048BEC5151 mov dword ptr [esi+0004],5151EC8Bh b. And one of its later generations: BF0F000055 mov edi,5500000Fh 893E mov [esi],edi 5F pop edi 52 push edx B640 mov dh,40 BA8BEC5151 mov edx,5151EC8Bh 53 push ebx 8BDA mov ebx,edx 895E04 mov [esi+0004],ebx c. And yet another generation with recalculated ("encrypted") "constant" data: BB0F000055 mov ebx,5500000Fh 891E mov [esi],ebx 5B pop ebx 51 push ecx B9CB00C05F mov ecx,5FC000CBh 81C1C0EB91F1 add ecx,F191EBC0h ; ecx=5151EC8Bh 894E04 mov [esi+0004],ecx  Variants of the W95/Zperm family appeared in June and September of 2000. The method used is known from the Ply DOS virus. The virus inserts jump instructions into its code. The jumps will be inserted to point to a new instruction of the virus. The virus body is built in a 64K buffer that is originally filled with zeros. The virus does not use any decryption. In fact, it will not regenerate a constant virus body anywhere. Instead, it creates new mutations by the removal and addition of jump instructions and garbage instructions. Thus there is no way to detect the virus with search strings in the files or in the memory. Most polymorphic viruses decrypt themselves to a single constant virus body in memory. Metamorphic viruses, however, do not. Therefore the detection of the virus code in memory needs to be algorithmic because the virus body does not become constant even there. Figure 7.6 explains the code structure changes of Zperm-like viruses. Figure 7.6. The Zperm virus. Sometimes the virus replaces instructions with other, equivalent instructions. For example, the instruction xor eax, eax (which sets the eax register to zero) will be replaced by sub eax, eax (which also zeroes the contents of the eax register). The opcode of these two instructions will be different. The core instruction set of the virus has the very same execution order; however, the jumps are inserted in random places. The B variant of the virus also uses garbage instruction insertion and removal such as nop (a do-nothing instruction). It is easy to see that the number of generations can be at least , where is the number of core set instructions in the virus body. Zperm introduced the real permutating engine (RPME). RPME is available for other virus writers to create new metamorphic viruses. We should note here that permutation is only a single item on the list of metamorphic techniques. To make the virus truly metamorphic, instruction opcode changes are introduced. Encryption can be used in combination with antiemulation and polymorphic techniques. In October 2000, two virus writers created a new permutation virus, W95/Bistro, based on the sources of the Zperm virus and the RPME. To further complicate the matter, the virus uses a random code block insertion engine. A randomly activated routine builds a do-nothing code block at the entry point of the virus body before any active virus instructions. When executed, the code block can generate millions of iterations to challenge a code emulator's speed. Simple permutating viruses and complex metamorphic viruses can be very different in their complexity of implementation. In any case, both permutating viruses and metamorphic viruses are different from traditional polymorphic techniques. In the case of polymorphic viruses, there is a particular moment when we can take a snapshot of the completely decrypted virus body, as illustrated Figure 7.7. Typically, antivirus software uses a generic decryption engine (based on code emulation) to abstract this process. It is not a requirement to have a complete snapshot to provide identification in a virus scanner, but it is essential to find a particular moment during the execution of virus code when a complete snapshot can be madeto classify a virus as a traditional polymorphic virus. It is efficient to have a partial result, as long as there is a long-enough decrypted area of each possible generation of the virus. Figure 7.7. Snapshot of a decrypted "polymorphic virus." On the contrary, a complex metamorphic virus does not provide this particular moment during its execution cycle. This is true even if the virus uses metamorphic techniques combined with traditional polymorphic techniques. 7.6.4. Mutating Other Applications: The Ultimate Virus Generator? Not only does W95/Bistro itself mutate in new generations, it also mutates the code of its host by a randomly executed code-morphing routine. In this way, the virus might generate new worms and viruses. Moreover, the virus cannot be perfectly repaired because the entry-point code area of the application could be different. The code sequence at the entry point of the host application is mutated for a range 480 bytes long. The next listing shows an original and a permutated code sequence of a typical entry point code: Original entry-point code: 55 push ebp 8BEC mov ebp, esp 8B7608 mov esi, dword ptr [ebp + 08] 85F6 test esi, esi 743B je 401045 8B7E0C mov edi, dword ptr [ebp + 0c] 09FF or edi, edi 7434 je 401045 31D2 xor edx, edx  Permutated entry-point code: 55 push ebp 54 push esp 5D pop ebp 8B7608 mov esi, dword ptr [ebp + 08] 09F6 or esi, esi 743B je 401045 8B7E0C mov edi, dword ptr [ebp + 0c] 85FF test edi, edi 7434 je 401045 28D2 sub edx, edx  Thus an instruction such as test esi, esi can be replaced by or esi, esi, its equivalent format. A push ebp; mov ebp, esp sequence (very common in high-level language applications) can be permutated to push ebp; push esp, pop ebp. It would certainly be more complicated to replace the code with different opcode sizes, but it would be possible to shorten longer forms of some of the complex instructions and include do-nothing code as a filler. This is a problem for all scanners. If a virus or a 32-bit worm were to implement a similar morphing technique, the problem could be major. New mutations of old viruses and worms would be morphed endlessly! Thus, a virtually endless number of not-yet-detectable viruses and worms would appear without any human intervention, leading to the ultimate virus generator. Note An even more advanced technique was developed in the W95/Zmist virus22, which is described in the following section. At the end of 1999, the W32/Smorph Trojan was developed. It implements a semimetamorphic technique to install a backdoor in the system. The standalone executable is completely regenerated during the installation of the Trojan. The PE header is re-created and will include new section names and section sizes. The actual code at the entry point is metamorphically generated. The code allocates memory and then decrypts its own resources, which contain a set of other executables. The Trojan uses API calls to its own import address table, which is filled with many nonessential API imports, as well as some essential ones. Thus everything in the standalone Trojan code will be different in new generations. 7.6.5. Advanced Metamorphic Viruses: Zmist During Virus Bulletin 2000, Dave Chess and Steve White of IBM demonstrated their research results on undetectable viruses. Shortly after, the Russian virus writer, Zombie, released his Total Zombification magazine, with a set of articles and viruses of his own. One of the articles in the magazine was titled "Undetectable Virus Technology." Zombie has already demonstrated his set of polymorphic and metamorphic virus-writing skills. His viruses have been distributed for years in source format, and other virus writers have modified them to create new variants. Certainly this is the case with Zombie's "masterpiece" creation, W95/Zmist. Many of us have not seen a virus approach this complexity for some time. We could easily call Zmist one of the most complex binary viruses ever written. W95/SK, One_Half, ACG, and a few other virus names popped into our minds for comparison. Zmist is a little bit of everything: It is an entry-point obscuring (EPO) virus that is metamorphic. Moreover, the virus randomly uses an additional polymorphic decryptor. The virus supports a unique new technique: code integration. The Mistfall engine contained in the virus is capable of decompiling PE files to their smallest elements, requiring 32MB of memory. Zmist will insert itself into the code; it moves code blocks out of the way, inserts itself, regenerates code and data references (including relocation information), and rebuilds the executable. This is something that has never been seen in any previous virus. Zmist occasionally inserts jump instructions after every single instruction of the code section, each of which points to the next instruction. Amazingly, these horribly modified applications will still run as before, just as the infected executables do, from generation to generation. In fact, we have not seen a single crash during test replications. Nobody expected this to worknot even the virus's author Zombie. Although it is not foolproof, it seems to be good enough for a virus. It takes some time for a human to find the virus in infected files. Because of this extreme camouflage, Zmist is easily the perfect antiheuristic virus. They say a good picture is worth a thousand words. The T-1000 model from the film Terminator 2 is the easiest analogy to use. Zmist integrates itself into the code section of the infected application as the T-1000 model could hide itself on the floor. 7.6.5.1 Initialization Zmist does not alter the entry point of the host. Instead, it merges with the existing code, becoming part of the instruction flow. However, the code's random location means that sometimes the virus will never receive control. If the virus does run, it will immediately launch the host as a separate process and hide the original process (if the RegisterServiceProcess() function is supported on the current platform) until the infection routine completes. Meanwhile, the virus will begin searching for files to infect. 7.6.5.2 Direct Action Infection After launching the host process, the virus will check whether there are at least 16MB of physical memory installed. The virus also checks that it is not running in console mode. If these checks pass, it will allocate several memory blocks (including a 32MB area for the Mistfall workspace), permutate the virus body, and begin a recursive search for PE files. This search will take place in the Windows directory and all subdirectories, the directories referred to by the PATH environment variable, and then all fixed or remote drives from A: to Z:. This is a rather brute-force approach to spreading. 7.6.5.3 Permutation The permutation is fairly slow because it is done only once per infection of a machine. It consists of instruction replacement, such as the reversing of branch conditions, register moves replaced by push/pop sequences, alternative opcode encoding, XOR/SUB and OR/TEST interchanging, and garbage instruction generation. It is the same engine, RPME, that is used in several viruses, including W95/Zperm, which also was written by Zombie. 7.6.5.4 Infection of Portable Executable Files A file is considered infectable if it meets the following conditions: • It is smaller than 448KB. • It begins with MZ. (Windows does not support ZM-format Windows applications.) • It is not infected already. (The infection marker is Z at offset 0x1C in the MZ header, a field generally not used by Windows applications.) • It is a PE file. The virus will read the entire file into memory and then choose one of three possible infection types. With a one-in-ten chance, only jump instructions will be inserted between every existing instruction (if the instruction was not a jump already), and the file will not be infected. There is also a one in ten chance that the file will be infected by an unencrypted copy of the virus. Otherwise, the file will be infected by a polymorphically encrypted copy of the virus. The infection process is protected by structured exception handling, which prevents crashes if errors occur. After the rebuilding of the executable, the original file is deleted, and the infected file is created in its place. However, if an error occurs during the file creation, the original file is lost, and nothing will replace it. The polymorphic decryptor consists of "islands" of code that are integrated into random locations throughout the host code section and linked by jumps. The decryptor integration is performed in the same way as for the virus body integrationexisting instructions are moved to either side, and a block of code is placed between them. The polymorphic decryptor uses absolute references to the data section, but the Mistfall engine will update the relocation information for these references, too. An antiheuristic trick is used for decrypting the virus code: Instead of making the section writeable to alter its code directly, the host is required to have, as one of the first three sections, a section containing writeable, initialized data. The virtual size of this section is increased by 32KB, large enough for the decrypted body and all variables used during decryption. This allows the virus to decrypt code directly into the data section and transfer control there. If such a section cannot be found, the virus will infect the file without using encryption. The decryptor receives control in one of four ways: • Via an absolute indirect call (0xFF 0x15) • Via a relative call (0xE8) • Via a relative jump (0xE9) • As part of the instruction flow itself If one of the first three methods is used, the transfer of control will appear soon after the entry point. In the case of the last method, though, an island of the decryptor is simply inserted into the middle of a subroutine somewhere in the code (including before the entry point). All used registers are preserved before decryption and restored afterward, so the original code will behave as before. Zombie calls this last method UEP, perhaps an acronym for "unknown entry point" because there is no direct pointer anywhere in the file to the decryptor. When encryption is used, the code is encrypted with ADD, SUB, or XOR with a random key, which is altered on each iteration by ADD/SUB/XOR with a second random key. Between the decryption instructions are various garbage instructions. They use a random number of registers and a random choice of loop instruction, all produced by the executable trash generator (ETG) engine, which was also written by Zombie. Randomness features heavily in this virus. 7.6.5.5 Code Integration The integration algorithm requires that the host have fixups to distinguish between offsets and constants. After infection, however, the fixup data is not required by the virus. Therefore, though it is tempting to look for a gap of about 20KB in the fixup area (which would suggest that the virus body is located there), it would be dangerous to rely on this during scanning. If another application (such as one of an increasing number of viruses) were to remove the fixup data, the infection would be hidden. The algorithm also requires that the name of each section in the host be one of the following: CODE, DATA, AUTO, BSS, TLS, .bss, .tls, .CRT, .INIT, .text, .data, .rsrc, .reloc, .idata, .rdata, .edata, .debug, or DGROUP. These section names are produced by the most common compilers and assemblers in use: those of Microsoft, Borland, and Watcom. The names are not visible in the virus code because the strings are encrypted. A block of memory is allocated that is equivalent to the size of the host memory image, and each section is loaded into this array at the section's relative virtual address. The location of every interesting virtual address is noted (import and export functions, resources, fixup destinations, and the entry point), and then the instruction parsing begins. This is used to rebuild the executable. When an instruction is inserted into the code, all following code and data references must be updated. Some of these references might be branch destinations, and in some cases the size of these branches will increase as a result of the modification. When this occurs, more code and data references must be updated, some of which might be branch destinations, and the cycle repeats. Fortunatelyat least from Zombie's point of view this regression is not infinite; although a significant number of changes might be required, the number is limited. The instruction parsing consists of identifying the type and length of each instruction. Flags are used to describe the types, such as instruction is an absolute offset requiring a fixup entry, instruction is a code reference, and so on. In some cases, an instruction cannot be resolved in an unambiguous manner to either code or data. In that case, Zmist will not infect the file. After the parsing stage is completed, the mutation engine is called, which inserts the jump instructions after every instruction or generates a decryptor and inserts the islands into the file. Then the file is rebuilt, the relocation information is updated, the offsets are recalculated, and the file checksum is restored. If overlay data is appended to the original file, then it is copied to the new file too. 7.6.6. {W32, Linux}/Simile: A Metamorphic Engine Across Systems W32/Simile is the latest "product" of the developments in metamorphic virus code. The virus was released in the 29A #6 issue in early March 2002. It was written by the virus writer who calls himself The Mental Driller. Some of his previous viruses, such as W95/Drill (which used the Tuareg polymorphic engine), were already very challenging to detect. W32/Simile moves yet another step in complexity. The source code of the virus is approximately 14,000 lines of Assembly code. About 90% of the virus code is spent on the metamorphic engine itself, which is extremely powerful. The virus's author has called it MetaPHOR, which stands for "metamorphic permutating high-obfuscating reassembler." The first-generation virus code is about 32KB, and there are three known variants of the virus in circulation. Samples of the variant initially released in the 29A issue have been received by certain AV companies from major corporations in Spain, suggesting a minor outbreak. W32/Simile is very obfuscated and challenging to understand. The virus attacks disassembling, debugging, and emulation techniques, as well as standard evaluating-based virus-analysis techniques. As with many other complex viruses, Simile uses EPO techniques. 7.6.6.1 Replication Routine Simile contains a fairly basic direct-action replication mechanism that attacks PE files on the local machine and the network. The emphasis is clearly on the metamorphic engine, which is unusually complex. 7.6.6.2 EPO Mechanism The virus searches for and replaces all possible patterns of certain call instructions (those that reference ExitProcess() API calls) to point to the beginning of the virus code. Thus the main entry point is not altered. The metamorphic virus body is sometimes placed, together with a polymorphic decryptor, into the same location of the file. In other cases, the polymorphic decryptor is placed at the end of the code section, and the virus body is not placed there but in another section. This is to further obfuscate the location of the virus body. 7.6.6.3 Polymorphic Decryptor During the execution of an infected program, when the instruction flow reaches one of the hooks that the virus places in the code section, control is transferred to a polymorphic decryptor responsible for decoding the virus body (or simply copying it directly, given that the virus body is intentionally not always encrypted). This decryptor, whose location in the file is variable, first allocates a large chunk of memory (about 3.5MB) and then proceeds to decipher the encrypted body into it. It does this in a most unusual manner: Rather than going linearly through the encrypted data, it processes it in a seemingly random order, thus avoiding the triggering of some decryption-loop recognition heuristics. This "pseudo-random index decryption" (as the virus writer calls it) relies on the use of a family of functions that have interesting arithmetic properties, modulo 2^n. Although the virus writer discovered this by trial and error, it is in fact possible to give a mathematical proof that his algorithm will work in all cases (provided the implementation is correct, of course). This proof was made by Frederic Perriot at Symantec. The size and appearance of the decryptor varies greatly from one virus sample to the next. To achieve this high variability, the virus writer simply generates a code template and then puts his metamorphic engine to work to transform the template into a working decryptor. Additionally, in some cases the decryptor might start with a header whose intent is not immediately obvious upon reading it. Further study reveals that its purpose is to generate antiemulation code on the fly. The virus constructs a small oligomorphic code snippet containing the instruction RDTSC (ReaD Time Stamp Counter), which retrieves the current value of an internal processor ticks counter. Then, based on one random bit of this value, the decryptor either decodes and executes the virus body or bypasses the decryption logic altogether and simply exits. Besides confusing emulators that would not support the somewhat peculiar RDTSC instruction (one of Mental Driller's favorites, already present in W95/Drill), this is also a very strong attack against all algorithms relying on emulation either to decrypt the virus body or to determine viral behavior heuristicallybecause it effectively causes some virus samples to cease infecting completely upon a random time condition. When the body of the virus is first executed, it retrieves the addresses of 20 APIs that it requires for replication and for displaying the payload. Then it will check the system date to see whether either of its payloads should activate. Both payloads require that the host import functions from USER32.DLL. If this is the case, the virus checks whether or not it should call the payload routine (which is explained further on). 7.6.6.4 Metamorphism After the payload check has been completed, a new virus body is generated. This code generation is done in several steps: 1. Disassemble the viral code into an intermediate form, which is independent of the CPU on which the native code executes. This allows for future extensions, such as producing code for different operating systems or even different CPUs. 2. Shrink the intermediate form by removing redundant and unused instructions. These instructions were added by earlier replications to interfere with disassembly by virus researchers. 3. Permutate the intermediate form, such as reordering subroutines or separating blocks of code and linking them with jump instructions. 4. Expand the code by adding redundant and unused instructions. 5. Reassemble the intermediate form into a final, native form that will be added to infected files. Not only can Simile expand as most first-generation metamorphic viruses did, it can also shrink (and shrink to different forms). Simile.D is capable of translating itself to different metamorphic forms (V1, V2...Vn) and does so on more than one operating system (O1,O2...On). So far, the virus does not use multiple CPU forms, but it could also introduce code translation and metamorphism for different processors (P1..Pn) in the future, as shown in Figure 7.8. Figure 7.8. Stages of Simile from Linux to Windows and from one generation to another. 7.6.6.5 Replication The replication phase begins next. It begins by searching for *.exe in the current directory and then on all fixed and mapped network drives. The infection will scan recursively into directories, but only to a depth of three subdirectories, avoiding completely any directory that begins with the letter W. For each file that is found, there is a 50% chance that it will be skipped. Additionally, files will be skipped if they begin with F-, PA, SC, DR, or NO or contain the letter V anywhere in the name. Due to the nature of the comparison, other character combinations are unintentionally skipped, such as any directory that begins with the number 7, any file that begins with FM, or any file that contains the number 6 anywhere in the name. The file-infection routine contains many checks to filter files that cannot be infected safely. These checks include such conditions as the file must contain a checksum, it must be an executable for the Intel 386+ platform, and there must exist sections whose names are .text or CODE and .data or DATA. The virus also checks that the host imports some kernel functions, such as ExitProcess. For any file that is considered infectable, random factors and the file structure will determine where the virus places its decryptor and virus body. If the file contains no relocations the virus body will be appended to the last section in the file. (Apparently, this can happen randomly with a small chance as well, regardless if there are any relocations in the file.) In this case, the decryptor will be placed either immediately before the virus body or at the end of the code section. Otherwise, if the name of the last section is .reloc, the virus will insert itself at the beginning of the data section, move all of the following data, and update all of the offsets in the file. 7.6.6.6 Payload The first payload activates only during March, June, September, or December. Variants A and B of W32/Simile display their messages on the 17th of these months. Variant C will display its message on the 18th of these months. Variant A will display "Metaphor v1 by The Mental Driller/29A," and variant B will display "Metaphor 1b by The Mental Driller/29A." Variant C intends to display "Deutsche Telekom by Energy 2002 **g**"; however, the author of that variant had little understanding of the code, and the message rarely appears correctly. In all cases, the cases of the letters are mixed randomly, as shown in Figure 7.9. Figure 7.9. The "metamorphic" activation routines of the Simile virus. a.) Variant A b.) Variant B c.) The "Unofficial" Variant C d.) The .D variant (which was the "official" C of the original author) The second payload activates on the 14th of May in variants A and B, and on the 14th of July in variant C. Variants A and B will display "Free Palestine!" on computers that use the Hebrew locale. Variant C contains the text "Heavy Good Code!" but due to a bug this is displayed only on systems on which the locale cannot be determined. The first W32/Linux cross-infector, {W32,Linux}/Peelf, uses two separate routines to carry out the infection on PE and ELF files. Simile.D shares a substantial amount of code between the two infection functions, such as the polymorphic and metamorphic engines. The only platform-specific part in the infection routine is the directory traversal code and the API usage. The virus was confirmed to infect successfully under versions 6.2, 7.0, and 7.2 of Red Hat Linux, and it very likely works on most other common Linux distributions. Infected files will grow by about 110KB on average, but the size increase is variable due to the shrinking and expansion capability of the metamorphic engine and to the insertion method. When the .D variant activates on Linux, it simply prints a console message, as shown in Figure 7.10. Figure 7.10. The payload of the Simile.D virus on Linux. 7.6.7. The Dark FutureMSIL Metamorphic Viruses As discussed in this chapter, some new MSIL viruses, such as MSIL/Gastropod, already support semi-metamorphic (permutating) code generation under the .NET Framework. Such viruses have a great advantage because they do not need to carry their own source code. Instead, viruses can simply decompile themselves to generate new binaries, for example, by using the System.Reflection.Emit namespace. Listing 7.12 illustrates the MSIL metamorphic engine of the MSIL/Gastropod virus in two generations. Listing 7.12. Different Generations of the MSIL/Gastropod Virus a.) .method private static hidebysig specialname void .cctor() { ldstr "[ .NET.Snail - sample CLR virus (c) whale 2004 ]" stsfld class System.String Ylojnc.lgxmAxA::WaclNvK nop ldc.i4.6 ldc.i4.s 0xF call int32 [mscorlib]System.Environment::get_TickCount() nop newobj void nljvKpqb::.ctor(int32 len1, int32 len2, int32 seed) stsfld class nljvKpqb Ylojnc.lgxmAxA::XxnArefPizsour call int32 [mscorlib]System.Environment::get_TickCount() nop newobj void [mscorlib]System.Random::.ctor(int32) stsfld class [mscorlib]System.Random Ylojnc.lgxmAxA::aajqebjtoBxjf ret } b.) .method private static hidebysig specialname void .cctor() { ldstr "[ .NET.Snail - sample CLR virus (c) whale 2004 ]" stsfld class System.String kivAklozuas.ghqrRrlv::ngnMTzqo ldc.i4.6 ldc.i4.s 0xF call int32 [mscorlib]System.Environment::get_TickCount() newobj void xiWtNaocl::.ctor(int32 len1, int32 len2, int32 seed) stsfld class xiWtNaocl kivAklozuas.ghqrRrlv::yXuzlmssjjp call int32 [mscorlib]System.Environment::get_TickCount() newobj void [mscorlib]System.Random::.ctor(int32) stsfld // line continues on next line class [mscorlib]System.Random kivAklozuas.ghqrRrlv::kaokaufdiehjs nop ret }  The extracted snippets represent the class constructor (".cctor") function of MSIL/Gastropod in two different generations. The semi-metamorphic engine of the virus obfuscates class names and method names23. In addition, the permutation engine also inserts and removes junk instructions (such as nop) into its body. Indeed, it is not well known to MSIL developers that they can invoke a compiler and generate code and assembly from a running application, but virus writers already use this feature. MSIL/Gastropod is a code-builder virus: It reconstructs its own host program with itself. This method allows the virus body to be placed to an unpredictable position within the host program. The main entry-point method of the host program is replaced by the entry-point method of the virus. When the infected host is executed, the virus code is invoked, which will run the original main entry-point method eventually. In addition, some viruses do not rely on the use of .NET Framework namespace to implement parasitic MSIL infection. For example, the MSIL/Impanate virus, written by roy g. biv, is fully aware of both 32-bit and 64-bit MSIL files and infects them using the EPO (Entry-Point Obscuring) technique. Thus next generation MSIL metamorphic viruses might not rely on the use of System.Reflection.Emit and similar namespaces anymore. 7.7. Virus Construction Kits Virus writers continuously try to simplify the creation of virus code. Because most viruses were written in Assembly language, writing them remained out of reach for many kids. This inspired virus writers to create generators that can be used by just about anyone who can use a computer. Virus construction kits evolved over the years as viruses targeted new platforms. Construction kits and virus mutators were built to generate a wide variety of malicious code, including DOS COM and EXE viruses; 16-bit Windows executable viruses; BATCH and Visual Basic Script viruses; Word, PowerPoint, and Excel viruses; mIRC worms; and so on. Recently, even PE virus code can be generated by such kits. Virus construction kits are a major concern of antivirus producers. It is impossible to predict whether or not virus writers will use a particular kit. Thus the virus researchers need to spend time with even the most primitive kits to see what kinds of viruses they can produce and to design detection and repair capabilities against all possible cases. To add to this problem, many kits generate source code instead of binary. Novice attackers can change the source code further, beyond the capabilities of the kit itself, so it is not always possible to develop perfectly adequate protection. To make the job of antivirus scanners more difficult, kits deploy armoring techniques such as encryption, antidebugging, antiemulation, and anti-behavior blocking techniques. Furthermore, some kits can mutate the code of the viruses similarly to metamorphic techniques. 7.7.1. VCS (Virus Construction Set) The first virus generator was VCS, written in 1990. The authors of the kit were members of the Verband Deutscher Virenliebhaber (the Association of German Virus Lovers). VCS was a rather simple kit. All viruses that the kit can generate are 1,077 bytes and saved to VIRUS.COM on the disk. The user's only options are to specify the name of a text file with a message and to set a generation number to display that message. The VCS viruses can only infect DOS COM files. The payload of the virus is to kill AUTOEXEC.BAT and CONFIG.SYS files and display the user's message. VCS viruses are simple but encrypted. The only remarkable feature of VCS viruses is that they can check whether the FluShot behavior blocker is loaded in memory and avoid infecting if it is. 7.7.2. GenVir In 1990-1991, a kit called GenVir was released as a shareware in France by J. Struss. The original intention was to "test" antivirus product capabilities with a tool that could generate newly replicating viruses. Very few viruses were ever created with GenVir. In 1993, a new version of GenVir was released to support newer versions of DOS. 7.7.3. VCL (Virus Creation Laboratory) In 1992, the VCL virus construction kit was written in the U.S. (see Figure 7.11). The author of this kit was a member of the NuKE virus-writer team who called himself Nowhere Man. Figure 7.11. The GUI of the Virus Creation Laboratory. VCL looked rather advanced at the time because it supported a Borland C++ 3.0based IDE with mouse support. Instead of creating binary files, VCL creates Assembly source code of viruses. These sources need to be complied and linked afterwards by the attacker to make them live viruses. VCL supports a large selection of payloads, as well as encryption and various infection strategies. Not surprisingly, not all viruses that VCL generates are functional. Nevertheless, the viruses look rather different because of the generator options, and thus their detection is far less straightforward than VCS. Several VCL viruses managed to become widespread. For the first time, VCL indicated that even beginner attackers could manage to cause problems in the field. 7.7.4. PS-MPC (Phalcon-Skism Mass-Produced Code Generator) Because virus-writing groups are in competition, we did not have to wait long before another group announced its kit. PS-MPC was created in 1992 in the United States, by the virus writer, Dark Angel. PS-MPC does not have the fancy user interface of VCL, which actually makes it a much more dangerous kit. Because PS-MPC is script-driven, it can be much simpler to generate hundreds of copycat viruses. PS-MPC is also a source generator, just like VCL. However, the PS-MPC viruses are much more functional than VCL's. Not surprisingly, virus writers utilized this particular kit to generate more than 15,000 PS-MPC variants and upload them to several antivirus companies' FTP sites. Although the initial versions of PS-MPC could only create direct-action viruses, other versions were released that could create memory resident viruses as well. Furthermore, later versions support infections of EXE files also. PS-MPC is one of the reasons I decided to introduce kits in this chapter. The creator of PS-MPC realized that to be successful, generators must create different-looking viruses. To achieve this, PS-MPC uses a generator that effectively works as a code-morphing engine. As a result, the viruses that PS-MPC generates are not polymorphic, but their decryption routines and structures change in variants. PS-MPC was quickly followed by G2, a second-generation kit. G2 adds antidebug/antiemulation features to PS-MPC, with even more decryptor morphing. 7.7.5. NGVCK (Next Generation Virus Creation Kit) NGVCK was introduced in 2001 by the virus writer, SnakeByte. NGVCK (see Figure 7.12) is a Win32 application written in Visual Basic. This kit is very similar to VCL in many ways, but it can generate 32-bit viruses that infect PE files. There are well over 30 versions of NGVCK alone. Figure 7.12. The main menu of NGVCK. NGVCK has a rather advanced Assembly source-morphing engine. The viruses that it creates have random function ordering, junk code insertion, and user-defined encryption methods. It also attacks the SoftICE debugger on Windows 95 systems. Later versions of the kit also have support to infect files using different techniques. The viruses that NGVCK creates are automatically morphed, so every single time somebody uses the kit, a new minor variant of the virus is generated. The code generation uses principles described in Chapter 6, "Basic Self-Protection Strategies." Unfortunately, it is much simpler to develop source morphing than real code-morphing engines, so such techniques are likely to be widely adopted by the attackers in the future. 7.7.6. Other Kits and Mutators Several other copycat generators have been used by amateur virus writers. Amateurs have many choiceswell over 150 kits and code mutators are available, and many of them have been used to produce working viruses. In 1996 such tools became extremely popular. In particular, we have received new viruses from English schools that were generated by the IVP (Instant Virus Production Kit). IVP was written by Admiral Bailey of the YAM (Youngsters Against McAfee) group in Turbo Pascal. It supports EXE and COM infection, as well as encryption and code mutationand kids loved it because of its Trojan payloads. One of the most infamous examples of a virus generatorbased attack was known to the general public as the "Anna Kornikova" virus outbreak. This worm was created by the VBSWG kit, and the kit's user, a 20-year-old Dutch man, admittedly did not know how to write a program. Nevertheless, the mass-mailing ability of the VBS worm, combined with social engineering, worked effectively. Many users were eager to see a recent picture of Anna Kornikovabut executed the script instead. Table 7.1 lists some other common generators. Table 7.1. Examples of Virus Generator Kits Name of Generator KitDescription NRLG (NuKE's Randomic Life Generator)Released in 1994 by the virus writer Azrael. Very similar to VCL. OMVCK (Odysseus Macro Virus Construction Kit)This kit was released in early 1998. It can generate Word Basic macro-virus source code. SSIWG (Senna Spy Internet Worm Generator)Released in 2000 in Brazil. This generator supports the creation of VBS worms. NEG (NoMercy Excel Generator)This was the first Excel macro virus generator kit (1998). It creates .bas files. VBSWG (VBS Worm Generator)Released in 2000 by [K]Alamar. Generates various script worms. AMG (Access Macro Generator)Created in 1998 by the virus writer Ultras to generate Access97 macro viruses. DREG (Digital Hackers' Alliance Randomized Encryption Generator)Released in 1997 by Gothmog. Supports advanced code morphing and antiheuristics. Evidently, in the future we can expect the tendency of virus construction kits to continue toward networked viruses, worms in particular. There are already a few kits that can be driven via a CGI script over a Web interface. This allows the attacker to avoid releasing the complete code of the virus generator, allowing fewer opportunities for antivirus vendors to test the capabilities of such tools. 7.7.7. How to Test a Virus Construction Tool? The use of virus construction kits is not purely a technical question; it is also an ethical question. As Alan Solomon has argued, it is ethical for the antivirus researcher to use kits as long as the samples are stored on a dirty PC only and destroyed as soon as the researcher has developed a detection solution for them. Thus the generated samples shall not be stored anywhere, not even for further testing. This method is the only approach widely accepted as ethical among computer virus researchers. References 1. Fridrik Skulason, "Latest Trends in Polymorphism The Evolution of Polymorphic Computer Viruses," Virus Bulletin Conference, 1995, pp.I-VII. 2. Peter Szor and Peter Ferrie, "Hunting for Metamorphic," Virus Bulletin Conference, September 2001, pp. 123-144. 3. Tim Waits, "Virus Construction Kits," Virus Bulletin Conference, 1993, pp. 111-118. 4. Fridrik Skulason, "Virus Encryption Techniques," Virus Bulletin, November 1990, pp. 13-16. 5. Peter Szor, "F-HARE," Documentation by Sarah Gordon, 1996. 6. Eugene Kaspersky, "Picturing Harrier," Virus Bulletin, September 1999, pp. 8-9. 7. Peter Szor, Peter Ferrie, and Frederic Perriot, "Striking Similarities," Virus Bulletin, May 2002, pp. 4-6. 8. X. Lai, J. L. Massey, "A Proposal for New Block Encryption Standard," Advances in Cryptology Eurocrypt'90, 1991. 9. Peter Szor, "Bad IDEA," Virus Bulletin, April 1998, pp. 18-19. 10. Peter Szor, "Tricky Relocations," Virus Bulletin, April 2001, page 8. 11. Dmitry Gryaznov, "Analyzing the Cheeba Virus," EICAR Conference, 1992, pp. 124-136. 12. Dr. Vesselin Bontchev, "Cryptographic and Cryptanalytic Methods Used in Computer Viruses and Anti-Virus Software," RSA Conference, 2004. 13. James Riordan and Bruce Schneider, "Environmental Key Generation Towards Clueless Agents," Mobile Agents and Security, Springer-Verlag, 1998, pp. 15-24. 14. Makoto Matsumoto and Takuji Nishimura, "Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudorandom Number Generator," ACM Transactions on Modeling and Computer Simulations: Special Issue on Uniform Random Number Generator, 1998, http://www.math.keio.ac.jp/~nisimura/random/doc/mt.pdf. 15. Peter Szor, "The Road to MtE: Polymorphic Viruses," Chip, June 1993, pp. 57-59. 16. Fridrik Skulason, "1260The Variable Virus," Virus Bulletin, March 1990, page 12. 17. Vesselin Bontchev, "MtE Detection Test," Virus News International, January 1993, pp. 26-34. 18. Peter Szor, "The Marburg Situation," Virus Bulletin, November 1998, pp. 8-10. 19. Dr. Igor Muttik, "Silicon Implants," Virus Bulletin, May 1997, pp. 8-10. 20. Vesselin Bontchev and Katrin Tocheva, "Macro and Script Virus Polymorphism," Virus Bulletin Conference, 2002, pp. 406-438. 21. "Shape Shifters," Scientific American, May 2001, pp. 20-21. 22. Peter Szor and Peter Ferrie, "Zmist Opportunities," Virus Bulletin, March 2001, pp. 6-7. 23. Peter Ferrie, personal communication, 2004. Chapter 8. Classification According to Payload "I think computer viruses should count as life. I think it says something about human nature that the only form of life we have created so far is purely destructive. We've created life in our own image." Stephen Hawking This chapter describes the most common computer-virus activation methods. Computer viruses can use an endless number of events to trigger their activation routines. Common examples of such events include, but are not limited to, the following basic methods: • System date or time • A particular file's file date stamp • A particular filename • The default language setting of the system • The name of a visited Web site • The system's IP address • The operating system used • A particular vulnerability that only exists in a certain version of an operating system, such as Russian Windows versions. Even targeted virus attacks are now feasible, although general control over virus code replication remains difficultbut not impossible. 8.1. No-Payload In the layman's world, often everything unusual that happens to a computer is believed to be caused by a "mysterious virus." In fact, however, most computer-related problems are caused by anything but a computer virus. As a result, IT departments tend to be less and less paranoid, and when real infections explode, they are initially skeptical and waste time. This is the proverbial "crying wolf" situation. Another problem is that people believe that for something to be classified as a "computer virus," it needs to destroy user data, such as reformatting the hard disk. People often do not understand why someone would write a program that "only replicates." In fact, the majority of computer viruses do nothing but replicate. Many proof-of-concept viruses belong to this class, such as the WM/Concept virus. Such viruses might carry a message that is never displayed and is usually left for people who are expected to discover the virus, such as virus researchers. The most boring viruses do not contain any text other than replication code. Virus replication, however, has many side effects. This includes the possibility of accidental data loss when the machine crashes due to a bug in the virus code or accidental overwriting of a part of the disk with relevant data. Virus researchers call this kind of virus a no payload virus. However, there is no such thing as a harmless virus. By itself, the replication of the virus can be extremely annoying to the user. I have never met more than a few users who have said to me, "Oh, no problem about these three viruses. They just infect files. I can live with them on my system." Such thinking seems to be very unusual. Most people feel very stressed by computer virus infections for fear of data loss, among other things. Removal of the virus code can be very costly. For example, when a large software or hardware manufacturing company gets hit by a computer virus, the production of new systems must be stopped, causing millions of dollars of damage during every nonproductive day. 8.2. Accidentally Destructive Payload Some viruses, such as Stoned, can cause data loss as a result of replication. When the virus saves the original boot sector, it might overwrite important data. For instance, if there are many directory entries on the diskette, Stoned will overwrite some of these entries because it stores the original boot sector at the end of the root directory. As a result, garbage is displayed on the screen instead of filenames when the diskette is looked at with the DIR command. The file can be restored from the diskettes with a disk editor at that point, but most users will experience this as permanent data loss. 8.3. Nondestructive Payload Almost half of all computer viruses belong to this class. Many computer viruses simply display a message on the screen when they activate. Several such examples are given in previous chapters of this book. Virus writers and malicious code authors are often politically motivated. The WANK1 worm was a typical example of this. This worm was released on the SPAN network on October 16, 1989. The worm replaced the system banner, displaying the message shown in Figure 8.1 when a user logged in on a DEC system. Figure 8.1. Message released from worm. Other computer viruses, such as W95/Marburg, have graphical payloads. When Marburg activates, the virus loads the standard IDI_HAND (0x7F01) icon resource, which is used in case of serious error messages, and puts it on the desktop. Finally, it draws up to 256 icons at random positions on the desktop. (See Figure 8.2.) Figure 8.2. The activation routine of the W95/Marburg virus. Windows 95 will slowly redraw the desktop area when new windows are moved, causing Marburg's icons to disappear; however, the virus will draw new icons all over again. Marburg is less annoying than old DOS viruses such as Cascade, which caused the characters to "fall down" to the bottom of the screen in a cascading effect, with some little noise in the background using the PC's speaker. Other computer viruses have built-in animations that are displayed when triggered. The Hungarian DOS virus, GF6mb (HH&H), displays an impressive 3D bouncing ball. Probably one of the most infamous people in this category is the French virus writer, Spanska, author of the IDEA virus, which displays several animations, including the one shown in Figure 8.3. All viruses written by Spanska belong to the nondestructive payload category. Figure 8.3. The activation routine of Spanska's IDEA virus. Spanska's most infamous animation is displayed by W32/SKA (also known as the Happy99 worm and discussed in Chapters 3, "Malicious Code Environments" and 9, "Strategies of Computer Worms"). Some viruses are even interactive and play a game with the user. The Playgame virus is an example2. Of course, viruses are not restricted to displaying animations and messages on the screen. They can use the speaker to play music or even to attempt to speak. On modern computers, playing MP3 or WAV files is no problem, and viruses take advantage of this. Some annoying viruses (and even backdoor Trojan programs) open and close the CD-ROM's door repeatedly, turning a computer lab into a "ghost house." Finally, some computer worms even write poetry, such as the W95/Haiku3 virus, written by the Spanish virus writer, Sandman. (See Figure 8.4.) Figure 8.4. The activation routine of the W95/Haiku virus. W95/Haiku also connects to 206.132.185.167 (www.xoom.com at the time of the virus attack) and uses the GET command to download a Windows WAV file (/haiku_wav/Haiku.wav). It saves this Windows sound file as c:\haiku.wav and plays it. Haiku demonstrates that the actual replicating code does not necessarily need to carry the payload within itself, making the virus much shorter. 8.4. Somewhat Destructive Payload Other viruses are somewhat destructive. For instance, the W95/HPS virus checks the date during initialization time only, and it will activate only on a Saturday. If a noncompressed Windows bitmap file has been opened, the virus flips the picture horizontally, as shown in Figure 8.5. Figure 8.5. The activation routine of the W95/HPS virus. HPS marks these flipped images by patching the ID - DEADBABEh to the end of the bitmap header area to avoid flipping the same image again. Thus, the virus never restores these images. This is somewhat more destructive than the activation routine of the DOS virus, Flip, which flips the characters onscreen but does so only temporarily. Because noncompressed bitmap files are used frequently by Windows, HPS can cause all kinds of weird effectsyou need to look at those flipped images in a mirror to make sense of them. There are some viruses that attempt to target a single executable (most likely an antivirus program). For example, the AntiEXE virus carries a detection string in itself to detect and destroy any executable that contains the string. Other applications are never affected. AntiEXE is most likely a retro virus of an earlier kind. Retro viruses all belong to the mildly destructive class. They attack the antivirus and other security software, such as personal firewall programs, by killing them in memory and deleting them from the disk. In some cases, retro viruses simply send a "Windows shutdown" message to a selected program, forcing the application to think it needs to exit before Windows shuts down. Of course, Windows will not shut down, but the protection unloads, allowing the execution of any other known attacks that would have been prevented by the installed protection. Another example of a mildly damaging virus is WM/Wazzu.A. This macro virus was extremely common in 1996 due to the fact that even Microsoft's Web site and a couple of CDs issued by Microsoft were infected with it. Wazzu randomly scrambles three words in documents and inserts the word wazzu into sentences. Morton Swimmer of IBM Research published an entertaining paper on playing the "Where is Wazzu?" game on the Internet using search engines. Because this virus was extremely widespread, many companies published documents on their Web sites that had been mildly damaged by Wazzu. 8.5. Highly Destructive Payload The most dangerous type of computer viruses will intentionally harm your data or even the hardware in your system. Examples of this type are discussed in this section. 8.5.1. Viruses That Overwrite Data Many extremely harmful computer viruses simply format the hard drive or overwrite the disk. The Michelangelo virus was among the most infamous of this kind, and the media overreacted to this news. Michelangelo did not attempt to overwrite the entire disk, only parts of the partition (the first 256 cylinders) that system had been booted from. For this reason, it was often feasible to recover data from attacked systems easily. Over the years, most computer virus researchers have become data recovery experts because of the various kinds of corruption such viruses can cause to a system. Some viruses simply killed the Master Boot Record (MBR) of the system to make them unbootable. This effect was very simple to fix, and Norton Disk Doctor did a particularly good job on its own in many cases. Many data recovery companies charged data restoration by the amount of KBs restored from the hard disk. While it was certainly worthwhile to pay this fee in the case of a real hardware error, it was not cheap to pay for a virus-damaged disk with a corrupted MBR (a single sector). The Hungarian Filler's damage routine is worth mentioning here. Not only did the virus delete the content of the FAT sectors on DOS, it also filled those sectors with 01 (☺) characters, drawing eight similar, large smiley faces in each FAT sector. Thus when you checked to see what happened to the corrupted disk using a disk editor such as Norton DiskEdit, you would find drawings similar to the one shown in Figure 8.6. Figure 8.6. The graffito of the Filler virus in a FAT sector. The activation routine of the virus was encrypted in the virus body and has never been documented in international virus descriptions because of this. Instead, the original Hungarian name of the virus, Töltögetö, was translated into English as Filler. Some other viruses simply delete files. For example, the AntiPascal family targets Pascal programs and deletes them from the disk. Other variants of the AntiPascal virus create temporary files with hidden, system, read-only flags to fill the disk with data and make the system unbootable by manipulating the master boot record. Unfortunately, some of the successful computer worms such as W32/Witty also used this method to quickly corrupt the content of disks on the attacked hosts within minutes. 8.5.2. Data Diddlers The term data diddler was coined by Yisrael Radai. Fred Cohen also used the term to describe viruses that do not destroy data all of a sudden in a very evident form. Instead, data diddlers slowly manipulate the data, such as the content of the hard disk. This kind of corruption is very dangerous because it can easily end up in backups before anyone notices it. Because backups are often reused, all possible data might be corrupt when the virus attack is finally noticed. Among the first viruses that caused major damage was Dark_Avenger.1800.A, commonly referred to by the nickname Eddie. The virus was so dangerous that the first company I visited in Hungary as a young researcher to collect the virus sample did not even want to talk about it anymore. They simply nailed the remaining infected diskette to the wall as a bad memory of a computer virus attack. Apparently, Eddie got its name from a text that is not only stored in the virus but is randomly written all around on the hard disk, except on the FAT, causing the slow death of the system. Thus the virus intentionally avoids destroying the FAT so the system will die more slowly and the virus can happily replicate to new systems during that time. The virus wrote, "Eddie lives...somewhere in time," on randomly selected sectors of the hard disk. This text could be detected later, but too many files (including databases) would contain it by the time the virus was finally discovered. Another good example of a data diddler virus is Ripper (which gets its name from Jack the Ripper). Ripper exchanges two data word values in a randomly selected sector and writes back the result to the disk. This is a serious corruption! Furthermore, we have noticed that in some rare cases, such a data-diddler effect can be responsible for minor mutations of computer viruses on Ripper-infected systems. In most cases, a binary virus will be destroyed with this kind of random data manipulation, but in theory such a corruption can lead to a working variant, a tiny mutation that might not be detected by antivirus products using the same definition as they used to detect the original variant. The Russian virus, WordSwap, uses a similar approach and swaps two textual words when files are written to disk. 8.5.3. Viruses That Encrypt Data: The "Good," the Bad, and the Ugly The Disk Killer virus was among the firsts to utilized encryption to attack your data. Disk Killer is a boot virus and was first reported in the United States in June of 1989. The virus spread wildly in Europe a few months later. In about 48 hours after the infected system is booted, the virus calls its payload, which displays a message and scrambles the content of the hard disk using a simple XOR encryption, starting at the partition table. As a result, the virus makes the system unbootable. This is what an attacked system would see on screen: Disk Killer Version 1.00 by COMPUTER OGRE 04/01/1989 Warning !! Don't turn off the power or remove the diskette while Disk Killer is Processing! PROCESSING After the virus has finished running the encryption it displays this message: Now you can turn off the power. I wish you luck ! The encryption in the virus was weak, and therefore restoration of the disk was possible using special decryption tools4. The routine, however, contained some minor errors in the cipher code, which made recovery impossible in some cases. Another boot virus, KOH, was developed in 1993. It uses the IDEA cipher to encrypt the disk and asks the user to type in a password. Although KOH encrypts the disk similarly to Disk Killer, its goal is not to keep users' data hostage or to damage it, but to protect users' data. This is exactly why this virus generated attention and has been discussed as a so-called "good virus." Mark Ludwig popularized the idea in his book in 19955. Unfortunately, he also made this virus and many others available. Not surprisingly, dozens of variants of the KOH virus exist today6. Other viruses, such as the Slovakian One_Half that appeared in 1994, use a different technique. One_Half slowly encrypts the disk with a simple encryption method. As long as the virus is active in memory, the disk is accessible to the user because the virus decrypts the encrypted sectors when they are accessed. Finally, after the virus has encrypted half of the disk, it displays the following message: Dis is one half. Press any key to continue... One_Half's disk encryption is a kind of forced symbiosis between the system and the virus. The virus author did not want his creation to be easily removed from the disk. If someone attempts an incompetent manual repair, such as replacing the infected MBR with a clean one, the disk will remain encrypted, and the user will most likely lose his or her data. Several antivirus vendors argued that repairing virus code is sufficient, and such data corruptions should not be the focus of antivirus software. Users saw things differently, however, and antivirus software followed the practice of SAC's (Slovakian Antivirus Center's) repair tool, which decrypted the content of the encrypted sections of the disk and removed the virus from the MBR and the files at the same time. Similar attacks were also seen on Win32, although with less success. Inspired by the One_Half virus, in December of 1999, the W32/Crypto virus attempted to encrypt the content of the DLLs on the system using the Microsoft Crypto API and strong encryption. This virus, however, contained too many Windows version dependencies and other minor bugs, which caused unwanted system crashes. Of course, Trojan horse programs such as the AIDS Information Diskette in 1989 already had been used to scramble users' data (as discussed in Chapter 2, "The Fascination of Malicious Code Analysis"), in an attempt to collect a bounty from the users of compromised systems. It was expected that similar attacks would be executed by using an asymmetric cipher. This attack was discussed in a book recently as a "crypto" virus. In fact, the authors of the book claim that they had created such a virus on the Mac in 1996, but "it was a top priority" for them not to release the virus7. Such crypto viruses use the public key of the attacker to encrypt your data without giving you a chance to decrypt it. Indeed, you would need to pay the bounty to restore your data. In contrast, a simple symmetric cipher easily allows the recovery of the data on the attacked systems because the encryption algorithm (and typically the keys themselves) are stored in the virus, and thus, they can be extracted and used directly to decrypt the data without involving additional secret information held by the attacker. 8.5.4. Hardware Destroyers Depending on the chipset of the computer and its actual vendor, its Flash BIOS can be updated using software. Nowadays, most PCs use Flash BIOS to update code quickly that would normally be burnt into chips with no chance of a fix. In the early '90s, virus researchers predicted that Flash BIOS would be attacked by computer viruses. The infamous Taiwanese virus, W95/CIH, successfully killed at least an estimated 10,000 PCs in 1998 by overwriting the boot strap portion of Flash BIOS code. The virus used I/O port commands in kernel mode to access the Flash BIOS. Such port commands could also be executed in user mode, but doing so would have allowed easy protection against the virus's activation routine, and the author of the virus obviously wanted to avoid that. For instance, a kernel-mode driver could have been loaded on the system to hook the I/O ports that the virus used as an "Open Sesame!" sequence to access the Flash BIOS for write. A VxD could easily capture the request of user-mode code in this way, protecting the machine against such an attack. However, CIH executes the I/O port commands in kernel mode, which another VxD cannot capture and block. Obviously, a more challenging attack would have been Flash BIOS infection. In fact, such a proof-of-concept virus exists, written by the virus writer Qark in 1994. Fortunately, however, such a virus is fairly BIOS codespecific and hard to implement in a generic way that would support several systems. Simpler activation routines involve setting a random CMOS (complimentary metal oxide semiconductor) stored boot-up password for the PC. Viruses such as the AntiCMOS family use this trick. The virus takes advantage of the boot protection password, which the user might not be able to change without first uncovering the machine and draining the CMOS battery or, alternatively, clear the content of the CMOS using a jumper setting on newer motherboards. 8.6. DoS (Denial of Service) Attacks In the past, researchers commonly believed that targeted attacks against selected computers or organizations were not possible using computer viruses. But modern operating systems in the networked world give attackers with political motivations the potential to carry out successful attacks against targeted businesses, such as financial institutions. There have been several successful DoS attacks in the past, some caused by computer worms. Most of these attacks were not targeted against a particular organization. However, computer worms flood the network with data to such an extent that the side effects of propagation develop into a DoS attack. The W32/Slammer worm was an example of such an accidental DoS attack. The worm was small, and it could propagate itself aggressively on the network. During the outbreak, Internet devices such as routers were heavily overloaded. The result was very unhealthy Internet communications, with as high as 90% packet loss between particular locations. It was difficult to use e-mail during the attack because the network was so slow all around the world. In addition, the speed with which W32/Slammer put packets on the network resulted in ATM failures, canceled airline flights, and election interference8. On August 14, 2003, several analysts speculated that the W32/Blaster worm was responsible for the major blackout in the United States and Canada. Blaster worm was on the loose for three days by that time. Well, official reports quickly denied these claims. Indeed, it is believed that Blaster worm was not the primary cause of the blackout. However, Blaster was a potential contributor by slowing down communication systems between electricity control operation centers. Thus the network operators did not have data in time to control the electric systems to avoid further power surges. Clearly, reports indicated that the electricity control center had "computer problems," which pretty much sound like a worm infection9. Eventually, the east coast experienced power loss, including the area of New York City. In the end, Blaster worm infections happened so rapidly that vulnerable systems could not be connected to the compromised networks (unless they were protected by a personal firewall) to download the security patches because the worm hit the compromised machine almost immediately. Blaster attempted to attack the Windows Update Web site; however, the attacker failed to pick the right target, so the attack was not successful against the real site. Evidently, a successful attack against the Windows Update site would make it even harder to patch vulnerable systems because the updates would be more difficult to download. Someone could argue, however, that it was hard enough to download the update because vulnerable systems could easily get infected before the patches could be downloaded and installed on them. On July 16, 2001, the Chinese W32/CodeRed worm attempted to execute a targeted DoS attack against www.whitehouse.gov (with the IP address 198.137.240.91) by connecting continuously to the site. In response to the attack, the IP address was quickly changed. However, the worm carried another payload that targeted systems using U.S. English codepage (0x409). In such a case, the worm installed a hook routine on the TcpSockSend() function of the INFOCOMM.DLL module of Microsoft IIS. The worm's hook routine did not let an infected system access any HTML content. Instead, the worm displayed the page shown in Figure 8.7 for all Web access. Figure 8.7. The activation routine of the CodeRed worm. Probably the most infamous worm on Linux operating systems was Linux/Slapper. (Detailed technical information about the previously discussed worms, as well as many others, is available in Chapters 9 and 10, "Exploits, Vulnerabilities, and Buffer Overflow Attacks".) For the sake of completeness, though, it is interesting to mention here that Slapper was designed to build a peer-to-peer network of compromised systems to execute DDoS (distributed denial of service) attacks. This allowed the attacker to connect to one infected node and control all infected "zombie" systems connected to that node from a single location by sending commands to all at the same time. Each copy of the worm carried a command interface that the attacker could use to execute various types of DoS attacks, including several flooding techniques. Although a few unconnected attack networks were found, the largest network consisted of nearly 20,000 zombie systems waiting for the attacker's commands. Many other types of DoS attacks were developed in worms. For example, 911 attacks against the phone system are a common payload of computer viruses. (911 is the emergency services phone number in the U.S.) The Neat worm (as discussed in Chapter 3, "Malicious Code Environments"), on Microsoft WebTV systems, was an example of such an attack. The worm simply reconfigured the WebTV system to call 911 instead of the default ISP phone number. 8.7. Data Stealers: Making Money with Viruses Modern attackers are making money using computer viruses. Although professional attackers could make money by breaking into individual systems to steal credit card numbers and other valuable information, computer worm attacks can reach many more targets in much less time, thereby enhancing the chances that the attacker gets away with valuable information without a trace. 8.7.1. Phishing Attacks There are several ways to use computer worms to steal information. In the simplest cases, the attacker uses a social engineering attack (also called a simple phishing attack) to collect the information simply by asking you to disclose your credit card information and PIN number. Phishing attacks typically use spoofed e-mail and fraudulent Web sites designed to fool recipients to disclose personal information. Phishers are able to convince up to 5% of recipients to respond to them10. The [email protected]11 is an example of such a simple, but rather effective attack. The worms sends itself in e-mail messages. In its attempt to steal information, the worm displays fake dialogs purporting to be from PayPal (see Figure 8.8), which ask you to type in a credit card number and other personal information. The stolen information is stored. Then the information is subsequently encrypted and sent to the attacker. Figure 8.8. The dialog box displayed by the [email protected] worm. 8.7.2. Backdoor Features Computer worms often have built-in backdoors. An infamous example of such a worm is W32/HLLW.Qaz.A. This worm was first discovered in China in July of 2000. QAZ is a companion virus, but it also spreads itself over the network. Furthermore, the worm has a backdoor that will enable a remote user to connect to and control the computer using port 7597. QAZ enumerates through poorly protected NetBIOS shares and attempts to find a computer to infect. After the remote computer is infected, its IP address is e-mailed back to the attacker. The backdoor payload in the virus awaits connection. This enables a hacker to connect and gain access to the infected computer. According to several sources, QAZ was most likely responsible for successful attacks against Microsoft's networks, compromising a nonsecured home system that had remote connections to corporate sites, thereby allowing the attacker access to valuable information. Another famous backdoor incident was built into a variant of CodeRed, called CodeRed_II. This worm copies CMD.EXE from the Windows NT \System folder to the following folders (if they exist): • C:\Inetpub\Scripts\Root.exe • D:\Inetpub\Scripts\Root.exe • C:\Progra~1\Common~1\System\MSADC\Root.exe • D:\Progra~1\Common~1\System\MSADC\Root.exe Although CodeRed_II spreads as an in-memory injector just like the original, this variant of the worm also drops a Trojan called VirtualRoot. When executed, this Trojan modifies the following Registry key: HKLM\System\CurrentControlSet\Services\W3SVC\Parameters\Virtual Roots The Trojan adds a few new keys here and sets the user group on these to the value 217. This allows the intruder to control the Web server by sending an HTTP GET request to run scripts/root.exe on the infected Web server. After a successful attack, you can find new root accesses to C:\ and D:\ drives in the Computer Management feature of Windows, as shown in Figure 8.9. This allows the attacker full remote access to logical drives C: and D: on the infected computer through legitimate requests to the Web server. Figure 8.9. System with opened shares after a CodRed_II attack. Computer viruses used backdoor features targeting Novell NetWare servers as well. For example, the Hypervisor virus12 on DOS included a special payload to create a Supervisor-equivalent user, called Hypervisor, on Novell NetWare servers in 1995. Hypervisor waits patiently until the Supervisor of the network logs on from an infected system. At that moment, the virus will be able to add a new user, creating a Hypervisor user object and adding SUPERVISOR SECURITY_EQUALS attributes to it. The Hypervisor user will not have a password set; thus the attacker can log in to the system with Supervisor rights shortly after the virus has been introduced on the local network. Hypervisor also copies the bindery files of Novell NetWare to the SYS:LOGIN/ folder (NET$BIND.SYS, NET$BVAL.SYS on 2.xx servers and NET$OBJ.SYS, NET\$PROP.SYS on 3.xx servers). In addition, Hypervisor is a stealth virus.

Computer worms such as Nimda use a similar approach on Windows systems. Nimda adds the Guest account to the Administrator group. This gives the Guest account administrative privileges.

Another example is the [email protected] family, which spreads using a variety of techniques including mass-mailing, network share infection, and file infection. In addition, Bugbear variants support a backdoor component and a keylogger function. Using the keylogger, the worm can collect information that the user types on the system, which can include sensitive data. The worm sends the collected information to several e-mail accounts that belong to the attacker. Using the backdoor component, the attacker can connect to the compromised systems remotely. In addition, some variants of Bugbear specifically target financial institutions. The worm carries a long list of more than 1,000 domain names that belong to banks from around the world. When Bugbear determines that the default e-mail address of the local system belongs to a banking company, it will send the data collected by the keylogger, as well as cached dial-up password information to the e-mail accounts of the attacker. Using the information, the attacker hopes to connect to a financial institution's dial-up network and make financial gain.

8.8. Conclusion

It is very likely that in the future, online fraud will be increasingly common based on computer worm attacks. Not only are banks at risk, but online brokerage systems as well. Imagine the possible financial chaos a computer worm could cause by "buying and selling" online traded stocks. Major stocks often trade tens of millions of stocks a day, but still the stock price hardly changes until the end of the day. Normally, there is a balance between buyers and sellersnot so if computer worms are used to make random or targeted sales and buys in the name of thousands of infected users.

Spammers are increasingly interested to utilize computer worms, bots, and backdoors, providing financial motivation for virus writers to create them.

References

1. Thomas A. Longstaff and E. Eugene Schultz , "Beyond Preliminary Analysis of the WANK and OILZ Worms: A Case Study of Malicious Code," Computers & Security, Volume 12, Issue 1, February 1993, pp. 61-77.4.

2. Mikko Hypponen , "Virus Activation Routines," EICAR, 1995, pp. T3 1-11.

3. Peter Szor , "Poetry in Motion" Virus Bulletin, April 2000, pp. 6-8.

4. Fridrik Skulason , "Disk Killer," Virus Bulletin, January 1990, pp. 12-13.

5. Mark Ludwig , "Giant Black Book of Computer Viruses," American Eagle Publications, Inc., 1995.

6. Vesselin Bontchev , "Are 'Good' Computer Viruses Still a Bad Idea?," EICAR, 1994, pp. 25-47.

7. Dr. Adam L. Young and Dr. Moti Yung , "Malicious Cryptography: Exposing Cryptovirology," Wiley Publishing, Indianapolis, 2004, ISBN: 0764549758 (Paperback).

8. Gerald D. Hill III, "The Trend Toward Non-Real-Time Attacks," Computer Fraud & Security, November 2003, pp. 5-11.

9. Bruce Schneier , "Blaster and the August 14th Blackout," http://www.schneier.com/crypto-gram-0312.html.

10. Anti-Phishing Working Group, http://www.antiphishing.org/.

11. Stuart Taylor , "Misguided or Malevolent? New Trends In Virus Writing," Virus Bulletin, February 2004, pp. 11-12.

12. Peter Szor and Ferenc Leitold , "Attacks Against Servers," Forraskod, March/April 1995, pp. 2-3.

Chapter 9. Strategies of Computer Worms

"Worm: n., A self-replicating program able to propagate itself across network, typically having a detrimental effect."

Concise Oxford English Dictionary, Revised Tenth Edition

9.1. Introduction

This chapter discusses the generic (or at least "typical") structure of advanced computer worms and the common strategies that computer worms use to invade new target systems. Computer worms primarily replicate on networks, but they represent a subclass of computer viruses. Interestingly enough, even in security research communities, many people imply that computer worms are dramatically different from computer viruses. In fact, even within CARO (Computer Antivirus Researchers Organization), researchers do not share a common view about what exactly can be classified as a "worm." We wish to share a common view, but well, at least a few of us agree that all computer worms are ultimately viruses1. Let me explain.

The network-oriented infection strategy is indeed a primary difference between viruses and computer worms. Moreover, worms usually do not need to infect files but propagate as standalone programs. Additionally, several worms can take control of remote systems without any help from the users, usually exploiting a vulnerability or set of vulnerabilities. These usual characteristics of computer worms, however, do not always hold. Table 9.1 shows several well-known threats.

Table 9.1. Well-Known Computer Worms and Their Infection Methods

Name / DiscoveredTypeInfectionExecution Method
WM/ShareFun February 1997Microsoft Mail dependent mailerWord 6 and 7 documentsBy user
Win/RedTeam January 1998Injects outgoing mail to Eudora mailboxesInfects Windows NE filesBy user
[email protected] (Happy99 worm) January 199932-bit Windows mailer wormInfects WSOCK32.DLL (by inserting a little hook function)By user
[email protected] March 1999Word 97 mass-mailer wormInfects other Word 97 documentsBy user
[email protected]2 May 2000Visual Basic Script mass-mailer wormOverwrites other VBS files with itselfBy user
[email protected] September 200132-bit Windows mass-mailer wormInfects 32-bit PE filesExploits vulnerabilities to execute itself on target

Table 9.1 suggests that infection of file objects is a fairly common technique among early, successful computer worms. According to one of the worm definitions, a worm must be self-contained and spread whole, not depending on attaching itself to a host file. However, this definition does not mean that worms cannot act as file infector viruses in addition to network-based propagators.

Of course, many other worms, such as Morris3, Slapper4, CodeRed, Ramen, Cheese5, Sadmind6, and Blaster, do not have file infection strategies but simply infect new nodes over the network. Thus defense methods against worms must focus on the protection of the network and the network-connected node.

9.2. The Generic Structure of Computer Worms

Each computer worm has a few essential components, such as the target locator and the infection propagator modules, and a couple of other nonessential modules, such as the remote control, update interface, life-cycle manager, and payload routines.

9.2.1. Target Locator

To spread rapidly on the network, the worm needs to be able to find new targets. Most worms search your system to discover e-mail addresses and simply send copies of themselves to such addresses. This is convenient for attackers because corporations typically need to allow e-mail messages across the corporate firewalls, thereby allowing an easy penetration point for the worm.

Many worms deploy techniques to scan the network for nodes on the IP level and even "fingerprint" the remote system to check whether such a system might be vulnerable.

9.2.2. Infection Propagator

A very important component of the worm is the strategy the worm uses to transfer itself to a new node and get control on the remote system. Most worms assume that you have a certain kind of system, such as a Windows machine, and send you a worm compatible with such systems. For example, the author of the worm can use any script language, document format, and binary or in-memory injected code (or a combination of these) to attack your system. Typically, the attacker tricks the recipient into executing the worm based on social engineering techniques. However, more and more worms deploy several exploit modules to execute the worm automatically on the vulnerable remote system without the user's help. Exploitation of vulnerabilities is the subject of Chapter 10, "Exploits, Vulnerabilities, and Buffer Overflow Attacks."

Note

Some mini-worms such as W32/Witty and W32/Slammer appear to combine the target locator (network scan) and infection propagator in a single function call. However, they still support distinct features: the generation of random IP addresses and the propagation of the worm body to new targets.

9.2.3. Remote Control and Update Interface

Another important component of a worm is remote control using a communication module. Without such a module, the worm's author cannot control the worm network by sending control messages to the worm copies. Such remote control can allow the attacker to use the worm as a DDoS (distributed denial of service) tool7 on the zombie network against several unknown targets.

An update or plug-in interface is an important feature of advanced worms to update the worm's code on an already-compromised system. A common problem for the attacker is that after a system is compromised with a particular exploit, it often cannot be exploited again with the same one. Such a problem helps the attacker to avoid multiple infections of the same node, which could result in a crash. However, the intruder can find many other ways to avoid multiple infections.

The attacker is interested in changing the behavior of the worm and even sending new infection strategies to as many compromised nodes as possible. The quick introduction of new infection vectors is especially dangerous. For example, the intruder can use a single exploit during the first 24 hours of the outbreak and then introduce a set of others via the worm's update interface.

9.2.4. Life-Cycle Manager

Some worm writers prefer to run a version of a computer worm for a preset period of time. For instance, the W32/Welchia.A worm "committed suicide" in early 2004, and then the B variant of Welchia was released in late February of 2004 to run for three more months. On the other hand, many worms have bugs in their life-cycle manager component and continue to run without ever stopping. Furthermore, we often encounter variants of computer worms that were patched by others to give the worm "endless" life.

Consider the statistics collected on an individual Welchia honeypot administered by Frederic Perriot between August 2003 and February 2004, shown in Figure 9.1. The sudden drop of Welchia is related to its life-cycle manager, which triggers the worm's self-killing routine.

Figure 9.1. The suicide of Welchia worm.

The cumulative number of distinct Welchia attacking systems was around 30,000 when the worm started to kill itself when observed on a particular DSL network (see Figure 9.2).

Figure 9.2. The cumulative number of Welchia attackers.

Another optional but common component of a computer worm is the payload (activation routine). In many cases, computer worms do not contain any payload. An increasingly popular payload is a DoS attack against a particular Web site. However, a common side effect of computer worms is accidental DoS attacks as a result of overloaded networks, especially overloaded network routers8. However, other interesting side effects have also been observed, such as accidental attacks on network printers.

Computer worms also can utilize the compromised systems as a "super computer." For example, W32/Opaserv9 attempts to break a DES-like10 secret key11 by sharing the attack among the infected nodes, similarly to the SETI network. (In fact, some computer worms, such as W32/Hyd, download and install SETI to compromised systems. The W32/Bymer worm is an example of a DNETC [Distributed Network Client] installation to compromised systems.) Such attacks were first predicted in 198912.

Another interesting tendency is the planned interaction between two computer worms as a payload. Several antiworms have been released with the intention of killing other computer worms and installing patches against the vulnerabilities they exploited. Examples include Linux/Lion versus Linux/Cheese and W32/CodeRed versus W32/CodeGreen. In this chapter, I will also discuss other kinds of interactions between malicious programs.

Recently it is becoming popular to install an SMTP (Simple Mail Transfer Protocol) spam relay server as the payload of a worm. Spammers compromise systems on a large scale using worms such as W32/Bobax and then using the SMTP relay server created by the worm to spam messages from the "zombie" systems.

9.2.6. Self-Tracking

Many computer virus authors are interested in seeing how many machines the virus can infect. Alternatively, they want to allow others to track the path of the virus infections. Several viruses, such as W97M/Groov.A13, upload the IP information of the infected system to an FTP site.

Computer worms typically send the attacker an e-mail message with information about the infected computer to track their spread. The Morris worm deployed a self-tracking module that attempted to send a UDP datagram to the host at ernie.berkeley.edu after approximately every 15 infections, but this routine was bogus, and it never sent any information14. A few other examples of self-tracking are mentioned later on in this chapter.

9.3. Target Locator

An efficient target locator module is an extremely important component of computer worms. The easiest mechanism for the attacker is to collect e-mail addresses on the system on which the worm was executed and to send attachments to such targets, but there are many more sophisticated techniques to reach new targets quickly, such as random construction of IP addresses in combination with port scanning.

Modern computer worms also attack the network using several protocols. In this section, I will summarize the most important attacks and network scanning techniques.

There are many ways in which a computer worm can collect e-mail addresses for attacks. The attacker can enumerate various address books with standard APIs, including COM interfaces15. An example of this is W32/Serot16.

Files can be enumerated directly to find e-mail addresses within them. Additionally, sophisticated worms might use the NNTP (network news transfer protocol) to read newsgroups or use search engines such as Google to collect e-mail addresses using techniques similar to those that spam attackers use.

All computer environments have some form of address book to store contact information. For example, the Windows Address Book or the Outlook Address Book might contain the e-mail addresses of your friends, colleagues, and clients, or names of e-mail lists in which you participate. If a worm can query the e-mail addresses stored in such locations, it can send itself to all of them and spread with an exponential infection rate. Unfortunately, it is a rather trivial task to query the information in such address books.

The [email protected]17 worm was especially successful with this technique in March 1999. The worm depends on the Microsoft Outlook installation on the system to propagate itself in e-mail by sending an infected Word document as an attachment.

9.3.1.2 File Parsing Attacks on the Disk

Several computer worms such as W32/Magistr18 simply search for the e-mail client's files or for all files with a WAB extension and parse such files directly for e-mail addresses. This technique became popular after Microsoft introduced security features in Outlook against computer worms that spread via e-mail messages.

As you might expect, file parsingbased attacks have their own minor caveats. For example, some worms have file format dependencies. The Windows Address Book is not saved in the same format on all Windows versions. Unicode is not always supported, and the file format is different in this case. This is why such worms cannot spread to other systems in such a situation. Problems like this can be extremely disturbing during natural infection tests in lab environments. It is an example of Murphy's Law when the whole world is infected with a particular wormwhich fails to work in the lab environment.

Nevertheless, the technique seems to be efficient in the real world, and many successful worm attacks are the proof. For example, the [email protected] worm became extremely widespread in early 2004. Mydoom parsed files for e-mails with the following extensions: HTM, SHT, PHP, ASP, DBX, TBB, ADB, PL, WAB, and TXT.

Computer worms use heuristics to figure out whether a particular string is a possible e-mail address. One possible heuristic is to look for mailto: strings in HTML files and assume it is followed by an e-mail address. Occasionally, the size of the domain name is limited. For example, [email protected] might not be accepted by worms such as W32/Klez.H as a valid e-mail address, because "a.com" is too short to be good (although someone might configure a local network using such domain name). In addition, some worms target recipients with a specific language such as Hungarian and, to trick the user to execute the worm, they check the TLD (top-level domain) of e-mail addresses as suggested. For example, the Zafi.A worm sends itself to e-mail addresses that have ".hu" (Hungarian) as their TLD19.

Sircam worm20 searches for e-mail addresses in Internet Explorer's Cache directory, the user's Personal directory, and the directory that contains the Windows Address Books (referred to by HKCU\Software\Microsoft\WAB\WAB4\Wab File Name in the Registry) in files whose names begins with sho, get, or hot, or whose suffix is HTM or WAB.

9.3.1.3 NNTP-Based E-Mail Collectors

Attackers have long introduced their creations in Internet newsgroups. In 1996 the abuse of the News Net became very intense. As a result, researchers of the Dr. Solomon antivirus team decided to create a service called Virus Patrol21 to scan Usenet messages for known and possibly unknown malware that was continuously planted in such messages. Virus Patrol was introduced in December 1996.

NNTP can be used in a number of malicious ways. For example, an attacker might be able to use a news server reader to build a large local database with the e-mail addresses of millions of people. The attacker can use this database to help the initial fast propagation of the worm by running the worm on a system that hosts the database.

This is a common technique of spammers, and it is suspected that worms such as the W32/Sobig family were populated using such techniques. The newsgroup-based e-mail collector is not entirely unknown in Win32 viruses. In fact, the very first known Win32 virus that used e-mail to propagate itself used an NNTP collector. W32/Parvo22 was introduced by the infamous virus writer GriYo of the 29A group in late 1998. Not surprisingly, just like many other GriYo viruses, Parvo also used polymorphism to infect PE files, but it also became the first virus to integrate an SMTP mass-mailing engine. Parvo was years ahead of its time, written in pure Assembly resulting in a 15KB virus body.

W32/Parvo used several newsgroups to collect e-mail addresses, but apparently a minor problem limited its spread. Parvo randomly tried to connect to two possible news servers: talia.ibernet.es or diana.ibernet.es. These servers, however, were not available to everyone at the time. Thus Parvo's newsgroup-based e-mail collector was limited to work "inside the borders" of Spain.

Parvo connects on port 119/TCP (NNTP) to one of the preceding servers and starts to communicate. The attacker prepared three different e-mail messages with content that he expected to be catchy enough for the selected audiences of three different newsgroups.

Parvo's first message targets frequent readers of hacking-related newsgroups, such as alt.bio.hackers, alt.hacker, alt.hackers, alt.hackers.malicious, and so on. The second message is sent to a subset of this newsgroup list. Finally, the third message targeted visitors to erotic newsgroups, such as alt.binaries.erotica, alt.binaries.erotica.pornstar, and so on.

To find e-mail addresses in newsgroups, Parvo uses the group command to join to a group randomly and then uses the head and next commands a random number of times to pick a message randomly. Finally, it extracts the e-mail address from the header of the randomly selected message, sends itself in e-mail to the target, and repeats the process.

9.3.1.4 E-Mail Address Harvesting on the Web

Attackers also can search for e-mail addresses using search engines. This is a relatively simple task that can help the attacker gain quick access to a large number of e-mails. As I was writing this book, the first such worms appeared that utilized popular search engines such as Google, Lycos, Yahoo!, and Altavista to harvest e-mail addresses. For example, the [email protected] worm used this technique successfully, and according to Google, it caused minor DoS attacks against its servers.

9.3.1.5 E-Mail Address Harvesting via ICQ

Some computer worms, such as the polymorphic [email protected]23, harvests e-mail addresses using ICQ (I Seek You) white pages located on ICQ servers. For example, http://www.icq.com/whitepages/ allows you to make searches for contacts according to various characteristics such as name, nickname, gender, age, and country in any combinations and retrieve contact information, such as e-mail addresses, to people who meet your search criteria. Not surprisingly, computer worms can get an advantage of the information provided.

Alternatively, a computer worm can capture e-mail addresses from outgoing messages. Even if a particular e-mail address is not saved anywhere on the system, when the user sends a message to a particular address, the worm can send a message to the same address. The Happy9924 worm was the first to use this method. Happy99 sends two messages that look similar to the example shown in Figure 9.3. Note the X-Spanska: Yes in the header. This is a self-tracking method that was used by the worm's author. SMTP servers simply ignore commands that begin with the "X" prefix.

Figure 9.3. The header section of an e-mail sent by Happy99.

Date: Fri, 26 Feb 1999 09:11:40 +0100 (CET)
From: "XYZ" <[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */>
To: <[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */>
Subject: VIRUS
X-Spanska: Yes

(Message contains UU-encoded Attachment.)


The original message is shown in Figure 9.4.

Figure 9.4. The message of the user is also sent by Happy99.

From: "XYZ" <[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */>
To: <[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */>
Subject: VIRUS
Date: Fri, 26 Feb 1999 09:13:51 +0100
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3


The body of the extra mail contains a UU-encoded executable called happy99.exe. When the user executes the attached program, the worm's code is activated.

Happy99 looks for two API names in the WSOCK32.DLL export section. This DLL is the Windows Socket communication library used by many networked applications, including several popular e-mail clients. The worm patches the export address entries of the connect() and send() APIs to point to new entries at the end of the .text section (the slack space) of WSOCK32.DLL.

When the patched DLL is loaded in memory as a client library to a networked application, the worm intercepts the connect() and send() APIs. Whenever the user makes a connection, Happy99 checks the used ports. If the port turns out to be for mail or news access, a new DLL, SKA.DLL, is loaded into the process address space, which contains the worm's complete code previously saved on the disk.

When the intercepted send() API is called, the worm again checks whether this event is related to newsgroups or mail. If so, it copies some part of the original e-mail header, paying attention to MAIL FROM:, TO:, CC, BCC, and NEWSGROUPS: keywords in the header of the e-mail. Finally, it adds the X-Spanska: YES string to the mail header. Several other worms use an approach similar to Happy99's. Some of these worms inject their complete code into the WSOCK32 library.

9.3.1.7 Combined Methods

Of course, there can be many variations of e-mail address harvesting and worm propagation. For example, the Linux/Slapper worm3 is capable of harvesting e-mail addresses and providing them to the attacker based on his request via a remote-control interface. Then another worm might be created by the attacker to use the database of harvested e-mail addresses to propagate to a large number of machines very rapidlywithout requiring a large set of initial infections to harvest an efficient number of e-mail addresses. Even more likely, the attacker can use the collected e-mail addresses to spam targets.

9.3.2. Network Share Enumeration Attacks

Probably the simplest method to find other nodes on the network quickly is to enumerate the network for remote systems. Windows systems are especially vulnerable to such attacks because of their rich support for finding other machines with simple interfaces. Computer viruses such as W32/Funlove used the enumeration principle to infect files on remote targets. These attacks caused major outbreaks at large corporations around the world.

Several computer worms have minor implementation problems and become overly successful at finding networked resources, including shared network printer resources. This happens because not all worms pay attention to the type of resources they enumerate, which can lead to accidental printing on the network printers. Indeed, bogus worms print random-looking binary garbage on the printer, which is in fact the code of the worm. W32/Bugbear and W32/Wangy are examples of computer worms that accidentally target network printers with such an attack.

The success of this kind of worm usually depends on the trusted relationship between systems. However, there are additional contributors:

• Blank passwords: Many default installations of systems are vulnerable to attacks because they do not have a default password set for administrative-level access on shared resources.
• Weak passwords dictionary attacks: Weak passwords were a target of computer worms as early as 1988, starting with the Morris worm. However, password dictionary attacks on Windows systems did not become popular until 2003, with the sudden outbreak of worms like BAT/Mumu. Surprisingly, Mumu carried a relatively short password list that includes password, passwd, admin, pass, 123, 1234, 12345, 123456, and a blank password. Most likely, its success is related to the blank passwords on administrator accounts.
• Vulnerabilities related to the handling of passwords: The W32/Opaserv worm appeared in September of 2002 and became infamous for its attacks against systems that were otherwise protected with strong passwords, but that shared network resources on vulnerable Windows installations. Specifically, Opaserv exploited the vulnerability described in the MS00-072 security bulletin, which affected Microsoft Windows 95/98 and Me systems. This vulnerability, known as the share-level password vulnerability, allows access to network shares using the first character of the password, no matter how long the password is. The number of systems that share network resources on the Internet without being protected by a personal firewall is overwhelming, which allows Opaserv easy access to writeable shared resources.
• Password-capturing attacks to gain domain administrator-level rights: In Windows networks, domain administrators have the right to read and write any files on any Windows machine on the network, unless specifically forbidden. On NT-based systems, domain administrators can also remotely execute programs on the fly and execute commands that require higher privilege levels than those of a regular user on the network.

These features make remote management possible, but at the same time they open up a whole new set of security problems. Gaining domain administrator rights is not trivial. However, a worm could do this easily if given enough time. A worm could spread through traditional channels, constantly sniffing the local network segment with traditional TCP/IP sniffing techniques. After detecting the domain administrator credentials being transferred in the network segment (for example, because the administrator is logging on from a nearby workstation), it logs the domain administrator's username and password hash.

NT-based networks do not broadcast the password in plain text; they run it through a one-way hash function first. The function cannot be reversed, so the password cannot be gathered directly from the hash. Instead, the worm could execute a brute-force attack to exhaust every possible password combination. It could run every password (A, AA, AAA, AAAA, and so on) through the same one-way function and compare the result. If they match, the password has been found. Alternatively, the worm could use a dictionary attack to find passwords as well.

With a strong password, this process might take days to accomplish, but a typical NT password takes less than a week to crack on a typical Windows workstation from a single Pentium system. Assuming that the worm could communicate with other compromised nodes, it could introduce workload balancing between the compromised nodes to share the work, making the cracking process even faster.

After the worm has cracked the NT domain administrator password, it owns the network and can do anything. Specifically, it can copy itself to any other Windows machine in the network. On NT-based machines, it can even start itself automatically with high access rights. Such a worm could also change the domain administrator password and the local administrator passwords to make itself more difficult to stop.

We first projected the feasibility of such attacks on NT domains with Mikko Hypponen back in 1997. At about the same time, tools such as L0phtCrack appeared to fulfill the sniffing and breaking of password hashes on NT domains. The authors of L0phtCrack demonstrated that long passwords can be often weaker than short ones when challenged with dictionary attacks25.

In fact, the hashing algorithm of passwords on NT domains splits long passwords to seven character chunks, helping L0phtCrack crack the password more quickly. Nevertheless, computer worms with built-in network sniffing to crack passwords have not been discovered so far. Secure your passwords nowbefore it is too late! (Of course, this advice might not be funded very well when you consider a computer worm with a built-in keylogger to capture user accounts and passwords to attack other systems.)

9.3.3. Network Scanning and Target Fingerprinting

Several computer worms construct random IP addresses to attack other nodes on the network. By analyzing the scanning algorithm of the worm, someone might be able to make predictions about the worm's propagation speed on the network.

Evidently, an attacker can scan the entire Internet from a single machine, building IP addresses in a sequential manner (such as 3.1.1.1, 3.1.1.2, 3.1.1.3, and so on) and carefully ignoring invalid IP address ranges. This technique allows the attacker to build a "hit list" (database of IP addresses) to systems that might be vulnerable against a particular attack. To do that, the attacker typically fingerprints the remote systems just enough to suspect that the target may be vulnerable. In many cases, the fingerprinting is strongly related to a successful exploitation.

The hit list method is one of the theoretical backgrounds for so-called Warhol worms26. Warhol worms can infect 90% of all vulnerable systems on the entire Internet in less than 15 minutes. (It is expected that IPv6 will force computer worms to switch from traditional scanning methods to "hit list" techniques in the future.)

9.3.3.1 Scanning Using a Predefined Class Table: The Linux/Slapper Worm

Network worms can also scan for remote systems, generating random IP addresses but using a predefined table of network classes. For example, the Linux/Slapper worm uses the classes as defined in Listing 9.1 to attack possibly vulnerable Apache systems running on Linux:

Listing 9.1. The Class Definitions of the Linux/Slapper Worm

unsigned char classes[] = { 3, 4, 6, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
24, 25, 26, 28, 29, 30, 32, 33, 34, 35, 38, 40, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 61, 62, 63, 64, 65, 66, 67, 68, 80, 81, 128, 129, 130, 131, 132, 133, 134,
135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,150, 151, 152,
153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170,
171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188,
189, 190, 191, 192, 193, 194, 195, 196, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 224, 225, 226, 227, 228,
229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239 };

Note

I picked the name for Linux/Slapper worm when we discovered it in September 2002. I chose the name based on Slapper's similarity to the BSD/Scalper worm's code. The Scalper worm attacked Apache systems with the scalp exploit codehence my name selection for this creature, after we had discovered it.

The preceding classes do not have some of the class A-sized, local networks, such as 10, or many other IP address ranges, including invalid classes. The worm builds the base IP address of the target machine as shown in Listing 9.2.

Listing 9.2. The Randomized IP Address Builder Routine of Linux/Slapper

a=classes[rand()%(sizeof classes)];
b=rand();
c=0;
d=0;

The attack will start with an address such as 199.8.0.0, and the worm will scan up the entire range of network nodes. Slapper attempts to connect on port 80 (HTTP) in order to fingerprint the remote system. It does so by sending a bogus HTTP request on port 80 that is missing the Host: header (which is required in HTTP/1.1) as shown in Listing 9.3.

Listing 9.3. The Bogus GET Request of Linux/Slapper

GET / HTTP/1.1\r\n\r\n

The worm expects that Apache Web servers return an error message to this request; Apache returns the message shown in Listing 9.4 to the attacker node:

Listing 9.4. Apache Web Server's Answer

HTTP/1.1 400 Bad Request
Date: Mon, 23 Feb 2004 23:43:42 GMT
Server: Apache/1.3.19 (UNIX) (Red-Hat/Linux) mod_ssl/2.8.1
OpenSSL/0.9.6 DAV/1.0.2 PHP/4.0.4pl1 mod_perl/1.24_01
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1


Note the Server: Apache keywords in the error message. The returned data also has information about the actual version number of the Web server, which is 1.3.19 in this example.

The worm checks whether the error message is coming from an Apache server by matching the server information. Then it uses a table filled with architecture and version information numbers (shown in Listing 9.5) to see if the target is compatible with the attack.

Listing 9.5. The Architectural Structure of Slapper

struct archs {
char *os;
char *apache;
} architectures[] = {
{"Gentoo", "", 0x08086c34},
{"Debian", "1.3.26", 0x080863cc},
{"Red-Hat", "1.3.6", 0x080707ec},
{"Red-Hat", "1.3.9", 0x0808ccc4},
{"Red-Hat", "1.3.12", 0x0808f614},
{"Red-Hat", "1.3.12", 0x0809251c},
<strong>{"Red-Hat", "1.3.19", 0x0809af8c},</strong>
{"Red-Hat", "1.3.20", 0x080994d4},
{"Red-Hat", "1.3.26", 0x08161c14},
{"Red-Hat", "1.3.23", 0x0808528c},
{"Red-Hat", "1.3.22", 0x0808400c},
{"SuSE", "1.3.12", 0x0809f54c},
{"SuSE", "1.3.17", 0x08099984},
{"SuSE", "1.3.19", 0x08099ec8},
{"SuSE", "1.3.20", 0x08099da8},
{"SuSE", "1.3.23", 0x08086168},
{"SuSE", "1.3.23", 0x080861c8},
{"Mandrake", "1.3.14", 0x0809d6c4},
{"Mandrake", "1.3.19", 0x0809ea98},
{"Mandrake", "1.3.20", 0x0809e97c},
{"Mandrake", "1.3.23", 0x08086580},
{"Slackware", "1.3.26", 0x083d37fc},
{"Slackware", "1.3.26", 0x080b2100}
};

The attacker knows that the remote system runs Apache on a system that is likely to be compatible with the exploit code of the worm (assuming that the system is not patched yet). The third value is a "magic" address related to the exploit code. The magic number is explained in Chapter 10. In this example, the worm will select the 0x0809af8c address using the Red Hat and 1.3.19 architecture and version information. (See the bold line in the preceding structure.)

9.3.3.2 Randomized Scanning: The W32/Slammer Worm

So far, the Slammer worm has been responsible for the quickest worm outbreak in history. Slammer attacks UDP port 1434 (SQL server) and does not bother to check whether the IP address is valid. It simply generates completely random IP addresses and sends a packet to each target. (See Table 9.2 for an illustration.)

Table 9.2. A Sample Scan of the Slammer Worm

0.00049448186.63.210.15:1434
0.0011043373.224.212.240:1434
0.00167424156.250.31.226:1434
0.00227515163.183.53.80:1434
0.00575352142.92.63.3:1434
0.00600663205.217.177.104:1434
0.0061734116.30.92.25:1434
0.0063399171.29.72.14:1434
0.00650697162.187.243.220:1434
0.00667403145.12.18.226:1434
0.00689780196.149.3.211:1434
0.0070648643.134.57.196:1434
0.00723192246.16.168.21:1434
0.00734088149.92.155.30:1434
0.00750710184.181.180.134:1434
0.0076733279.246.126.21:1434
0.00783926138.80.13.228:1434
0.00800521217.237.10.87:1434
0.00817112236.17.200.51:1434

Slammer appears to be one of the quickest possible attacks on the Internet, but researchers predict that some worm types in the future will spread even faster. Slammer's infection was observed almost simultaneously all around the world and does not need to use any fingerprinting. It counts on the "sure shot" against vulnerable targets, which will continue the infection of other nodes as fireworks.

9.3.3.3 Combined Scanning Methods: The W32/Welchia Worm

The Welchia worm uses an IP address generator engine similar to Slapper's; however, it uses a combination of methods:

• Welchia scans class Bsized networks near the host's class-B network. It does so by scanning either the exact class Bsized network or slightly above or below, in hopes that such nearby systems also might be vulnerable to the same exploits.
• The worm uses a hit list for class Asized networks. The attacker expects that these systems will have more vulnerable targets. This method also uses a randomized scanning strategy by attacking 65,536 random IP addresses.

Before Welchia proceeds with its exploits, it checks the availability of the remote system with ICMP echo requests (pings).

9.4. Infection Propagators

This section summarizes interesting techniques that computer worms use to propagate themselves to new systems.

9.4.1. Attacking Backdoor-Compromised Systems

Although most computer worms do not intentionally attack an already compromised system, some computer worms use other backdoor interfaces to propagate themselves. The W32/Borm worm was among the first computer worms to attack a backdoor-compromised remote system. W32/Borm cannot infect any other systems than those already compromised with Back Orifice (a fairly popular backdoor among attackers). Back Orifice supports a remote command interface that uses an encrypted channel between a client and the Back Orifice server installed on the compromised system. Borm utilizes a network-scanning and fingerprinting function to locate Back Orificecompromised systems. See Figure 9.5 for illustration.

Figure 9.5. W32/Borm uses Back Orifice to propagate itself.

The worm attacks a compromised system using the following simple steps:

1. It randomly generates an IP address and actively scans using Back Orifice's BO_PING command to see whether the remote system is compromised. To initiate any meaningful communication, the worm needs to know Back Orifice's magic password, which is *!*QWTY?. The header of the communication data is encrypted with simple encryption, which is required by the Back Orifice server. Borm properly encrypts the data before it sends it to the randomly generated IP address on port 31337/UDP used by Back Orifice. If the remote node answers the BO_PING command, the worm proceeds to the next step. Otherwise, it generates other IP addresses to attack.
2. Borm sends a BO_HTTP_ENABLE command to the server.
3. In turn, this command instructs Back Orifice to create a virtual HTTP server on the compromised host. The worm instructs Back Orifice to use port 12345/TCP to establish an HTTP proxy on the compromised system.
4. Next, the worm connects and uploads itself in MIME-encoded format to the server.
5. Finally, the worm runs the uploaded executable on the server by sending a BO_PROCESS_SPAWN command. This will run the worm on the remote machine so it can start to scan for other Back Orifice systems from the newly infected node.

W32/Borm was among a flurry of computer worms that appeared in 2001. Other worms that utilized backdoor interfaces in this time frame included Nimda, which took advantage of a backdoor that was previously opened by a CodeRed II infection, and the W32/Leaves worm, which got in the wild by attacking SubSeven Trojan-compromised systems.

Borm was a creation of the Brazilian virus writer, Vecna. Several other of his methods are discussed later in this chapter.

9.4.2. Peer-to-Peer Network Attacks

Peer-to-peer attacks are an increasingly popular computer worm technique that not require advanced scanning methods on the computer network. Instead, such worms simply make a copy of themselves to a shared P2P folder on the disk. Anything that is available in a P2P download folder is searchable by other users on the P2P network.

In fact, some computer worms even create the shared folder in case the user only wants to search the P2P network for content instead of sharing similar content with others. Although this attack is similar to a Trojan installation, rather than recursive propagation, users of the P2P network will find the network-shared content easily and run the malicious code on their systems to complete the infection cycle. Some P2P worms, such as W32/Maax, will infect files in the P2P folder. The most common infection technique is the overwriting method, but prepender and even appender infections have also been seen.

P2P clients such as KaZaA, KaZaA Lite, Limewire, Morpheus, Grokster, BearShare, and Edonkey among others are common targets of malicious code. P2P networks are an increasingly popular way to exchange digital music; they are hard to regulate because they are not centralized.

9.4.3. Instant Messaging Attacks

Instant messaging attacks originated in the abuse of the mIRC /DCC Send command. This command can be used to send a file to users connected to a particular discussion channel. Normally, attackers modify a local script file, such as script.ini used by mIRC to instruct the instant messaging client to send a file to a recipient any time a new participant joins a discussion.

Modern implementations of IRC (Internet Relay Chat) worms can connect dynamically to an IRC client and send messages that trick the recipient into executing a link or an attachment. In this way, the attacker can avoid modifying any local files.

For example, the W32/Choke worm uses the MSN Messenger API to send itself to other instant messaging participants as a "shooter game"27. Although several instant messenger software programs require the user to click a button to send a file, worms can enumerate the dialog boxes and "click" the button, so the actual user does not have to click. It is also expected that computer worms will exploit buffer overflow vulnerabilities in instant messenger software. For example, certain versions of AOL Instant Messenger software allow remote execution of arbitrary code via a long argument in a game request function28.

9.4.4. E-Mail Worm Attacks and Deception Techniques

The vast majority of computer worms use e-mail to propagate themselves to other systems. It is interesting to see how attackers trick users every day, asking them to execute unknown code on their systems by sending them malicious code in e-mail. Let's face it: This is one of the greatest problems of security. How can security experts protect users from themselves?

During the last several years, an increasing number of users have been trapped in a "Matrix" of operating systems, such as Windows. Windows especially gives the illusion to millions of computer users worldwide that they are masters of their computersnot slaves to them. This illusion leads to neglected security practices. In fact, most users do not know that they need to be careful with e-mail attachments. Consider W97M/Melissa, which used the following e-mail to trick recipients into executing the worm on their machines:

"Here is that document you asked for ... don't show anyone else ;-)"

Another common method of deception is forging e-mail headers. For example, the attacker might use the e-mail address of Microsoft's support as the W32/Parvo virus does, placing [email protected] as a sender of the message. This can easily trick users into trusting an attachment and opening it without thinking. Other computers worms, such as W32/Hyd, wait until the user receives a message and quickly answers it by sending a copy of the worm back to the sender. Not surprisingly, this can be a very effective deception method.

Worms also make minor changes in the From: field to change the sender's e-mail randomly to something bogus. In practice, you might receive e-mail messages from many people, and most of the time they have nothing to do with the worm that abused their e-mail address. The bottom line is that notifying "the sender" will not necessarily help.

9.4.5. E-Mail Attachment Inserters

Some computer worms insert messages directly into the mailboxes of e-mail clients. In this way, the worm does not need to send the message; it simply relies on the e-mail client to send the mail. The earliest example of computer worms on Windows systems were of this type. An example of this is Win/Redteam, which targets outgoing mailboxes of the Eudora e-mail client.

9.4.6. SMTP Proxy Based Attacks

[email protected]29 is an example of a tricky worm that acts as an SMTP (simple mail transfer protocol) proxy. This worm appeared in February of 2002. Taripox attacks the %WINDOWS %\SYSTEM32\DRIVERS\ETC\HOSTS file to proxy mail traffic to itself. Normally, the HOSTS file has a simple definition for the localhost address as shown in Listing 9.6.

Listing 9.6. The Content of a Typical Host's Configuration File

127.0.0.1 localhost

W32/Taripox remaps the IP address of the SMTP server to the local host. The worm can listen on port 25 (SMTP) and wait for any SMTP e-mail client to connect. The outgoing e-mail message is then forwarded to the real SMTP server, but first the worm injects its own MIME-encoded attachment. The worm also tricks users with comment entries in the host file such as "# Leave this untouched," which is used for the localhost entry, and "# do not remove!," which is used as the comment for the SMTP IP address to localhost redirection entry. Figure 9.6 illustrates how Taripox works.

Figure 9.6. The W32/Taripox worm uses an SMTP proxy.

The HOSTS file is a common target for Retro worms to deny access to the Web sites of antivirus and security companies. Taripox's attack is similar to Happy99's, but it is a much simpler technique and does not require complicated modifications to binary files like WSOCK32.DLL.

9.4.7. SMTP Attacks

As Microsoft strengthened Outlook's security to protect end users better against worm attacks, computer worm authors quickly started to use more and more SMTP-based attacks.

The first such major worldwide outbreak was caused by the Sircam20 worm in July 2001 and was followed by the infamous W32/Nimda worm in September 2001. Smaller outbreaks had signaled the problem earlier, with the in-the-wild appearance of W32/ExploreZip in June 1999.

Sircam avoids relying on an e-mail program by getting the SMTP information directly from the Registry. This information consists of the following keys, which are created and used by a number of Microsoft mail applications:

• The current user's e-mail address: HKCU\Software\Microsoft\Internet Account Manager\Default Mail Account\Accounts\SMTP Email Address
• The address of the e-mail server: HKCU\Software\Microsoft\Internet Account Manager\Default Mail Account\Accounts\SMTP Server
• The user's display name: HKCU\Software\Microsoft\Internet Account Manager\Default Mail Account\Accounts\SMTP Display Name

If, for some reason, this information does not exist, Sircam will use prodigy.net.mx as the e-mail server, and the user's logon name as the e-mail address and display name. Using a hard-coded list of SMTP IP addresses is a common technique for computer worms, but usually this trick quickly overloads the particular set of servers once the worm is sufficiently widespread. Typically, such worms take off their SMTP servers with a DoS attack very quickly.

Thanks to implementation mistakes and bugs, it took a little while before SMTP worms could take their real place. Before Sircam, most worms lacked some important detail in their spreading mechanism. For instance, Magistr18 often sends clean files or files that are infected but that reference some libraries not available on the recipient's system, so they fail to penetrate the target.

To illustrate the simplicity of SMTP, consider Table 9.3.

Table 9.3. A Typical SMTP Communication Between a Client and a Server

 1.) Client Connects to Server 2.) Server sends 220 3.) HELO name.com says hi to the server 4.) Server sends 250 5.) MAIL FROM: 6.) Server sends 250 7.) RCPT TO: 8.) Server sends 250 9.) DATA 10.) Server sends 354 11.) Body of the e-mail Subject: (ENCODED ATTACHMENT FOLLOWS) dot to terminate 12.) Server sends 250 13.) QUIT - to say goodbye to the server 14.) Server sends 221

At this point, the e-mail might be dropped into a folder with a temporary name on the server in EML (Electronic Mail List) format. For example, Figure 9.7 shows an e-mail message sent by the W32/Aliz worm that was stored in the mail drop folder of Microsoft IIS Server.

Figure 9.7. An e-mail message of the W32/Aliz worm.

This snippet already reveals the Content-Type exploit in the body of the e-mail, which will be discussed in more detail in Chapter 10. Chapter 15, "Malicious Code Analysis Techniques," also shows a network-level capture of W32/Aliz as an illustration of analysis techniques.

9.4.8. SMTP Propagation on Steroids Using MX Queries

Worms such as Nimda, Klez, Sobig, and Mydoom perfected SMTP mass-mailing by utilizing automated SMTP server address resolution using MX (eMail eXchanger) record lookups via the DNS (Domain Name System). Such worms check the domain name of the e-mail addresses they have harvested and obtain a valid SMTP server for that domain. Mydoom even uses the backup SMTP server addresses, instead of the primary server IP address, to reduce the load on SMTP servers further.

Table 9.4 is a list of MX look-ups of Mydoom on a test machine. The worm immediately sends itself to systems that it can find correctly, doing so many times per minute. In Table 9.4, the first column is the time in seconds, the second column is the infected system's IP address, and the third column shows the IP address of the DNS server used to make MX lookups.

Table 9.4. DNS Queries of the Mydoom Worm

TimeWorkstation IP DNS IPQuery TypeQueried Value
5.201889192.168.0.1->192.168.0.3DNS Standard queryMX dclf.npl.co.uk
5.450294192.168.0.1->192.168.0.3DNS Standard queryMX frec.bull.fr
6.651133192.168.0.1->192.168.0.3DNS Standard queryMX csc.liv.ac.uk
18.036855192.168.0.1->192.168.0.3DNS Standard queryMX esrf.fr
19.721399192.168.0.1->192.168.0.3DNS Standard queryMX welcom.gen.nz
30.761329192.168.0.1->192.168.0.3DNS Standard queryMX t-online.de
32.213049192.168.0.1->192.168.0.3DNS Standard queryMX welcom.gen.nz
32.447161192.168.0.1->192.168.0.3DNS Standard queryMX geocities.com

9.4.9. NNTP (Network News Transfer Protocol) Attacks

Worms such as Happy99 can mail themselves to newsgroups as well as to e-mail addresses. Usenet attacks can further enhance the spreadability of computer worms. Interestingly, most computer worms will only send mail directly to e-mail addresses.

9.5. Common Worm Code Transfer and Execution Techniques

Computer worms also differ how they propagate the worm's code from one system to another. Most computer worms simply propagate their main body as an attachment in an e-mail. However, other types of worms utilize different methods, such as injected code and shellcode techniques in conjunction with exploit code, to attack another system.

9.5.1. Executable CodeBased Attacks

E-mail can be encoded in various ways, such as UU, BASE64 (MIME), and so on. However, the UU-encoded attachments are not very reliable over the Internet because UU uses some special characters whose interpretation depends on the context. Nowadays, most e-mail clients use MIME-encoded attachments by defaultand that is how most e-mail worms' SMTP client engines transfer themselves to new targets. Script e-mail worms usually send attachments encoded according to the settings of the e-mail client on the compromised system.

9.5.2. Links to Web Sites or Web Proxies

Computer worms also can send links to executables hosted elsewhere, such as a single Web site, a set of Web sites, or an FTP location. The actual message on IRC or in e-mail might not have any malicious content in it directlybut infects indirectly. One problem with this kind of attack is the possibility of an accidental DoS attack against the system that hosts the worm's code. Another potential pitfall is that the defender can easily contact Internet service providers to request they disconnect such sites, preventing further propagation of the computer worm.

Tricky worms send links with the IP address of an already compromised system. First, the worm compromises a machine and opens a crude Web server on the system. Then it sends messages to other users, using the IP address of the machine with the port on which the worm itself is listening for a GET request. In this way, the worm attack becomes peer-to-peer, as Figure 9.8 illustrates. Such computer worms might be able to bypass content filtering easily if the content-filtering rule is based on attachment filtering.

Figure 9.8. Tricky worms send links in e-mails instead of their own copy.

W32/Beagle.T used a similar method in March 2004. This variant of Beagle opens a crude Web server on TCP port 81. Then it sends a link to the recipient that triggers automated downloading with an HTML-based mail (which exploits the Microsoft Internet Explorer object tag vulnerability described in the MS03-032 bulletin) to download and execute the hosted worm executable on the target automatically.

The W32/Aplore worm was among the first worms to use this attack to propagate itself on Instant Messaging in April 2002. When the [email protected]30 worm arrives on a new system, it acts as a crude local Web server on port 8180, hosting a Web page that instructs the user to download and run a program, which is the worm body itself. The worm tricks Instant Messenger users by sending them a link that looks like the following:

FREE PORN: http://free:[email protected]:8180

9.5.3. HTML-Based Mail

The e-mail can even be HTML mail-based. Disabling HTML support in your e-mail client reduces your chance of exposure to at least some of these threats, such as VBS/Bubbleboy. This worm is described in Chapter 10.

On UNIX-like systems, commands such as rsh, rlogin, rcp, and rexec can be used directly by computer worms. Using such commands, worms can execute themselves on remote systems if the attacked system is not secured or if the password is guessed with a dictionary attack or similar method. Usually, such worms make a copy of their code directly to the remote system and execute themselves via the remote execution facilities.

On Windows systems, worms like JS/Spida can take advantage of vulnerable Microsoft SQL servers. Spida scans for remote Microsoft SQL server systems on port 1433 and tries to execute itself remotely with the following assumptions:

• The Microsoft SQL server runs in Administrative mode.
• The "sa" Microsoft SQL server account has no password set.

The worm takes advantage of the xp_cmdshell function to execute system commands to run the worm on the remote machine.

9.5.5. Code Injection Attacks

A more advanced attack requires exploitation of a target with direct code injection over the network. As traditional buffer overflows are getting more difficult to exploit, attackers are increasingly interested in exploiting server-side vulnerabilities related to a lack of input validation. For example, the Perl/Santy worm utilizes Google to find vulnerable Web sites and runs its own Perl script via a vulnerability in the phpBB bulletin board software. This worm successfully defaced tens of thousands of Web sites on December 21 of 2004. Depending on the thread model of the vulnerable target server, one of the following actions will happen:

• A new thread is created at the start of the server.
• A new thread is created upon each incoming request.

Furthermore, depending on the context of the hijacked thread, the worm

• Runs in SYSTEM context with high privileges.
• Runs in the context of a user with either high or low privileges that the worm might be able to escalate.

These preconditions are often reflected in the worm's operation. When, for example, W32/Slammer exploits a vulnerable Microsoft SQL server, the worm hijacks a thread that was executed at the start of the server. Thus the operations associated with the hijacked thread will be paralyzed because new incoming requests will not be resolved. In addition, the server process and the entire system is heavily overloaded because the worm never stops sending itself to new targets.

An example of the second type of attack is W32/CodeRed. CodeRed exploits Microsoft IIS server via a malformed GET request. When the server receives the GET request, it executes a new thread to process it. The worm hijacks that particular thread and creates 100 new threads (300 in some variants) in the vulnerable server process. This kind of computer worm needs to avoid infecting the target a second time because the worm could exploit the target multiple times, causing the target to be overloaded shortly after the initial outbreak. In addition, computer worms that counterattack each other can also benefit from this condition because they can utilize the same exploit as their opponent.

Both of these attacks are detailed in Chapter 10 from the point of view of exploitation. Figure 9.9 illustrates this.

Figure 9.9. A typical one-way code injection attack.

In some cases, the injected code creates a new user account on the target that can be used by the attacker to log in to the system remotely.

Another interesting example of a code injection attack is the [email protected] worm. This worm takes advantage of a vulnerability in Eudora 5 that can be exploited by sending a malformed boundary tag.

Lespaul is a mass-mailer worm, but just like CodeRed or Slammer, it injects its code directly into the vulnerable Eudora 5 process. The worm does not send an attachment to the recipient; instead, it propagates itself as the mail body. It can appear in the Eudora mailbox as part of an e-mail message featuring an overly long header field; however, its code is never saved into a standalone executable at any point in order to be executed.

9.5.6. Shell Code Based Attacks

Another class of computer worms utilize shell code on the target machine. The basic idea is to run a command prompt on the remote system, such as cmd.exe (on Windows) or /bin/sh (on UNIX) via the exploit code. Consider Figure 9.10 for an illustration.

Figure 9.10. A typical shell codebased attack.

The worm follows these steps:

1. It injects code into a remote process and binds a specific port to the process. The exploited process starts to listen on the port.
2. The worm attempts to connect to the listening port.
3. If the connection to the port is successful, the previously injected shellcode executes a command prompt and binds that process to the same port that the attacker is using.
4. Finally, the worm can start to send commands to the shell.

An example of such a worm is W32/Blaster.

Shellcode-based attacks are typically more common on UNIX systems than on Windows systems. A few variations exist, such as back-connecting shellcode and shellcode that reuses an existing connection.

Back-connecting shellcode immediately attempts to connect the target with the attacker by establishing a TCP connection from the target to the attacker's machine. The advantage of this method is that it allows machines behind a firewall to "connect-out" to the attacker system.

This attack requires the attacker system to listen on a particular port and wait for the shellcode to connect, as shown in Figure 9.11.

Figure 9.11. A back-connecting shellcode.

The basic difference occurs in the second step. The shellcode executes on the target and connects to the attacker. When the connection is established, the shellcode creates a shell prompt that gets its input from the attacker. The W32/Welchia worm uses this approach.

The exploiting phase might take place in a few steps. For example, Linux/Slapper exploits the target more than once to run shellcode via a heap overflow condition. Slapper, however, implements yet another shell-code technique, reusing the connection established between the attacker's machine and the target. As shown in Figure 9.12, the shellcode does not need to reconnect to the target. In Chapter 15 you can find a traced shellcode of Slapper that illustrates the reused connection better.

Figure 9.12. A connection-reusing shellcode.

9.6. Update Strategies of Computer Worms

Computer worms can be classified according to their update strategies. An early example of this is W95/Babylonia, a Windows Help and PE infector and self-mailer that was discovered on December 6, 1999.

Babylonia was posted to the alt.crackers Internet newsgroup as a Windows Help file named serialz.hlp31, which appeared to be a list of serial numbers for commercial software. This Help file was launched by many people who activated the virus on their systems. When executed, the virus creates a downloader component that looks for updates on a Web site. (Figure 9.13 illustrates this.)

Figure 9.13. The update procedure of Babylonia.

First, the downloader reads the content of a text file called virus.txt stored on the Web site. This text file lists a few filenames, such as dropper.dat, greetz.dat, ircworm.dat, and poll.dat. These files use a special plug-in file format with a header that starts with the identifier VMOD (which stands for virus module). The header of the virus modules contains an entry point of the module and, using this information, the downloader component of Babylonia downloads and executes the plug-in modules inside its own process, one by one.

• The dropper.dat module can reinstall the virus code on the system. This can be used by the attacker to update the virus with a newer release or to reinfect an already cleaned system via the downloader.
• The greetz.dat module is the payload. It modifies the c:\autoexec.bat file to display a message, shown in Listing 9.7, in January of each year.
• The ircworm.dat module is an mIRC worm installer that infects other targets via an mIRC.
• The poll.dat module is used to track the number of infected machines. When it is used, it sends messages to [email protected], with the Portuguese message "Quando o mestre chegara?" ("When will the master arrive?")

Listing 9.7. The Babylonia Worm's Message

W95/Babylonia by Vecna (c) 1999
Big thankz to sok4ever webmaster
Abracos pra galera brazuca!!!
---
Eu boto fogo na Babilonia!


Not only is Babylonia able to infect two different Windows file formats, it also infects WSOCK32.DLL, allowing it to send e-mails with an attachment whenever the user sends mail. Babylonia somewhat borrows this idea from Happy99.

The weakness of the attack is the update system based on a single Web site. After authorities pulled the site, Babylonia could not download new components.

9.6.1. Authenticated Updates on the Web or Newsgroups

Realizing the weaknesses of a single Web sitebased update system, Vecna decided to use alternated update channels and strong cryptography to authenticate the updates. The W95/Hybris worm was released in late 2000. It was an unusually large project of several top virus writers from around the world: Brazilian, Spanish, Russian, and French virus writers were all part of the large team that developed it.

Hybris uses 1,023-bit RSA signing32 to deliver its update modules to infected systems. It also uses a 128-bit hash function to protect the updates against attacks. The hash function uses XTEA (extended tiny encryption algorithm, which is a successor of TEA). XTEA is in the public domain, written by David Wheeler and Roger Needham. The RSA library for Hybris was written by the infamous Russian virus writer, Zombie. Figure 9.14 is an illustration of the Hybris attack.

Figure 9.14. The authenticated updates model of the Hybris worm.

Note the interesting selection for XTEA instead of TEA, which was previously found weak by cryptographers John Kelsey, Bruce Schneier, and David Wagner many years ago at CRYPTO 1996. In fact, TEA was used as a hash function in the security of the second version of the Microsoft Xbox. This weakness was leveraged a day later after its announcement by a team headed by Andy Green to break the security of the Xbox scheme by flipping bits in Xbox's FLASH ROM code that allowed a jump instruction to branch to RAM33.

The idea of the Hybris worm is to encrypt the updates with XTEA and sign the update files with RSA on the attacker's system. The attacker creates a secret key and a corresponding public key. He puts the public key into the virus, and the XTEA encryption/decryption keys are delivered with the modulebut are signed with a 1,023-bit RSA secret key. This is called a hybrid signing technique, which makes the process more efficient.

Instead of using a single 128-bit key, Hybris uses 8 XTEA keys, one of which is a hash computed about the plug-in and 7 other 128-bit keys that are set randomly. First, a 128-bit hash of the module is calculated using XTEA. This value will be used as one of eight 128-bit encryption keys to encrypt the entire module using a 64-bit XTEA block cipher. The block cipher applies the eight 128-bit keys (including the hash of the plug-in) to each consecutive 64-bit block of the plug-in. Each 64-bit block is encrypted with one 128-bit key. Thus the first 64-bit block is encrypted with the first key in the set, the second 64-bit block is encrypted with the second 128-bit key (the hash) until the keys wrap around: The 9th block is encrypted with the first key again, and so on.

Signing allows the worm instances to check if the update files were distributed by the virus writer. Thus the RSA algorithm is used to prevent changes to plug-ins or to create new plug-ins without specifically involving the attacker who holds the secret key. The worm uses the public key corresponding to the secret key of the attacker to validate the signed XTEA key and verifies that the hash is correct to avoid forgery attacks.

Although the updates are encrypted, the algorithm uses a symmetric key so the modules can be decrypted by anyone, in the same way as the worm decrypts them. The attacker is protected against any manipulations that could occur to update modules. Thus it is not feasible to distribute an update that could kill the worm without the secret key of the virus author unless, of course, there is some implementation error discovered that commonly occurs in cryptography.

There were up to 20 known modules (so-called Muazzins) for Hybris. However, there were more than 32 different versions of these in circulation. After encrypting and signing the module, the attacker encoded the module to send it to the alt.comp.virus newsgroup. Infected systems, which were all looking for the modules, downloaded and decrypted them using their public keys.

Although the initial update Web site was quickly disabled, the attacker had the opportunity to send out new updates in newsgroups. Infected nodes propagated the modules back to the newsgroups, so all infected nodes had a chance to get the updates. Hybris used a similar technique to the Happy99 worm's algorithm to inject its code into the WSOCK32.DLL library, propagating itself via e-mail.

The update modules included several extensions to the worms:

• A DOS EXE file infection module.
• A file infection module to attack PE files without changing their size and CRC 16/32/48 checksum. This module used compression to compress the host and filled the module with extra data, using the algorithm of the Russian virus writer, Zhengxi, to make the CRC the same as it was before the infection.
• A wrapper module to encrypt the Hybris-infected WSOCK32.DLL further.
• A Windows Help file infection module. (This module borrows code from W95/Babylonia.)
• A PE file infection module using Zombie's KME polymorphic engine.
• Two archive infection modules to infect RAR, ZIP, and ARJ archives.
• Two different plug-in modules to infect Microsoft Word documents and a third module to infect Microsoft Excel documents.
• A DoS attack module.
• An encrypted dropper generator module.
• An attacker module to infect machines via a SubSeven backdoor.
• A HATE (human-alike text engine) message module; this particular module could generate e-mail messages in the names of well-known antivirus researchers such as Eugene Kaspersky, Mikko Hypponen, and Vesselin Bontchev. My name was also on the list. The module was supposed to send e-mail messages using one of my e-mail addresses in the sender field with the subject "Uglier than Hermann Monster!" (most likely a reference to Herman Munster) with the attachment named "The Hungarian Freak!.exe."

Note:

This module was written by the Spanish virus writer, Mr. Sandman, the founder of the 29A virus writing group, who is believed to be a professional translator. Many other viruses of Mr. Sandman's are related to his interest in languages, for example Esperanto and Haiku.

• A retro attack module to block access to antivirus Web sites.
• Another e-mail message generator using a SOAP Web server to generate fortune cookie messages and send these (with Hybris) to recipients.
• A sys file infection routine to hide the infected WSOCK32.DLL on the system with stealth routines.
• An exploit module that can be used to retrieve files from vulnerable Web servers.
• Another retro attack to scan the disk and Registry for antivirus programs and delete them or corrupt their databases.
• An e-mail-based tracker module to send e-mail messages from infected nodes to a particular e-mail account.
• A few other generic message generator modules for e-mail propagation.
• A Happy 2000 module. This one overwrites the SKA.EXE file of the Happy99 worm to propagate Hybris instead. It also contains the graphical payload of the Happy99 worm.
• A Usenet module to connect to NNTP servers and download plug-ins. This module also uploads other modules to a newsgroup.
• Finally, an OpenGL-based animation that installed itself to load at boot time. This module, shown in Figure 9.15, was contributed by the French virus writer, Spanska.

Figure 9.15. The OpenGL-based hypnotizer spiral plug-in.

Listing 9.8 is an example of a plug-in module posted to the alt.comp.virus newsgroup34.

Listing 9.8. A Hybris Update in alt.comp.virus (Partial Snippet)

Date: Tue, 24 Jul 2001 20:29:51 -0700
Newsgroups: alt.comp.virus
Subject: h_2k MRKR KRnAbIvQdE?UlOhK6CrWdU#YvYnM:SrYU

TRUTUWXXPTVFVY3NXSTREYCUSPVNBLZLSQBPXXRRYMUOD7USWESFRWYBUTREMBLWKSPS
OXYVNWZG KTVHVDMTTRODVSMCZFWCQXSXVVTZVUKVKHOBTRNFYVVBLFRBXWUVRHWHPF
SE&THUFNVMHZCRHNVRVZUKXVWSBSBZRPB6NEVVYZLSVSLDLZZFZCYCSWKDLUZVYR5ZYLZ
NDOSNUKRMUYXOHTEMUKD


The body of this message contains the Happy 2000 plug-in of Hybris (only a snippet is shown in Listing 9.8). The name of the plug-in is in the Subject line as "h_2k," which is followed by the version number information of the plug-in. Hybris uses the version information to decide whether a module needs to be extracted and executed.

Several computer worms open up a port on the compromised system and implement an interface to execute arbitrary files on the compromised machine. The attacker can use this interface to update the worm's code from one version to another. For instance, the W32/Mydoom worm opens a TCP port in the range of 3127 to 3198 and waits for a connection, implementing a simple protocol. Essentially, Mydoom's code is updated similarly to a backdoor-based propagation technique described earlier in this chapter. The attacker needs to scan for systems that have a port open and can send an executable to the target that will be executed on the remote node. The first few versions of Mydoom did not implement any security mechanism for their update protocol. Not surprisingly, worms such as W32/Doomjuice, W32/Beagle, and W32/Welchia attacked Mydoom-compromised systems by taking advantage of the insecure update mechanism.

Later releases of Mydoom leave less chance for opportunistic attackers because they inspect incoming requests more carefully.

9.7. Remote Control via Signaling

Attackers often want to control their creations remotely, for example, to execute a DoS attack against a selected target or to control the propagation of the worm to new systems. The most obvious technique is based on the use of a backdoor feature built into the worm that communicates directly with a particular host. However, other known techniques centrally control the worm, such as via IRC or Windows domain mail-slots. Consider Figure 9.16, which illustrates the attacks of W32/Tendoolf.

Figure 9.16. The remote-controlled Tendoolf worm.

Some of the variants of this worm only propagate the infection to new targets whenever the attacker sends a ".spread" signal to an IRC server into a discussion channel of the worm itself. When the attacker sends the ".spread" signal, the already compromised systems look for new targets to infect. Techniques include e-mail, instant messaging, and SubSeven backdoor-compromised nodes attack. In addition, the attacker can execute a DoS attack against any target. The target of the DoS attack is unknown to the compromised systems (it is not hard-coded in the worm) until the attacker sends the command. Then the compromised nodes turn against the specified target and execute various kinds of flooding methods. Hence the name of the worm, which is Floodnet spelled backwards.

This kind of remote controlbased worm propagation is often confusing for junior antivirus researchers.

9.7.1. Peer-to-Peer Network Control

Some computer worms implement a virtual network between infected nodes to establish communication and control operations. The Linux/Slapper worm is an example of this kind of computer worm. Slapper uses the UDP protocol and port 2002 on each infected node. When the worm infects a new target, it passes the attacker's IP address to the target system. Then each node receives the IP addresses of all other nodes and keeps these in a list. Whenever a new IP address is introduced, all other nodes receive the update via this special virtual network, which uses TCP-like (stateful) features to ensure that the information arrives at the target correctly. The infected nodes select a random set of other machines to broadcast the updated information on the network. This is called a broadcast segmentation technique.

The remote control interface of Linux/Slapper is very advanced. The code is borrowed from the BSD/Scalper worm, which was based on a previous attacker tool. Just like Scalper, Slapper implements a hierarchical network structure35 that keeps track of systems the worm has infected. Slapper supports a large set of commands that implement UDP flood, TCP SYN flood, IPv6 SYN flood, and DNS standard query flood DoS attacks; it also supports the execution of arbitrary commands on the compromised nodes.

Consider Figure 9.17 for a possible illustration of a Slapper worm infection. When the first node is infected, it receives the IP address of the attacker host, followed by the list of all other nodes that are in the network, and so on.

Figure 9.17. A Slapper worm infection.

Figure 9.17 only illustrates how the infection is passed from one system to the next. Figure 9.18 is a possible way to illustrate the hierarchical relationships between the infected nodes as a P2P command network. Because more than one initial infection might be started by the attacker, there might be multiple smaller and larger P2P networks parallel to each other without any connection, similar to the configuration shown in Figure 9.18.

Figure 9.18. Slapper worm P2P network hierarchy.

9.8. Intentional and Accidental Interactions

Computer virus researchers have observed a set of interesting behaviors that are the result of intentional and accidental interactions between various kinds of malicious code. This section describes common interactions.

9.8.1. Cooperation

Some computer viruses accidentally cooperate with other malicious code. For example, computer worms might get infected with a standard file infector virus as they pass through already infected nodes. It is common to encounter multiple infections on top of in-the-wild computer worms. It is not uncommon to find three or even more different viruses on the top of a worm carrier. This can help both the network worm and the standard file infector virus in various ways.

A worm can take advantage of the infection of an unknown file infector virus. If the file infector virus is unknown to antivirus products, the computer worm body might not be detectable. For example, in some cases the worm body will be embedded deep inside the virus code, leaving little chance for the antivirus program to find it. See Figure 9.19 for an illustration.

Figure 9.19. The accidental interaction of a worm infected with a file infector virus.

In Step A, the computer becomes infected with a worm. In Step B, the worm successfully penetrates a new remote system. That computer, however, is already infected with a virus that infects exactly the same type of files in which the worm propagates itself. Thus the file infector virus attaches itself to the worm. In Step C, the multiple infection arrives at a new computer. When the worm is executed, the file infector virus runs (in most cases, it will run before the worm's code) and infect other objects.

In Step D, the combination arrives at a system protected by antivirus software that knows the file infector virus on top of the computer worm carrierbut does not know the computer worm underneath. If the antivirus can disinfect the file infector virus, it might create a file object that is not exactly the same as the original worm. For example, the binary file of the worm might get larger or smaller, and important fields in its header might also change. (Of course, an antivirus is just one possible agent that might interact with other malicious programs.)

Thus a "mutant" worm will have the chance to propagate itself further and infect a new system in Step E. In practice, no antivirus software would consider the Step E case a variant of the original worm. However, antivirus programs need to address this issue. For example, the MD5 checksums of the "mutant" worm body are clearly different, and if the antivirus or content-filtering software uses such checksums to detect the original worm, it will fail to detect the "mutant" worm.

An intentional variation of such cooperation appeared in the "symbiosis project" of GriYo. He released the mass-mailing worm, W32/Cholera, in infected form. The worm was infected with the polymorphic W32/CTX virus and became a carrier. The result was a quick succession of both W32/Cholera and W32/CTX infections worldwide. As a result, W32/CTX was reported on the Wildlist.

File infector viruses such as W32/Funlove often infect other worms and often multiple times. Such file infector viruses occasionally disappear from the top 50 charts but suddenly attack again during major computer worm outbreaks. This kind of activity was seen increasingly when the W32/Beagle worm appeared in the wild. As discussed earlier, some variants of Beagle send a password-protected attachment to recipients. Because the worm creates the archive (ZIP) file on the local system, the executable file of the worm can easily get infected by viruses such as Funlove before the executable gets packed. Thus a virus like Funlove also enjoys being password-protected36. Because several antivirus softwares had problems detecting the password-protected attachments reliably, "traveler viruses" could take advantage of this accidental cooperation.

A form of cooperation also exists with the previously mentioned W32/Borm creation, which infects BackOrifice-compromised systems. W32/Borm does not attempt to kill Back Orifice; it simply takes advantage of compromised systems to propagate. Similarly, the aforementioned Mydoom had a backdoor that was utilized by the Doomjuice worm to spread itself.

With macro and script viruses, "body snatching" attacks often occur. Two or more script or macro viruses might form a new creation as they accidentally propagate each other's code.

9.8.2. Competition

Competition between malicious codes was also experienced among computer viruses. Several viruses attack other viruses and disinfect them from the systems that they have compromised. An example of this is the Den_Zuko boot virus37, which disinfects the Brain virus. These viruses are often called "benefical viruses" or "antivirus" viruses.

Antiworm computer worms started to become more popular in 2001 with the appearance of the CodeRed worm and the counterattacking, CodeGreen. (However, antiworm worms had been experienced previously on other platforms, such as Linux.)

Because IIS could be exploited more than once, CodeGreen could easily attack CodeRed-infected systems. The worm sent a similarly malformed GET request to the remote target nodes to CodeRed, which had in front the message shown in Listing 9.9.

Listing 9.9. The Front of a CodeGreen GET Request

GET /default.ida?Code_Green_<I_like_the_colour-_-><AntiCodeRed-
CodeRedIII-IDQ_Patcher>_V1.0_beta_written_by_'Der_HexXer'-
Wuerzburg_Germany-_is_dedicated_to_my_sisterli_'Doro'.
Save_Whale_and_visit_<www.buhaboard.de>_and_www.buha-security.de


The worm also carried the following messages shown in Listing 9.10.

Listing 9.10. Other Messages of the CodeGreen Worm

HexXer's CodeGreen V1.0 beta CodeGreen has entered your system
it tried to patch your system and
to remove CodeRedII's backdoors

You may uninstall the patch via
SystemPanel/Sofware: Windows 2000 Hotfix [Q300972]

get details at "www.microsoft.com".
visit "www.buha-security.de


CodeGreen removed the CodeRed infections from systems and also removed the backdoor components of other CodeRed variants. Furthermore, it downloaded and installed patches to close the vulnerability.

Similar attacks against W32/Blaster worm were experienced when the W32/Welchia worm began its antiworm hunt against Blaster, which started the "worm wars" (as I decided to call it after Core Wars).

Another enthralling example is the W32/Sasser worm. Sasser targeted an LSASS vulnerability that was previously exploited by variants of Gaobot worm. Thus Gaobot's author was not impressed because Gaobot needed to compete with Sasser for the same targets. Consequently, the W32/Gaobot.AJS38 worm was developed with a vampire attack. I decided to call this kind of attack a "vampire" based on the Core War vampire attack. Vampire warriors can steal their enemies' souls (see Chapter 1, "Introduction to the Games of Nature," for details).

Gaobot.AJS is a vampire because it attacks Sasser when the two worms are on the same machine. Instead of simply killing Sasser, Gaobot.AJS modifies Sasser's code in a very tricky way. As a result of the modification, Sasser can still scan for new targets and even exploit them successfully. However, when Sasser connects to its shellcode on the compromised system to instruct it to download and execute a copy of Sasser via FTP, the code modifications of Gaobot.AJS will get control. In turn, Gaobot.AJS sends commands to Sasser's shellcode on the remote machine and instructs it to download a copy of Gaobot.AJS's code instead of Sasser's. Furthermore, Gaobot closes the connection to the remote machine so Sasser cannot propagate but is used as a Gaobot propagation agent in a parasitic manner.

Another gripping example is the W32/Dabber worm, which appeared right after Sasser. As mentioned, Sasser's shellcode is instructed to download a copy of Sasser via FTP. On the attacker system, Sasser implements a crude FTP server. However, this routine of Sasser had a simple buffer overflow vulnerability that could be exploited. (Indeed, worms can have their own vulnerabilities!) Dabber was released to exploit Sasser's vulnerability to propagate itself. It scans for targets that were compromised by Sasser and attempts to connect to Sasser's vulnerable "FTP server" to exploit it successfully.

It is expected that competitions between malicious programs will become more and more common in the future.

9.8.3. The Future: A Simple Worm Communication Protocol?

Although increased competition among malicious programs is likely, it also makes sense for attackers to invest in cooperating techniques. For example, computer worms could use a special protocol such as simple worm communication protocol (SWCP) to exchange information, as well as plug-ins ("genes") among different families of computer worms that support SWCP. Computer worms could swap payloads, exchange information about systems to attack, or even collect e-mail addresses and share them with the other worms that occasionally communicate using SWCP. I highly anticipate that such techniques will appear in the very near future.

Of course, communication can have other forms. For instance, viruses could "reproduce sexually"39 to cross their genomes to produce offspring, which can evolve or devolve. The closest currently known example of accidentally "sexually reproducing" computer viruses can be found in macro viruses which occasionally swap, or snatch their macros ("genes") as discussed in Chapter 3, "Malicious Code Environments,". However, specifically written binary viruses could possibly demonstrate similar behavior that would lead to further evolution in computer viruses on their own.

9.9. Wireless Mobile Worms

The SymbOS/Cabir worm40 indicates a totally new era of computer worms that will slowly become more popular as wireless smart phones replace current mobile phone systems, which have limited programming ability. The Cabir worm appeared in June 2004, and it has a number of unique features. This worm can run on Nokia 60 series phones running the Symbian operating system. The Symbian operating system is based on the EPOC. In fact, Symbian is EPOC version 6, also called EPOC32, but has a new name.

Interestingly, the Cabir worm spreads using the Bluetooth feature of wireless phones as shown in Figure 9.20.

Figure 9.20. The attacker phone is on the left, and the recipient is on the right.

The worm's code is compatible with mobile phones using ARM series processors with Symbian operating system. Normally, by default the Bluetooth communication feature is off on mobile phones. Mobile phone users might exchange some little programs, and in doing so they open up the Bluetooth communication channel to Cabir-like worms as well.

When executed, Cabir installs itself into several directories of the Symbian OS intending to make sure it will run each time the user boots the phone. Fortunately, this operation is disallowed in newer phone models. However, on older phones, worm components cannot be easily found without using custom file manager applications. Cabir does not enumerate Bluetooth devices; instead, it tries to find only the first such device and communicates with that device. The standard Bluetooth range is about 30 feet, and apparently not all Bluetooth devices like to communicate with each other. (However, researchers such as Mark Rowe are experienced with Bluetooth signal amplification and pointed out that attackers could utilize such technology to extend the Bluetooth range to about 300 feet, reliably.) In addition, researchers such as Ollie Whitehouse of @stake also demonstrated that Bluetooth devices are discoverable even in the so-called "non-discoverable" mode41. Several Bluetooth-related attack tools exist today including the most popular Bluesniff, Btscanner, PSMscan, and Redfang.

During the natural infection tests, Cabir first talked to a Bluetooth printer, which strangely acted as a "sticky" honeypot system and blocked the worm given that the printer did not support the Object Exchange (OBEX) protocol that is required to send a file. However, the worm successfully infected another phone as soon as I turned the Bluetooth printer off. Cabir is overly active in finding other phones and that can easily drain the battery of the phone similarly to natural situations when your phone is hopelessly attempting to find a provider without finding one in range.

A further problem is that you need to "hide" with mobile phones when you test replicate worms. Although the recipient needs to accept the incoming message to successfully receive the message, you do not want to infect another phone "by accident." In fact, there are several known vulnerabilities of Bluetooth systems, and some of these can be utilized to execute arbitrary code on Pocket PC devices42, while others can be used to implement phishing attacks on a number of smart phones types43.

Sure enough, in the future you can expect that worms are going to make phone calls from your mobile phone instead of you. There might be a new era of MMS- (Multimedia Messaging Service) based mass mailer worms as well as SMS- (Short Messages Services) based downloaders, porn dialers, and spammer applications, as well. Who is going to pay the bill?

References

1. Dr. Vesselin Bontchev, personal communication, 2004.

2. Nick FitzGerald, "When Love Came to Town," Virus Bulletin, June 2000, pp. 6-7.

3. Donn Seeley, "A Tour of the Worm," USENIX Conference, 1989, pp. 287-304.

4. Frederic Perriot and Peter Szor, "An Analysis of the Slapper Worm Exploit," Symantec Security Response, White Paper, April 2003, www.sarc.com/avcenter/whitepapers.html.

5. "The Cheese Worm," CERT Incident Note IN-2001-05, http://www.cert.org/incident_notes/IN-2001-05.html.

7. Alexsander Czarnowski, "Distributed DoS AttacksIs the AV Industry Ready?" Virus Bulletin Conference, 2000, pp. 133-142.

8. Ido Dubrawsky, "Effects of Worms on Internet Routing Stability," Security Focus, June 2003, http://www.securityfocus.com/infocus/1702.

9. Frederic Perriot, "Crack Addict," Virus Bulletin, December 2002, pp. 6-7, http://www.virusbtn.com/resources/viruses/indepth/opaserv.xml.

10. National Bureau of Standards, "Data Encryption Standard," FIPS Publication 46, U.S. Department of Commerce, 1977.

11. Electronic Frontier Foundation, "Cracking DES," Sebastopol, CA, 1998, ISBN: 1-56592-520-3 (Paperback).

12. Dr. Steve R. White, "Covert Distributed Processing with Computer Viruses," Advances in CryptologyCRYPTO '89, Springer-Verlag, 1990, pp. 616-619.

13. Vesselin Bontchev, "Anatomy of a Virus Epidemic," Virus Bulletin Conference, 2001, pp. 389-406.

14. Eugene H. Spafford, "The Internet Worm Program: An Analysis," 1988.

15. Katrin Tocheva, "Worming the Internet Part 2," Virus Bulletin, November 2001, pp. 12-13.

16. Peter Ferrie, "Sleep-Inducing," Virus Bulletin, April 2003, pp. 5-6.

17. Katrin Tocheva, Mikko Hypponen, and Sami Rautiainen , "Melissa," March 1999, http://www.f-secure.com/v-descs/melissa.shtml.

18. Peter Ferrie, "Magisterium Abraxas," Virus Bulletin, May 2001, pp. 6-7.

19. Gabor Szappanos and Tibor Marticsek, "Patriot Games," Virus Bulletin, July 2004, pp. 6-9.

20. Peter Ferrie and Peter Szor, "Sircamstantial Evidence," Virus Bulletin, September 2001, pp. 8-10.

21. Dmitry O. Gryaznov, "Virus Patrol: Five Years of Scanning the Usenet," Virus Bulletin Conference 2002, pp. 195-198.

22. Peter Szor, "ParvoOne Sick Puppy?" Virus Bulletin, January 1999, pp. 7-9.

23. Atli Gudmundsson and Andre Post, "[email protected]," http:[email protected]html.

24. Peter Szor, "Happy Gets Lucky?" Virus Bulletin, April 1999, pp. 6-7.

25. Stuart McClure, Joel Scambray, and George Kurtz, "Hacking Exposed: Network Security Secrets and Solutions," 3rd Edition, Osborn/McGraw-Hill, Berkeley, 2001, ISBN: 0-07-219381-6 (Paperback).

26. Vern Paxson, Stuart Staniford, and Nicholas Weaver, "How to 0wn the Internet in Your Spare Time," http://www.icir.org/vern/papers/cdc-usenix-sec02/.

27. Neal Hindocha and Eric Chien, "Malicious Threats and Vulnerabilities in Instant Messaging," Symantec Security Response, White Paper, October 2003, www.sarc.com/avcenter/whitepapers.html.

28. "Buffer Overflow in AOL Instant Messenger," http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2002-0005.

29. Sergei Shevchenko, "[email protected]," February 2002, http:[email protected].html.

30. Katrin Tocheva and Erdelyi Gergely, "Aplore," April 2002, http://www.f-secure.com/v-descs/aplore.shtml.

31. Marious van Oers, "Digital Rivers of Babylonia," Virus Bulletin, February 2000, pp. 6-7.

32. Ronald L. Rivest, Adi Shamir, and Leonard Adleman , "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems," Communications of the ACM, v-21, n-2, February 1978, pp. 120-126.

33. Andrew "bunnie" Huang, "Hacking the Xbox," Xenatera LLX, San Francisco, 2003, ISBN: 1-59327-029-1.

34. Nick Fitzgerald, personal communication, 2001.

35. Sen Hittel, "Modap OpenSSL Worm Analysis," Security Focus, September 16, 2002.

36. Dr. Igor Muttik, personal communication, 2004.

37. Vesselin Bontchev, "Are 'Good' Computer Viruses Still a Bad Idea?" EICAR, 1994, pp. 25-47.

38. Heather Shannon, Symantec Security Response, personal communication, 2004.

39. Edward Fredkin, "On the Soul," 2000, Draft Paper, http://www.digitalphilosophy.org/on_the_soul.htm.

40. Peter Ferrie and Peter Szor, "Cabirn Fever," Virus Bulletin, August 2004, pp. 4-5, http://pferrie.tripod.com/vb/cabir.pdf.

41. Ollie Whitehouse, "Redfang: The Bluetooth Device Hunter," 2003.

42. "WIDCOMM Bluetooth Communication Software Multiple Buffer Overflow Vulnerabilities," http://www.securityfocus.com/bin/10914/discussion.

43. "Bluetooth Information Disclosure Vulnerability," http://www.securityfocus.com/bin/9024/discussion.

Chapter 10. Exploits, Vulnerabilities, and Buffer Overflow Attacks

"Greater is our terror of the unknown."

Titus Livius

10.1. Introduction

Exploits, vulnerabilities1, and buffer overflow techniques2 have long been used by malicious hackers and virus writers. Until recently, however, these techniques were not commonplace. The CodeRed3,4 worm was a major shock to the antivirus industry because it was the first worm that spread not as a file but solely in memory by utilizing a buffer overflow in Microsoft IIS. Many antivirus companies were unable to provide protection against CodeRed, while other companies with a wider focus on security could provide solutionsto the relief of end users.

Usually new techniques are picked up and used by copycat virus writers. Thus, many similarly successful worms followed CodeRed, such as Nimda5 and Badtrans6.

This chapter covers not only such techniques as buffer overflows and input validation exploits, but also how computer viruses are using them to their advantage.

10.1.1. Definition of Blended Attack

A blended threat is often referred to as a blended attack7. Some people refer to it as a combined attack or a mixed technique. Without attempting to make a strong definition here, I will say simply that, in the context of computer viruses, this term is typically used when the virus exploits some sort of security flaw of a system or an application to invade new systems. A blended threat exploits one or more vulnerabilities as the main vector of infection and might perform additional network attacks, such as a denial of service (DoS) attack, against other systems.

10.1.2. The Threat

Security exploits, commonly used by malicious hackers, are being combined with computer viruses, resulting in very complex attacks that sometimes go beyond the general scope of antivirus software.

In general, a large gap has existed between computer security companies, such as intrusion detection and firewall vendors and antivirus companies. For example, many past popular computer security conferences did not have any papers or presentations dealing with computer viruses. Some computer security people do not seem to consider computer viruses a serious aspect of security, or they ignore the relationship between computer security and computer viruses.

When the CodeRed worm appeared, there was obvious confusion about which kind of computer security vendors could prevent, detect, and stop the worm. Some antivirus researchers argued that there was nothing they could do about CodeRed; others tried to solve the problem with various sets of security techniques, software, and detection tools to support their customers' needs.

Interestingly, such intermediate solutions were often criticized by antivirus researchers. Instead of realizing that affected customers needed such tools, some antivirus researchers suggested that there was nothing to do but to install the security patch.

Obviously, this step is very important in securing the systems. At large corporations, however, the installation of a patch on thousands of systems might not be easy to deliver, especially so in lack of centralized patch management. Furthermore, corporations might have the valid fear that new patches could introduce a new set of problems, compromising system stability.

CodeRed and blended attacks in generalis a problem that must be addressed both by antivirus vendors and by other security product vendors so that multilayered security solutions can be delivered in a combined effort to deal with blended attacks.

10.2. Background

Blended attacks began in November 1988, with the introduction of the Morris worm8. The Morris worm exploited flaws in standard applications of BSD systems:

• The worm attempted to use a buffer overflow attack against VAX-based systems running a vulnerable version of fingerd. (More details on this attack come later.) This resulted in the automatic execution of the worm on a remote VAX system. The worm was able to execute this attack from either VAX or Sun systems, but the attack was only successful against targeted VAX systems. The code was not in place to identify the remote OS version, so the same attack was used against the fingerd program of Suns running BSD. This resulted in a core dump (crash) of fingerd on targeted Sun systems.
• The Morris worm also utilized the DEBUG command of the sendmail application. This command was only available in early implementations of sendmail. The DEBUG command made it possible to execute commands on a remote system by sending an SMTP (simple mail transfer protocol) message. This command was a potential mistake in functionality and was removed in later versions of sendmail. When the DEBUG command was sent to sendmail, someone could execute commands as the recipient of the message.
• Finally, the worm tried to utilize remote shell commands to attack new machines by using rsh from various directories. It demonstrated the possibility of cracking password files. The worm attempted to crack passwords to get into new systems. This attack was feasible because the password file was accessible and readable by everyone. Although the password file was encrypted, someone could encrypt test passwords and then compare them with the encrypted ones. The worm used a small dictionary of passwords that its author believed to be common or weak. Looking at the list of passwords in the author's dictionary, I have the impression that this was not the most successful of the worm's attacks. Indeed, this is used only as a last resort, when the other attacks had failed.

The Morris worm was not without bugs. Although the worm was not deliberately destructive, it overloaded and slowed down machines so much that it was very noticeable after repeated infections.

Thirteen years later, in July 2001, CodeRed repeated a very similar set of attacks against vulnerable versions of Microsoft Internet Information Server (IIS) systems. Using a well-crafted buffer overflow technique, the worm executed copies of itself (depending on its version) on Windows 2000 systems running vulnerable versions of Microsoft IIS. The slowdown effect was similar to that of the Morris worm.

Further information on the buffer overflow attacks is made available in this chapter (without any working attack code).

10.3. Types of Vulnerabilities

10.3.1. Buffer Overflows

Buffers are data storage areas that generally hold a predefined amount of finite data. A buffer overflow occurs when a program attempts to store data in a buffer, and the data is larger than the size of the buffer.

When the data exceeds the size of the buffer, the extra data can overflow into adjacent memory locations, corrupting valid data and possibly changing the execution path and instructions. Exploiting a buffer overflow makes it possible to inject arbitrary code into the execution path. This arbitrary code could allow remote system-level access, giving unauthorized access not only to malicious hackers, but also to replicating malware.

Buffer overflows are generally broken into multiple categories, based both on ease of exploitation and historical discovery of the technique. Although there is no formal definition, buffer overflows are, by consensus, broken into three generations:

• First-generation buffer overflows involve overwriting stack memory9.
• Second-generation overflows involve heaps, function pointers, and off-by-one exploits.
• Third-generation overflows involve format string attacks10 and vulnerabilities in heap structure management. Third-generation attacks often overwrite data just like "anywhere" in memory to force a change in execution flow indirectly according to what the attacker wants11.

For simplicity, the following explanations will assume an Intel CPU architecture, but the concepts can be applied to other processor designs.

10.3.2. First-Generation Attacks

First-generation buffer overflows involve overflowing a buffer that is located on the stack.

10.3.2.1 Overflowing a Stack Buffer

Listing 10.1 declares a buffer that is 256 bytes long. However, the program attempts to fill it with 512 bytes of the letter A (0x41).

Listing 10.1. A Bogus Function

int i;
void function(void)
{
char buffer[256];       //create a buffer

for(i=0;i<512;i++)      //iterate 512 times
buffer[i]='A';  //copy the letter A
}

Figure 10.1 illustrates how the EIP (where to execute next) is modified due to the program's overflowing the small 256-byte buffer.

Figure 10.1. An overflow of a return address.

When data exceeds the size of the buffer, the data overwrites adjacent areas of data stored on the stack, including critical values such as the instruction pointer (EIP), which defines the execution path. By overwriting the return EIP, an attacker can change what the program should execute next.

10.3.2.2 Exploiting a Stack Buffer Overflow

Instead of filling the buffer full of As, a classic exploit will fill the buffer with its own malicious code. Also instead of overwriting the return EIP (where the program will execute next) with random bytes, the exploit overwrites EIP with the address to the buffer, which is now filled with malicious code. This causes the execution path to change and the program to execute injected malicious code.

Figure 10.2 demonstrates a classic stack-based (first-generation) buffer overflow, but there are variations. The exploit utilized by CodeRed was a first-generation buffer overflow that is more complex and is described later.

Figure 10.2. A classic first-generation attack.

10.3.2.3 Causes of Stack-Based Overflow Vulnerabilities

Stack-based buffer overflows are caused by programs that do not verify the length of data being copied into a buffer. This is often caused by using common functions that do not limit the amount of data copied from one location to another.

For example, strcpy is a C programming language function that copies a string from one buffer to another; however, the function does not verify that the receiving buffer is large enough, so an overflow can occur. Many such functions have safer counterparts, such as strncpy, which takes an additional count parameter, specifying the number of bytes that should be copied. On BSD systems, even safer versions, such as strlcpy, are available.

Of course, if the count is larger than the receiving buffer, an overflow can still occur. Programmers often make counting errors and utilize counts that are off by one byte. This can result in a type of second-generation overflow called off-by-one.

10.3.3. Second-Generation Attacks

10.3.3.1 Off-By-One Overflows

Programmers who attempt to use relatively safe functions such as strncat do not necessarily make their programs much more secure from overflows. In fact, the strncat function often leads to overflows because of its rather unintuitive behavior, which is inconsistent with strncpy's behavior27. Errors in counting the size of the buffer can occur, usually resulting in a single-byte overflow, or off-by-one. This is because indexes starting with "0" are not intuitive to novice programmers.

Consider the program shown in Listing 10.2. The programmer has mistakenly utilized "less than or equal to" instead of simply "less than."

Listing 10.2. Another Bogus Program

#include <stdio.h>

int i;
void vuln(char *foobar)
{
char buffer[512];

for(i=0;i<512;i++)
buffer[i]=foobar[i]
}

void main(int argc, char *argv[])
{
if (argc==2)
vuln(argv[1]);
}

In this case, the stack will appear as shown in Table 10.1.

Table 10.1. The Resulting Stack State of the Aforementioned Bogus Program

0x0012FD74buffer[512]
......
......
0x0012FF74Old EBP=0x0012FF80
Ret EIP=0x00401048

Although one is unable to overwrite EIP because the overflow is only one byte, one can overwrite the least significant byte of EBP because it is a little-endian system. For example, if the overflow byte is set to 0x00, the old EBP is set to 0x0012FF00 as shown in Figure 10.3. Depending on the code, the fake EBP might be used in different ways. Often, a mov esp, ebp will follow, causing the stack frame pointer to be in the location of buffer. Thus buffer can be constructed to hold local variables, an EBP, and more importantly, an EIP that redirects to the malicious code.

Figure 10.3. An off-by-one overflow.

The local variables of the fake stack frame are processed, and the program continues execution at the fake return EIP, which has been set to injected code located in the stack. Thus a single-byte overflow could allow arbitrary code execution.

10.3.3.2 Heap Overflows

A common misconception of programmers is that by dynamically allocating memory (utilizing heaps), they can avoid using the stack, reducing the likelihood of exploitation. Although stack overflows might be the "low-hanging fruit," utilizing heaps does not eliminate the possibility of exploitation.

10.3.3.3 The Heap

A heap is memory that has been dynamically allocated. This memory is logically separate from the memory allocated for the stack and code. Heaps are dynamically created (for instance, "new, malloc") and removed (for instance, "delete, free").

Heaps are generally used because the amount of memory needed by the program is not known ahead of time or is larger than the stack.

The memory where heaps are located generally does not contain return addresses such as the stack. Thus without the ability to overwrite saved return addresses, redirecting execution is potentially more difficult. However, that does not mean utilizing heaps makes one secure from buffer overflows and exploitation.

10.3.3.4 Vulnerable Code

Listing 10.3 shows a sample program with a heap overflow. The program dynamically allocates memory for two buffers. One buffer is filled with As. The other is taken in from the command line. If too many characters are typed on the command line, an overflow will occur.

Listing 10.3. A Heap Overflow Example

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main(int argc, char **argv)
{
char *buffer = (char *) malloc(16);
char *input = (char *) malloc(16);

strcpy(buffer,"AAAAAAAAAAAAAAA");

// Use a non-bounds checked function
strcpy(input,argv[1]);
printf("%s",buffer);
}

With a normal amount of input, memory will appear as shown in Table 10.2.

Table 10.2. Memory Layout with Normal Input

00300350InputBBBBBBBBBBBBBBBB
00300360?????????????????????
00300370BufferAAAAAAAAAAAAAAAA

However, if one inputs a large amount of data to overflow the heap, one can potentially overwrite the adjacent heap, as shown in Table 10.3.

Table 10.3. Memory Layout with Abnormal Input

00300350InputBBBBBBBBBBBBBBBB
00300360?????BBBBBBBBBBBBBBBB
00300370BufferBBBBBBBBAAAAAAAA
10.3.3.5 Exploiting the Overflow

In a stack overflow, one could overflow a buffer and change EIP. This allows an attacker to change EIP to point to exploit code, usually in the stack.

Overflowing a heap does not typically affect EIP. However, overflowing heaps can potentially overwrite data or modify pointers to data or functions. For example, on a locked-down system one might not be able to write to C:\AUTOEXEC.BAT. However, if a program with system rights had a heap buffer overflow, instead of writing to some temporary file, someone could change the pointer to the temporary filename to point instead to the string C:\AUTOEXEC.BAT, inducing the program with system rights to write to C:\AUTOEXEC.BAT. This results in a user-rights elevation.

In addition, heap buffer overflows (just like stack overflows) also can result in a denial of service, allowing an attacker to crash an application.

Consider the vulnerable program shown in Listing 10.4, which writes characters to C:\HARMLESS.TXT.

Listing 10.4. A Vulnerable Program

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main(int argc, char **argv)
{
int i=0,ch;
FILE *f;
static char buffer[16], *szFilename;
szFilename = "C:\\harmless.txt";

ch = getchar();
while (ch != EOF)
{
buffer[i] = ch;
ch = getchar();
i++;
}
f = fopen(szFilename, "w+b");
fputs(buffer, f);
fclose(f);
}

Memory will appear as shown in Table 10.4.

Table 10.4. Memory Layout of the Program Shown in Listing 10.4

0x00300ECBargv[1]
.........
0x00407034*szFilenameC:\harmless.txt
.........
0x00407680Buffer
0x00407690szFilename0x00407034

Notice that the buffer is close to szFilename. Both variables are placed to static heap (global data section), which is typically merged into the ".data" section of a PE file on Windows. If the attacker can overflow the buffer, the attacker can also overwrite the szFilename pointer and change it from 0xx00407034 to another address. For example, changing it to 0x00300ECB, which is argv[1], allows one to change the filename to any arbitrary filename passed in on the command line.

For example, if the buffer is equal to XXXXXXXXXXXXXXXX00300ECB and argv[1] is C:\AUTOEXEC.BAT, memory appears as shown in Table 10.5.

Table 10.5. Memory Layout During Attack

0x00300ECBargv[1]C:\AUTOEXEC.BAT
.........
0x00407034 *szFilenameC:\harmless.txt
.........
0x00407680BufferXXXXXXXXXXXXXXX
0x00407690szFilename0x00300ECB

Notice that szFilename has changed and now points to argv[1], which is C:\AUTOEXEC.BAT. Although heap overflows might be more difficult to exploit than the average stack overflow, the increased difficulty does not stop dedicated, intelligent attackers from exploiting them. The exploit code of Linux/Slapper worm (discussed in this chapter) makes this very clear.

10.3.3.6 Function Pointers

Another second-generation overflow involves function pointers. A function pointer occurs mainly when callbacks occur. If in memory, a function pointer follows a buffer, there is the possibility to overwrite the function pointer if the buffer is unchecked. Listing 10.5 is a simple example of such code.

Listing 10.5. An Example Application Using Function Pointers

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int CallBack(const char *szTemp)
{
printf("CallBack(%s)\n", szTemp);
return 0;
}

void main(int argc, char **argv)
{
static char buffer[16];
static int (*funcptr)(const char *szTemp);

funcptr = (int (*)(const char *szTemp))CallBack;
strcpy(buffer, argv[1]); // unchecked buffer

(int)(*funcptr)(argv[2]);
}

Table 10.6 shows what memory looks like when this code is executed.

00401005CallBack()
004013B0system()...
.........
004255D8buffer????????
004255DCfuncptr00401005

For the exploit, one passes in the string ABCDEFGHIJKLMNOP004013B0 as argv[1], and the program will call system() instead of CallBack(). In this case, usage of a NULL (0x00) byte renders this example inert. Avoiding a NULL byte is left to the reader, to avoid exact exploit code (see Table 10.7).

00401005CallBack()
004013B0system()...
.........
004255D8bufferABCDEFGHIJKLMNOP
004255EEfuncptr004013B0

This demonstrates another nonstack overflow that allows the attacker to run any command by spawning system() with arbitrary arguments.

10.3.4. Third-Generation Attacks

10.3.4.1 Format String Attacks

Format string vulnerabilities occur due to sloppy coding by software engineers. A variety of C language functions allow printing of characters to files, buffers, and the screen. Not only do these functions place values on the screen, they also can format them. Table 10.8 shows common ANSI format functions:

Table 10.8. ANSI Format Functions

 printf Print formatted output to the standard output stream wprintf Wide-character version of printf fprintf Print formatted data to a stream (usually a file) fwprintf Wide-character version of fprintf sprintf Write formatted data to a string swprintf Wide-character version of sprintf vprintf Write formatted output using a pointer to a list of arguments vwprintf Wide-character version of vprintf vfprintf Write formatted output to a stream using a pointer to a list of arguments vwfprintf Wide-character version of vfprintf

The formatting ability of these functions allows programmers to control how their output is written. For example, a program could print the same value in both decimal and hexadecimal.

Listing 10.6. An Application Using Format Strings

#include <stdio.h>
void main(void)
{
int foo =1234;
printf("Foo is equal to: %d (decimal), %X (hexadecimal)", foo, foo);
}

The program shown in Listing 10.6 would display the following:

Foo is equal to: 1234 (decimal), 4D2 (hexadecimal)

The percent sign (%) is an escape character signifying that the next character(s) represent the form the value should be displayed. Percent-d (%d), for example, means to "display the value in decimal format," and %X means to "display the value in hexadecimal with uppercase letters." These are known as format specifiers.

The format function family specification requires the format control and then an optional number of arguments to be formatted (as shown in Figure 10.4).

Figure 10.4. The format function specification.

Sloppy programmers, however, often do not follow this specification. The program in Listing 10.7 will display Hello World! on the screen, but it does not strictly follow the specification. The commented line demonstrates the proper way the program should be written.

Listing 10.7. A Bogus Program Using Incorrect Formatting Syntax

#include <stdio.h>
void main(void)
{
char buffer[13]="Hello World!";
printf(buffer);                         // using argument as format control!
// printf("%s",buffer); this is the proper way
}

Such sloppy programming allows an attacker the potential to control the stack and inject and execute arbitrary code. The program in Listing 10.8 takes in one command-line parameter and writes the parameter back to the screen. Notice that the printf statement is used incorrectly by using the argument directly instead of a format control.

Listing 10.8. Another Sloppy Program Subject to Code Injection Attacks

int vuln(char buffer[256])
{
int nReturn=0
printf(buffer);                         //print out command line
// printf("%s",buffer);                 // correct-way
return(nReturn);
}

void main(int argc,char *argv[])
{
char buffer[256]="";                    // allocate buffer
if (argc == 2)
{
strncpy(buffer,argv[1],255);    // copy command line
}
vuln(buffer);                           // pass buffer to bad function
}

This program copies the first parameter on the command line into a buffer, and then the buffer is passed to the vulnerable function. Because the buffer is being used as the format control, however, instead of feeding in a simple string, one can attempt to feed in a format specifier.

For example, running this program with the argument %X will return some value on the stack instead of %X:

C:\>myprog.exe %X
401064


The program interprets the input as the format specifier and returns the value on the stack that should be the passed-in argument. To understand why this is the case, examine the stack just after printf is called. Normally, if one uses a format control and the proper number of arguments, the stack will look similar to what is shown in Figure 10.5.

Figure 10.5. The state of the stack with the proper number of arguments.

However, by using printf() incorrectly, the stack will look different, as shown in Figure 10.6.

Figure 10.6. Incorrect use of print().

In this case, the program believes that the stack space after the format control is the first argument. When %X is fed to our sample program, printf displays the return address of the previous function call instead of the expected missing argument. In this example we display arbitrary memory, but the key to exploitation from the point of view of a malicious hacker or virus writer would be the ability to write to memory for the purpose of injecting arbitrary code.

The format function families allow for the %n format specifier, which will store (write to memory) the total number of bytes written. For example, the following line will store 6 in nBytesWritten because foobar consists of six characters:

printf("foobar%n",&nBytesWritten);

Consider the following:

C:\>myprog.exe foobar%n

When executed, instead of displaying the value on the stack after the format control, the program attempts to write to that location. So instead of displaying 401064 as demonstrated previously, the program attempts to write the number 6 (the total number of characters in foobar) to the address 401064. The result is an application error message, as shown in Figure 10.7.

Figure 10.7. Myprog.exe crashes and proves exploitability.

This does demonstrate, however, that one can write to memory. With this ability, one would actually wish to overwrite a return pointer (as in buffer overflows), redirecting the execution path to injected code. Examining the stack more fully, the stack appears as shown in Figure 10.8.

Figure 10.8. The stack state of myprog.exe before exploitation.

Knowing how the stack appears, consider the following exploit string:

C:>myprog.exe AAAA%x%x%n

This results in the exploited stack state, as shown in Figure 10.9.

Figure 10.9. An example exploitation of stack using tricky format strings.

The first format specifier, %x, is considered the return address (0x401064); the next format specifier, %x, is 0x12FE84. Finally, %n will attempt to write to the address specified by the next DWORD on the stack, which is 0x41414141 (AAAA). This allows an attacker to write to arbitrary memory addresses.

Instead of writing to address 0x41414141, an exploit would attempt to write to a memory location that contains a saved return address (as in buffer overflows). In this example, 0x12FE7C is where a return address is stored. By overwriting the address in 0x12FE7C, one can redirect the execution path. So instead of using a string of As in the exploit string, one would replace them with the value 0x12FE7C, as shown in Figure 10.10.

Figure 10.10. Exploitation of a return address.

The return address should be overwritten by the address that contains the exploit code. In this case, that would be at 0x12FE84, which is where the input buffer is located. Fortunately, the format specifiers can include how many bytes to write using the syntax, %.<bytestowrite>x. Consider the following exploit string:

<exploitcode>%x%x%x%x%x%x%x%x%x%.622404x%.622400x%n\x7C\xFEx12

This will cause printf to write the value 0x12FE84 (622404+622400=0x12FE84) to 0x12FE7C if the exploit code is two bytes long. This overwrites a saved return address, causing the execution path to proceed to 0x12FE84, which is where an attacker would place exploit code, as shown in Figure 10.11.

Figure 10.11. Exploitation of a return address to exploit the buffer.

For the sake of completeness, I need to mention that most library routines disallow the use of large values as a length specifier, and others will crash with large values behind a "%" specifier12. Thus the usual attacks write a full DWORD (32-bit value) in four overlapping writes similar to the following sequence:

      000000CC (1^st write)
000001BB   (2^nd write)
000002AA     (3^rd write)
000003CC       (4^th write)
CCAABBCC (complete DWORD)


By not following the specification exactly, programmers can allow an attack to overwrite values in memory and execute arbitrary code. While the improper usage of the format function family is widespread, finding vulnerable programs is also relatively easy.

Most popular applications have been tested by security researchers for such vulnerabilities. Nevertheless, new applications are developed constantly and, unfortunately, developers continue to use the format functions improperlyleaving them vulnerable.

10.3.4.2 Heap Management

Heap management implementations vary widely. For example, the GNU C malloc is different from the System V routine. However, malloc implementations generally store management information within the heap area itself. This information includes such data as the size of memory blocks and is usually stored right before or after the actual data.

Thus by overflowing heap data, one can modify values within a management information structure (or control block). Depending on the memory management functions' actions (for instance, malloc and free) and the specific implementation, one can cause the memory management function to write arbitrary data at arbitrary memory addresses when it utilizes the overwritten control block.

10.3.4.3 Input Validation

Input validation exploits take advantage of programs that do not properly validate user-supplied data. For example, a Web page form that asks for an e-mail address and other personal information should validate that the e-mail address is in the proper form and does not contain special escape or reserved characters.

Many applications, such as Web servers and e-mail clients, do not properly validate input. This allows hackers to inject specially crafted input that causes the application to perform in an unexpected manner.

Although there are many types of input validation vulnerabilities, URL canonicalization and MIME header parsing are specifically discussed here due to their widespread usage in recent blended attacks.

10.3.4.4 URL Encoding and Canonicalization

Canonicalization is the process of converting something that can exist in multiple different but equally valid forms into a single, common, standard form. For example, C:\test\foo.txt and C:\test\bar\..\foo.txt are different full pathnames that represent the same file. Canonicalization of URLs occurs in a similar manner, where http://doman.tld/user/foo.gif and http://domain.tld/user/bar/../foo.gif would represent the same image file.

A URL canonicalization vulnerability results when a security decision is based on the URL and not all of the URL representations are taken into account. For example, a Web server might allow access only to the /user directory and subdirectories by examining the URL for the string /user immediately after the domain name. For example, the URL http://domain.tld/user/../../autoexec.bat would pass the security check but actually provide access to the root directory.

After more widespread awareness of URL canonicalization issues due to such an issue in Microsoft Internet Information Server, many applications added security checks for the double-dot string (..) in the URL. However, canonicalization attacks were still possible due to encoding.

For example, Microsoft IIS supports UTF-8 encoding such that %2F represents a forward slash (/). UTF-8 translates US-ASCII characters (7 bits) into a single octet (8 bits) and other characters into multioctets. Translation occurs as follows:

0-7 bits	0xxxxxxx
8-11 bits	110xxxxx 10xxxxxx
12-16 bits	1110xxxx 10xxxxxx 10xxxxxx
17-21 bits	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx


For example, a slash (0x2F, 0101111) would be 00101111 in UTF-8. A character with the hexadecimal representation of 0x10A (binary 100001010) has 9 bits and thus a UTF-8 representation of 11000100 10001010.

Although standard encoding did not defeat the aforementioned input validation security checks, Microsoft IIS still provides decoding, if one encodes a 7-bit character using the 811 bit rule format.

For example, a slash (0x2F, 101111) in 811 bit UTF-8 encoding would be 11000000 10101111 (hexadecimal 0xC0 0xAF). Thus, instead of using the URL

http://domain.tld/user/../../autoexec.bat

one could substitute the slash with the improper UTF-8 representation:

http://domain.tld/user/..%co%af../autoexec.bat.

The input validation allows this URL because it does not recognize the UTF-8-encoded forward slash, which gives access outside the Web root directory. Microsoft fixed this vulnerability with Security Bulletin MS00-78.

In addition, Microsoft IIS performs UTF-8 decoding on two separate occasions. This allows characters to be double-encoded. For example, a backslash (0x5C) can be represented as %5C. However, one also can encode the percent sign (0x25) itself. Thus, %5C can be encoded as %255c. On the first decoding pass, %255c is decoded to %5c, and on the second decoding pass, %5C is decoded to a backslash.

Thus, a URL such as http://domain.tld/user/..%5c../autoexec.bat will not pass the input validation check, but http://domain.tld/user/..%255c../autoexec.bat would pass, allowing access outside the Web root directory. Microsoft fixed this vulnerability with Security Bulletin MS01-26.

The inability of Web servers to provide proper input validation can lead to attacks. For example, in IIS, one can utilize the encoding vulnerabilities to break out of the Web root directory and execute CMD.EXE from the Windows system directory, allowing remote execution. W32/Nimda utilized such an attack to copy itself to the remote Web server and then execute itself.

When Internet Explorer parses a file, it can contain embedded MIME-encoded files. Handling of these files occurs by examining a header, which defines the MIME type. Using a look-up table, these MIME types are associated with a local application. For example, the MIME type audio/basic is generally associated with Windows Media Player. Thus, MIME-encoded files designated as audio/basic are passed to Windows Media Player.

MIME types are defined by a Content-Type header. In addition to the associated application, each type has a variety of associated settings, including the iconindicating whether to show the extension and whether to pass the file automatically to the associated application when the file is being downloaded.

When an HTML e-mail is received with Microsoft Outlook and some other e-mail clients, code within Internet Explorer actually renders the e-mail. If the e-mail contains a MIME-embedded file, Internet Explorer parses the e-mail and attempts to handle embedded MIME files. Vulnerable versions of Internet Explorer would check whether the application should be opened automatically (passed to the associated application without prompting) by examining the Content-Type header. For example, audio/x-wav files are automatically passed to Windows Media Player for playing.

A bug exists in vulnerable versions of Internet Explorer, however, where files are passed to the incorrect application. For example, a MIME header might appear as follows:

Content-Type: audio/x-wav;
name="foobar.exe"
Content-Transfer-Encoding: base64
Content-ID: <CID>


In this case, Internet Explorer determines that the file should be passed automatically to the associated application (no prompting) because the content type is audio/x-wav. However, when determining what the associated application is, instead of utilizing the Content-Type header (and the file header itself), Internet Explorer incorrectly relies on a default association, which will be made according to the extension. In this case, the extension is .EXE, and the file is passed to the operating system for executioninstead of passing the audio file to an associated application to be played.

This bug allows for the automatic execution of arbitrary code. Several Win32 mass-mailers send themselves via an e-mail with a MIME-encoded malicious executable and a malformed header, and the executable silently executes unbeknownst to the user. This occurs whenever Internet Explorer parses the mail and can happen when a user simply reads or previews e-mail. Such e-mail worms can spread themselves without any user actually executing or detaching a file.

Any properly associated MIME file type that has not set the "Confirm open after download" flag can be utilized for this exploit. Thus a definitive list is unavailable because developers can register their own MIME types.

Such an exploit was utilized by both W32/Badtrans and W32/Klez, allowing them to execute themselves upon a user's reading or previewing an infected e-mail.

10.3.4.6 Application Rights Verification

Although improper input validation can give applications increased access, as with URL canonicalization, other models simply give applications increased rights due to improper designation of code as "safe." Such a design is employed by ActiveX. As a result, numerous blended attacks have also used ActiveX control rights verification exploits13.

10.3.4.7 Safe-for-Scripting ActiveX Controls

By design, ActiveX controls are generally scriptable. They expose a set of methods and properties that can potentially be invoked in an unforeseen and malicious manner, often via Internet Explorer.

The security framework for ActiveX controls requires developers to determine whether their ActiveX control could potentially be used in a malicious manner. If a developer determines a control is safe, the control can be marked "safe for scripting."

Microsoft notes that ActiveX controls that have any of the following characteristics must not be marked safe for scripting:

• Accessing information about the local computer or user
• Exposing private information on the local computer or network
• Modifying or destroying information on the local computer or network
• Faulting of the control and potentially crashing the browser
• Consuming excessive time or resources, such as memory
• Executing potentially damaging system calls, including executing files
• Using the control in a deceptive manner and causing unexpected results

However, despite these simple guidelines, some ActiveX controls with these characteristics have been marked safe for scriptingand used maliciously.

For example, VBS/Bubbleboy used the Scriptlet.Typelib14 ActiveX control to write out a file to the Windows Startup directory. The Scriplet.Typelib contained properties to define the path and contents of the file. Because this ActiveX control was incorrectly marked safe for scripting, one could invoke a method to write a local file via a remote Web page or HTML e-mail without triggering any ActiveX warning dialog.

ActiveX controls that have been marked safe for scripting can easily be determined by examining the Registry. If the safe-for-scripting CLSID key exists under the Implemented Categories key for the ActiveX control, the ActiveX control is marked safe for scripting.

For example, the Scriptlet.Typelib control has a class ID of {06290BD5-48AA-11D2-8432-006008C3FBFC} , and the safe-for-scripting CLSID is { 7DD95801-9882-11CF-9FA0-00AA006C42C4} . In the Registry, an unpatched system would contain the key

HKCR/CLSID/{06290BD5-48AA-11D2-8432-006008C3FBFC}/Implemented Categories /{7DD95801-9882-11CF-9FA0-00AA006C42C4}

This allows any remote Web page or incoming HTML e-mail to create malicious files on the local system. Clearly, leaving such security decisions to the developer is far from foolproof.

10.3.4.8 System Modification

After malicious software has gained access to the system, the system is often modified to disable application or user rights verification. Such modifications can be as simple as eliminating a root password or modifying the kernel, allowing user rights elevation or previously unauthorized access.

For example, CodeRed creates virtual Web roots, allowing general access to the compromised Web server, and W32/Bolzano15 patches the kernel, disabling user rights verification on Windows NT systems. System modificationbased exploitation techniques, however, were also possible on older systems such as Novell NetWare environments.

10.3.4.8.1 NetWare ExecuteOnly Attribute: Consider Harmful

For many years, the Novell NetWare ExecuteOnly attribute has been recommended as a good way to secure programs against virus infections and to disallow illegal copying. Unfortunately, this attribute is not safe and, in some cases, can be harmful. A small bug in old versions of NetWare Workstation Shell (for instance, NET3.COM from 1989) made it possible for some viruses to write to ExecuteOnly files, even though these viruses had no NetWare-specific code at all16.

The same buggy shell still works with newer Novell NetWare versions, which means that the problem exists even today. When running on top of a buggy network shell, fast infectors can write into ExecuteOnly files on any NetWare version. In addition, there are other ways to exploit the same vulnerability on any version of the NetWare shell.

This happens because the ExecuteOnly attribute is not a NetWare right but a plain NetWare attribute. The workstation shell is supposed to handle it, but in an unsecured DOS workstation, this is impossible. Because there is no write protection at the server side for ExecuteOnly files, any program capable of subverting the shell checks can not only read and copy ExecuteOnly files, but can also write to them.

Because ExecuteOnly files cannot normally be read by any program or even by the Administrator, an infected ExecuteOnly file cannot be detected by standard virus scanners. This makes recovery from such an infection a major problem.

10.3.4.8.2 Execute, Only?

Under Novell NetWare it is possible to mark program files as ExecuteOnly. Only the supervisor has the rights to do this. Because these files are not available for read by anyone, including the supervisor, this is supposed to make them unreachable for any kind of access except execution or deletion (by the supervisor only). Unfortunately, this method is not perfect, and the problem is fundamental.

Before the workstation can access files from the server, it must load the NetWare client software into memory. In the case of a DOS workstation, the client software consists of two TSR programs. The client must first start IPX (low-level communication protocol driver) and then NETx, the Workstation Shell. NETx adds several new subfunctions to the DOS INT 21h handler.

When IPX and NETx are running, the user of the workstation can execute LOGIN to connect to the server. When the client requests a file, the shell checks the location of the file. There are no special functionalities in NETx for requests for local files because it redirects all requests that need replies from the server. Thus all program execution requests, EXEC (INT 21h/AH=4Bh) calls, are redirected to the file server if the path of the executable is mapped from the server.

Every EXEC function (implicitly) must read the program code into the workstation memory first. To do this, it must be able to open the file for reading. ExecuteOnly files are not available to open by any user (including the supervisor), so the shell has to have a "backdoor." The workstation shell whispers, "I am the shell, and I want to execute this file; give me access to this file." And its open/read/close requests for this specific ExecuteOnly file succeed.

Note

I am not going to document here how the backdoor works. That's because I do not like to give ideas to virus writers.

In the buggy network shell, the operation continues with a simple "open file" function call:

1AA3:62EF B8003D        MOV     AX,3D00         ; Open for READ
1AA3:62F2 CD21          INT     21

Because the shell needs to open the file, the vulnerable code uses an Open function (3Dh) of INT 21h interrupt. Unfortunately, every resident program can monitor this function call since it goes through via the interrupt vector table. Thus the interrupt call is visible to fast infectors (viruses that hook INT 21h and wait for the Open subfunction). Without any efforts, this vulnerability allows viruses to infect files marked ExecuteOnly when executed on a workstation running the bogus shell.

To demonstrate this, I executed a test using the Burglar.1150.A virus, a typical fast infector. (Burglar was in the wild at the time of this experience.) I loaded the preceding vulnerable Workstation Shell version and Novell NetWare 4.10. I also created several test files in one directory by using supervisor rights and flagged them as ExecuteOnly.

I executed the virus on a workstation where I was logged in with normal user rights (no supervisor rights), but I did have directory modification rights to the test directory. The virus could infect ExecuteOnly files perfectly as long as I had directory modification rights on the test directory. That means there is no write protection on ExecuteOnly files at the server side on a file granularity level. If a workstation is able to write to an ExecuteOnly file, the NetWare server will allow it!

For another simple test, I executed the DOS internal command ECHO and redirected its output to an ExecuteOnly file. The command was the following:

ECHO X >> test.com

To my surprise, I was able to modify ExecuteOnly files with this method as well.

Now, let's see what Novell did to fix the vulnerability. Apparently, there is a minor, but crucial change to how the shell opens a file. The shell calls its own INT 21h entry point directlynot through the interrupt vector table. This means that the call is no longer visible to other TSR programs (including viruses). The shell's code is fixed the following way:

1AB3:501D B8003D        MOV     AX,3D00
1AB3:5020 9C  &nbs