Maximize
Bookmark

VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Crawling - A unique spreading technique

freeon
DoomRiderz #1
2007

[Back to index] [Comments]

This article is the basis for talking about a possibly overlooked effective spreading technique. We often think of spreading our worms/viruses based on what a user will use to communicate with other humans. This usually takes the form of email, IRC or any Instant Message application, p2p, exploit scanning. Now each one of these techinques are usually enough to propagate our worm to another host.

The main problem with using each of these techniques is finding users to spread to. If we spread using email we usually have a built in smtp engine or outlook and then search either the users contacts folder or searching the users computer for emails, html looking for contacts to spread to. We usually try these techniques because we can assume that the person we are spreading to knows the host with the virus already. Thus if they know each other there might be a level of trust, a trust big enough for another to open our virus/worm and execute it.

So if we take this technique of spreading further and break out of the host computer we can try something that most search engines, and bots use already and that is crawling. Crawling the maze of the internet finds us many more advantages but the most important is that we are now targeting the masses instead of one host at each run. The obvious downsides of using this method is you lose the trust between the host and it's possible victims. A user maybe less enclined to open a email with an executable attachment if its from an uknown sender but yet after all these years we are still able to find an idiot out there to run them.

Another and probably most important downside is you loose total control. By which I mean your worm will grow and spread at a much faster rate. When I say at a faster rate I mean in under a 4 hour period I was able find over 400,000 email addresses. Keep in mind with ofcourse with this many possiblities not everyone will open the email and probably not all the emails will get sent. But within 4 hours with a one worm crawling engine seeing over 400,000 email addresses to spread to is still a very impressive feat. Now obviously spreading to each and every email found and having multiple worms out there each doing the same thing will obviously have heavy reprocusions and one of which we will cause heavy network traffic at an alarming rate. And by an alarming rate I mean if you write one that has no control and doesn't limit it'self you are going to be looking at a epidemic.

Some other notes besides looking for email is obviously scanning a host for exploits. Much like worms in the past like COde Red we can utilize the same techniques to sites we find when crawling and since the current hype in network security seems to be finding web exploits in server side scripting languages like php/asp/java we can take advantage of these and utilize file upload or remote code/execution exploits to propagate and run automatically.

The above paragraph I mentioned that within a 4 hour period I was able to gather more than 400,000 emails this is great. But let us focus on finding more hosts then just the one's we crawl. We know that every email address contains what? a host! for example [email protected] or [email protected] we can easily parse an email address to get the host information and perform a routine of url scanning those also. So now our list of sites we crawled + all the unique hosts.

What else can we look for when we are crawling besides emails? The answer is obvious. Everyone of us has been to a forum and on these forum profiles we usually have a place to enter in our icq, msn, aim nicknames. It's a nice feature and one that can be taken fully advantage of. When a user enters their nickname for one of these fields we can usually notice a link on the forum each one of these links have a call like

aim:goim?screenname=<screenName>&message=Hello+Are+you+there?

If our infected host user does use aim we can obviously add our new found user as a contact and then spread like normal. Obviously this method will be less effective then scanning for emails but with the amount of forums and places that use IM links it might still be an impressive amount. Remember the whole goal in crawling is to be on a massive level.

Ok so now that we have opened our eyes a little bit to the possiblities to what crawling can do for a worm. I must warn you that though you may think this will be a good thing for your virus/worm and you may go off and build a crawler engine which is your descion but you will unleash a monster. A monster which you might not see as of yet. So please remember that just because you have power it doesn't mean you should use it.

With all that said. Below is a crawling engine I built. It's in no way perfect and probably has some issues with it. It has no spreading routine only because I did not want this to be used by some slackjaw newb. It's also written in C# so hopefully you can use it as an example of a crawling engine to follow along with this article. It is pretty cool to see the results.

Special shouts to

Anyone i forgot, i'm sorry it's 2am so i'm tired.. :)

To build the code

using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Threading;
using System.Net;
using System.IO;
using Microsoft.Win32;
using System.Text.RegularExpressions;

namespace Saga
{
    public partial class Form1 : Form
    {
        private string site;
        private Hashtable hashish = new Hashtable();
        private Hashtable oldhashish = new Hashtable();
        private ArrayList arrEmails = new ArrayList();
        private string k = "";
               
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //starting page we set this to some random site
            //then turn on scripting errors to off, set the visiblity to no
            //then navigate to the site and enable our timer.
            //once the page has been downloaded it will trigger web_DocumentCompleted event
            //the event will then parse the page looking for any hyperlinks or email addresses
            //the timer portion will load a new site in our queue each time and then remove it
            //from the list so we don't hit the same site more than once per run...
           
            string homepage = "http://vx.netlux.org/doomriderz/links.html";
            web.ScriptErrorsSuppressed = true;
            web.Visible = true;
            web.Navigate(homepage);

            time.Enabled = true;
        }
       
        private void web_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            //check if the web document is null, this usually happens when we
            //don't have any content loaded up yet or it timed out...
            if (web.Document != null)
            {
                //loop through each html element looking for links + emails
                HtmlElementCollection htmlCol = web.Document.All;
                for (int i = 0; i < htmlCol.Count; i++)
                {
                    //retrieve all elements then check if they are links <a
                    HtmlElement htmlElement = htmlCol[i];
                    if (htmlElement.TagName.ToUpper() == "A")
                    {
                        //retrieve the url attribute of the link <a href="http://www.doomriderz.com">doomriderz</a>
                        string link = htmlElement.GetAttribute("href");
                        if (link != "")
                        {
                            //if the link doesn't contain mailto and contains http or www. then go to below...
                            //else run email check cause it might be a email...
                            if ((!link.Contains("mailto:")) && (link.Contains("http") || link.Contains("www.")))
                            {
                                //we don't want to crawl every page on the site
                                //just the one we found. I know u say what? why not? well
                                //the fact is we can find plenty of links on one page there
                                //is no need to get greedy...

                                Uri uri = new Uri(link);

                                //if we don't have the host already, and the host isn't in a old list and uri.host isn't nothing
                                if ((!oldhashish.Contains(uri.Host)) && uri.Host != "")
                                {
                                    //add all the links to our hashtable
                                    //this will act as our queue...
                                    if (!hashish.Contains(uri.Host))
                                    {
                                        if (hashish.Count < 50)
                                        {
                                            //add the site to the queue
                                            Console.WriteLine("Adding " + uri.Host + "|" + link);
                                            hashish.Add(uri.Host, link);
                                        }
                                    }
                                }
                            }
                            else
                            {
                                //replace the mailto portion
                                link = link.Replace("mailto:", "");
                                //clean up the address
                                string email = ExtractAddr(link);
                                if (email != "")
                                {
                                    //if we don't already have the email then
                                    //then we revalidate and then add the email to our list.
                                    if (!arrEmails.Contains(email))
                                    {
                                        string strValGex =  @"^([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}" +
                                                            @".[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+" +
                                                            @".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$";
                                        Regex regVal = new Regex(strValGex);
                                        if (regVal.IsMatch(email))
                                        {
                                            if (!arrEmails.Contains(email))
                                            {
                                                //add the email to our email array list
                                                Console.WriteLine("Found email:" + email);
                                                arrEmails.Add(email);
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

        private void time_Tick(object sender, EventArgs e)
        {
            if (hashish.Count > 0)
            {
                IDictionaryEnumerator en = hashish.GetEnumerator();
                while (en.MoveNext())
                {
                    if (k == "")
                    {
                        k += en.Value.ToString();
                        Console.WriteLine("Site is " + k);
                    }
                    else if(k != en.Value.ToString())
                    {
                        if (!oldhashish.Contains(en.Key))
                        {
                            hashish.Remove(en.Key);
                            oldhashish.Add(en.Key, en.Value);
                            k = en.Value.ToString();
                            web.Navigate(k);
                            break;
                        }
                        Console.WriteLine("Site is " + k);
                    }
                    Console.WriteLine("----------------------------------");
                    Console.WriteLine("Current Site:" + web.Url.ToString());
                    Console.WriteLine("Current key:" + en.Key.ToString());
                    Console.WriteLine("Current value:" + en.Value.ToString());
                   
                }
            }
            else
            {
                //we didn't find any links so we need to navigate
                //to somewhere there is a lot of links...
                //how about to some crazy iran page, maybe it will hit all their servers muwhahaha :)
                //hopefully it hits some crazy dumbass radical muslim websites. Those people are
                //such tards anyways.
                web.Navigate("http://en.wikipedia.org/wiki/Iran");
            }
           
            Console.WriteLine("-------------- Stats -------------------------------");
            Console.WriteLine("Total number of links to follow:" + hashish.Count);
            Console.WriteLine("Total followed so far:" + oldhashish.Count);
            Console.WriteLine("Total Emails found so far:" + arrEmails.Count);
            Console.WriteLine("----------------------------------------------------");
        }

        public string ExtractAddr(string InputData)
        {
            /*** genetix's email extractor ***/
            string tmpExtractAddr = null;
            int AtPos, p1, p2, n = 0;
            string tmp = null;
            AtPos = (InputData.IndexOf("@", 0) + 1);
            p1 = 1;
            p2 = InputData.Length;
            tmpExtractAddr = "";
            if (AtPos == 0)
                return tmpExtractAddr;

            for (n = (AtPos - 1); n >= 1; n--)
            {
                tmp = InputData.Substring(n - 1, 1);
                if ((tmp == " ") | (tmp == "<") | (tmp == "(") | (tmp == ":") | (tmp == ",") | (tmp == "["))
                {
                    p1 = n + 1;
                    break;
                }
            }

            for (n = (AtPos + 1); n <= InputData.Length; n++)
            {
                tmp = InputData.Substring(n - 1, 1);
                if ((tmp == " ") | (tmp == ">") | (tmp == ")") | (tmp == ":") | (tmp == ",") | (tmp == "]"))
                {
                    p2 = n - 1;
                    break;
                }
            }

            //strip out any html and do some extra clean up
            string email = InputData.Substring(p1 - 1, (p2 - p1) + 1);
            email = Regex.Replace(email, @"<(.|n)*?>", string.Empty);
            email = email.Replace("&nbsp;", "");
            email = email.Replace(" ", "");
            email = email.Replace(@"""", "");

            return email;
        }

    }
}
 
[Back to index] [Comments]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! vxheaven.org aka vx.netlux.org
deenesitfrplruua