An Introduction To Malware
An Introduction To Malware
An Introduction To Malware
Anduamlak Berhanu
Rekik Kassahun
Introduction
• Malware is a general term for all types of malicious software, which in the context
of computer security means: Software which is used with the aim of attempting to
breach a computer system’s security policy with respect to Confidentiality, Integrity
or Availability.
• The computer system whose security policy is attempted breached is usually
known as the target for the malware.
• A program P which would be classified as malware if initiated by an user with no
special privileges, could be quite acceptable if executed by a system administrator
with extensive privileges on the target system
• Malware is commonly divided into a number of classes, depending on the way in
which it is introduced into the target system and the sort of policy breach which it
is intended to cause
Terminologies
• Virus
Malware which spreads from one computer to another by embedding copies of
itself into files, which by some means or another are transported to the target.
The medium of transport(Vector) may be initiated by the virus itself or rely on an
unsuspecting human user
• Worm
Malware which spreads from one computer to another by transmitting copies of
itself via a network which connects the computers, without the use of infected files.
• Trojan horse
Malware which is embedded in a piece of software which has an apparently useful
effect. The useful effect is often known as the overt effect, as it is made apparent to
the receiver, while the effect of the malware, known as the covert effect, is kept
hidden from the receiver.
Terminologies
• Logic Bomb
Malware which is triggered by some external event. For example the creation or
deletion of a specific data item such as a file or a database entry
• Rabbit
Malware which uses up all of a particular class of resource, such as message
buffers, file space or process control blocks, on a computer system.
• Backdoor
Malware which, once it reaches the target, allows the initiator to gain access to the
target without going through any of the normal login and authentication
procedures.
Techniques/Strategies
• Consists of two parts
Insertion Code: Code to insert a copy of the virus into one or
more files(Victim Files) on the target. (All Virus Contain)
Payload: Code to perform the malicious activity associated with the virus. (Optional)
Techniques/Strategies
• Execution Strategy
Forcing the computer to execute the various parts of the virus and the infected program.
• Disguise Strategy
Cant be seen directly and the designer will attempt to disguise by including nonsense code or by
encryption
Code Placement
• The PE file header consists of an MS-DOS stub, the PE signature, the COFF
File Header, and an Optional Header.
• The MS-DOS Stub is a valid application that runs under MS-DOS and
is placed at the front of the .EXE image.
• The linker places a default stub here, which prints out the message
“This program cannot be run in DOS mode” when the
image is run in MS-DOS.
• At location 0x3c, the stub has the file offset to the Portable
Executable (PE) signature. This information enables Windows NT to
properly execute the image file, even though it has a DOS Stub.
PE Header
• After the MS-DOS stub, at the file offset specified at offset 0x3c, there is a
4-byte signature identifying the file as a PE format image file.
• This format is used in Win32, Posix on Windows NT,
and for some device drivers in Windows NT.
• Currently, this signature is “PE\0\0” (the letters “P” and “E” followed by
two null bytes).
• At the beginning of an object file, or immediately after the signature of an
image file, there is a standard COFF header of the following format.
Optional Header
• Every image file has an Optional Header that provides information to the loader.
• This header is optional in the sense that some files (specifically, object files) do
not have it. For image files, this header is required. An object file may have an
optional header, but generally this header has no function in an object file except
to increase size.
• The Optional Header is of variable length and is divided into three parts.
• The first eight fields of the Optional Header are standard fields, defined for every
implementation of COFF. These fields contain general information useful for
loading and running an executable file.
• The next twenty-one fields are an extension to the COFF Optional Header format
and contain additional information needed by the linker and loader in Windows.
• The third part of the Optional Header is a set of Data Directories, which give the
positions (relative to the start of the file) and sizes of a number of important
tables, such as the relocation table, debug table, import address table, and the
attribute certificate table.
Optional Header Cont..
• Except for the certificate table, these are loaded into memory as part of the image
to be executed. The certificate table contains certificates which can be used to
verify the authenticity of the file or various parts of its contents; typically each
certificate contains a hash of all or part of the file, digitally signed by its originator
– a so-called Authenticode PE Image Hash.
Section table
• Each row of the Section Table, in effect, is a section header. This table immediately
follows the optional header, if any. This positioning is required because the file
header does not contain a direct pointer to the section table; the location of the
section table is determined by calculating the location of the first byte after the
headers. Make sure to use the size of the optional header as specified in the file
header.
• The number of entries in the Section Table is given by the NumberOfSections field
in the file header. Entries in the Section Table are numbered starting from one.
The code and data memory section entries are in the order chosen by the linker.
• In an image file, the virtual addresses for sections must be assigned by the linker
such that they are in ascending order and adjacent, and they must be a multiple
of the Section Align value in the optional header.
Vira Manuplation
• The section in the PE will always be allocated an integral number of sectors on the
disc, this expansion will not necessarily change the size of the file – the extra code
can be fitted into the “waste space” at the end of the disc sector.
• If there is no single section with enough waste space, the malicious code can be
divided among several sections.
Executing the Virus Code
• The simplest way of ensuring that the virus code is executed is to change the
AddressOfEntryPoint field in the Optional Header, so that it points to the start of
the virus code.
• The original code is executed after the virus code so that the executable appears
to have the usual effect and the user does not get suspicious.
• But changing the AddressOfEntryPoint field is such an obvious idea that most
antivirus systems check whether the beginning of the code is in a section which
should not contain executable code or contains known patterns from a database
of viral code.
Entry Point Obscuring (EPO)
• Insert a JUMP instruction somewhere in the executable’s code, to cause a jump to
the start of the virus code.
• Change an existing CALL instruction to call the virus code.
• Changing the call instruction is not an easy job in many architectures so the viral code for inserting the
virus in the victim file therefore checks whether the address after the 0xe8 “opcode” points into the
import section, in which case it really is a CALL instruction.
• Change the content of the import table, which contains addresses of all imported functions, so that one of
the entries in the table is replaced by the address of the start of the virus code.
• Detection of EPO vira is a challenge, as the inserted or modified JUMP or CALL instructions can in
principle be placed anywhere within the code.
• Searching through the file and checking all the JUMP and CALL instructions to see whether they
activate viral code can be a slow process.
• The other effective way to detect the presence of the virus is to emulate the execution of the
program and see whether it would actually cause any damaging effects .
• This detection is slow and can be fooled.
• So this has lead to the development of antivirus systems
which rely on detection of malicious behavior rather than recognition of signatures.
• A variant of the EPO approach is for the actual viral code to be kept in a library file
(a shared library or a DLL) which the infected executable will call.
Disguising the Virus
• Since signature-based antivirus systems attempt to find viral code by looking for
characteristic byte sequences in the executable, virus designers have adopted various
techniques for disguising such sequences. The two dominant techniques are Encryption of
the viral code and Polymorphism.
• Encryption
• Encryption of the viral code with different encryption keys will produce different cipher texts, thus ensuring
that a signature scanner cannot recognize the virus. However, the cipher text needs to be decrypted before the
virus can be executed; the code for the decryption algorithm cannot itself be encrypted, and will need to be
disguised using another technique, such as polymorphism.
• The first attempts to encrypt vira used very simple encryption algorithms, such as using bitwise XOR (Exclusive
Or. More modern encrypted vira use stream ciphers or SKCS block ciphers. Whatever technique is used, the key
must be somewhere within the virus, and careful analysis of the decryption algorithm will reveal where this is.
• Polymorphism
• virus is deliberately designed to have a large number of variants of its code, all with the same basic
functionality. This is ensured by including different combinations of instructions which do not have any net
effect.
• A further approach is code transposition: to swap round the order of instructions (or
whole blocks of instructions) and insert extra jump instructions in order to achieve the
original flow of control.
Other Types of Virus
Malware may be based on any kind of software,not just ordinary executable (.exe)
files and linked libraries.
1. Interpreted scripting languages, particularly Perl and Visual Basic.
2. Interpreteddocument handling languages such as PostScript and PDF.
3. Macro languages used in document handling programs such as MS Word or
Excel.The actual macro language is usually some form of Basic.
4. Multimedia files, such as the RIFF files used to supply animated cursors and icons
• A particular danger with these is that many ordinary users are completely unaware
that there is a possibility of executing malicious code due to, say, opening a
PostScript document or using an attractively animated cursor.
• Vira based on these vectors are therefore easily spread, for example via e-mail. On
the positive side, this “user unawareness” means that few designers of such vira
bother to encrypt them or disguise them in any way.
Other Types of Virus
• A classic example is the Melissa e-mail virus of 1999, which used Word macros. If the
infected Word 2000 document was opened, it caused a copy to be sent to up to 50 other
users via MS Outlook, using the local user’s address book as a source of addresses.
• A more modern example is the family of trojan horses which exploited the Microsoft
animated cursor vulnerability (2006). By passing an apparently innocent animated cursor in
an ANI file to an unsuspecting user via a malicious web page or HTML e-mail message, the
attacker was able to perform remote code execution with the privileges of the logged-in
user. The vulnerability was in fact a buffer overflow vulnerability based on the fact that
the lengths of RIFF chunks (the logical blocks of a multimedia file) were not checked.
This made it possible, by sending a malformed chunk, to create a buffer overflow in the
stack, overwriting the return address for the LoadAniIcon function which should load the
animated cursor. In this way, the normal function return was replaced by a jump to viral
code hidden in the ANI file.
Worms
• Worms are pieces of software which reproduce themselves on hosts in a network
without explicitly infecting files.
• A worm typically consists of three parts:
1. Searcher: Code used to identify potential targets, i.e. other hosts which it can try
to
infect.
2. Propagator: Code used to transfer the worm to the targets.
3. Payload: Code to be executed on the target. As in the case of vira, the payload is
optional, and it may or may not have a damaging effect on the target. Some
worms are just designed to investigate how worms can be spread, or actually
have a useful function.
• One of the very first worms was invented at Xerox Palo Alto Research Center in the
early 1980s in order to distribute parts of large calculations among workstations at
which nobody was currently working.
• On the other hand, even a worm without a payload may have a malicious effect, since
the task of spreading the worm may use a lot of network resources and cause Denial
of Service.
Example of this was the W32/Slammer worm of 2003.
1. To exploit the targets in order to cause a Distributed DoS attack on a chosen system.
Example: Apache/mod ssl (2002)
2. Website defacement on the targets, which are chosen to be web servers.
Example:Perl.Santy (2004), which overwrote all files with extensions .asp, .htm, .jsp,
.php,.phtm and .shtm on the server, so they all produced the text “This site is defaced!!!
NeverEverNoSanity WebWorm generation xx”.
3. Installation of a keylogger to track the user’s input, typically in order to pick up
passwords, PIN codes, credit card numbers or other confidential information, and to
transmit these to a site chosen by the initiator of the worm. Malware which does this
sort of thing is often known as spyware.
4. Installation of a backdoor, providing the initiator with access to the target host. The
backdoor can be used to produce breaches of confidentiality similar to spyware.
5. To replace user files with executables which ensure propagation of the worm or
possibly just produce some kind of display on the screen. Example: LoveLetter (2000),
which amongst other things overwrote files with a large number of different extensions
(.js, .jse, .css, .wsh, .sct, .hca, .jpg, .jpeg, .mp2 and .mp3) with Visual Basic
scripts which, if executed, would re-execute the worm code.
Searching for Targets
• The search for new targets can be based on information found locally on the host
which the worm is currently visiting, or it may be based on a more or less
systematic search of the network.
• Local information can be found in configuration files of various sorts, as these
often contain addresses of other hosts to be contacted for various purposes.
• Worms which spread via e-mail look in personal e-mail address books or search
through text files which might contain e-mail addresses (typically files with file
extensions .txt, .html, .xml or even .php).
• Searching through the network is usually based on port scanning, since
propagation of the worm depends on the presence of a suitable open port which
can be contacted.
Propagating the Worm
• Once some suitable potential targets have been discovered, the worm will try to
use its chosen propagation technique to send itself to these new hosts and get its
code executed on them.
• The transmission of the worm is typically automatic, whereas its activation on
the target host may involve a human user on that host. Some examples are:
• The e-mail worm LoveLetter (2000) included the malicious executable of the wormas
a mail attachment. If the user opened this attachment, which contained a Visual Basic
script disguised as a .txt file, the worm would be activated on his system.
• Secure communication between computers is often ensured at user level via the use
of SSH. However, this can be set up in a way which allows users to login without
repeating their password on hosts where they have already correctly logged in once.
This vulnerability can be exploited by a worm to “log in” on this group of hosts and
execute itself.
• The CodeRed worm (2001) exploited a buffer overflow vulnerability in the ldq.dll
library used in Microsoft’s IIS server, which enabled the worm to get control over
the thread which the server started up to handle an incoming HTTP GET request.
Essentially, the vulnerability allowed the worm to insert code into the thread, a
technique generally known as Code Injection
Propagating the Worm
A request giving this effect is shown in Figure. The long sequence of N’s in the request ensures that
the worm code bytes (%u9090...%u000a) are placed in the stack in such a position that the return
address for the current routine is overwritten with the value 0x7801cbd3.
• Either of the transmission and activation steps may of course be unsuccessful. For
example, with an e-mail worm, the e-mail containing the worm may be refused by the
destination mail server (failure of the sending step), or the user may refuse to activate
the attachment which will execute the worm code (failure of the execution step).
• Similarly, theCodeRed worm may successfully reach a Web server which does not
have the vulnerability on which it depends for being executed on the target. And so
on.
Botnets
Botnets
• Botnets illustrate the specialized use of a Worm or Trojan horse to set up a private
communication infrastructure which can be used for malicious purposes. The aim of the actual
botnet is to control a large number of computers, which is done by installing a backdoor in each
of them.
• The individual computers in the botnet then technically speaking become zombies since they are
under remote control, but are in this context usually referred to simply as bots.
• The bots can be given orders by a controller, often known as the botmaster, to perform various
tasks, such as sending spam mail, adware, or spyware, performing DDoS attacks or just
searching for further potential targets to be enrolled in the botnet.
• In many cases, the botmaster offers such facilities as a service to anyone who is willing to pay
for it. Botnets with large numbers of bots can obtain higher prices than smaller botnets. There
have been press reports of some very large botnets, such as one with 1.5 million bots controlled
from Holland, and one with 10 000 bots in Norway; both of these were closed by the police.
Botnets
• The activities associated with a botnet typically fall into four phases:
1. Searching: Search to find target hosts which look suitable for attack, typically because they
appear to have a known vulnerability or easily obtainable e-mail addresses which can be
attacked by an e-mail worm or Trojan horse.
2. Installation: The backdoor code is propagated to the targets, where an attempt is
made to install the code or persuade the user to do so, so that the targets become bots.
3. Sign-on: The bots connect to the master server and become ready to receive Command
and Control (C&C) traffic.
4. C&C: The bots receive commands from the master server and generate traffic directed
towards further targets.
Botnets
• Usually the master server is a semi-public IRC (Internet Relay Chat)server. Seen from the point of view
of the botmaster, it is important that the server should not officially be controlled by him/her, since this
could lead to the botmaster being identified.
• Since running a botnet is at least potentially a criminal act, the botmaster does not want this to
happen. Indeed, the botmaster will usually hide behind several proxies in order to anonymise his
activities and avoid identification.
• On the other hand, to avoid detection of the actual Botnet, the server is not usually a well-known
public server either, as most of these are carefully monitored for botnet activity. The new bot
automatically attempts to connect to the server and to join a predetermined IRC channel. This
channel is used by the botmaster to issue commands to his bots.
• From a security point of view, it is at least as important to detect the master server, identify the
control channel and (if possible) determine the identity of the botmaster, since without these
elements the botnet is non-functional.
• Detection of the master server is most reliably done during the Sign-on and C&C phases of botnet
operation, since the Searching and Installation phases can be performed by the bots themselves and
(after the initial command from the controller) do not necessarily involve the master server at all.
Botnets
• Most master servers nowadays are rogue(separate) IRC servers, which are bots which have been
instructed to install and host an IRC server. To avoid detection, many of them use nonstandard
IRC ports, are protected by passwords and have hidden IRC channels.
• Typical signs of such a rogue IRC server:
• A high invisible to visible user ratio.
• A high user to channel ratio.
• A server display name which does not match the IP address.
• Suspicious nicks (botspeak for user IDs), topics and channel names.
• A suspicious DNS name used to find the server(s).
• Suspicious Address Resource Records (ARRs) associated with DNS name (see RFC1035).
• Connected hosts which exhibit suspicious behavior, such as the sudden bursts of
activity associated with mass spamming or DDoS attacks.
Botnets
• Monitoring of the DNS is often a good place to start when looking for the master server.
• Rules to indicate suspicious activity are
• Repetitive A-queries to the DNS often come from a servant bot
• MX-queries to the DNS often indicate a spam bot
• in-addr.arpa queries to the DNS often indicate a server.
• The names being looked up just look suspicious.
• Hostnames have a 3-level structure: hostname.subdomain.top level domain
• Unfortunately, even if a particular DNS entry looks suspiciously as though it is being used by the botnet, it is not entirely
simple to close this entry, since many botnets are organized to take precautions against this.
• For example, if the master server is “up”, but its name cannot be resolved, then bots connected to it will be instructed to
update the DNS.
• Correspondingly, if the name can be resolved, but the master server is “down”, then the DNS is changed to point to one
or more alternative servers
Botnets
• A recent development in botnet technology is the use of protocols other than IRC as the
basis for the botnet.
• An example is the Nugache botnet (2006), which uses peer-to-peer (P2P) technology with
encryption to build up the network and to spread C&C traffic, and which does not use the DNS.
This approach makes it extremely difficult for defenders to find the master server (if one can
speak of a master in a P2P system at all).
• If the master server(s) cannot be (or at least have not yet been) found, then the last
line of defense against the activity of the botnet is to block as much of the botnet traffic
as possible at the network level.
• This can, for example, be done by fixing rate limits for network flows which use uncommon
protocols and ports, and by using both ingress and egress filters on each sub-net, so as to filter
off typical botnet command and control (C&C) traffic which the botmaster uses to control his
bots.
Malware Detection
• Signature scanning is still the basis of most malware detection systems.
• Malware signatures are unique values that indicate the presence of malicious code. Simply
speaking, When an anti-virus program scans your computer, it calculates the signature for a file
(say like a hash), then compares that signature/hash to a list of known bad signatures.
• This allows the scanner to deal with a certain amount of polymorphism in the malware. Scanners
can be made more efficient by restricting the area which they search through in order to find a
match.
• For example, a particular virus may be known always to place itself in a particular section of
an executable file, and it is then a waste of effort to search through other parts of the file.
• Scanning has the advantage over other methods that it can be performed not only on
files in the hosts, but also to a certain extent on the traffic passing through the network.
• This makes it possible in principle for ISPs and local network managers to detect and
remove (some) malware before it reaches and damages any hosts.
• Similarly, the system on the host can scan all incoming mail and web pages before actually
storing them on the host. This “on access” approach to malware detection is very common in
commercial antivirus products.
Detection by Emulation
• Detection of polymorphic or encrypted malware in general requires a more advanced technique than signature
scanning.
• Emulate the execution of the code under strictly controlled conditions. In the case of encrypted vira, this is often
known as Generic Decryption (GD), as it uses the virus’ own decryption algorithm to decrypt the virus and reveal
the true code.
• Emulation has two basic problems:
• It is very slow (maybe 100-1000 times slower than direct execution on the CPU).
• It is not always 100% accurate, since the CPU to be emulated is not always sufficiently documented. Many
CPUs contain undocumented instructions (or undocumented features of well-known instructions) which can
potentially be exploited by virus designers.
• Furthermore, although detection of a malicious effect during emulation is a clear sign that the software being
investigated is malware, failure to detect any malicious effect is not a guarantee that the software is “clean”.
• Construction of such a program would be equivalent to constructing a program which could solve the halting
problem, i.e. decide whether or not execution of a given piece of software will halt at some stage or continue for
ever. It is a fundamental result of computer science that the halting problem cannot be solved.
• So obviously it is an open question how long the emulation should be allowed to continue before the software
being investigated is declared malware-free.
Detection by Static Program Analysis
• One promising technique for dealing with polymorphic vira is the use of static program analysis to build up
a control flow graph (CFG) for the executable being checked.
• A CFG is a graph whose nodes correspond to the basic blocks of the program, where a basic block is a
sequence of instructions with at most one control flow instruction (i.e. a call, a possibly conditional jump
etc.), which, if present, is the last instruction in the block, and where the edges correspond to possible paths
between the basic blocks.
• Even if groups of instructions with no effect are inserted into the code , the basic flow of control in the
program is maintained, so the CFGs for the original virus and for the polymorphic variant should have the
same form.
Detection by Static Program Analysis
• A disadvantage is that the method is currently very slow. On an computer with an Athlon 1GHz CPU and
1GB of RAM, analysis of all variants of the Hare virus to build up the CFGs and to annotate them to indicate
“empty code” took 10 seconds of CPU time.
• To build up the annotated CFG for a fairly large non-malicious executable (QuickTimePlayer.exe, size approx.
1MB) took about 800 seconds of CPU time.
• However, the method was extremely effective at recognizing viral code, even when it appeared in quite
obscure variants.
• False positive and false negative rates of 0% were reported for the examples tested. It must be expected
that improvements in the technique will make it suitable for practical use in real-time detection of viral code.
Behavioral Methods of Detection
• All the methods which we have discussed up to now rely on handling the code of the possible malware.
• A completely different approach is represented by methods which do not look at the code, but which monitor in real time
the behavior caused by the pieces of software running in the system.
• At the host level, this can for example be done by adding code stubs(code used to stand in for some other programming
functionality) to the request handler for operating system calls, so that every call is checked, and suspicious activities or
patterns of activity cause an alarm.
• This is basically very similar to what is done in a host-based intrusion detection system (HIDS), and behavioral malware
detection may indeed be incorporated in a HIDS.
• There are two approaches
• Misuse detection: Systems which follow this approach build up a model of known patterns of misuse. Any pattern of
behavior described by the model is classified as suspicious.
• Anomaly detection: Systems which follow this approach build up a model of the normal behavior of the system. Any
pattern of behavior not described by the model is classified as suspicious.
• Individual activities which might typically be considered interesting to monitor include:
• Attempts to format disc drives or perform other irreversible disc operations.
• Attempts to open or delete files.
• Attempts to modify executable files, scripts or macros.
• Attempts to modify configuration files and the contents of the registry
• Attempts to modify the configuration of e-mail clients or IM clients, so they send executable material.
• Attempts to open network connections.
Behavioral Methods of Detection
• Even if the individual events are not especially suspicious, combinations of them may well be,
and so behavioral detection systems build up signatures describing characteristic sequences of
such events.
• Depending on whether the malware detection system uses the anomaly detection or misuse
detection approach, these sequences may be found from:
• Statistical observations, defining what is “normal behavior” in a statistical sense;
• Models describing the permitted behavior of the system, for example as a set of traces
(event sequences) which the system may exhibit, or as a Finite State Automaton or Push-
down Automaton.
• Models describing possible modes of misbehavior of the system;
• Heuristics.
Digital Immune system
• In a computer system, “self” is the set of software which is present under normal circumstances, when
there is no malware about.
• The functionality of immune cells such as the T-cells is emulated by a recognizer which attempts to recognize
patterns of abnormal behavior which have previously been seen.
• If a known pattern is recognized, the system attempts to neutralize the virus concerned. If abnormal
behavior which has not been seen before is observed, the recognizer communicates with a central system,
where the new behavior is analyzed and countermeasures for neutralizing it are determined.