VxClass Logo

VxClass – Automatic classification of malware and trojans into families
zynamics VxClass allows the automated unpacking and classification of malware into families.

Based on the same ideas and algorithms that made zynamics BinDiff great, zynamics VxClass can structurally compare executables and thus ignore byte-level changes such as instruction reordering or string obfuscation. Small changes in the code or changed compiler settings will not fool zynamics VxClass.

It's easy: Upload a piece of malware, and zynamics VxClass will first remove the executable crypters from it. Our automated unpacker handles most packers automatically. zynamics VxClass then analyzes and compares the uploaded executable to the database of stored malware, and provides a simple similarity metric that can tell you wether the program is related to a piece of known malware.

Please note that there are no current plans to resume sales for zynamics VxClass.
Use Cases
  • Filter unknown malware samples for analysis by sorting out items you have already analyzed
  • Find out if that security incident you are investigating is correlated to a previous one
  • Help avoiding malware analysts doing duplicate work by sharing results
  • Automatically remove most unpackers and crypters from that malware you are analyzing
  • Generate AV-signatures for malware clusters to increase endpoint security


To learn more about VxClass, please download the VxClass Manual (*.zip)
Screenshots
VxClass screenshot 1

Screenshot 1: Upload a suspicious file

VxClass screenshot 2

Screenshot 2: Wait for the unpacking and classification to finish

VxClass screenshot 3

Screenshot 3: View the results in the family tree


Detailed Description

Over the last years the problem malware has changed enormously: whereas approximately 8 years ago the main problem consisted of small programs (for the most part written in assembler/machine language) which infected other executable files and had no higher-level task except spreading, most modern malware is written in a high level language, and has a clear goal like botnet construction or the theft of PIN/TAN/passwords.

Moreover authors of malware have professionalized, i.e. it is absolutely normal that malware passes through several versions of which each one repairs the errors and problems of the previous version. By using “normal” languages, the distribution of bytes in malware does not differ significantly from a “normal” application anymore.

The spreading of source codes for so called “bots” has furthermore induced a complex flood of bot-variations which are only so long being amended in the source code until they bypass the already existing anti-virus-signatures. In general, the trend is to amend existing source codes for malware until the usual byte-signature-based anti-virus-programs become ineffective.
The complete analysis of new malware-variations is complex and could definitely mean several days work for a highly qualified “Reverse Engineer”.

Since a major part of the recently detected malware is only a variation of already known malware, it would be useful to be able to assign “new” malware automatically to its “relatives” and to re-use already existing analysis results.

Our structural comparison algorithms compare executable files on a “structural” level instead of on a byte-level: An executable file is regarded as a “directed graph” and not as a sequence of bytes. For the comparison of two executable files these graphs are compared.

This is very resilient to byte-level changes:
We have successfully compared the mobile worms Commwarrior.A and Commwarrior.C, although the first is a “normal” ARM code whereas the second has been compiled for ARM/Thumb-mode. On the byte-level almost no similarity exist between the viruses, but our comparison-algorithms demonstrated that more than 60% of the functions in Commwarrior.A have direct counterparts in Commwarrior.C. To see an example of some “structurally equal” functions of Commwarrior.A and Commwarrior.C follow this link.

By means of this “useful similarity measure” we can now apply algorithms of other disciplines. We use algorithms from Bioinformatics to generate family trees from a matrix of similarity values.

We can automatically group large collections of “malware” into “families” and assign new malware to already existing families. If a new program is a member of a family already considered “malicious”, this program can likewise be classified as malicious without further analysis.
Case Study

We have received a collection of bots from the administrators of Honeynets of RWTH Aachen. As a first test we have automatically analyzed and classified approximately 200 of them.

Our procedure was as follows: First we only had the MD5-checksums and the executable files and ran our analysis. The result was this graph.

Files, which exhibit a mutual similarity of more than 50 % have been assigned to the same family. The next step was to have the files named by an anti-virus-program (ClamAV). We replaced the MD5 sums with the names in the tree. The result was this graph.

The graph enables us to draw interesting conclusions:
  1. We could clearly assign several bots to a family even though ClamAV did not identify them.
  2. Many “distinct” bots show a strong similarity to other bots and should actually be assigned to one single family (e.g. Trojan.GoBot and Trojan.Downloader.Delf as well as Worm.Korgo.Y and Worm.Padobot.I). This seems to be due to problems in the naming-process.
  3. Basically, all bots are representatives of two big (GoBot, PadoBot) and 3 small families (Sasser, PoeBot, Crypt-8) as well as some “pairs”.

Sections of generated family trees:

For the complete classification of 200 botsamples of the RWTH-Honeynet we refer to the above-mentioned links. Some examples here:

First of all we regard the “blind” classification.

VxClass screenhot 7
The respective similarities are listed on the edges of the tree.

In the next section we have added the names as generated by ClamAV. We see that these files are members of the PadoBot family. Moreover we can automatically recognize that the program with the MD5-sum “02aa4422480900647b211e391e1977b0“, which has not been identified by ClamAV, is a member of this family as well.

VxClass screenshot 8
To learn more about the technology behind VxClass or how to license and use it, please contact zynamics-info@google.com.