Next: 6. Conclusions Up: MEF: Malicious Email Filter Previous: 4. Methodology for Building

Subsections

5. System Performance

The system requires different time and space complexities for model generation and deployment.

5.1 Training

In order for the data mining algorithm to quickly generate the models, it requires all calculations to be done in memory. The algorithm consumed space in excess of a gigabyte of RAM. By splitting the data into smaller pieces, the algorithm was done in memory with no loss in accuracy.

The training of a classifier took 2 hours 59 minutes and 49 seconds running on a Pentium III 600 Linux machine with 1GB of RAM. The classifier took on average 2 minutes and 28 seconds for each of the 4,301 binaries in the data set.

5.2 During Deployment

Current work is being done to make the system accurate on a system with smaller memory. At this point in development, only systems that have a 1GB or more of memory can use our models. The amount of system resources taken for using a model are equivalent to the requirements for training a model. So on a Pentium III 600 Linux box with 1GB of RAM it would take on average 2 minutes 28 seconds per attachment.

The ongoing work we are doing is to make the model small enough to be loaded into a computer with 128MB of RAM without losing more than 5% in accuracy. If this is accomplished then the resources required in CPU time and memory will be notably reduced.

There are other options in making the system perform its analysis faster such as sharing the load over several computers. These options are not currently being explored, but they are open problems that the community should examine. We are primarily concerned with improving the space complexities of the algorithm without sacrificing a significant amount accuracy.

Next: 6. Conclusions Up: MEF: Malicious Email Filter Previous: 4. Methodology for Building

Erez Zadok
2001-05-14