next up previous
Next: 3. Cost Models Up: Toward Cost-Sensitive Modeling for Previous: 1. Introduction


2. Cost Factors and Metrics

In order to build cost-sensitive ID models, we must first understand the relevant cost factors and the metrics used to define them. Borrowing ideas from the related fields of credit card and cellular phone fraud detection, we identify the following major cost factors related to intrusion detection: damage cost, response cost, and operational cost. Damage cost (DCost) characterizes the amount of damage to a target resource by an attack when intrusion detection is unavailable or ineffective. Response cost (RCost) is the cost of acting upon an alarm or log entry that indicates a potential intrusion. Operational cost (OpCost) is the cost of processing the stream of events being monitored by an IDS and analyzing the activities using intrusion detection models. We will discuss these factors in greater detail in Section 2.2.

Cost-sensitive models can only be constructed and evaluated when cost metrics are given. The issues involved in the measurement of cost factors have been studied by the computer risk analysis and security assessment communities. The literature suggests that attempts to fully quantify all factors involved in cost modeling usually generate misleading results because not all factors can be reduced to discrete dollars (or some other common unit of measurement) and probabilities [2,4,7,8,11]. It is recommended that qualitative analysis be used to measure the relative magnitudes of cost factors. It should also be noted that cost metrics are often site-specific because each organization has its own security policies, information assets, and risk factors [19].

2.1 Attack Taxonomy

An attack taxonomy is essential in producing meaningful cost metrics. The taxonomy groups intrusions into different types so that cost measurement can be performed for categories of similar attacks. Intrusions can be categorized and analyzed from different perspectives. Lindqvist and Jonsson introduced the concept of the dimension of an intrusion and used several dimensions to classify intrusions [14]. The intrusion results dimension categorizes attacks according to their effects (e.g., whether or not denial-of-service is accomplished). It can therefore be used to assess the damage cost and response cost. The intrusion techniques dimension categorizes attacks based on their methods (e.g., resource or bandwidth consumption). It therefore affects the operational cost and the response cost. Also, the intrusion target dimension categorizes attacks according to the resource being targeted and affects both damage and response costs.

Table 1: An Attack Taxonomy for DARPA Data
Main Category Description Sub-Category Description Cost
(by results)   (by techniques)    
1. ROOT illegal root access is obtained. 1.1 local by first logging in as a legitimate user on a local system, e.g., buffer overflow on local system programs such as eject. DCost=100 RCost=40
    1.2 remote from a remote host, e.g., buffer overflow of some daemon running suid root. DCost=100 RCost=60
2. R2L illegal user access is obtained from outside. 2.1 single a single event, e.g., guessing passwords. DCost=50 RCost=20
    2.2 multiple multiple events, hosts, or days, e.g., the multihop attack. DCost=50 RCost=40
3. DOS Denial-of-Service of target is accomplished. 3.1 crashing using a single malicious event (or a few packets) to crash a system, e.g., the teardrop attack. DCost=30 RCost=10
    3.2 consumption using a large number of events to exhaust network bandwidth or system resources, e.g., synflood. DCost=30 RCost=15
4. PROBE information about the target is gathered. 4.1 simple many of probes within a short period of time, e.g., fast port scan. DCost=2 RCost=5
    4.2 stealth probe events are distributed sparsely across a long time windows, e.g. slow port scan. DCost=2 RCost=7

Our attack taxonomy is illustrated in Table 1, and categorizes intrusions that occur in the DARPA Intrusion Detection Evaluation dataset, which was collected in a simulated military environment by MIT Lincoln Lab [15]. In this dataset, each event to be monitored is a network connection, and the resources being attacked are mainly the network services (e.g., http, smtp, etc.) and system programs on a particular host in the network. We use the taxonomy described in Table 1 to first categorize the intrusions occurring in the dataset into ROOT, DOS, R2L, and PROBE, based on their intrusion results. Then within each of these categories, the attacks are further partitioned by the techniques used to execute the intrusion. The ordering of sub-categories is of increasing complexity of the attack method. Attacks of each sub-category can be further partitioned according to the attack targets. For simplicity, the intrusion target dimension is not shown.

2.2 Cost Factors

When measuring cost factors, we only consider individual attacks detectable by IDSs. For example, a coordinated attack that involves port-scanning a network, gaining user-level access to the network illegally, and finally acquiring root access, would normally be detected and responded to by an IDS as three separate attacks because most IDSs are designed to respond quickly to events occurring in real-time. It is therefore reasonable to measure the attacks individually. As part of our future work, we will study the cost-sensitive aspects of intrusion detection for coordinated attacks.

2.2.1 Damage Cost

There are several factors that determine the damage cost of an attack. Northcutt uses criticality and lethality to quantify the damage that may be incurred by some intrusive behavior [19].

Criticality measures the importance, or value, of the target of an attack. This measure can be evaluated according to a resource's functional role in an organization or its relative cost of replacement, unavailability, and disclosure [8]. Similar to Northcutt's analysis, we assign 5 points for firewalls, routers, or DNS servers, 4 points for mail or Web servers, 2 points for UNIX workstations, and 1 point for Windows or DOS workstations. Lethality measures the degree of damage that could potentially be caused by some attack. For example, a more lethal attack that helped an intruder gain root access would have a higher damage cost than if the attack gave the intruder local user access. Other damage may include the discovery of knowledge about network infrastructure or preventing the offering of some critical service. For each main attack category in Table 1, we define a relative lethality scale and use it as the base damage cost, or baseD. By assigning damage cost according to the criticality of the target, we are using the intrusion target dimension. Using these metrics, we can define the damage cost of an attack targeted at some resource as $criticality \times base_{D}$. For example, a DOS attack targeted at a firewall has DCost = 150, while the same attack targeted at a Unix workstation has DCost = 60.

In addition to criticality and lethality, we define the progress of an attack to be a measure of how successfully an attack is in achieving its goals. For example, a Denial-of-Service (DOS) attack via resource or bandwidth consumption (e.g. SYN flooding) may not incur damage cost until it has progressed to the point where the performance of the resource under attack is starting to suffer. The progress measure can be used as an estimate of the percentage of the maximum damage cost that should be accounted for. That is, the actual cost is $progress
\times criticality \times base_{D}$. However, in deciding whether or not to respond to an attack, it is necessary to compare the maximum possible damage cost with the response cost. This requires that we assume a worst-case scenario in which progress=1.0.

2.2.2 Response Cost

Response cost depends primarily on the type of response mechanisms being used. This is usually determined by an IDS's capabilities, site-specific policies, attack type, and the target resource [3]. Responses may be either automated or manual, and manual responses will clearly have a higher response cost.

Responses to intrusions that may be automated include the following: termination of the offending connection or session (either killing a process or resetting a network connection), implementation of a packet-filtering rule, rebooting the targeted system, or recording the session for evidence gathering purposes and further investigation [1,19]. In addition to these responses, a notification may be sent to the administrator of the offending machine via e-mail in case that machine was itself compromised. A more advanced response which has not been successfully employed to date could involve the coordination of response mechanisms in disparate locations to halt intrusive behavior closer to its source.

Additional manual responses to an intrusion may involve further investigation (perhaps to eliminate action against false positives), identification, containment, eradication, and recovery [19]. The cost of manual response includes the labor cost of the response team, the user of the target, and any other personnel that participate in response. It also includes any downtime needed for repairing and patching the targeted system to prevent future damage.

We estimate the relative complexities of typical responses to each attack type in Table 1 in order to define the relative base response cost, or baseR. Attacks with simpler techniques (i.e., sub-categories x.1 in our taxonomy) generally have lower response costs than more complex attacks (i.e., sub-categories x.2), which require more complex mechanisms for effective response.

2.2.3 Operational Cost

The main cost inherent in the operation of an IDS is the amount of time and computing resources needed to extract and test features from the raw data stream that is being monitored1. We associate OpCost with time because a real-time IDS must detect an attack while it is in progress and generate an alarm as quickly as possible so that damage can be minimized. A slower IDS which uses features with higher computational costs should therefore be penalized. Even if a computing resource has a ``sunken cost'' (e.g., a dedicated IDS box has been purchased in a single payment), we still assign some cost to the expenditure of its resources as they are used. If a resource is used by one task, it may not be used by another task at the same time. The cost of computing resources is therefore an important factor in prioritization and decision making.

Some features cost more to gather than others. However, costlier features are often more informative for detecting intrusions. For example, features that examine events across a larger time window have more information available and are often used for ``correlation analysis [1]'' in order to detect extended or coordinated attacks such as slow host or network scans [3]. Computation of these features is costly because of their need to store and analyze larger amounts data.

Based on our experience in extracting and constructing predictive features from network audit data, we classify features into three relative levels, based on their computational costs:

We can assign relative magnitudes to these features according to their computational costs. For example, level 1 features may cost 1 or 5, level 2 features may cost 10, and level 3 features may cost 100. These estimations have been verified empirically using a prototype system for evaluating our ID models in real-time that has been built in coordination with Network Flight Recorder [18].

next up previous
Next: 3. Cost Models Up: Toward Cost-Sensitive Modeling for Previous: 1. Introduction
Erez Zadok