

Testing Intrusion Detection Systems:
Methods and Tools
Michael Borgwardt


Motivation
- Growing importance of security issues in computing
- Completely secure system may not be feasible
- New security model:
- prevention
- detection
- investigation
- analyzation
- Intrusion detection systems (IDSs) are becoming important applications
- The more important an application, the more important are testing methods
Methods and tools are needed to
- Test IDS functionality (development)
- Find weaknesses in an IDS (customization, complementary mechanisms)
- Compare different IDSs (requirements)


A Concrete Approach
Two papers, published in IEEE software magazines, written by
Nicholas J. Puketza, Mandy Chung, Biswanath Mukherjee, Ronald A. Olsson and others:
- "A Methodology for Testing Intrusion Detection Systems"
(IEEE Trans. Software Eng., Oct. 1996, pp. 719-729)
- "A Software Platform for Testing Intrusion Detection Systems",
describing the results of a research project funded by the
National Security Agency (NSA) INFOSEC University Research Program.


Testing Approach
- simulation of computer users (both normal and intruders)
- Test case = simulated user session


Tested Aspects
Three main aspects are tested:
- Detection range
- Resource usage
- Stress resilence
all three are necessary (though not sufficient) for an IDS to work properly, but
priorities may vary.


Selection of Test Cases
- Collect as much intrusion data as possible
- Partition the set of intrusions into classes
- Create representative subset: "equivalence partitioning"
- Ideally: equivalence classes in respect to the IDS's detection ability
(generally not possible)
- Classification strategies:
- intrusion technique
- taxonomy of exploited system vulnerabilities
- signatures
- Select test cases, especially site-specific


Concurrent Attacks
- Intrusive behaviour can be distributed among several separate channels
- Activities on each channel alone seem less suspicious
- Types of concurrent or distributed attack:
- One source, many targets
- Many sources, one target
- Many sources, many targets


Limitations of the Methodology
- The IDS is tested only against known attacks
(new attacks often similar to known ones)
- May not be possible to identify the set of all possible intrusions


Software platform
- Tcl (Tool command language)
- expect tool (control of interactive programs):
"spawn": create new process
"expect" waits for specific string patterns from the process
"send" sends string input to the process



Extensions of the expect tool:
- synchronization and communication,
needed for distributed scripts; guarantee reproducable results (overall event order
otherwise non-deterministic)
- record-and-replay function


Tool Limitations
- Not all user activity can be simulated, especially not GUI usage
- The actions that are directly related to the attack should be simulated
- Anomaly-detection based IDSs may not be properly tested.


Test Scenarios
- Basic Detection Test: environment free of unrelated activity
- Normal User Test: run non-intrusive scripts to check for false positives
- Resource Usage Test: run various (intrusive or non-intrusive) scripts, observe how many resources
the IDS uses (e.g. disk space for logs).
- Stress Tests:
- Smokescreen Noise: intersperse intrusive commands with non-intrusive ones
- Background Noise: run intrusion scripts parallel to several
non-intrusive "noise" scripts, or one that produces a lot of activity.
- High Volume Sessions, Intensity: (not really different from the Background Noise Test)
- Load: run additional programs that take away CPU
cycles from the IDS and/or give IDS a lower scheduling priority
In all cases: compare IDS output with that from the Basic Detection Test.


Actual Experiments
- Tested System: UC Davis' Network Security Monitor (NSM)
monitors network traffic, recognizes individual computer-to
computer connections and assigns numerical warning values
ranging from 0 to 10
- Platform: Sun SPARCstation 2,
Connected to the CS LAN segment at UCD.


Results
- Some types of intrusive behaviour were not detected at first
- NSM could be configured to detect them
- Load Stress Test: loads above 4 caused the IDS to miss
network packets
- NSM itself reported this, and showed the
percentage of missed data: about 10% at load 10, 40% at load 14
- Concurrent Intrusion Test Cases:
| Intrusion type | Max. warning value |
| Sequential | Concurrent |
| Transmission of password file | 7.472 | 7.472 |
| Password cracking | 3.160 | 3.160 |
| Password guessing | 8.722 | 7.785 |
| Exploiting buffer overrrun | 7.472 | 4.972 |


Future Plans
- Develop suite of intrusion test cases
- Eventually: a comprehensive "benchmark suite"


Evaluation of the Paper


Strengths
- Lots of background explanations (motivation for ID, approaches), easy to understand
- Basic introduction to the field
- Practically no prior knowledge necesary


Shortcomings
- Practical side of research focused on NSM, limited view
- Some important issues were left out
- Network-based vs. host-based
- Different protocols
- Low-level DoS or surveillance attacks, installing backdoors, removing traces
- Encrypted sessions


Real-World Aspects
Configuration & Administration
IDS' own resilence to attacks
Automatic response to attacks
Network infrastructure
Cost


Analyzation
- Very little detail on how to analyze IDS reaction or output
- No mention of detection percentage vs. false alarms, ROC


Other Projects


DARPA Intrusion Detection Evaluation
- Cooperation of 3 parties:
- Defense Advanced Research Projects Agency (DARPA): sponsors many ID development projects
- Air Force Research Laboratory (AFRL), manages many of the resulting programs
- MIT Lincoln Laboratory
- http://www.ll.mit.edu/IST/ideval/
- R. Durst, T. Champion, B. Witten, E. Miller, L. Spagnuolo, "Testing and Evaluating
Computer Intrusion Detection Systems" (Communications of the ACM, July 1999/Vol. 42, No. 7, p. 53-61)


MIT Reaearch Approach
- Methodology and tools for non-realtime IDS performance evaluation
- Suite of test sessions including various services is recorded
(audit data from Sun BSM and IP traffic from tcpdump)
- Logs are sent to IDS developers who feed them to their systems
- Results are sent back for analysis
- User and network anomalies were included
- Participants have to provide CPU time the IDS used to analyze data
- Also uses expect scripts
- 300 attacks from 4 groups: surveillance, denial of service, user to root, remote to local


Research Applied
- AFRL developed a test architecture and process for the DARPA
- Results are published in semiannual reviews
- In additon: Real-time evaluation scenario
[1][2]
for selected participants, (four-hour subset of the Lincoln Laboratory testbed)
- "traffic generators" play back prerecorded background traffic
- Modified Linux kernel assigns different IP source adresses to individual processes,
simulating large array of real computers


ROC Charts
"receiver operating characteristic", crucial for evaluation IDSs



Results
- Signature-based methods are good at reducing false alarm rates
- Network-based IDSs can't detect local user-to-root attacks
- Careful (slow) surveillance attacks are difficult to detect
- "new" attacks were generally missed
- String-matching network monitors have high false alarm rates and miss many attacks
- Only results of 1998 evaluation are available, this is considered a "learning experience"
- Only Unix systems and Cicso routers were used


IBM Research
- Testing done at IBM Zürich Research Laboratory (general IDS research, maintenance of vulnerability database), report dates July 1999
- Some usability testing
- Detection/false alarm testing, both in closed lab and open network
- Load measurement
- http://www.ossir.org/ftp/supports/99/debar/index1.html


Very Theoretical
- Roy Maxion of Carnegie Mellon University, "Measuring Intrusion-Detection Systems",
presented at a workshop (RAID-98)
- Pretty useless, spends too much time on semantic questions ("What does it mean to measure"),
giving definitions and unrelated examples

Commercial Test
- DataComm Magazine test, conducted in August 1998
- Much more practical, test of specific products, detailed results
- Usability issues (ease of implementation, amount of administration, quality of technical support)
- Cost of ownership
- no sophisticated methodology, 10 popular attacks were chosen and implemente in some variants:
ping sweep, port scan, dig, syn flood, ICMP flood, UDP flood, ping of death, land,
winnuke, teardrop 2
- stress testing was done: with 20% network utilization everything worked well, at 40%
detection rates dropped to near 0
- http://www.data.com/lab_tests/intrusion.html