Anti-Intrusion Detection System (IDS) tactics were one of the original key features of my whisker web scanner. The goal of any anti-IDS tactic is to mutate a request so much that the ID systems will get confused, but the web server will still be able to understand it, hence the subtitle "just how bad can we ruin a good thing?"
This paper is aimed at explaining the thought process and implementation behind various anti-IDS tactics whisker uses to avoid web scan detection. While I specifically have ID systems in mind, this also applies to monitors, sniffers, log parsers and anything else trying to interpret web traffic and/or requests. The methods, analysis and theories presented within this document can also be applied to other protocols and concepts--however, HTTP is my focus due to the implementation of whisker.
It is important to understand the components of an HTTP request. As defined by RFC 1945:
[ GET1 /cgi-bin/some.cgi2 HTTP/1.03 ]4
Throughout this paper I will reference two types of ID systems--raw and smart:
Smart: Implement logic that understands the target protocol (in our case, HTTP). They will parse the request and perform (optimized) signature matching based on known rules pertaining to the protocol. They will attempt to behave like a real web server would behave, at the expense of additional code and slowness. Example IDS: RealSecure.
Raw: Also referred to as 'packet grep' style ID systems, they typically just scan the unprocessed raw data for key strings. The benefit of this method is pure speed. I use the turn 'raw' not in a derogatory manner, but rather to identify that these ID systems usually deal with the raw data directly, rather than interpreting the protocols they are monitoring. In all honesty, I personally prefer this type of IDS approach. Example ID systems of this type would be Dragon and Snort.
Both types have pros and cons, and can be fooled in different ways. The goal is to obfuscate the request enough to keep the signature from matching. A signature is generically considered to be a condition or string (in our case--HTTP) that is present in packets traversing the network. Essentially it comes down to the IDS matching a signature (such as the string "/cgi-bin/phf") against network traffic. If there is a match, the ID system will flag it as an attack. Often times the longer the signature is, the less likely the string is to occur. However, some ID systems check for very small strings—often as little as "/phf". In some cases, however, the smaller you make a signature, the greater the chance of a false positive. For example, the previously mentioned "/phf" signature will match on the following request:
GET /phfiles/phonefiles.txt HTTP/1.0
So the ID vendors should be cautious when shortening a signature. They can assume the "/cgi-bin/" part, but this causes problems if the CGI is not located in /cgi-bin. If they leave this out, however, the signature becomes more prone to false alarms. So it's a balancing act. Many vendors have decided to keep the "/cgi-bin/" style notation*, with the notable exception of Snort. But either way, there are still problems.
(*from what I can tell, anyway; many ID systems are closed source)
On initial testing of whisker, many ID systems were failing due to the fact that they were assuming the requests to use the GET method--they were looking for the following style of signatures:
The trick here is that whisker didn't use the GET method; it used HEAD by default. Whisker was sending
ID systems were all missing the scans, flat out. Accurate coding of the signature would not include the request method. Granted, the attacker may have to use GET later to actually exploit the CGI; however, it is often possible to still use HEAD and POST in the exploitation, depending on how the CGI was coded. On some platforms, the method is even ignored, so it becomes a mute point.
The classic trick with URL encoding is to encode the URI with it's escaped equivalent. The HTTP protocol specifies that arbitrary binary characters can be passed within the URI by using %xx notation, where 'xx' is the hex value of the character. In theory, the raw ID systems would fall prey to this, since the signature "cgi-bin" does not match the string "%63%67%69%2d%62%69%6e". Also, in theory, the smart ID systems would be able to plow past this, since they would decode the string similar to a web server before actually checking for a signature. In reality, nowadays all worthwhile ID systems decode encoded URIs, so this tactic is becoming obsolete. This was implemented in whisker v1.0+ as the -I option, and as the -I 1 option in v1.3.
In an effort to break up a string, the classic double slash method replaced every single '/' with '//'. This resulted in checks for "/cgi-bin/some.cgi" not matching "//cgi-bin//some.cgi". However, most ID systems (smart and raw) are aware of this trick and all derivatives of the trick using multiple (3+) slashes. Smart ID systems tend to correctly interpret this (by logically combining all slashes into one); raw ID systems vary by emulating smart ID systems (combining them), or just reporting multiple slashes and moving along. This method is basically obsolete and not implemented in whisker, in favor of self-referencing directories (see below).
Another classic trick is to break apart a signature such as "/cgi-bin/some.cgi" by using reverse traversal directory tricks:
GET /cgi-bin/blahblah/../some.cgi HTTP/1.0
which equates to "/cgi-bin/some.cgi" once the directory traversal has been accounted for. However, like URI encoding, this trick is old and well known. Most smart ID systems account for this (it's a core feature of what makes them 'smart'), and raw ID systems usually alert the fact that the request contains "/../". For all intents and purposes, this tactic is becoming obsolete as well. It has not been implemented in whisker, in favor of self-referencing directories.
A newer trick in the 'directory games' category is the self-referencing directory. While '..' means the parent directory, '.' means the current directory. So "c:\temp\.\.\.\.\.\" is equivalent to "c:\temp\" ("/tmp/./././././" being "/tmp/" for you Unix folk). In an effort to stop the raw ID systems from matching signatures like "/cgi-bin/phf", we can change the string to "/./cgi-bin/./phf". That means raw ID systems have three options:
Theoretically the smart ID systems should handle this situation. In reality I found that when using both URI encoding and self-referencing directories (-E in whisker v1.0, -I 12 in whisker v1.3), none of the major (top 8; a mix of smart and raw) ID systems would catch a scan. However, after publishing whisker v1.0, many IDS vendors caught onto this fact, and have since modified to suit. So we need new tactics. Moving on...
The premature request ending tactic is specifically aimed at the smart ID systems. In an effort to save precious time and processing power (remember, the faster you scan packets, the more traffic you can view in real-time), smart ID systems may choose to implement an agreeable approach to detecting a scan: check only the request, and throw away extra client-submitted data. A typical request looks like:
GET /some.file HTTP/1.0\r\n
There is no point in a smart IDS scanning the headers (although some do, which means they're using hybrid smart/raw tactics to balance speed with efficiency). The ID system can stop looking after the "HTTP/1.0\r\n". But they must be careful if they do. Imagine the following submission:
GET /%20HTTP/1.0%0d%0aHeader:%20/../../cgi-bin/some.cgi HTTP/1.0\r\n\r\n
This translates to:
GET / HTTP/1.0\r\nHeader: /../../cgi-bin/some.cgi HTTP/1.0\r\n\r\n
Or, if you will:
GET / HTTP/1.0\r\n
Which is a valid request! Assuming the IDS will decode the encoding first, they will stop scanning at our fake 'premature' ending, rather than the real one. The proper approach is
This tactic is available in whisker v1.3 as -I 3.
Going further into the design of smart ID systems, you have the issues of parameters, which are submitted with dynamic content. Parameters to a page typically look like:
Obviously the data in the parameters need not be scanned (if you're only looking for particular file requests). Again, in an effort to save time and processing power, a smart IDS can stop processing once the '?' is reached, which indicates the rest of the data are parameters. Well, like the premature request end tactic, we can fake this anomaly as well:
GET /index.htm%3fparam=/../cgi-bin/some.cgi HTTP/1.0
This translates to:
GET /index.htm?param=/../cgi-bin/some.cgi HTTP/1.0
Again, this is a valid request. The proper method of parsing is similar to the method I mentioned earlier--extract the portion you wish to examine before decoding the encoded characters. This tactic is implemented in whisker v1.3 as -I 5.
As I mentioned, a smart ID system could feasibly extract the URI of a request, possibly chop off the parameters, and then scan only within the leftover string. According to the HTTP RFC, a v1.0 request looks like:
Method <space> URI <space> HTTP/ Version CRLF CRLF
The key is that HTTP calls for spaces to separate the three components, and that the components appear in the specified order. This means it's easy to extract specific portions of the request--you merely need to use the spaces as separators, and adjust accordingly.
Interestingly enough, Apache 1.3.6 and newer (and perhaps earlier versions; I have not traced the history of this 'feature') allow you to specify a slightly different syntax:
Method <tab> URI <tab> HTTP/ Version CRLF CRLF
This will ruin any processing dependant on the 'assumed' RFC format of a request. Even more specifically, there are ID systems that implement minimal signatures that depend on the trailing space for matching. For example, matching "/phf" could lead to many false positives, but "/phf " (notice the trailing space) helps assure that the final requested page is closer to the actual 'phf', and not just starting with the letters 'phf'. Also keep in mind HTTP v0.9 syntax, which is simply:
GET <space> URI CRLF
This means that ID systems depending on having three parameters may be confused by v0.9 requests; however, v0.9 only provides the GET method, and returns no headers--making automatic processing by CGI scanners much more difficult.
Whisker v1.3 currently handles the tab separation (-I 6). Whisker does not currently use any sort of v0.9 requesting by default; however, you can code a script to implement this fairly easily.
An optimization of some raw ID systems is to only look within the first xx bytes of the request. Generally this works well, since the first line of the request needs to contain the URI. However, we can exploit this by submitting a request along the lines of:
GET /rfprfp<lots of characters>rfprfp/../cgi-bin/some.cgi HTTP/1.0
The key is to include enough characters to move the rest of the submitted request outside the scope of the ID systems' scan limit. However, this tactic is very noisy in the web server logs, especially when you are submitting 1-2K worth of random characters per request. Whisker, by default, will submit 1-2K of random characters when the -I 4 option is specified. The actual amount submitted is controlled by the XXIDSMode4Limit variable.
Everyone has heard the story that Microsoft separates directories using '\' simply because Unix uses '/'. However, if you notice in the HTTP RFC, the syntax calls for '/'. That means Microsoft, with all their ingenuity, lost the battle and must silently convert from '/' to '\' internally in IIS (as well as all other DOS/Windows based web servers). Interestingly enough, we can still use '\' in our requests, since they are still valid as directory separators--this means on DOS/Windows platforms, we could use requests such as "/cgi-bin\some.cgi", which will not match a typical "/cgi-bin/some.cgi" signature. Note that the first character of a URI must still be a '/', and not a '\'. This is tactic -I 8.
Many C string libraries use the NULL character to denote the end of the string. While I doubt most ID systems use these libraries (they are typically too slow for these high-speed applications), the reoccurrence of using NULLs to denote the end of strings is still quite common. We can use this to our advantage with the following type of request:
GET%00 /cgi-bin/some.cgi HTTP/1.0
The theoretical flow of this tactic goes:
Again, this is on the assumption that ID systems tend to handle the full request at once; this is a reasonable assumption, since detailed parsing incurs too much overhead for an effective IDS.
NOTE: Apache will not process any request that contains '%00' or '%2f'. However, this method has been found to work with IIS. All others are untested. Remember, the web server still has to see it as a valid request for it to be usable.
Use the -I 0 option to invoke this tactic in whisker.
The DOS/Windows filesystem has a unique characteristic that Unix doesn't: filenames are case insensitive. This means requests for "index.htm", "INDEX.HTM" and "Index.Htm" are all the same. In our case, the signature "/cgi-bin/some.cgi" does not literally match "/CGI-BIN/SOME.CGI". In an optimal environment we should mix the case randomly throughout; however, whisker v1.3 implements this by only capitalizing all characters when the -I 7 option is used.
Session splicing is the only network-level anti-ID system tactic in whisker at the moment. Many raw ID systems, as well as some smart ones, only scan for a particular signature within the current packet--signatures are not split up and checked across multiple packets. Whisker exploits this by sending parts of the request in different packets. Note that this is not fragmentation; it is just multiple packets for the data. For example, the request "GET / HTTP/1.0" may be split across multiple packets to be "GE", "T ", "/", " H", "T", "TP", "/1", ".0". The current implementation in whisker (invoked with -I 9) will result in 1-3 characters in each packet, depending on your system and network speed.
The proper defense to this tactic is session reassembly; however, to reassemble a session, you must understand the protocol and it's definition of a 'session'. Therefore, by implementing session reassembly, you have incurred a large overhead in interpreting the protocol.
That basically completes the overview of anti-IDS tactics used in whisker. Starting in version 1.3 you can use the -I command to invoke the anti-IDS features. Multiple tactics can be used together by specifying multiple types, such as:
whisker.pl -h www.server.com -I 124
This will invoke tactics 1, 2 and 4 to be used in conjunction with each other. Note that particular combinations may not work well together and have not been tested--use at your best judgement.
Whisker is available for download from www.wiretrip.net/rfp/
Current version is 1.3 (12/24/99).