I just got thoroughly confused when I found a statement in one
whitepaper by SANS that Snort
can do anomaly based detection. For me, anomaly based detection means that the software is capable of detecting something that deviates from the
normal behavior in a profound ways and additionally, it wasn't possible to algorithmically define this deviated behavior in advance. Obviously, I started immediately to google around to find out more information about this since, lately, I was
reading some surveys about research on anomaly based detection. This is still relatively unexplored area which means not much used in real-world scenarios.
After a bit of googling I found in
Snort manual the following section:
2.2.3.4 Anomaly Detection
TCP protocol anomalies, such as data on SYN packets, data received outside the TCP window, etc are configured via the detect_anomalies option to the TCP configuration. Some of these anomalies are detected on a per-target basis. For example, a few operating systems allow data in TCP SYN packets, while others do not.
Turns out that the anomaly detection in Snort are actually anomalies that can be algorithmically codified (e.g. in TCP segment SYN bit is set and there is data in the segment). So, in conclusion, there is no algorithm for learning in standard Snort code.
That said, I found now defunct
research project that experimented with anomaly based detection in Snort. By looking into the implementation, it turns out that the authors created plugin for Snort that was
logging different features into textual log files. Those log files were then processed using R. In essence, this is good approach for experimentation but not for a production use.