Everything about nothing

Saturday, February 28, 2015

Short Tip: Renaming log files to include date...

I had a bunch of a log files in the format logfilename.N.gz, but I wanted to rename them into logfilename.YYYYMMDD.gz where YYYYMMDD is a date when the file was last modified. I did it using the following for loop:

for i in logfilename.*.gz
do
mv -i $i logfilename.`date -r $i +%Y%m%d`.gz
done

The argument -r to date(1) command tells it to use the last modification date (mtime) of a file given as the argument to the option. Note that it is also possible to use stat(1) command instead of date(1).

Anomaly detection in Snort

I just got thoroughly confused when I found a statement in one whitepaper by SANS that Snort can do anomaly based detection. For me, anomaly based detection means that the software is capable of detecting something that deviates from the normal behavior in a profound ways and additionally, it wasn't possible to algorithmically define this deviated behavior in advance. Obviously, I started immediately to google around to find out more information about this since, lately, I was reading some surveys about research on anomaly based detection. This is still relatively unexplored area which means not much used in real-world scenarios.

After a bit of googling I found in Snort manual the following section:

2.2.3.4 Anomaly Detection

TCP protocol anomalies, such as data on SYN packets, data received outside the TCP window, etc are configured via the detect_anomalies option to the TCP configuration. Some of these anomalies are detected on a per-target basis. For example, a few operating systems allow data in TCP SYN packets, while others do not.

Turns out that the anomaly detection in Snort are actually anomalies that can be algorithmically codified (e.g. in TCP segment SYN bit is set and there is data in the segment). So, in conclusion, there is no algorithm for learning in standard Snort code.

That said, I found now defunct research project that experimented with anomaly based detection in Snort. By looking into the implementation, it turns out that the authors created plugin for Snort that was logging different features into textual log files. Those log files were then processed using R. In essence, this is good approach for experimentation but not for a production use.

Friday, January 9, 2015

Getting free disk space in Linux

While working on a script to have full Zimbra backups as many days in the past as possible, I was trying to automatically remove old backups based on the free space value. Basically, the idea was to remove directory by directory until free space reached some threshold. To find out free space on a disk is easy, use df(1) command. Basically, it looks like this:

$ df -k /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 56267084 39311864 16938836 70% /

The problem is that it is necessary to use some postprocessing in order to obtain desired value, i.e. 5th or 5th column. cut(1) command, in this case, is a bit problematic because in general you can not expect that the output is so nicely formatted, nor it is fixed. For example, based on the width of the widest device node in the first column, it is automatically resized. That in turn means number of whitespaces varies, and you end up being forced to use something else than cut(1). Probably, the most appropriate tool is awk(1), since awk(1) can properly parse fields separated with variable number of whitespaces. In addition, you need to get rid of first line. That can be done using head(1)/tail(1), but it is more efficient to use awk(1) itself. So, you end up with the following construct:

$ df -k / | awk 'NR==2 {print $4}'
16938836

But, for some reason, I wasn't satisfied with the given solution because I thought I'm using too complex tools for something that should be simpler than that. So, I started to search is there some other way to obtain free space of some partition. It turned out that stat(1) command is able to do that, but it's rarely used for that purpose. It is used to find out data about files, or directories, but not file systems. Yet, there is an option, -f, that tells stat(1) we are querying file system, and also there is an option --format which accepts format sequences in a style of date(1) command. So, to get the free space on root file system you can use it as follows:

$ stat -f --format "%f" /
4238805

stat(1) command without --format option prints all the data about file system it can find out:

$ stat -f /
File: "/"
ID: b8a4e1f0a2aefb22 Namelen: 255 Type: ext2/ext3
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 14066771 Free: 4238805 Available: 4234709
Inodes: Total: 3588096 Free: 2151591

This makes it in some way analogous to df(1) command. But, we are getting values in blocks, instead of kilobytes! You can get block size using %S format sequence, but that's it. So, some additional trickery is needed. One solution is to output arithmetic expression and evaluate it using bc(1) command, like this:

$ stat -f --format "%f * %S" / | bc
17362145280

Alternatively, it is also possible to use shell's arithmetic evaluation like this:

$ echo $((`stat -f --format "%f * %S" /`))17362145280

But, in both cases we are starting two process. In a first case the processes are stat(1) and bc(1), and in the second case it is a new subshell (for backtick) and stat(1). Note that this is the same as the solution with awk(1). But in case of awk(1) we are starting two more complex tools of which one, df(1), is more targeted to display value to a user than to be used in scripts. One additional advantage of a method using awk(1) might be portability, i.e. I'm df(1)/awk(1) combination is probably more common than stat(1)/bc(1) combination.

Anyway, the difference probably isn't so big with respect to performance, but obviously there is another way to do it, and it was interesting to pursue an alternative.