Sunday, August 12, 2012

Implementing OSSEC log reader for Linux audit logs...

After writing log readers for mod_security and regex, I was asked in a private mail if I could implement log reader for Linux audit logs, so I decided to try. Basically, first I was thinking about implementing something more general but then I decided to keep it simple and not to overdesign it. In the conclusion section I'll return to this more complex type of log reader.

Format of linux audit logs

Log records in the Linux audit files can consist of one or more log lines. For example, here is a record that consists of three lines:
type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=2 entries=0
type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=10 entries=0
type=SYSCALL msg=audit(1344674083.473:7422): arch=c000003e syscall=56 success=yes exit=5246 a0=60000011 a1=0 a2=0 a3=0 items=0 ppid=5239 pid=5245 auid=5056 uid=5056 gid=1000 euid=0 suid=0 fsuid=0 egid=1000 sgid=1000 fsgid=1000 tty=pts7 ses=5 comm="chrome-sandbox" exe="/opt/google/chrome/chrome-sandbox" subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 key=(null)
and here is the one that consist of two lines:
type=AVC msg=audit(1344110282.960:999): avc: denied { write } for pid=4690 comm="plugin-containe" name=".pulse-cookie" dev="dm-3" ino=2883770 scontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file
type=SYSCALL msg=audit(1344110282.960:999): arch=c000003e syscall=2 success=no exit=-13 a0=7fc060e29240 a1=80142 a2=180 a3=394f64947c items=0 ppid=4463 pid=4690 auid=5056 uid=5056 gid=1000 euid=5056 suid=5056 fsuid=5056 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=5 comm="plugin-containe" exe="/usr/lib64/xulrunner-2/plugin-container" subj=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 key=(null)
In all those cases, log records share the same ID (I'll call it a log record ID from now on) which is a number after a timestamp (I placed it in bold in the previous two examples).

Design

At least in principle, it is easy to parse those logs. We just extract that field (first number after colon). What makes it complicated is that it isn't garantueed (at least I'm not aware) that log records will not be mixed (i.e. first lines 1 of two records, then their second lines). Furthermore, the code for reading log files can be called when partial log lines, or records, are written! Finally, we don't know how many lines in each record there will be!

So, after we get a new log record ID, we have to wait a bit to see if we are going to receive another one. After that time passes, without receiving anything, we pass what we have up to now. Again, there is a choice here! We can reset timeout if we receive something new, or we can count from the first line. Basically, we can have timeout or window.

Finally, no matter how we implement wait time, there is additional quirck in the way reading log files works. Namely, we can not set timeout mechanism that will call us after some time. We are called by logcollector.c module in regular intervals defined in the configuration file (defaulting to 2 s). So, we are going to implement timeout, not in time units, but in a number of those calls. So, if you say window is 1, this means that when we find new log record ID, we'll wait that we are called once more, and then we'll join and send a record. If the interval is defined to be 2s, that means waiting time of 2s. If the window is 2, then we'll send record after the second call, or after 4s. Timeout, on the other hand, will function a bit differently. If we say that timeout is 1, that means that at the moment we find a new log record ID we start timer (initialized to 1). At the next call we first decrement all timers, then if we get that same log record ID again, we'll reinitialize timer. Finally, we  send all the records that have expired timers.

Note that we are here introducing runtime per-log reader data (timers, saved logs), which is different than configuration per-log reader data (that comes exclusively from configuration files)! This will be reflected later in the implementation.

Finally, since this is a Linux specific feature, it is completly disabled if the source is compiled on (or for) windows!

Configuration

So, this is how to configure this log reader in the OSSEC's configuration files. To say that some log file is audit type, you'll use the following <log_format> element:
<log_format timeout="T" window="W">linux_auditd</log_format>
Both timeout and window are specified in time units (T and W must be numbers) defined by logcollector.loop_timeout variable (defined in internal_options.conf). You have to specify one of them. It is error to define both, or none!

Implementation notes

For keeping logs until timeout or window expires, I'm using doubly linked list. It is inefficient, but for the time being it will do. More specifically, I'm using OSSEC's list implemenation in shared/list_op.c.
For testing purposes, I also added code that is enabled by defining BUILD_TEST_BINARY. In that case read_linux_audit is compiled which accepts log file that should be read. To emulate how the log grows, binary first opens a new temporary file, then reads a random number of lines from original file, writes it to the temporary file, and calls read_linux_audit function, after which, it pauses. This is repeated until all the input from the original log file is exhausted.
To build test binary, first build evertyhing. Then, go to logcollector directory and run there 'make test'. You'll then have binary read_linux_audit.
The testing was as follows:
  1. Run read_linux_audit on a sample audit.log and redirect output to some temporary file. Count a number of lines in a temporary file, it has to be smaller then the original file.
  2. Using simple shell pipe "cut -f2 -d: audit.log | cut -f1 -d\) | sort | uniq | wc -l" I got how many uniqe lines. There was a difference between this and previous step.
  3. Search for a difference (using diff for example) and analyze why it happened. :D

Conclusion

As I said in the introduction section, I was thinking about implementing a more general reader. Namely, the idea was that you give the reader regular expression, and this regular expression is executed against every line. All the lines that have the same return value are treated as a part of a single record and thus are concatenated. Probably when this reader is finished, I suppose writing that more complex one wouldn't be a problem.

I also fixed few small bugs in the code I sent previously, and, the new patch can be found here.

No comments:

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)