Everything about nothing: ossec

Showing posts with label ossec. Show all posts

Friday, December 28, 2012

OSSEC error 'remote_commands'...

While upgrading one of the agents from ossec version 2.6 to 2.7, I was testing agent configuration and I got the following error message:

ossec-logcollector(2301): ERROR: Definition not found for: 'logcollector.remote_commands'.

It didn't appear before, and more importantly, I haven't had a slightest idea what's the problem! So, I decided to dig a bit further to find out. BTW, I removed timestamp column from the log entry as it is not important here.

So, what I found is that this is a new configuration variable introduced in 2.7 version of OSSEC. It is expected to be defined in internal_options.conf file. The reason I got it is that my internal_options.conf was from 2.6.

This variable is a boolean flag (accepted values are 0 and 1) and its purpose is to allow administrator to control whether the agent will accept commands from the manger, or not. This value is used when configuration is loaded, here. If it is set to 0 then any command configurations will be ignored, e.g. the ones like the following one:

<command>
    <name>host-deny</name>
    <executable>host-deny.sh</executable>
    <expect>srcip</expect>
    <timeout_allowed>yes</timeout_allowed>
</command>

For each ignored configuration entry, there will be appropriate notification message in the log file, something like the following message:

Remote commands are not accepted from the manager. Ignoring it on the agent.conf

Tuesday, October 30, 2012

Installing ossec client on CentOS 6...

Ok, I did this already, but I managed to forget it. Still, it isn't strange, after all, it's not that you are adding new machines every day. Anyway, here are the steps that are need in order to install OSSEC client on a CentOS machine, more specifically CentOS 6. I decided to write this post if someone also needs these instructions, but certainly for me so that next time I have to do it I don't have to think a lot. Note that I like to install RPM packages because it is easier to update them instead compiling from source, and also someone else is worrying about new releases. Additionally, it's not so good to install development environment on production machines that don't need it, for security reasons. Ok, here we go.

First, make sure that you have EPEL repository added. The easiest way to do this is using the following command (note, bold is what you type, the rest is what you get from the machine):

# rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
Retrieving http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-7.noarch.rpm
warning: /var/tmp/rpm-tmp.7IMdWB: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing... ##################################### [100%]
1:epel-release ##################################### [100%]

Second, fetch necessary packages. I didn't want to install Atomicorp's repository, so I only fetched ossec packages using wget. ossec-hids and ossec-hids-client are what you need. Select the newest versions you can find. Next, install them using yum command:

# yum localinstall ossec-hids-client-2.6-15.el6.art.x86_64.rpm ossec-hids-2.6-15.el6.art.x86_64.rpm

I assumed that yum is executed in the same directory where you placed downloaded packages. Also, if you downloaded some other versions, change names appropriately.

Open ossec's configuration file, /var/ossec/etc/ossec-agent.conf, and change the line that has <server-ip></server-ip> element. It has to point to your server's IP address. You can also add files to be monitored in addition to the existing ones, or remove some of the existing ones if they are not used on the machine you are installing ossec client.

Now, go to the OSSEC server and run there agent management tool. It is probably in /var/ossec/bin:

# ./manage_agents

****************************************
* OSSEC HIDS v2.5-SNP-100907 Agent manager.     *
* The following options are available: *
****************************************
   (I)mport key from the server (I).
   (Q)uit.
Choose your action: I or Q: A

- Adding a new agent (use '\q' to return to the main menu).
Please provide the following:
   * A name for the new agent: centos6.domain.local
   * The IP Address of the new agent: 192.168.10.41
   * An ID for the new agent[030]: <just press ENTER>
Agent information:
   ID:030
   Name:centos6.domain.local
   IP Address:192.168.10.41

Confirm adding it?(y/n): y
Agent added.

Note that the tool doesn't display all the options you have on your disposal. Next what you need to do is to extract a key that you'll import into the client. This is also done using manage_clients tool, so either start it again, or in case you didn't exit after you added an agent just continue:

****************************************
* OSSEC HIDS v2.5-SNP-100907 Agent manager.     *
* The following options are available: *
****************************************
   (I)mport key from the server (I).
   (Q)uit.
Choose your action: I or Q: e

Available agents:
   ID: 002, Name: somehost, IP: 10.0.10.1
   ID: 030, Name: centos6.domain.local, IP: 192.168.10.41
Provide the ID of the agent to extract the key (or '\q' to quit): 030
Agent key information for '030' is:
<here a very long string will be printed>
** Press ENTER to return to the main menu.

Again, option to export the key isn't listed in the help message! Anyway, copy the very long string that is printed (agent's key) and you can quit from the tool and logout from the OSSEC server.

Go now to ossec client, change directory to /var/ossec/bin and run manage_client tool:

# ./manage_client

****************************************
* OSSEC HIDS v2.6 Agent manager.     *
* The following options are available: *
****************************************
   (I)mport key from the server (I).
   (Q)uit.
Choose your action: I or Q: I

* Provide the Key generated by the server.
* The best approach is to cut and paste it.
*** OBS: Do not include spaces or new lines.

Paste it here (or '\q' to quit):
<very long string copied here!>

Agent information:
   ID:030
   Name:centos6.domain.local
   IP Address:192.168.10.41

Confirm adding it?(y/n): y
Added.

Finally, restart ossec client:

# /etc/init.d/ossec-hids restart
Shutting down ossec-hids: [ OK ]
Starting ossec-hids: [ OK ]

You should see you new client in OSSEC's Web interface which should confirm that it is running OK.

Friday, August 17, 2012

How to communicate with OSSEC deamons via Unix socket...

OSSEC daemons, when started, open Unix sockets for a local communication. For example, ossec-execd opens the following socket:

/var/ossec/var/queue/alerts/execq

On which, it waits for commands. If you try to send it message using echo, or in some similar way, you'll receive an error message:

$ echo 1 message > /opt/ossec/var/queue/alerts/execq
bash: /opt/ossec/var/queue/alerts/execq: No such device or address

because, it's not a pipe. But, it is possible to "manually" send it command using socat utility. socat is very capable utility with, equivalently complex syntax. In this case you should run it like follows:

$ socat - UNIX-CLIENT:/opt/ossec/var/queue/alerts/execq

What we are saying in this case is that we want socat to relay messages between stdin (first address, minus) and unix socket in which it is a client (i.e. the socket already has to be opened/created).

Now, whatever you type, will go to ossec-execd. This can be monitored either in ossec's logs, or if we start ossec-execd in debug mode (without forking), in the terminal.

Sunday, August 12, 2012

About active responses in OSSEC

I already wrote about OSSEC's active response feature, and I said that I'm going to write a bit more after I study a bit more thorougly how it works. After I did analysis of log collecting feature of OSSEC, I decided to finally look at this too. So, here is the essey about what I found out by analysing code of OSSEC 2.6 with respect to active response. This essay will intergrate also the previous post, so, you don't have to read that previous one if you didn't already. Since active responses are bound to rules - triggering rule triggers active response - I'll touch also on rules, but not in a lot of details. Only as much is necessary to explaing active response.
I'll start with the purpose of active response, then I'll continue to configuration of active response, and I'll finish with implementation details and conclusions.

Purpose of active response

The idea behind active response in OSSEC's is to activelly block potential offenders and in that way to allow system to defend itself instead of passively monitoring what's happening. In some way this classifies OSSEC also as an IPS. As with all other IPSes, the feature is great but can bite you in case you are not careful, or, if you are using it without realising how it actually works. In the current implementation of OSSEC, active response is restricted to act on offending IP addresses and users.

Configuring active response

Active response configuration consists of two parts. In the first part you define all the commands, and in the second part - active response - you define how and when they are called. When writing configuration you should first list all of you commands, and then active responses that reference them. The reason is the way how the code was written, i.e., you will get undefined command error in case you don't list commands before they are used.
Note that final component, and important, connection between rules and active responses. Namely, rules are those that get triggered, and if they are triggered then active responses bound to them will be activated. In this post, we'll use manual method of activation, but I'll also describe how active responses are connected to rules.
It is interesting, and important, that when the configuration files are analyzed, the code writes all enabled active responses into the file shared/ar.conf, which is placed underneath the main OSSEC directory (by default it is /var/ossec). We'll come to exact format and purpose of that file later.
The following text is based on the analysis of config/active-response.[ch], analysisd/active-response.[ch] files, util/agent_control.c and some others.

Defining commands

All the commands that will be run as the part of the active response should be defined using <command> element placed directly beneath <ossec_config> top-level element. The structure should look something like this:

<ossec_config>
...

<command>
</command>
...

<command>
</command>
...
</ossec_config>

Each <command> element can have the following subelements:

name - This will be identifier by which the command will be referenced in active response definitions, i.e. the second part. Obviously, this is a mandatory subelement. You shouldn't use dashes here because the way parsing is done throughout OSSEC and that will most likely confuse it! It is also advasible to avoid using spaces for the same reasons!
expect - Mandatory to specify. The value can be either user or srcip. Due to the use of OS_Regex, and the way it calls OSRegex_Compile, those two strings are case insensitive, i.e. you can capitalize them as you wish, the meaning is the same.
executable - Mandatory element, defines the exact name of the executable. The path where the executable is placed is predefined during compilation time. Default value is active-response/bin.
timeout_allowed - Allowed values are yes and no. Basically, this tells OSSEC if the effect of the command (after it is called) should be reversed, i.e., it's effect have to be cleared by new invocation. This is optional and default value is no. To signal to a script if it is called for the first time, or as a part of a reversal process, the first argument to the script will be add or delete.

Defining active responses

The second part of the configuration defines active responses themselves. Active responses are configured (again) in OSSEC's configuration file using <active-response> elements that are placed directly within <ossec_config> top-level element, like this:

<ossec_config>
...

<active-response>
</active-response>
...

<active-response>
</active-response>
...
</ossec_config>

Under each <active-response> element the following subelements can be used to more specifically define active response:

command - string value that references one <command> element. This is mandatory to specify.
location - one of the following values: AS, analysisd, analysis-server, server, local, defined-agent, all, or any. AS, analysisd, analysis-server, server are synonyms, and so are all and any. You must specify this.
agent_id - in case the value of the location is defined-agent, then this defines which agent and is mandatory to specify.
rules_id - comma separated list of rule IDs to which this active response should be bound. This is an optional element without a default value.
rules_group - name of a rule group to which this active response will be bound.
level - Integer between 0 and 20, inclusive. If some rule has level equal to or grater than this value than it is bound to active response.
timeout - Numerical value without constraints.
disabled - Can have values yes or no.
repeated_offenders - This element and its value are ignored.

Note one interesting thing. Namely, there is no ID bound to a specific active response. This means that the relationship between command and active response is 1:1. It seems somehow meaningless, beacuse, it is not possible to have some command (i.e. one <command> element) bind to multiple active responses commands each with a different set of parameters. Or, I'm missing something important?
When OSSEC agent starts it reads and parses <command> and <active-response> elements and writes shared/ar.conf file (actually, when any component of ossec starts if the previous paragraph is true). In that file each line defines one active response. Lines have the following printf like format:

%s - %s - %d

The first string is name of the active response and of the command (becaues active response doesn't have it's own name, as I explained). The second string is the executable. Finally, the third is a timeout value if it is defined. If not, 0 will be written in that field.

Manually triggering active responses - execution flow

To manually test how active response works you can use the utility agent_control whose source can be found in util/ top-level directory of OSSEC source. Note that only responses that require IP address can be activated. This tool should be run on a manager node and it actually connects to local ossec-remoted instance and sends it the appropriate commands that will trigger requested active response.
To trigger active reponse, using the utility, you have to specify three things:

Offending IP address (option -b).
ID of an agent where active response will be triggered (option -u).
Which active response to run (option -f).

For example, to tell an agent whose ID is 003 that it should run firewall-drop600 active response and pass it IP adress 192.168.1.1, you would invoke agent_control in the following way:

agent_control -u 003 -f firewall-drop600 -b 192.168.1.1

What will happen than is that agent_control will connect to the local ossec-remoted instance and send it the following command:

(msg_to_agent) [] NNS 002 firewall-drop600 - 192.168.1.1 (from_the_server) (no_rule_id)

The interesting thing here is that there is possibility to run active response on all nodes but this functionality is not enabled in agent_control utility. It boils down not to specify agent ID, in which case all agents are assumed and the previous command would look now like this:

(msg_to_agent) [] ANN (null) firewall-drop600 - 192.168.1.1 (from_the_server) (no_rule_id)

ossec-remoted accepts message from agent_control utility and based on that message sends a message over the network to agent. This message is encrypted, but in our case (running firewall-drop600 with IP address 192.168.1.1 as an argument) it will have the following format:

#!-execd firewall-drop600 - 192.168.1.1 (from_the_server) (no_rule_id)

The three characters at the beginning (#!-) are message header, and "execd " is a command. More commands can be seen in headers/rc.h file. Note that there is no agent ID in the message. That is because this information is used only by ossec-remoted so that it knows to whom it should send the message. So, no matter if you sent the message to some specific agent, or all of them, the message will look the same.
The message is received and preparsed by ossec-clientd in function receive_msg (in file client-agent/receiver.c) and then sent to ossec-execd (in top-level directory os_execd) via local message queue starting with a letter f, i.e. message header and command are stripped off by ossec-clientd.
It is interesting that ossec-execd, when started, reads standard configuration file (in XML) and, based on active response configuration, creates shared/ar.conf (unless changed during compilation). Then, it enters main loop, and there when active response is triggered for the first time, it reads shared/ar.conf instead of main XML configuration file. It has also one additional interesting behavior. Namely, if unknown active response is triggered it will first reread shared/ar.conf and if the active response isn't found then it will signal error. This, I suppose, can be used to dynamically update a set of active responses.
But, back to our example. ossec-execd receives message and assumes that everything to the first space is the command name. Using that token it then searches for a command. There are two interesting thing here:

The list of commands is taken from shared/ar.conf file that is dynamically created, not from ossec.conf or some similar.
If the command isn't found, then the shared/ar.conf is read again, and the search is peformed again. Only now, if there is no command, an error is reported and processing of a message is stopped. Note that no return message is sent to manager!

When the command is found, standard execve(2) structures are created, but twice. One for the first execution and the other for timeout value. Only later it is checked if timeout is supported/requested by command, and if it is not then memory is released. The arguments are copied from the message (spaces are used to separate arguments) with additional argument inserted first. Namely, add is inserted if the command is called first time, while delete is inserted in case it is called as a consequence of a timeout.
Finally, before executing command, one more check is performed. Namely, if the command was run, and timeout is hanging waiting to trigger, then command will not be called again, nor new timeout value will be called. The message from the manager will be ignored!

Automatic triggering of active responses

An active response is tirggered when the rule it is bound to is triggered. A single rule can have multiple active responses attached to it. Note that there are three ways in which you can bind rule and active response:

Using level element in active response. Namely, if the rule has attached level that is greater than or equal to the level of active response then the active response is attached to the rule.
If a rule ID is listed in the rules_id element of the active response, then the active response is attached to the rule.
If the group name to which rule belongs is listed in the rules_group element of active response, than they are bound together. The check is done very simply, namely it only checks that the value specified in rules_group occurs within group name. For example, if group name is "syslog,pam" then rules_group value "syslog" will match, as well as "pam", but "apache" will not. Note that even the value "syslog,pam" will match!

Rules and active responses are bound together (the code in analysisd/rules.c) when the ossec-analysisd is started.

Some implementation details

Configuration reading and parsing is implemented in the config/active-response.c file.
The data contained in each <active-response> element is kept in a structure active_response defined in src/config/active-response.h file. The elements of that structure quite closely mimic the configuration layout, i.e. this structure is defined as follows:

/** Active response data **/
typedef struct _ar{
int timeout;
int location;
int level;
char *name;
char *command;
char *agent_id;
char *rules_id;
char *rules_group;
ar_command *ar_cmd;
}active_response;

That's it, not much. :)

Conclusion

Ok, I think I managed to document active response in OSSEC at least better then the majority of the existing documents. Tell me what you think. :)
Next, let me say that I'm impressed with XML parsers shipped with OSSEC. It seems to be very featerfull even though it is written in C, which is usually not used for XML processing.
One of the interesting things of OSSEC is that it reads the same configuration files multiple times. For example, in ossec-analysisd deamon, the configuration file is first read by active response initialization code (active-response.c) and then by the deamon itself. The reason why is this so interesting is that, during parsing of element <active-response> the file shared/ar.conf is written each time. In other words, the operation isn't idempotent. This is also problematic from the performance reasons standpoint, but probably less so.
While reading the code I sumbled on several interesting things, not directly related to the topic of this post. But since they are important, I'll mention them here and maybe I devote some future post to some of them:

Top level directory addagent is misleading. It actually contains binary manage-agents which is much more encompassing then that. So, it should be renamed.
In the top-level directory util/ there are some tools that should be moved or integrated with manage-agents utility.

Implementing OSSEC log reader for Linux audit logs...

After writing log readers for mod_security and regex, I was asked in a private mail if I could implement log reader for Linux audit logs, so I decided to try. Basically, first I was thinking about implementing something more general but then I decided to keep it simple and not to overdesign it. In the conclusion section I'll return to this more complex type of log reader.

Format of linux audit logs

Log records in the Linux audit files can consist of one or more log lines. For example, here is a record that consists of three lines:

type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=2 entries=0
type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=10 entries=0
type=SYSCALL msg=audit(1344674083.473:7422): arch=c000003e syscall=56 success=yes exit=5246 a0=60000011 a1=0 a2=0 a3=0 items=0 ppid=5239 pid=5245 auid=5056 uid=5056 gid=1000 euid=0 suid=0 fsuid=0 egid=1000 sgid=1000 fsgid=1000 tty=pts7 ses=5 comm="chrome-sandbox" exe="/opt/google/chrome/chrome-sandbox" subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 key=(null)

and here is the one that consist of two lines:

type=AVC msg=audit(1344110282.960:999): avc: denied { write } for pid=4690 comm="plugin-containe" name=".pulse-cookie" dev="dm-3" ino=2883770 scontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file
type=SYSCALL msg=audit(1344110282.960:999): arch=c000003e syscall=2 success=no exit=-13 a0=7fc060e29240 a1=80142 a2=180 a3=394f64947c items=0 ppid=4463 pid=4690 auid=5056 uid=5056 gid=1000 euid=5056 suid=5056 fsuid=5056 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=5 comm="plugin-containe" exe="/usr/lib64/xulrunner-2/plugin-container" subj=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 key=(null)

In all those cases, log records share the same ID (I'll call it a log record ID from now on) which is a number after a timestamp (I placed it in bold in the previous two examples).

Design

At least in principle, it is easy to parse those logs. We just extract that field (first number after colon). What makes it complicated is that it isn't garantueed (at least I'm not aware) that log records will not be mixed (i.e. first lines 1 of two records, then their second lines). Furthermore, the code for reading log files can be called when partial log lines, or records, are written! Finally, we don't know how many lines in each record there will be!

So, after we get a new log record ID, we have to wait a bit to see if we are going to receive another one. After that time passes, without receiving anything, we pass what we have up to now. Again, there is a choice here! We can reset timeout if we receive something new, or we can count from the first line. Basically, we can have timeout or window.

Finally, no matter how we implement wait time, there is additional quirck in the way reading log files works. Namely, we can not set timeout mechanism that will call us after some time. We are called by logcollector.c module in regular intervals defined in the configuration file (defaulting to 2 s). So, we are going to implement timeout, not in time units, but in a number of those calls. So, if you say window is 1, this means that when we find new log record ID, we'll wait that we are called once more, and then we'll join and send a record. If the interval is defined to be 2s, that means waiting time of 2s. If the window is 2, then we'll send record after the second call, or after 4s. Timeout, on the other hand, will function a bit differently. If we say that timeout is 1, that means that at the moment we find a new log record ID we start timer (initialized to 1). At the next call we first decrement all timers, then if we get that same log record ID again, we'll reinitialize timer. Finally, we send all the records that have expired timers.

Note that we are here introducing runtime per-log reader data (timers, saved logs), which is different than configuration per-log reader data (that comes exclusively from configuration files)! This will be reflected later in the implementation.

Finally, since this is a Linux specific feature, it is completly disabled if the source is compiled on (or for) windows!

Configuration

So, this is how to configure this log reader in the OSSEC's configuration files. To say that some log file is audit type, you'll use the following <log_format> element:

<log_format timeout="T" window="W">linux_auditd</log_format>

Both timeout and window are specified in time units (T and W must be numbers) defined by logcollector.loop_timeout variable (defined in internal_options.conf). You have to specify one of them. It is error to define both, or none!

Implementation notes

For keeping logs until timeout or window expires, I'm using doubly linked list. It is inefficient, but for the time being it will do. More specifically, I'm using OSSEC's list implemenation in shared/list_op.c.
For testing purposes, I also added code that is enabled by defining BUILD_TEST_BINARY. In that case read_linux_audit is compiled which accepts log file that should be read. To emulate how the log grows, binary first opens a new temporary file, then reads a random number of lines from original file, writes it to the temporary file, and calls read_linux_audit function, after which, it pauses. This is repeated until all the input from the original log file is exhausted.
To build test binary, first build evertyhing. Then, go to logcollector directory and run there 'make test'. You'll then have binary read_linux_audit.
The testing was as follows:

Run read_linux_audit on a sample audit.log and redirect output to some temporary file. Count a number of lines in a temporary file, it has to be smaller then the original file.
Using simple shell pipe "cut -f2 -d: audit.log | cut -f1 -d\) | sort | uniq | wc -l" I got how many uniqe lines. There was a difference between this and previous step.
Search for a difference (using diff for example) and analyze why it happened. :D

Conclusion

As I said in the introduction section, I was thinking about implementing a more general reader. Namely, the idea was that you give the reader regular expression, and this regular expression is executed against every line. All the lines that have the same return value are treated as a part of a single record and thus are concatenated. Probably when this reader is finished, I suppose writing that more complex one wouldn't be a problem.

I also fixed few small bugs in the code I sent previously, and, the new patch can be found here.

Friday, August 10, 2012

Adding new log types to OSSEC...

I'm using OSSEC for log monitoring, and while it is a great tool there are some drawbacks. One is that (not so) occasionally Java stack traces are dumped into syslog and they are not properly handled/parsed by OSSEC. The other drawback is that I want OSSEC to monitor mod_security logs, which are multi line logs with variable number of lines belonging to each HTTP request/response processed by mod_security. So, I wanted to modify OSSEC in order to allow it to handle those cases, too. To be able to modify OSSEC I started with the analysis of its source (version 2.6) and based on the analysis, I wrote this text. The goal of the analysis was to get acquainted on how OSSEC handles log files, document this knowledge, and to propose solution that would handle Java stack traces and mod_security logs.

Configuring log files monitoring

First, ehere are some global parameters that control global log monitoring behavior of each OSSEC agent, or server. More specifically, logcollector.loop_timeout in internaloptions.conf defines how much time (in seconds) OSSEC will wait before checking if there are additions to all log files that it monitors. Default value is 2s, with minimal allowed value of 1 second and maximum 120s.
This is actually the most important parameter. There are two additional ones, logcollector.open_attempts with allowed values between 2 and 998, and logcollector.debug with allowed values 0, 1 and 2.

Supported log types in OSSEC

If I correctly understand it, OSSEC treats log files as consisting of records. Each record being independent from the previous, or any later ones. Rules, then, act on records. In the majority of cases the record is identical to a single line of a log file. This is, I suppose, legacy from syslog in which each line was (and still is) separate log entry. In the mean time OSSEC was extended so that it supports more different log files:

multiline log (read_multiline). The problem with this log format is that it expects that each record consists of a fixed number of lines.
snort full log (read_snortfull). This is the closest one to what I need with respect to mod_security, i.e. this one reads several lines, combines them, and sends them to log analyzer.
some others I won't analyze now.

The length of the record is restricted to 6144 characters (constant OS_MAXSTR defined in headers/defs.h). Everything above that length will be cut off and discarded, with an error message logged.

Configuring log files

So, there are different log types and to tell ossec for some file which type it is, you use <logformat> element associated with each monitored file. Each monitored file is defined in element <localfile> directly beneath <ossec_conf> element in the ossec configuration file, for example:

<ossec_config>
   ...
   <localfile>
      <log_format>syslog</log_format>
      <location>path_to_log_file</location>
   </localfile>
   ...
</ossec_config>

The configuration file, along with all of its elements, is read by configuration loader placed in src/config subdirectory. The localfile element is processed in function Read_Localfile (file localfile-config.c). For each <localfile> one logreader structure (defined in localfile-config.h) is allocated and initialized. This structure has the following content:

typedef struct _logreader
{
    unsigned int size;
    int ign;

    #ifdef WIN32
    HANDLE h;
    int fd;
    #else
    ino_t fd;
    #endif
    /* ffile - format file is only used when
     * the file has format string to retrieve
     * the date,
     */
    char *ffile;
    char *file;
    char *logformat;
    char *djb_program_name;
    char *command;
        char *alias;
    void (*read)(int i, int *rc, int drop_it);
    FILE *fp;
}logreader;

Analyzing localfile-config.c file I came to the following conclusions:

Underneath <localfile> element the following elements are accepted/valid: <location&gt, <command>, <log_format>, <frequency>, and <alias>.
<location> defines the file, with full path, that is being monitored.
<log_format> tells OSSEC in which format are records stored. The following are hardcoded values in the Read_Local.c file: syslog, generic, snort-full, snort-fast, apache, iis, squid, nmapq, mysql_log, mssql_log, postgresql_log, djb_multilog, syslog-pipe, command, full_command, multi-line, and eventlog. What's interesting is that some of those (marked in bold) don't appear later in log collector!
the syntax of accepted multi-line log_format is:
<log_format>multiline[ ]*:[ ]*[0-9]*[ ]*</log_format>
note that it is not an error, i.e. you can avoid writing any number, but that number defines how many lines should be concatenated before being sent to analyzer module, so it is important you don't skip it. The number part is made available in logformat field of logreader structure. Furthermore, when check is made to determine which type of logformat is some logff structure, to detemine it is a multiline the first character is checked if it is a digit. If it is, then it is a multiline format. This is actually a hack because in the original design it wasn't planned to have parameters to certain log types!
<command> is used to run some external process, collect its output, and send it to central log. Each line outputed by the command will be treated as a separate log record. Empty lines are removed, i.e. ignored. Note that command is passed to shell using popen library call. So, you can place shell commands there too.
in the code there is a reference to a full_command log type format, but it is not supported in the configuration file. The difference is that this form reads command's output as a single log record.
the value of element <frequency> is stored into logreader.ign field. But the use of this field is strange because it is overwritten with number of failed attempts to open log file. I would assume that this parameter would, somehow, allow rate limiting.
When defining the value of <location> element certain variables can be used. In case of Windows operating systems, that means % variables (e.g. %SYSTEM% and similar). In case of Unix/Linux glob patterns are allowed (these are not allowed on Windows). Also, strftime time format specifiers can be used and in that case strftime function will be called with time format specifier and current local time to produce final log's file name.
<alias> defines an alias for a log file. It is used in read_command.c and read_fullcommand.c files, only. Probably to be substituted instead of command in the output sent to log collector (for readability purposes, instead of the whole complex command line you see just its short version).
The function pointer should point to a function returning nothing (void). But, all the read functions return void * pointer and this return value is forced during assignment of a function to this field. Finally, this return value is never used.
The return code (*rc parameter) of read functions is also used in a strange way and only once it is different than 0.

Reading and processing log files

Actual monitoring of log files is done by logcollector module (contained in the src/logcollector subdirectory). If you look at process list you'll see it under the name ossec-logcollector.
C's main() function of Logcollector module obtains array of logreader structures from config module and calls main function LogCollectorStart(). LogCollectorStart() references logreader structures through logff array pointer.
logreader structures are then initialized.
Functions to read different type of files are placed in separate files prefixed with string read_. Each file has one globally visible entry function that has the following prototype:

void *read_something(int pos, int *rc, int drop_it);

The function return value isn't used, and most (but not all!) functions just return NULL. pos is index into structure that defines which particular file should be checked by the function. Basically, it indexes array logff. So, logff[pos] is the log file that has to be processed. rc is output parameter that contains return code. Finally, drop_it is a flag that tells reader function to drop record (drop_it != 0) or to process it as usual (drop_it == 0).
So, the conclusion is that I should/would create the following log readers:

regex reader that uses regex to define start of the block, so everything between one line that matches given regex is treated as a log record until the first line that matches the regex again. The variation of the theme is to have separate regex for the end of the block.
modsec_audit reader. A separate log reader that would combine the multiline output of modsec into a single line/record understood by ossec. In particular, I'll read only audit log of mod_security, there is also debug log which I'll ignore for the time being.

Plan

So, as I said, the goal was to solve Java and mod_security log problems. While studying the source I decided to implement two log readers. I'll leave Java for now, since I think it is better solved with modifying syslog (concatenating next line if it doesn't start at column 1). So, one reader that will process mod_security's audit log files and another one that will use regex to search for start and end of each record. To do so, obviously those two have to be implemented, but also appropriate configurations has to be defined.
Since I definitely decided to go with attributes instead of hacks (as multiline is) I also decided to convert multiline to use attribute to define number of lines that has to be concatenated.
So, new multiline definition in the configuration file will be:

<log_format lines="N">multiline</log_format>

And the attribute lines is mandatory, it defines how many lines in the log file makes one log record! On the other hand, mod_security for now will use very simple configuration style:

<log_format>modsec_audit</log_format>

Finally, regex based log type will use the following type:

<log_format start_regex="" end_regex="">regex</log_format>

And if end_regex isn't specified, then the value of start_regex will be assumed to be also end_regex. start_regex attribute is mandatory.

Well, basically, that's for the plan.

Implementation

I first implemented code that reads and parses configuration. In order to do that I had to change files src/config/localfile-config.[ch] and src/error_messages/error_messages.h files. Note that I introduced a new field void *data into logreader structure who's purpose is to keep private data of each log reader. In that way I'm not cluttering structure with lots of attributes used only sporadically. Alternatively, I could use union and place everything there, but for the time being this will do.

Then I modified src/logcollector/read_multiline.c to take its parameter from a new place in logreader structure, i.e. from private *data pointer. The small problem with how I did it is that it is dependent on 32/64-bit architecture as I'm directly manipulating with pointers, which is actually one big no-no. :) But, for the prototype it will do.

Next, I copied read_multiline.c into read_modsec_audit.c and modified it to work with mod_security audit logs. Note that mod_security, for each request and response (that are treated as a single record) creates a series of lines and blocks. Block are separated by blank lines, but everything is kept between the following lines:

-- 62a78a12-A--

...

-- 62a78a12-Z--

Blank lines before and after those that mark beginning and an end are ignored. Random looking number (62a78a12) is an identifier that binds multiple blocks together in log files. In my implementation I assume there is no interleaving of those!

Finally, I implemented read_regex.c. This one is the same as the read_modsec_audit.c but instead of fixed delimiters for start and the end or a record, delimiters are provided via configuration file in a form of regular expressions. Note that it would be possible to make read_modsec_audit a special case of read_regex via appropriate use of regular expressions in the configuration file.

When modules were finished I just had to integrate them into logcollector.[ch] (and modify how multiline is detected). And the implementation was finished.

All the changes are provided in new_log_types-v00.patch.

Some conclusions and ideas for a future work

There are few shortcomings of this implementation. First, it is implemented on Linux and only slightly tested even on that platform. So, it is very likely buggy and non-portable across different platforms. Next, there is certainly a non-portable part with respect to 32-bit vs. 64-bit pointers. I marked that part in the code. Finally, security review has to be done, after all, this is security sensitive application!

It seems to me that multiline module doesn't seem to work right, i.e. there are some corner cases when it misbehaves. Namely, there is a check that a single line doesn't exceed maximum line size, but there is no check if more than one line exceeds that threshold. And, probably, if it exceeds, then the reading will get out of sync, i.e. wrong lines will be grouped together.

For regexp module written as a part of this treatise probably additional attributes should be included, like flags so that you can say, e.g., if the match should be case sensitive or not. Or, that you can remember matches from start_regex (using () operator) and reuse them in end_regex regular expression.
For the end, I don't want to criticize too much, but the build system of OSSEC isn't what you would call: flexible. This isn't a problem for production environment, but for development it is since, to test a single change, you have to rebuild all the source. Also, OSSEC assumes you run it from the installation directory, which is owned by a root. Again, this is a problem for a development, more specifically testing. I think there is a lot of room for the improvement.

Thursday, January 19, 2012

OSSEC active response...

This is still work in progress (I need to add more about configuration part). But since OSSEC is so badly documented and I don't know when this will be finished I'm publishing it now.

Prompted by the problem caused by the OSSEC active response, I decided to try to debug why an error is occurring in logs and fine tune it. But in order to do that I had first to understand how it works. There is a section in OSSEC manual about active response, but what wasn't immediately clear to me is who runs active response and where this is configured. But soon I found out that the active response is run on the client via agent (actually this part wasn't problematic) but that active response is configured on the server and the server is the one that instructs agents to run active response.

On server you'll find active response configuration in $OSSEC/etc/ossec.conf. In that file you have several parts of the configuration:

Set of <command> blocks. Each one defines a command for active response and the arguments it expects. Note that those commands have to exist on agents.
Set of <active-response> blocks that define circumstances under which there will be active response and which command will be executed.

Scripts for active response

Scripts for active response receive five arguments. The first argument is either add or delete. If add is specified, given IP address should be added to a ban list, in case delete is specified address should be removed from the ban list. The second argument is a user name. The third argument is offending IP address. Fourth argument is Unix timestamp (microsecond resolution) when the script was called. Final argument is rule number that triggered active response.

firewall-drop.sh script

This script ships with OSSEC and it adds or removes IP addresses from firewall. It is a relatively simple shell script that accepts command line arguments as specified in the introduction of this section and installs IP address in the ban list (or removes it from there, depending on the command line arguments). The script could be run on Linux, FreeBSD, NetBSD, Solaris and AIX. The following description is specific to Linux behavior even though some things will be common across different platforms.

This script logs its activity into active-responses.log file (in my case in /var/ossec/logs directory). For each invocation one line is emitted into that file. Here is an example of one log entry:

Thu Jan 19 13:52:29 CET 2012 /var/ossec/active-response/bin/firewall-drop.sh add - 193.41.36.141 1326977549.1358625 3301

First group of fields is timestamp when the log entry was generated. Next is a full path and name of the script itself. Finally, all five arguments given to script are also recorded.

Ocasionally, you'll also see error messages like the following one:

Thu Jan 19 06:17:19 CET 2012 Unable to run (iptables returning != 1): 1 - /var/ossec/active-response/bin/firewall-drop.sh delete - 208.115.236.82 1326949601.551727 3301

This log entry is a bit misleading. What happened is that iptables command returned exit code 1 (judging from the log entry it could be interpreted as if it returned something else and 1 was expected, but that's not true). What is important to note is that you'll usually see multiple log entries like the previous one grouped together and the only thing that will differ between them is the number 1 (shown in bold above). What basically happens is that in case of an error returned by iptables command the script tries to run it five times, so, you'll usually see five records and each of those records is numbered.

There are two places where this error might occur. The first one is when the IP address is removed from INPUT chain, while the other is when it is removed from FORWARD chain.

The only reason this error can occur is because someone or something already removed IP address (or added it). But, this should not happen. Still, it happened to em but I don't know the reason for that.

Looking into this script it was clear that it could be improved from the logging perspective. Currently, if you manually run this command from the command line it will write part of error messages to stdout and some other to log file.

Manual control of scripts

Scripts for active response can be started from the server using agent_control tool. Note that the help message of this tool isn't updated to reflect real arguments (bug?) so I had to look into source to infer how to call it. Let me give you several examples of its use.

To list all available agent use this command in the following way:

# ./agent_control -l
   ID: 000, Name: agent0.somedomain.local (server), IP: 127.0.0.1, Active/Local
   ID: 001, Name: agent1.somedomain.local, IP: 192.168.1.2, Active
   ID: 002, Name: agent2.somedomain.local, IP: 192.168.1.3, Active
   ID: 003, Name: agent3.somedomain.local, IP: 192.168.1.4, Disconnected

The output of the command is a list of known agents, either active or non-active. In case you want only active agents use -lc option instead.

Next, if you want to find out some information about a certain agent, you can query it in the following way:

# ./agent_control -i 002

OSSEC HIDS agent_control. Agent information:
   Agent ID:   002
   Agent Name: agent2.somedomain.local
   IP address: 192.168.1.3
   Status:     Active

   Operating system:    Linux agent2.somedomain.local 2.6.32-131.17.1.el6.x86_64..
   Client version:      OSSEC HIDS v2.5-SNP-100907
   Last keep alive:     Thu Jan 19 13:26:01 2012

   Syscheck last started at: Thu Jan 19 12:10:16 2012
   Rootcheck last started at: Thu Jan 19 06:33:06 2012

To activate active response on a certain agent use the following form of the agent_control command:

./agent_control -b 1.1.1.1 -f firewall-drop600 -u 002

Here, the IP address to be blocked is argument of the -b option (in this case 1.1.1.1). There could be more responses available (defined in ossec.conf on server) and the option -f selects which one to run. To see which responses are available use the option -L, like this:

# ./agent_control -L

OSSEC HIDS agent_control. Available active responses:

Response name: host-deny600, command: host-deny.sh
Response name: firewall-drop600, command: firewall-drop.sh

Agent on which this command should initiate active response is specified via ID given as the parameter to option -u. Note that, if you look into help output of the agent_control, this option is not listed, at least not in the version 2.6.0. There is a bit of inconsistency here, as some commands use agent ID as a parameter, while this one requires separate parameter. It would be more uniform if all command would instead use -u option.

Note that when you manually initiate active response then fourth argument to the script will be '(from_the_server)' and the fifth argument will be '(no_rule_id)'.

Also, the rule that was added can not be removed with agent_control, you have to wait for it to timeout when it will be automatically removed.

Tuesday, January 17, 2012

Interesting problem with OSSEC, active response and mail delivery...

We had a problem that manifested itself in such a way that mail messages didn't come from certain domains, or more specifically from certain mail servers. Furthermore, no clue was given in the mail log to know what went wrong and to make things worse, logs from the remote mail server were inaccessible to see there what actually happened. Finally, the worse thing was that this happened sporadically. It turned out that this was consequence of a circumstances and a bug with ossec active response. This post explains what happened.

We changed DNS domain several months ago, let me call the new domain newdomain.hr, and the old one olddomain.hr. DNS was reconfigured so that it correctly handled requests for a new domain, but we had to leave old domain because of some Web server. The the old domain was changed so that when someone asked which is a mail exchanger for olddomain.hr it would receive response: mail.newdomain.hr. Finally, domain olddomain.hr was removed from the mail server. This was the first error, and now I think that it is better either to leave old domain on mail server or to not return any response! Actually, if you want to get rid of the old domain, it is the best to remove it from the mail server and that DNS server doesn't return any response for a mail exchanger of a given domain. If you know how mail works, you'll know that by changing MX record for old domain from mail.olddomain.hr to mail.newdomain.hr didn't change anything!

Anyway, that's the part concerning mail. Now, about OSSEC. It has a possibility of active response, i.e. to block offending IP addresses for a certain amount of time, 10 minutes by default. One class of offending IP addresses are those that try to deliver mail messages which require mail server to be open relay. Since mail server is properly configured it rejects those messages with a message 'Relay denied'. After mail server rejects such delivery attempt OSSEC kicks in and blocks offending IP address for 10 minutes.

This, by itself didn't have to be a problem because blocking rules are automatically removed after 10 minutes. But, there is a bug in the removal script that manifested itself in the logs like follows (found on agent in /var/ossec/logs/active-responses.log):

Unable to run (iptables returning != 1): 1 - /var/ossec/active-response/bin/firewall-drop.sh delete - 203.83.62.99 1326738019.2370422 3301

For some reason removal of IP address from block list wasn't successful and that basically meant that the source mail host is blocked indefinitely!

Majority of mail servers that to generate such 'Relay denied' messages are truly spammers and if some of them were indefinitely blocked that was actually good. But, this particular source mail server that was blocked is very popular one with many users serving many different domains, so now when some other user tried to send an email that was legal and had correct address, IPtables blocked access and the mail couldn't be delivered. There was nothing in the logs of destination mail server. Also, sending user didn't receive any response message since mail was being temporary put on hold on the source server.

This particular problem was solved by completely removing the old domain. Now, source mail servers won't even try to deliver mails for the old domain and thus OSSEC won't block legitimate servers. Furthermore, the sending users will get notification immediately about non-existent mail address.

Tuesday, December 20, 2011

Problem with inactive agent in OSSEC Web Interface

I was just debugging OSSEC Web interface. Namely, it incorrectly showed that one host was not responding event though there were log entries that showed otherwise. The problem was that this particular host was transferred to another network, and thus, its address was changed.

I figured out that the list of available agents within Web interface is generated from a files found in /var/ossec/queue/agent-info directory. There, you'll find one file per agent. The file name itself consists of agent name and IP address separated by a single dash. In order to display if an agent is connected or not the PHP code from Web interface (which itself is placed in /usr/share/ossec-wui directory) obtains time stamp of a file belonging to a particular client and if this time stamp is younger that 20 minutes, it proclaims agent OK, otherwise, it shows it as inaccessible.

In this case it turned out that the old agent wasn't removed using manage_client tool (selecting option R, for remove). So, the old information remained, which wasn't updated and thus the Web interface reported inactive agent.

Everything about nothing