Everything about nothing: C++

Showing posts with label C++. Show all posts

Tuesday, January 1, 2013

Deprecated functions in ffmpeg library

Well, I have some code that uses some old FFMPEG library, and now, as I updated my laptop to Fedora 18, it turns out that those functions are gone for good. I found some resources about how to port old code (here, here and here), but since it wasn't what I needed I decided to write my own version. So, here we go.

url_open()

This function has been changed to avio_open. There is also url_close which is renamed to avio_close. This information I found here.

av_new_stream()

This function is still supported as of FFMPEG 1.0.1 but it is marked as deprecated. It will be replaced with avformat_new_stream(). Suppose that the old code was:

AVStream *st = av_new_stream(oc, i);

the modified code should be:

AVStream *st = avformat_new_stream(oc, NULL);
st->id = i

Be careful to check first that st isn't NULL!

dump_format()

This function was renamed to av_dump_format().

av_write_header()

Replaced with avformat_write_header() that accepts two arguments instead of one. Pass NULL as the second argument to get identical behavior to the old function.

av_codec_open()

This one is replaced with av_codec_open2(). The replacement function accepts three arguments instead of two, but put NULL as a third argument to get the same behavior as the old function.

avcodec_encode_audio()

Replaced with avcodec_encode_audio2().

av_set_parameters()

I couldn't fine the replacement for this one. First, I've found that this function doesn't have replacement. But it was when it was still available in FFMPEG, even though deprecated. Then, they removed it, and thus it has to have replacement. In certain places I found that they only disabled it, on others that its parameters have to be passed to avformat_write_header. In the end, I gave up because I didn't need working version of that part of the code for now. Since in my case avformat_alloc_context() is called and then av_set_parameters(), last what I looked at was to call avformat_alloc_output_context2() instead of avformat_alloc_context(). But the change is not trivial so I skipped it.

SampleFormat

This enum has been renamed AVSampleFormat.

URL_WRONLY

This constant has been replaced with AVIO_FLAG_WRITE.

SAMPLE_FMT_U8, SAMPLE_FMT_S16, SAMPLE_FMT_S32, etc.

Those are prefixed now with AV_, so use AV_SAMPLE_FMT_U8, AV_SAMPLE_FMT_S16, etc.

Thursday, December 27, 2012

UDP Lite...

Many people know for TCP and UDP, at least those that work in the field of networking or are learning computer networks in some course. But, the truth is that there are others too, e.g. SCTP, DCCP, UDP Lite. And all of those are actually implemented in Linux kernel. What I'm going to do is describe each one of those in the following few posts and give examples of their use. In this post, I'm going to talk about UDP Lite. I'll assume that you know UDP and also that you know how to use socket API to write UDP application.

UDP Lite is specified in RFC3828: The Lightweight User Datagram Protocol (UDP-Lite) . The basic motivation for the introduction of a new variant of UDP is that certain applications (primarily multimedia ones) want to receive packets even if they are damaged. The reason is that codecs used can recover and mask errors. UDP itself has a checksum field that covers the whole packet and if there is an error in the packet, it is silently dropped. It should be noted that this checksum is quite weak actually and doesn't catch a lot of errors, but nevertheless it is problematic for such applications. So, UDP lite changes standard UDP behavior in that it allows only part of the packet to be covered with a checksum. And, because it is now different protocol, new protocol ID is assigned to it, i.e. 136.

So, how to use UDP Lite in you applications? Actually, very easy. First, when creating socket you have to specify that you want UDP Lite, and not (default) UDP:

s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);

Next, you need to define what part of the packet will be protected by a checksum. This is achieved with socket options, i.e. setsockopt(2) system call. Here is the function that will set how many octets of the packet has to be protected:

void setoption(int sockfd, int option, int value)
{
if (setsockopt(sockfd, IPPROTO_UDPLITE, option,
(void *)&value, sizeof(value)) == -1) {
perror("setsockopt");
exit(1);
}
}

It receives socket handle (sockfd) created with socket function, option that should be set (option) and the option's value (value). There are two options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV. Option UDPLITE_SEND_CSCOV sets the number of protected octets for outgoing packets, and UDPLITE_RECV_CSCOV sets at least how many octets have to be protected in the inbound packets in order to be passed to the application.

You can also obtain values using the following function:

int getoption(int sockfd, int option)
{
int cov;
socklen_t len = sizeof(int);
if (getsockopt(sockfd, IPPROTO_UDPLITE, option,
(void *)&cov, &len) == -1) {
perror("getsockopt");
exit(1);
}
return cov;
}

This function accepts socket (sockfd) and option it should retrieve (i.e. UDPLITE_SEND_CSCOV or UDPLITE_RECV_CSCOV) and returns the option's value. Note that the two constants, UDPLITE_SEND_CSCOV or UDPLITE_RECV_CSCOV, should be explicitly defined in your source because it is possible that glibc doesn't (yet) define them.

I wrote fully functional client and server applications you can download and test. To compile them you don't need any special options. So that should be easy. Only change you'll probably need is the IP address that clients sends packets to. This is a constant SERVER_IPADDR which contains server's IP address hex encoded. For example, IP address 127.0.0.1 is 0x7f000001.

Finally, I have to say that UDP Lite will probably have problems traversing NATs. For example, I tried it on my ADSL connection and it didn't pass through the NAT. What I did is that I just started client with IP address of one of my servers on the Internet, and on that server I sniffed packets. Nothing came to the server. This will probably be a big problem for the adoption of UDP Lite, but the time will tell...

You can read more about this subject on the Wikipedia page and in the Linux manual page udplite(7).

Tuesday, December 25, 2012

Controlling which congestion control algorithm is used in Linux

Linux kernel has a quite advanced networking stack, and that's also true for congestion control. It is a very advanced implementation who's primary characteristics are modular structure and flexibility. All the specific congestion control algorithms are separated into loadable modules. The following congestion control mechanisms are available in the mainline kernel tree:

Default, system wide, congestion control algorithm is Cubic. You can check that by inspecting the content of the file /proc/sys/net/ipv4/tcp_congestion_control:

$ cat /proc/sys/net/ipv4/tcp_congestion_control
cubic

So, to change system-wide default you only have to write a name of congestion control algorithm to the same file. For example, to change it to reno you would do it this way:

# echo reno > /proc/sys/net/ipv4/tcp_congestion_control
# cat /proc/sys/net/ipv4/tcp_congestion_control
reno

Note that, to change the value, you have to be the root user. As the root you can specify any available congestion algorithm you wish. In the case the algorithm you specified isn't loaded into the kernel, via standard kernel module mechanism, it will be automatically loaded. To see what congestion control algorithms are currently loaded take a look into the content of the file /proc/sys/net/ipv4/tcp_available_congestion_control:

$ cat /proc/sys/net/ipv4/tcp_available_congestion_control
vegas lp reno cubic

It is also possible to change congestion control algorithm on a per-socket basis using setsockopt(2) system call. Here is the essential part of the code to do that:

...
int s, ns, optlen;
char optval[TCP_CA_NAME_MAX];
...
s = socket(AF_INET, SOCK_STREAM, 0);
...
ns = accept(s, ...);
...
strcpy(optval, "reno");
optlen = strlen(optval);
if (setsockopt(ns, IPPROTO_TCP, TCP_CONGESTION, optval, optlen) < 0) {
perror("setsockopt");
return 1;
}

In this fragment we are setting congestion control algorithm to reno. Note that that the constant TCP_CA_NAME_MAX (value 16) isn't defined in system include files so they have to be explicitly defined in your sources.

When you are using this way of defining congestion control algorithm, you should be aware of few things:

You can change congestion control algorithm as an ordinary user.
If you are not root user, then you are only allowed to use congestion control algorithms specified in the file /proc/sys/net/ipv4/tcp_allowed_congestion_control. For all the other you'll receive error message.
No congestion control algorithm is bound to socket until it is in the connected state.

You can also obtain current control congestion algorithm using the following snippet of the code:

optlen = TCP_CA_NAME_MAX;
if (getsockopt(ns, IPPROTO_TCP, TCP_CONGESTION, optval, &optlen) < 0) {
perror("getsockopt");
return 1;
}

Here you can download a code you can compile and run. To compile it just run gcc on it without any special options. This code will start server (it will listen on port 10000). Connect to it using telnet (telnet localhost 10000) in another terminal and the moment you do that you'll see that the example code printed default congestion control algorithm and then it changed it to reno. It will then close connection.

Instead of the conclusion I'll warn you that this congestion control algorithm manipulation isn't portable to other systems and if you use this in your code you are bound to Linux kernel.

Sunday, August 12, 2012

Implementing OSSEC log reader for Linux audit logs...

After writing log readers for mod_security and regex, I was asked in a private mail if I could implement log reader for Linux audit logs, so I decided to try. Basically, first I was thinking about implementing something more general but then I decided to keep it simple and not to overdesign it. In the conclusion section I'll return to this more complex type of log reader.

Format of linux audit logs

Log records in the Linux audit files can consist of one or more log lines. For example, here is a record that consists of three lines:

type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=2 entries=0
type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=10 entries=0
type=SYSCALL msg=audit(1344674083.473:7422): arch=c000003e syscall=56 success=yes exit=5246 a0=60000011 a1=0 a2=0 a3=0 items=0 ppid=5239 pid=5245 auid=5056 uid=5056 gid=1000 euid=0 suid=0 fsuid=0 egid=1000 sgid=1000 fsgid=1000 tty=pts7 ses=5 comm="chrome-sandbox" exe="/opt/google/chrome/chrome-sandbox" subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 key=(null)

and here is the one that consist of two lines:

type=AVC msg=audit(1344110282.960:999): avc: denied { write } for pid=4690 comm="plugin-containe" name=".pulse-cookie" dev="dm-3" ino=2883770 scontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file
type=SYSCALL msg=audit(1344110282.960:999): arch=c000003e syscall=2 success=no exit=-13 a0=7fc060e29240 a1=80142 a2=180 a3=394f64947c items=0 ppid=4463 pid=4690 auid=5056 uid=5056 gid=1000 euid=5056 suid=5056 fsuid=5056 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=5 comm="plugin-containe" exe="/usr/lib64/xulrunner-2/plugin-container" subj=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 key=(null)

In all those cases, log records share the same ID (I'll call it a log record ID from now on) which is a number after a timestamp (I placed it in bold in the previous two examples).

Design

At least in principle, it is easy to parse those logs. We just extract that field (first number after colon). What makes it complicated is that it isn't garantueed (at least I'm not aware) that log records will not be mixed (i.e. first lines 1 of two records, then their second lines). Furthermore, the code for reading log files can be called when partial log lines, or records, are written! Finally, we don't know how many lines in each record there will be!

So, after we get a new log record ID, we have to wait a bit to see if we are going to receive another one. After that time passes, without receiving anything, we pass what we have up to now. Again, there is a choice here! We can reset timeout if we receive something new, or we can count from the first line. Basically, we can have timeout or window.

Finally, no matter how we implement wait time, there is additional quirck in the way reading log files works. Namely, we can not set timeout mechanism that will call us after some time. We are called by logcollector.c module in regular intervals defined in the configuration file (defaulting to 2 s). So, we are going to implement timeout, not in time units, but in a number of those calls. So, if you say window is 1, this means that when we find new log record ID, we'll wait that we are called once more, and then we'll join and send a record. If the interval is defined to be 2s, that means waiting time of 2s. If the window is 2, then we'll send record after the second call, or after 4s. Timeout, on the other hand, will function a bit differently. If we say that timeout is 1, that means that at the moment we find a new log record ID we start timer (initialized to 1). At the next call we first decrement all timers, then if we get that same log record ID again, we'll reinitialize timer. Finally, we send all the records that have expired timers.

Note that we are here introducing runtime per-log reader data (timers, saved logs), which is different than configuration per-log reader data (that comes exclusively from configuration files)! This will be reflected later in the implementation.

Finally, since this is a Linux specific feature, it is completly disabled if the source is compiled on (or for) windows!

Configuration

So, this is how to configure this log reader in the OSSEC's configuration files. To say that some log file is audit type, you'll use the following <log_format> element:

<log_format timeout="T" window="W">linux_auditd</log_format>

Both timeout and window are specified in time units (T and W must be numbers) defined by logcollector.loop_timeout variable (defined in internal_options.conf). You have to specify one of them. It is error to define both, or none!

Implementation notes

For keeping logs until timeout or window expires, I'm using doubly linked list. It is inefficient, but for the time being it will do. More specifically, I'm using OSSEC's list implemenation in shared/list_op.c.
For testing purposes, I also added code that is enabled by defining BUILD_TEST_BINARY. In that case read_linux_audit is compiled which accepts log file that should be read. To emulate how the log grows, binary first opens a new temporary file, then reads a random number of lines from original file, writes it to the temporary file, and calls read_linux_audit function, after which, it pauses. This is repeated until all the input from the original log file is exhausted.
To build test binary, first build evertyhing. Then, go to logcollector directory and run there 'make test'. You'll then have binary read_linux_audit.
The testing was as follows:

Run read_linux_audit on a sample audit.log and redirect output to some temporary file. Count a number of lines in a temporary file, it has to be smaller then the original file.
Using simple shell pipe "cut -f2 -d: audit.log | cut -f1 -d\) | sort | uniq | wc -l" I got how many uniqe lines. There was a difference between this and previous step.
Search for a difference (using diff for example) and analyze why it happened. :D

Conclusion

As I said in the introduction section, I was thinking about implementing a more general reader. Namely, the idea was that you give the reader regular expression, and this regular expression is executed against every line. All the lines that have the same return value are treated as a part of a single record and thus are concatenated. Probably when this reader is finished, I suppose writing that more complex one wouldn't be a problem.

I also fixed few small bugs in the code I sent previously, and, the new patch can be found here.

Friday, August 10, 2012

Adding new log types to OSSEC...

I'm using OSSEC for log monitoring, and while it is a great tool there are some drawbacks. One is that (not so) occasionally Java stack traces are dumped into syslog and they are not properly handled/parsed by OSSEC. The other drawback is that I want OSSEC to monitor mod_security logs, which are multi line logs with variable number of lines belonging to each HTTP request/response processed by mod_security. So, I wanted to modify OSSEC in order to allow it to handle those cases, too. To be able to modify OSSEC I started with the analysis of its source (version 2.6) and based on the analysis, I wrote this text. The goal of the analysis was to get acquainted on how OSSEC handles log files, document this knowledge, and to propose solution that would handle Java stack traces and mod_security logs.

Configuring log files monitoring

First, ehere are some global parameters that control global log monitoring behavior of each OSSEC agent, or server. More specifically, logcollector.loop_timeout in internaloptions.conf defines how much time (in seconds) OSSEC will wait before checking if there are additions to all log files that it monitors. Default value is 2s, with minimal allowed value of 1 second and maximum 120s.
This is actually the most important parameter. There are two additional ones, logcollector.open_attempts with allowed values between 2 and 998, and logcollector.debug with allowed values 0, 1 and 2.

Supported log types in OSSEC

If I correctly understand it, OSSEC treats log files as consisting of records. Each record being independent from the previous, or any later ones. Rules, then, act on records. In the majority of cases the record is identical to a single line of a log file. This is, I suppose, legacy from syslog in which each line was (and still is) separate log entry. In the mean time OSSEC was extended so that it supports more different log files:

multiline log (read_multiline). The problem with this log format is that it expects that each record consists of a fixed number of lines.
snort full log (read_snortfull). This is the closest one to what I need with respect to mod_security, i.e. this one reads several lines, combines them, and sends them to log analyzer.
some others I won't analyze now.

The length of the record is restricted to 6144 characters (constant OS_MAXSTR defined in headers/defs.h). Everything above that length will be cut off and discarded, with an error message logged.

Configuring log files

So, there are different log types and to tell ossec for some file which type it is, you use <logformat> element associated with each monitored file. Each monitored file is defined in element <localfile> directly beneath <ossec_conf> element in the ossec configuration file, for example:

<ossec_config>
   ...
   <localfile>
      <log_format>syslog</log_format>
      <location>path_to_log_file</location>
   </localfile>
   ...
</ossec_config>

The configuration file, along with all of its elements, is read by configuration loader placed in src/config subdirectory. The localfile element is processed in function Read_Localfile (file localfile-config.c). For each <localfile> one logreader structure (defined in localfile-config.h) is allocated and initialized. This structure has the following content:

typedef struct _logreader
{
    unsigned int size;
    int ign;

    #ifdef WIN32
    HANDLE h;
    int fd;
    #else
    ino_t fd;
    #endif
    /* ffile - format file is only used when
     * the file has format string to retrieve
     * the date,
     */
    char *ffile;
    char *file;
    char *logformat;
    char *djb_program_name;
    char *command;
        char *alias;
    void (*read)(int i, int *rc, int drop_it);
    FILE *fp;
}logreader;

Analyzing localfile-config.c file I came to the following conclusions:

Underneath <localfile> element the following elements are accepted/valid: <location&gt, <command>, <log_format>, <frequency>, and <alias>.
<location> defines the file, with full path, that is being monitored.
<log_format> tells OSSEC in which format are records stored. The following are hardcoded values in the Read_Local.c file: syslog, generic, snort-full, snort-fast, apache, iis, squid, nmapq, mysql_log, mssql_log, postgresql_log, djb_multilog, syslog-pipe, command, full_command, multi-line, and eventlog. What's interesting is that some of those (marked in bold) don't appear later in log collector!
the syntax of accepted multi-line log_format is:
<log_format>multiline[ ]*:[ ]*[0-9]*[ ]*</log_format>
note that it is not an error, i.e. you can avoid writing any number, but that number defines how many lines should be concatenated before being sent to analyzer module, so it is important you don't skip it. The number part is made available in logformat field of logreader structure. Furthermore, when check is made to determine which type of logformat is some logff structure, to detemine it is a multiline the first character is checked if it is a digit. If it is, then it is a multiline format. This is actually a hack because in the original design it wasn't planned to have parameters to certain log types!
<command> is used to run some external process, collect its output, and send it to central log. Each line outputed by the command will be treated as a separate log record. Empty lines are removed, i.e. ignored. Note that command is passed to shell using popen library call. So, you can place shell commands there too.
in the code there is a reference to a full_command log type format, but it is not supported in the configuration file. The difference is that this form reads command's output as a single log record.
the value of element <frequency> is stored into logreader.ign field. But the use of this field is strange because it is overwritten with number of failed attempts to open log file. I would assume that this parameter would, somehow, allow rate limiting.
When defining the value of <location> element certain variables can be used. In case of Windows operating systems, that means % variables (e.g. %SYSTEM% and similar). In case of Unix/Linux glob patterns are allowed (these are not allowed on Windows). Also, strftime time format specifiers can be used and in that case strftime function will be called with time format specifier and current local time to produce final log's file name.
<alias> defines an alias for a log file. It is used in read_command.c and read_fullcommand.c files, only. Probably to be substituted instead of command in the output sent to log collector (for readability purposes, instead of the whole complex command line you see just its short version).
The function pointer should point to a function returning nothing (void). But, all the read functions return void * pointer and this return value is forced during assignment of a function to this field. Finally, this return value is never used.
The return code (*rc parameter) of read functions is also used in a strange way and only once it is different than 0.

Reading and processing log files

Actual monitoring of log files is done by logcollector module (contained in the src/logcollector subdirectory). If you look at process list you'll see it under the name ossec-logcollector.
C's main() function of Logcollector module obtains array of logreader structures from config module and calls main function LogCollectorStart(). LogCollectorStart() references logreader structures through logff array pointer.
logreader structures are then initialized.
Functions to read different type of files are placed in separate files prefixed with string read_. Each file has one globally visible entry function that has the following prototype:

void *read_something(int pos, int *rc, int drop_it);

The function return value isn't used, and most (but not all!) functions just return NULL. pos is index into structure that defines which particular file should be checked by the function. Basically, it indexes array logff. So, logff[pos] is the log file that has to be processed. rc is output parameter that contains return code. Finally, drop_it is a flag that tells reader function to drop record (drop_it != 0) or to process it as usual (drop_it == 0).
So, the conclusion is that I should/would create the following log readers:

regex reader that uses regex to define start of the block, so everything between one line that matches given regex is treated as a log record until the first line that matches the regex again. The variation of the theme is to have separate regex for the end of the block.
modsec_audit reader. A separate log reader that would combine the multiline output of modsec into a single line/record understood by ossec. In particular, I'll read only audit log of mod_security, there is also debug log which I'll ignore for the time being.

Plan

So, as I said, the goal was to solve Java and mod_security log problems. While studying the source I decided to implement two log readers. I'll leave Java for now, since I think it is better solved with modifying syslog (concatenating next line if it doesn't start at column 1). So, one reader that will process mod_security's audit log files and another one that will use regex to search for start and end of each record. To do so, obviously those two have to be implemented, but also appropriate configurations has to be defined.
Since I definitely decided to go with attributes instead of hacks (as multiline is) I also decided to convert multiline to use attribute to define number of lines that has to be concatenated.
So, new multiline definition in the configuration file will be:

<log_format lines="N">multiline</log_format>

And the attribute lines is mandatory, it defines how many lines in the log file makes one log record! On the other hand, mod_security for now will use very simple configuration style:

<log_format>modsec_audit</log_format>

Finally, regex based log type will use the following type:

<log_format start_regex="" end_regex="">regex</log_format>

And if end_regex isn't specified, then the value of start_regex will be assumed to be also end_regex. start_regex attribute is mandatory.

Well, basically, that's for the plan.

Implementation

I first implemented code that reads and parses configuration. In order to do that I had to change files src/config/localfile-config.[ch] and src/error_messages/error_messages.h files. Note that I introduced a new field void *data into logreader structure who's purpose is to keep private data of each log reader. In that way I'm not cluttering structure with lots of attributes used only sporadically. Alternatively, I could use union and place everything there, but for the time being this will do.

Then I modified src/logcollector/read_multiline.c to take its parameter from a new place in logreader structure, i.e. from private *data pointer. The small problem with how I did it is that it is dependent on 32/64-bit architecture as I'm directly manipulating with pointers, which is actually one big no-no. :) But, for the prototype it will do.

Next, I copied read_multiline.c into read_modsec_audit.c and modified it to work with mod_security audit logs. Note that mod_security, for each request and response (that are treated as a single record) creates a series of lines and blocks. Block are separated by blank lines, but everything is kept between the following lines:

-- 62a78a12-A--

...

-- 62a78a12-Z--

Blank lines before and after those that mark beginning and an end are ignored. Random looking number (62a78a12) is an identifier that binds multiple blocks together in log files. In my implementation I assume there is no interleaving of those!

Finally, I implemented read_regex.c. This one is the same as the read_modsec_audit.c but instead of fixed delimiters for start and the end or a record, delimiters are provided via configuration file in a form of regular expressions. Note that it would be possible to make read_modsec_audit a special case of read_regex via appropriate use of regular expressions in the configuration file.

When modules were finished I just had to integrate them into logcollector.[ch] (and modify how multiline is detected). And the implementation was finished.

All the changes are provided in new_log_types-v00.patch.

Some conclusions and ideas for a future work

There are few shortcomings of this implementation. First, it is implemented on Linux and only slightly tested even on that platform. So, it is very likely buggy and non-portable across different platforms. Next, there is certainly a non-portable part with respect to 32-bit vs. 64-bit pointers. I marked that part in the code. Finally, security review has to be done, after all, this is security sensitive application!

It seems to me that multiline module doesn't seem to work right, i.e. there are some corner cases when it misbehaves. Namely, there is a check that a single line doesn't exceed maximum line size, but there is no check if more than one line exceeds that threshold. And, probably, if it exceeds, then the reading will get out of sync, i.e. wrong lines will be grouped together.

For regexp module written as a part of this treatise probably additional attributes should be included, like flags so that you can say, e.g., if the match should be case sensitive or not. Or, that you can remember matches from start_regex (using () operator) and reuse them in end_regex regular expression.
For the end, I don't want to criticize too much, but the build system of OSSEC isn't what you would call: flexible. This isn't a problem for production environment, but for development it is since, to test a single change, you have to rebuild all the source. Also, OSSEC assumes you run it from the installation directory, which is owned by a root. Again, this is a problem for a development, more specifically testing. I think there is a lot of room for the improvement.

Thursday, January 19, 2012

Problem with avformat_open_input()

I lost too much time because of a misleading error message reported by avformat_open_input function! I used the following code that exhibit the error:

AVFormatContext *pFormatCtx = NULL;
...
ret = avformat_open_input(&pFormatCtx, argv[1], NULL, NULL);
if (ret < 0)
print_error_and_exit(ret, "avformat_open_input()");

print_error_and_exit() is a helper function that uses av_strerror() to give textual representation of the error code stored in ret. Running this code produced the following output:

$ ./a.out infile.wav outfile.wav
avformat_open_input(): No such file or directory

But the infile.wav was there! Using strace I found that open system call wasn't called and so it was certainly an internal error. This was frustrating! The reason I started to write this code was to find out why I'm getting another error in VoIP tool simulator, but I was stuck on something even more basic: Not being able to open a wav file. Googling around I finally found the following link which explained me a real cause of the error, i.e. I forgot to call av_register_all() initialization function.

This again shows how important good error reporting is, in both libraries and application programs!

I had also another problem with the previous code. It was segfaulting within avformat_open_input(). At first, I thought that I found a bug within a library but then I realized I forgot to initialize pFormatCtx to NULL. Namely, allocating a dynamic variable on stack didn't zero it so it was non-NULL and avformat_open_input() misbehaved. Mea culpa! :)

Thursday, October 13, 2011

Dennis Ritchie died...

Well, another great figure of computing has prematurely left us, according to reports on the Internet. This one isn't so well known like Steve Jobs, but his work certainly matches the one done by Jobs, and in my humble opinion, even exceeds it. His "problem", sort to speak, is that he did everything in the core area of computer science, not in the consumer part, and he did majority of his work during the years when most people even didn't know that computers exists.

The guy is Dennis Ritchie, and he invented C programming language and also took important part in the development of Unix operating system. His influence was and is great. For example, Android smart phones today all run on top of Linux, which itself started as a Unix derivative. MS DOS was a very poor copy of Unix, and it was evident that it tried to copy Unix. Windows NT in part was also influenced by Unix. Not to mention MacOS X which, in its core, is Unix! And today's biggest businesses run their core services on Unix machines, not Windows.

The C programming language was, and still is, extremely influential. First, majority of today's operating systems are written in C, and all the other languages have ability to link with libraries written in C. There are numerous applications and libraries written in C. C is, in essence, lowest common denominator. Furthermore, we have today many languages which directly or indirectly borrow features from C. For a start C++ started as an extension to C. Which itself influenced many other object-oriented programming languages. C's influence can be traced also in all other non-OO languages.

All in all, I'm very sad that he passed away. RIP Dennis Ritchie.

Everything about nothing