Everything about nothing: software

Showing posts with label software. Show all posts

Tuesday, November 1, 2016

A bit about RSS feed readers on Linux

I'm monitoring lot of sites using RSS so having a good RSS feed reader is mandatory for me. Once upon a time, I used Liferea but since I have a lots of RSS feeds with lots of posts I want to keep around, turned out that Liferea wasn't designed with scalability in mind. So, I decided to find another one. Web based readers are out of question, because I prefer desktop applications. Not to mention that locally I have lot of disk storage that I don't have to pay, while storage in the cloud I would have to pay due to my heavy use of it.

After a search I settled on QuiteRSS. In the process I tried RSSOwl but I wasn't able to start it due to different XULRunner version on my Fedora. Besides, it turns out the last version of RSSOwl was released in December 2013, and isn't maintained any more. QuiteRSS was very good, but it turned out that the bug in Webkit started to annoy me. So, I started to explore RSS feed readers again. Note that I have the following requirements:

No Web application! I want desktop RSS feed reader with GUI interface. It would be nice, though, that I can synchronize it with a reader on a mobile phone!
I have a large number of feeds and keep a lot of new (that is unread :D) posts around. So, scalability is of paramount importance.
And last, but not least, nice looking and usable GUI.

This brought me to three candidates: QuiteRSS, FeedReader and RSSGuard. I'll describe each of them in a bit more details below. But before that, note that this is a live post, i.e. I'll still try all the mentioned readers and update it with new experience. Also, I would like to hear you comments/sugestions, so if you have any, please leave a comment.

QuiteRSS

QuiteRSS is quite good and I'm using it all the time. There is a homepage and GitHub development page. It has the ability to tag posts, mark them as a read, etc.

It is interesting to look at QuiteRSS GitHub page. From there, the following conclusions can be inferred:

QuiteRSS is quite popular, 33 watches, 180 stars and 28 forks.
QuiteRSS is basically in maintenance mode since there is no substantial activity since 2014. From 2012 to 2014 development was very intensive.
There are 212 open issues and 719 closed ones. I think that there are a lot of open issues but more thorough statistics has to be performed to know for certain.

The problems are the following ones, from the most important to the least important ones:

You have to disable JavaScript because QuiteRss often freezes on some feeds while loading. It still freezes with some RSS feeds and if that happens some history is lost (read feeds, marked/tagged feds, etc).
If you accidentally click on a link to PDF file, QuiteRSS freezes!
Once I mistakenly selected the option "Mark all news read" which is irreversible. There is no confirmation dialog for such cases.
Some posts on GitHub are in Russian. That's a problem because not everyone is speaking Russion. ;)
It depends on Qt4 and Webkit4 that are not maintained any more.

FeedReader

FeedReader is a interesting because it has two components, daemon and a front end. This is uniqe to other readers that bundle those functions together into a single binary. You can read more about this reader on its homepage, and there is also GitHub development page. Looking at the Web page, it has a lots of features but I'm using only a few, if any at this stage. Take this into account while reading this review. Looking at the GitHub page of FeedReader, the following conclusions can be inferred:

FeedReader is somehow less popular than QuiteRss. It has 26 watches (against 33 for QuiteRss), 152 stars (against 180) and 6 forks (against 28).
FeedReader is in active development, and all the activity is concentrated in 2016 with some additional in 2015.
There are 27 open issues and 197 closed ones. This is better ratio than for QuiteRss, but again more research has to be done!

First problem I had was while removing feeds. It was painful because it doesn't allow selection of multiple feeds or feed groups at once.

The next problem was that only two level hierarchy supported, while in QuiteRSS I have three level. So, importing OPML file with multiple levels will result in transforming everything into two layers.

While removing certain feed folders, some of them kept coming back! Maybe the problem was that I right-clicked on a feed and selected delete but it was necessary to first left-click and then right-click. Who will know...

RSS Guard

RSS Guard, as all the other feed readers mentioned above, has its GitHub development page. As for the homepage, it uses Wiki on GitHub. Again, by looking into GitHub page, the following conclusions can be made:

RSSGuard has 6 watches, 21 stars and 6 forks. This makes it the lowest ranked by popularity of the three RSS readers reviewed here.
RSSGuard is in development since 2013 with evenly spread development efforts. This probably means it isn't going to be finished soon.
It has 11 open issues reported and 51 closed. Which isn't that bad.

So, some shortcomings from the personal experience. It is a bit non-intuitive. It took me some time to realize that in order to import OPML file, first I have to create account. Another non-intuitive task was the process of importing itself. When you select OPML file and all the feeds appear, you click OK, but then you have to click Close. First time I clicked twice OK and got all the feeds imported twice!

~~It support multilevel feed organization, but it is not possible to fold certain feed groups, i.e. they are always unfolded!~~ I finally realized that it is possible to fold a folder, you just need to click twice in order to fold/unfold it. But, this isn't something particularly intuitive, nor visible. Namely, if the folder is folded there is no indication nor there is indication that that the folder can be folded.

When I click "Update all items" button in a toolbar, I expected that all feeds will be updated. But for some reason, that didn't happen.

Conclusion

Comparing development of each of the proposed readers, it turns out that each one of them basically depends on a single developer and has its own pros and cons. In the end, I think that despite its shortcomings, QuiteRSS is still the best feed reader closely followed by FeedReader. If development activity of FeedReader continues with the same intensity, expect that it will become the best RSS among the three.

ChangeLog

20161101 - Initial version

Tuesday, November 26, 2013

Lecture about agile software development

On November 22nd and 25th, 2013, I gave a lecture about agile software development to a group of employees of Croatian Telecom. This lecture was part of their internal education in which they were educated about different IT and business technologies. The goal of the lecture was not to teach them specifics of agile software development but to give them some kind of an overview of agile development, to contrast it to a more traditional methods, more specifically to waterfall development model, and to compare them mutually. With that in mind, I ended up with a presentation that has the following parts:

Introduction and motivation
Why is software development hard?
Traditional methodologies
Principles of Agile software development.
Some specific agile software development methodologies.
Problems encountered when introducing agile methods.
Experiences gained during introduction of agile methods from some companies.
History of agile methods.
Conclusion.

The presentation is in Croatian, and available here. In case there is demand, I'll translate it to English. I plan to enhance and improve the presentation (there is lot of place to do so). If I upload newer version I'll expand this post. But also, filename will be changed (it includes date of a release). Accompanying this presentations are references I deemed the most interesting for those employes. They can be downloaded as a single ZIP file. Note that those references are in English!

I'm using some pictures from the Internet in the presentation and I hope that I didn't broke any licence or copyright agreements. The same holds for papers in ZIP file which are given for purely educational purposes. In case I didn't do right something, please notify me.

Sunday, August 12, 2012

Implementing OSSEC log reader for Linux audit logs...

After writing log readers for mod_security and regex, I was asked in a private mail if I could implement log reader for Linux audit logs, so I decided to try. Basically, first I was thinking about implementing something more general but then I decided to keep it simple and not to overdesign it. In the conclusion section I'll return to this more complex type of log reader.

Format of linux audit logs

Log records in the Linux audit files can consist of one or more log lines. For example, here is a record that consists of three lines:

type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=2 entries=0
type=NETFILTER_CFG msg=audit(1344674083.473:7422): table=filter family=10 entries=0
type=SYSCALL msg=audit(1344674083.473:7422): arch=c000003e syscall=56 success=yes exit=5246 a0=60000011 a1=0 a2=0 a3=0 items=0 ppid=5239 pid=5245 auid=5056 uid=5056 gid=1000 euid=0 suid=0 fsuid=0 egid=1000 sgid=1000 fsgid=1000 tty=pts7 ses=5 comm="chrome-sandbox" exe="/opt/google/chrome/chrome-sandbox" subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 key=(null)

and here is the one that consist of two lines:

type=AVC msg=audit(1344110282.960:999): avc: denied { write } for pid=4690 comm="plugin-containe" name=".pulse-cookie" dev="dm-3" ino=2883770 scontext=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file
type=SYSCALL msg=audit(1344110282.960:999): arch=c000003e syscall=2 success=no exit=-13 a0=7fc060e29240 a1=80142 a2=180 a3=394f64947c items=0 ppid=4463 pid=4690 auid=5056 uid=5056 gid=1000 euid=5056 suid=5056 fsuid=5056 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=5 comm="plugin-containe" exe="/usr/lib64/xulrunner-2/plugin-container" subj=unconfined_u:unconfined_r:mozilla_plugin_t:s0-s0:c0.c1023 key=(null)

In all those cases, log records share the same ID (I'll call it a log record ID from now on) which is a number after a timestamp (I placed it in bold in the previous two examples).

Design

At least in principle, it is easy to parse those logs. We just extract that field (first number after colon). What makes it complicated is that it isn't garantueed (at least I'm not aware) that log records will not be mixed (i.e. first lines 1 of two records, then their second lines). Furthermore, the code for reading log files can be called when partial log lines, or records, are written! Finally, we don't know how many lines in each record there will be!

So, after we get a new log record ID, we have to wait a bit to see if we are going to receive another one. After that time passes, without receiving anything, we pass what we have up to now. Again, there is a choice here! We can reset timeout if we receive something new, or we can count from the first line. Basically, we can have timeout or window.

Finally, no matter how we implement wait time, there is additional quirck in the way reading log files works. Namely, we can not set timeout mechanism that will call us after some time. We are called by logcollector.c module in regular intervals defined in the configuration file (defaulting to 2 s). So, we are going to implement timeout, not in time units, but in a number of those calls. So, if you say window is 1, this means that when we find new log record ID, we'll wait that we are called once more, and then we'll join and send a record. If the interval is defined to be 2s, that means waiting time of 2s. If the window is 2, then we'll send record after the second call, or after 4s. Timeout, on the other hand, will function a bit differently. If we say that timeout is 1, that means that at the moment we find a new log record ID we start timer (initialized to 1). At the next call we first decrement all timers, then if we get that same log record ID again, we'll reinitialize timer. Finally, we send all the records that have expired timers.

Note that we are here introducing runtime per-log reader data (timers, saved logs), which is different than configuration per-log reader data (that comes exclusively from configuration files)! This will be reflected later in the implementation.

Finally, since this is a Linux specific feature, it is completly disabled if the source is compiled on (or for) windows!

Configuration

So, this is how to configure this log reader in the OSSEC's configuration files. To say that some log file is audit type, you'll use the following <log_format> element:

<log_format timeout="T" window="W">linux_auditd</log_format>

Both timeout and window are specified in time units (T and W must be numbers) defined by logcollector.loop_timeout variable (defined in internal_options.conf). You have to specify one of them. It is error to define both, or none!

Implementation notes

For keeping logs until timeout or window expires, I'm using doubly linked list. It is inefficient, but for the time being it will do. More specifically, I'm using OSSEC's list implemenation in shared/list_op.c.
For testing purposes, I also added code that is enabled by defining BUILD_TEST_BINARY. In that case read_linux_audit is compiled which accepts log file that should be read. To emulate how the log grows, binary first opens a new temporary file, then reads a random number of lines from original file, writes it to the temporary file, and calls read_linux_audit function, after which, it pauses. This is repeated until all the input from the original log file is exhausted.
To build test binary, first build evertyhing. Then, go to logcollector directory and run there 'make test'. You'll then have binary read_linux_audit.
The testing was as follows:

Run read_linux_audit on a sample audit.log and redirect output to some temporary file. Count a number of lines in a temporary file, it has to be smaller then the original file.
Using simple shell pipe "cut -f2 -d: audit.log | cut -f1 -d\) | sort | uniq | wc -l" I got how many uniqe lines. There was a difference between this and previous step.
Search for a difference (using diff for example) and analyze why it happened. :D

Conclusion

As I said in the introduction section, I was thinking about implementing a more general reader. Namely, the idea was that you give the reader regular expression, and this regular expression is executed against every line. All the lines that have the same return value are treated as a part of a single record and thus are concatenated. Probably when this reader is finished, I suppose writing that more complex one wouldn't be a problem.

I also fixed few small bugs in the code I sent previously, and, the new patch can be found here.

Friday, August 10, 2012

Adding new log types to OSSEC...

I'm using OSSEC for log monitoring, and while it is a great tool there are some drawbacks. One is that (not so) occasionally Java stack traces are dumped into syslog and they are not properly handled/parsed by OSSEC. The other drawback is that I want OSSEC to monitor mod_security logs, which are multi line logs with variable number of lines belonging to each HTTP request/response processed by mod_security. So, I wanted to modify OSSEC in order to allow it to handle those cases, too. To be able to modify OSSEC I started with the analysis of its source (version 2.6) and based on the analysis, I wrote this text. The goal of the analysis was to get acquainted on how OSSEC handles log files, document this knowledge, and to propose solution that would handle Java stack traces and mod_security logs.

Configuring log files monitoring

First, ehere are some global parameters that control global log monitoring behavior of each OSSEC agent, or server. More specifically, logcollector.loop_timeout in internaloptions.conf defines how much time (in seconds) OSSEC will wait before checking if there are additions to all log files that it monitors. Default value is 2s, with minimal allowed value of 1 second and maximum 120s.
This is actually the most important parameter. There are two additional ones, logcollector.open_attempts with allowed values between 2 and 998, and logcollector.debug with allowed values 0, 1 and 2.

Supported log types in OSSEC

If I correctly understand it, OSSEC treats log files as consisting of records. Each record being independent from the previous, or any later ones. Rules, then, act on records. In the majority of cases the record is identical to a single line of a log file. This is, I suppose, legacy from syslog in which each line was (and still is) separate log entry. In the mean time OSSEC was extended so that it supports more different log files:

multiline log (read_multiline). The problem with this log format is that it expects that each record consists of a fixed number of lines.
snort full log (read_snortfull). This is the closest one to what I need with respect to mod_security, i.e. this one reads several lines, combines them, and sends them to log analyzer.
some others I won't analyze now.

The length of the record is restricted to 6144 characters (constant OS_MAXSTR defined in headers/defs.h). Everything above that length will be cut off and discarded, with an error message logged.

Configuring log files

So, there are different log types and to tell ossec for some file which type it is, you use <logformat> element associated with each monitored file. Each monitored file is defined in element <localfile> directly beneath <ossec_conf> element in the ossec configuration file, for example:

<ossec_config>
   ...
   <localfile>
      <log_format>syslog</log_format>
      <location>path_to_log_file</location>
   </localfile>
   ...
</ossec_config>

The configuration file, along with all of its elements, is read by configuration loader placed in src/config subdirectory. The localfile element is processed in function Read_Localfile (file localfile-config.c). For each <localfile> one logreader structure (defined in localfile-config.h) is allocated and initialized. This structure has the following content:

typedef struct _logreader
{
    unsigned int size;
    int ign;

    #ifdef WIN32
    HANDLE h;
    int fd;
    #else
    ino_t fd;
    #endif
    /* ffile - format file is only used when
     * the file has format string to retrieve
     * the date,
     */
    char *ffile;
    char *file;
    char *logformat;
    char *djb_program_name;
    char *command;
        char *alias;
    void (*read)(int i, int *rc, int drop_it);
    FILE *fp;
}logreader;

Analyzing localfile-config.c file I came to the following conclusions:

Underneath <localfile> element the following elements are accepted/valid: <location&gt, <command>, <log_format>, <frequency>, and <alias>.
<location> defines the file, with full path, that is being monitored.
<log_format> tells OSSEC in which format are records stored. The following are hardcoded values in the Read_Local.c file: syslog, generic, snort-full, snort-fast, apache, iis, squid, nmapq, mysql_log, mssql_log, postgresql_log, djb_multilog, syslog-pipe, command, full_command, multi-line, and eventlog. What's interesting is that some of those (marked in bold) don't appear later in log collector!
the syntax of accepted multi-line log_format is:
<log_format>multiline[ ]*:[ ]*[0-9]*[ ]*</log_format>
note that it is not an error, i.e. you can avoid writing any number, but that number defines how many lines should be concatenated before being sent to analyzer module, so it is important you don't skip it. The number part is made available in logformat field of logreader structure. Furthermore, when check is made to determine which type of logformat is some logff structure, to detemine it is a multiline the first character is checked if it is a digit. If it is, then it is a multiline format. This is actually a hack because in the original design it wasn't planned to have parameters to certain log types!
<command> is used to run some external process, collect its output, and send it to central log. Each line outputed by the command will be treated as a separate log record. Empty lines are removed, i.e. ignored. Note that command is passed to shell using popen library call. So, you can place shell commands there too.
in the code there is a reference to a full_command log type format, but it is not supported in the configuration file. The difference is that this form reads command's output as a single log record.
the value of element <frequency> is stored into logreader.ign field. But the use of this field is strange because it is overwritten with number of failed attempts to open log file. I would assume that this parameter would, somehow, allow rate limiting.
When defining the value of <location> element certain variables can be used. In case of Windows operating systems, that means % variables (e.g. %SYSTEM% and similar). In case of Unix/Linux glob patterns are allowed (these are not allowed on Windows). Also, strftime time format specifiers can be used and in that case strftime function will be called with time format specifier and current local time to produce final log's file name.
<alias> defines an alias for a log file. It is used in read_command.c and read_fullcommand.c files, only. Probably to be substituted instead of command in the output sent to log collector (for readability purposes, instead of the whole complex command line you see just its short version).
The function pointer should point to a function returning nothing (void). But, all the read functions return void * pointer and this return value is forced during assignment of a function to this field. Finally, this return value is never used.
The return code (*rc parameter) of read functions is also used in a strange way and only once it is different than 0.

Reading and processing log files

Actual monitoring of log files is done by logcollector module (contained in the src/logcollector subdirectory). If you look at process list you'll see it under the name ossec-logcollector.
C's main() function of Logcollector module obtains array of logreader structures from config module and calls main function LogCollectorStart(). LogCollectorStart() references logreader structures through logff array pointer.
logreader structures are then initialized.
Functions to read different type of files are placed in separate files prefixed with string read_. Each file has one globally visible entry function that has the following prototype:

void *read_something(int pos, int *rc, int drop_it);

The function return value isn't used, and most (but not all!) functions just return NULL. pos is index into structure that defines which particular file should be checked by the function. Basically, it indexes array logff. So, logff[pos] is the log file that has to be processed. rc is output parameter that contains return code. Finally, drop_it is a flag that tells reader function to drop record (drop_it != 0) or to process it as usual (drop_it == 0).
So, the conclusion is that I should/would create the following log readers:

regex reader that uses regex to define start of the block, so everything between one line that matches given regex is treated as a log record until the first line that matches the regex again. The variation of the theme is to have separate regex for the end of the block.
modsec_audit reader. A separate log reader that would combine the multiline output of modsec into a single line/record understood by ossec. In particular, I'll read only audit log of mod_security, there is also debug log which I'll ignore for the time being.

Plan

So, as I said, the goal was to solve Java and mod_security log problems. While studying the source I decided to implement two log readers. I'll leave Java for now, since I think it is better solved with modifying syslog (concatenating next line if it doesn't start at column 1). So, one reader that will process mod_security's audit log files and another one that will use regex to search for start and end of each record. To do so, obviously those two have to be implemented, but also appropriate configurations has to be defined.
Since I definitely decided to go with attributes instead of hacks (as multiline is) I also decided to convert multiline to use attribute to define number of lines that has to be concatenated.
So, new multiline definition in the configuration file will be:

<log_format lines="N">multiline</log_format>

And the attribute lines is mandatory, it defines how many lines in the log file makes one log record! On the other hand, mod_security for now will use very simple configuration style:

<log_format>modsec_audit</log_format>

Finally, regex based log type will use the following type:

<log_format start_regex="" end_regex="">regex</log_format>

And if end_regex isn't specified, then the value of start_regex will be assumed to be also end_regex. start_regex attribute is mandatory.

Well, basically, that's for the plan.

Implementation

I first implemented code that reads and parses configuration. In order to do that I had to change files src/config/localfile-config.[ch] and src/error_messages/error_messages.h files. Note that I introduced a new field void *data into logreader structure who's purpose is to keep private data of each log reader. In that way I'm not cluttering structure with lots of attributes used only sporadically. Alternatively, I could use union and place everything there, but for the time being this will do.

Then I modified src/logcollector/read_multiline.c to take its parameter from a new place in logreader structure, i.e. from private *data pointer. The small problem with how I did it is that it is dependent on 32/64-bit architecture as I'm directly manipulating with pointers, which is actually one big no-no. :) But, for the prototype it will do.

Next, I copied read_multiline.c into read_modsec_audit.c and modified it to work with mod_security audit logs. Note that mod_security, for each request and response (that are treated as a single record) creates a series of lines and blocks. Block are separated by blank lines, but everything is kept between the following lines:

-- 62a78a12-A--

...

-- 62a78a12-Z--

Blank lines before and after those that mark beginning and an end are ignored. Random looking number (62a78a12) is an identifier that binds multiple blocks together in log files. In my implementation I assume there is no interleaving of those!

Finally, I implemented read_regex.c. This one is the same as the read_modsec_audit.c but instead of fixed delimiters for start and the end or a record, delimiters are provided via configuration file in a form of regular expressions. Note that it would be possible to make read_modsec_audit a special case of read_regex via appropriate use of regular expressions in the configuration file.

When modules were finished I just had to integrate them into logcollector.[ch] (and modify how multiline is detected). And the implementation was finished.

All the changes are provided in new_log_types-v00.patch.

Some conclusions and ideas for a future work

There are few shortcomings of this implementation. First, it is implemented on Linux and only slightly tested even on that platform. So, it is very likely buggy and non-portable across different platforms. Next, there is certainly a non-portable part with respect to 32-bit vs. 64-bit pointers. I marked that part in the code. Finally, security review has to be done, after all, this is security sensitive application!

It seems to me that multiline module doesn't seem to work right, i.e. there are some corner cases when it misbehaves. Namely, there is a check that a single line doesn't exceed maximum line size, but there is no check if more than one line exceeds that threshold. And, probably, if it exceeds, then the reading will get out of sync, i.e. wrong lines will be grouped together.

For regexp module written as a part of this treatise probably additional attributes should be included, like flags so that you can say, e.g., if the match should be case sensitive or not. Or, that you can remember matches from start_regex (using () operator) and reuse them in end_regex regular expression.
For the end, I don't want to criticize too much, but the build system of OSSEC isn't what you would call: flexible. This isn't a problem for production environment, but for development it is since, to test a single change, you have to rebuild all the source. Also, OSSEC assumes you run it from the installation directory, which is owned by a root. Again, this is a problem for a development, more specifically testing. I think there is a lot of room for the improvement.

Friday, December 9, 2011

Evolution & Thunderbird

I can not describe how much Evolution annoys me! I'm using it for years, maybe 10 or so by now, and I can say with quite a bit of a confidence that it's full of bugs, at least on Fedora, and there was no release in past few years that didn't have some quirks that made me go mad! And today, it happened that it didn't want to create a meeting within a Google calendar with a usual unhelpful error message about failed authentication. To make things even more weird, it did show my available calendars on Google, and that requires authentication so it should work! After some searching on the Internet I found workaround that includes removing calendars from Evolution and re-adding them back again. This, by itself, made me closer to look for an alternative and to switch. Anyway, I started to remove all Google calendars from Evolution, but then, removing some didn't work!? The ones that were turned off because I didn't provide password for them. Even restarting Evolution didn't help. What helped in the end was that I changed username and afterward removal was successful!

The reason I'm using Evolution for so many years was that it had integrated calendar with mail client, todo lists and memos. I need at least calendar function along with a mail client. I'm already using Thunderbird but as a secondary mail client for some unimportant mail accounts and I know that it progressed quite nicely, and more importantly, it has Calendar extension. So, I started contemplating about switching mail client NOW! Well, everything OK except that I have huge mail archives stored in Evolution and I have to import them into Thunderbird. It turns out there is no migration wizard and that it has to be done manually. Then, it turned out that Thunderbird uses Mbox format, while Evolution uses Maildir format (it also used Mailbox until a year or so ago).

In essence, Mbox uses one file for all mail messages, while maildir uses one file per message. Maildir has many advantages over Mailbox and thus has become preffered way of storing mail messages on a file system. One reason I want maildir is that I'm doing backups and when one mail message is stored in a mbox (which by itself is huge file containing many messages) backup process will copy the whole file again.

Anyway, judging from the information I found on the Internet this is a long requested feature in Thunderbird. Thunderbird supports pluggable mail stores since version 3.1, but maildir format is not planned before version 11. In the end, I decided to wait a bit more and then to switch to Thunderbird.

Update

I forgot to mention one more bug. When I try to save existing calendar into ical file the evolution simply crashes, and this is 100% repeatable. I discovered this when I decided to clean up all the old calendars from evolution, but first I wanted to save them, just in case.

Sunday, January 28, 2007

OSS support for Croatian language

I was just looking at the Asterisk open source PBX and one of the features of that software is possibility of integration with Festival. Festival is a text to speech synthesis software, freely available on the Internet! And it's quite good piece of software. Using only Festival, or Festival in combination with some other application, like Asterisk, interesting services could emerge.

Now, we come to the point! I searched for possibility to use Croatian language in that application. And guess what, there is no application that supports it. There are quite few application for speech synthesis and none of them, you guessed, has support for Croatian! Actually, there is possibility of adding Croatian to those software using generic support but it's far from usefull.

So, this made me think a bit! What the hell is Croatian Ministry of Sciences and whatever else doing!? Shouldn't at least they care about this aspect of development? Shouldn't they try to invest some money in development of such software? Shouldn't they put out some tender searching for interested parties that would develop such software? Also, the license of that software should be such that afterwards this software could be used in both, open source and commercial applications, e.g. some BSD style license. And not only there is a problem with software for speech synthesis. There's no OCR capable software, syntax and grammar checking are also not well supported, if supported at all, and to talk about voice recognition is to much!

Speaking of syntax checking, thanks to enthusiasts there is some support in open source office applications, but much remains to be done and I believe that investment in that respect would help, but would help to Croatian language – and I believe that's important to the Government and also to the aforementioned Ministry.

Everything about nothing