Sunday, January 28, 2007

OSS support for Croatian language

I was just looking at the Asterisk open source PBX and one of the features of that software is possibility of integration with Festival. Festival is a text to speech synthesis software, freely available on the Internet! And it's quite good piece of software. Using only Festival, or Festival in combination with some other application, like Asterisk, interesting services could emerge.

Now, we come to the point! I searched for possibility to use Croatian language in that application. And guess what, there is no application that supports it. There are quite few application for speech synthesis and none of them, you guessed, has support for Croatian! Actually, there is possibility of adding Croatian to those software using generic support but it's far from usefull.

So, this made me think a bit! What the hell is Croatian Ministry of Sciences and whatever else doing!? Shouldn't at least they care about this aspect of development? Shouldn't they try to invest some money in development of such software? Shouldn't they put out some tender searching for interested parties that would develop such software? Also, the license of that software should be such that afterwards this software could be used in both, open source and commercial applications, e.g. some BSD style license. And not only there is a problem with software for speech synthesis. There's no OCR capable software, syntax and grammar checking are also not well supported, if supported at all, and to talk about voice recognition is to much!

Speaking of syntax checking, thanks to enthusiasts there is some support in open source office applications, but much remains to be done and I believe that investment in that respect would help, but would help to Croatian language – and I believe that's important to the Government and also to the aforementioned Ministry.

3 comments:

Anonymous said...

Hello,

I totaly agree with you about speech synthesis and Croatian language, Ministry of Scinece :), etc. I've done some research about it and it turns out that some organizations (like the blind association) use the open source Czech speech synthesis for Croatian. It doesn't sound so bad, but rather funny... imagine a Czech speeking Croatian :) The good thing about this engine is that it's rule driven so the configuration for Croatian could be made with less efort than some other engines. As a postgraduate student at FER I pointed this out to my mentor and (again the balkan sindrom of pulling you back to the swamp) there was no interest.

Tom

Stjepan Groš (sgros) said...

Wow, first comment... Though I'm almost a month late with reaction...

Anyway, I personally disagree with a term "balkan". The point is that the current situation is a consequence of the old system and a war. And it will take time until things are set straight, i.e. until capable people take appropriate positions.

As for the speech synthesis system, and all the related stuff. I tried to contact the Ministry via some professors at the Faculty but there is no response. Probably, will never be...

Now, I have an explanation why there is no interest, as you mention. Because it takes a lot of time and energy to start moving the things. So, we come to the choice, trying to go against windmills or trying to do something more achievable.

Anonymous said...

This is great info to know.

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive