Tuesday, November 1, 2016

Short Tip: Find files with non-printable ASCII characters

I have a directory full of different files obtained from the Internet and it turned out that some of them contain UTF-8 characters because of which indexing didn't work. So, I had to find all files that contain such characters. The solution I found was the following one:
LC_ALL=C find . -name '*[! -~]*'
This command will print all filenames with embedded unicode characters represented as question marks. Few facts about this command:
  1. Assignment (LC_ALL=C) temporarily switches to C locale during the execution of find(1) command. The effect of this is that find(1) will not interpret multibyte utf8 characters, but strictly byte-per-byte input.
  2. find(1) will then search for file name that don't contain printable ASCII characters. To see this, take a look at a glob pattern. First and last star mean that the square brackets can be anywhere within the file name. Square bracket, on the other hand, specifies class of characters outside (exclamation negates range) of a range from space (ASCII code 32) to tilde character (ASCII code 126).
The output of find(1) command will include question marks in places where byte (ASCII char) has a value below 32 or above 126. In order to see what unicode character is in the particular place, you can pipe output to, e.g. cat(1) command, like this:
LC_ALL=C find . -name '*[! -~]*' | cat
This will work because cat(1) command will have unicode encoding selected (the value of the variable LC_ALL isn't changed for it) and will properly interpret and output multibyte sequences used in utf8 coding. Actually, if you want to nitpick, cat isn't going to interpret anything but will initialize terminal to properly handle utf8 characters which will do actual interpretation. 

A bit about RSS feed readers on Linux

I'm monitoring lot of sites using RSS so having a good RSS feed reader is mandatory for me. Once upon a time, I used Liferea but since I have a lots of RSS feeds with lots of posts I want to keep around, turned out that Liferea wasn't designed with scalability in mind. So, I decided to find another one. Web based readers are out of question, because I prefer desktop applications. Not to mention that locally I have lot of disk storage that I don't have to pay, while storage in the cloud I would have to pay due to my heavy use of it.

After a search I settled on QuiteRSS. In the process I tried RSSOwl but I wasn't able to start it due to different XULRunner version on my Fedora. Besides, it turns out the last version of RSSOwl was released in December 2013, and isn't maintained any more. QuiteRSS was very good, but it turned out that the bug in Webkit started to annoy me. So, I started to explore RSS feed readers again. Note that I have the following requirements:
  • No Web application! I want desktop RSS feed reader with GUI interface. It would be nice, though, that I can synchronize it with a reader on a mobile phone!
  • I have a large number of feeds and keep a lot of new (that is unread :D) posts around. So, scalability is of paramount importance.
  • And last, but not least, nice looking and usable GUI. 
This brought me to three candidates: QuiteRSS, FeedReader and RSSGuard. I'll describe each of them in a bit more details below. But before that, note that this is a live post, i.e. I'll still try all the mentioned readers and update it with new experience. Also, I would like to hear you comments/sugestions, so if you have any, please leave a comment.

QuiteRSS

QuiteRSS is quite good and I'm using it all the time. There is a homepage and GitHub development page. It has the ability to tag posts, mark them as a read, etc.

It is interesting to look at QuiteRSS GitHub page. From there, the following conclusions can be inferred:
  1. QuiteRSS is quite popular, 33 watches, 180 stars and 28 forks.
  2. QuiteRSS is basically in maintenance mode since there is no substantial activity since 2014. From 2012 to 2014 development was very intensive.
  3. There are 212 open issues and 719 closed ones. I think that there are a lot of open issues but more thorough statistics has to be performed to know for certain.
The problems are the following ones, from the most important to the least important ones:
  • You have to disable JavaScript because QuiteRss often freezes on some feeds while loading. It still freezes with some RSS feeds and if that happens some history is lost (read feeds, marked/tagged feds, etc).
  • If you accidentally click on a link to PDF file, QuiteRSS freezes!
  • Once I mistakenly selected the option "Mark all news read" which is irreversible. There is no confirmation dialog for such cases.
  • Some posts on GitHub are in Russian. That's a problem because not everyone is speaking Russion. ;)
  • It depends on Qt4 and Webkit4 that are not maintained any  more.

FeedReader

FeedReader is a interesting because it has two components, daemon and a front end. This is uniqe to other readers that bundle those functions together into a single binary. You can read more about this reader on its homepage, and there is also GitHub development page. Looking at the Web page, it has a lots of features but I'm using only a few, if any at this stage. Take this into account while reading this review. Looking at the GitHub page of FeedReader, the following conclusions can be inferred:
  1. FeedReader is somehow less popular than QuiteRss. It has 26 watches (against 33 for QuiteRss), 152 stars (against 180) and 6 forks (against 28).
  2. FeedReader is in active development, and all the activity is concentrated in 2016 with some additional in 2015.
  3. There are 27 open issues and 197 closed ones. This is better ratio than for QuiteRss, but again more research has to be done!
First problem I had was while removing feeds. It was painful because it doesn't allow selection of multiple feeds or feed groups at once.

The next problem was that only two level hierarchy supported, while in QuiteRSS I have three level. So, importing OPML file with multiple levels will result in transforming everything into two layers.

While removing certain feed folders, some of them kept coming back! Maybe the problem was that I right-clicked on a feed and selected delete but it was necessary to first left-click and then right-click. Who will know...

RSS Guard

RSS Guard, as all the other feed readers mentioned above, has its GitHub development page. As for the homepage, it uses Wiki on GitHub. Again, by looking into GitHub page, the following conclusions can be made:
  1. RSSGuard has 6 watches, 21 stars and 6 forks. This makes it the lowest ranked by popularity of the three RSS readers reviewed here.
  2. RSSGuard is in development since 2013 with evenly spread development efforts. This probably means it isn't going to be finished soon.
  3. It has 11 open issues reported and 51 closed. Which isn't that bad.
So, some shortcomings from the personal experience. It is a bit non-intuitive. It took me some time to realize that in order to import OPML file, first I have to create account. Another non-intuitive task was the process of importing itself. When you select OPML file and all the feeds appear, you click OK, but then you have to click Close. First time I clicked twice OK and got all the feeds imported twice!

It support multilevel feed organization, but it is not possible to fold certain feed groups, i.e. they are always unfolded! I finally realized that it is possible to fold a folder, you just need to click twice in order to fold/unfold it. But, this isn't something particularly intuitive, nor visible. Namely, if the folder is folded there is no indication nor there is indication that that the folder can be folded.

When I click "Update all items" button in a toolbar, I expected that all feeds will be updated. But for some reason, that didn't happen.

Conclusion

Comparing development of each of the proposed readers, it turns out that each one of them basically depends on a single developer and has its own pros and cons. In the end, I think that despite its shortcomings, QuiteRSS is still the best feed reader closely followed by FeedReader. If development activity of FeedReader continues with the same intensity, expect that it will become the best RSS among the three.

ChangeLog

  • 20161101 - Initial version


About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive