Showing posts with label shell. Show all posts
Showing posts with label shell. Show all posts

Tuesday, November 1, 2016

Short Tip: Find files with non-printable ASCII characters

I have a directory full of different files obtained from the Internet and it turned out that some of them contain UTF-8 characters because of which indexing didn't work. So, I had to find all files that contain such characters. The solution I found was the following one:
LC_ALL=C find . -name '*[! -~]*'
This command will print all filenames with embedded unicode characters represented as question marks. Few facts about this command:
  1. Assignment (LC_ALL=C) temporarily switches to C locale during the execution of find(1) command. The effect of this is that find(1) will not interpret multibyte utf8 characters, but strictly byte-per-byte input.
  2. find(1) will then search for file name that don't contain printable ASCII characters. To see this, take a look at a glob pattern. First and last star mean that the square brackets can be anywhere within the file name. Square bracket, on the other hand, specifies class of characters outside (exclamation negates range) of a range from space (ASCII code 32) to tilde character (ASCII code 126).
The output of find(1) command will include question marks in places where byte (ASCII char) has a value below 32 or above 126. In order to see what unicode character is in the particular place, you can pipe output to, e.g. cat(1) command, like this:
LC_ALL=C find . -name '*[! -~]*' | cat
This will work because cat(1) command will have unicode encoding selected (the value of the variable LC_ALL isn't changed for it) and will properly interpret and output multibyte sequences used in utf8 coding. Actually, if you want to nitpick, cat isn't going to interpret anything but will initialize terminal to properly handle utf8 characters which will do actual interpretation. 

Saturday, February 28, 2015

Short Tip: Renaming log files to include date...

I had a bunch of a log files in the format logfilename.N.gz, but I wanted to rename them into logfilename.YYYYMMDD.gz where YYYYMMDD is a date when the file was last modified. I did it using the following for loop:
for i in logfilename.*.gz
do
    mv -i $i logfilename.`date -r $i +%Y%m%d`.gz
done
The argument -r to date(1) command tells it to use the last modification date (mtime) of a file given as the argument to the option. Note that it is also possible to use stat(1) command instead of date(1).

Friday, June 27, 2014

Detecting which directory is changing...

Suppose that you have some directory with a lot of subdirectories. Of all those subdirectories, one of them is changing in size, while all the others are of a constant size. The question is, how to detect which subdirectory is that?

This happened to me while I was downloading mail archives from IETF. lftp client, that I'm using, shows only a file that it is currently downloading, not a directory in which it is, i.e. the output looks something like this:
lftp ftp.ietf.org:/> mirror ietf-mail-archive
`2010-04.mail' at 518040 (50%) 120.1K/s eta:4s [Receiving data]
                             
A solution to search for a given file won't work because this particular filename is in almost every directory.

The solution I used, was the following shell command:
$ ( du -sk *; sleep 5; du -sk * ) | sort | uniq -u
36204 mmusic
36848 mmusic
This command has to be executed inside ietf-mail-archive directory. It works as follows:
  1. First 'du -sk *' command lists all directory sizes.
  2. Then it sleeps for a five seconds (sleep 5) waiting for a directory that is currently changing, to change its size.
  3. Again we get all the directory sizes using the second du -sk command.
  4. Parentheses around all three are used so all of those commands execute within a subshell and that we receive output of both du commands.
  5. Then, we sort output. Note that the directories that don't change will be one after the another, while the one that changes won't be. 
  6. Finally, we use uniq command to filter out all the duplicate lines, meaning, only the directory that changed will be passed to the output.

Thursday, March 8, 2012

Using ffmpeg tool to recode audio files...

I'm using ffmpeg library in one simulation project to support VoIP traffic. What bothers me is that on certain input files in mp3 format I constantly received error message about format error and then simulator would segmentation fault. The output codec is g726. So, to determine if this is a bug in my code, or something is really wrong with mp3 file I decided to use ffmpeg command line tool.

It turned out that it is necessary to experiment quite a bit with this command to achieve what you want, and in my case to recode mp3 file into wav file in g726.

First, I tried with the following command:
ffmpeg -i test.mp3 test.g726
That command specifies that ffmpeg should use test.mp3 as the input file and the output should be stored in test.g726 file. But, it stopped with the following error:
[NULL @ 0x2014c60] Unable to find a suitable output format for 'test.g726'
The problem in this case is that ffmpeg tries to deduce which codec to use based on the extension of the output file. In my case this extension didn't mean anything, it was only indicator to me which codec is used. So, I used option acodec to force use of g726:
ffmpeg -i test.mp3 -acodec g726 test.g726
Still no luck, the following error message was reported:
[NULL @ 0x21aec60] Unable to find a suitable output format for 'test.g726'
but, that gave me the clue. The problem is that output file format isn't recognized! I wanted to place g726 output into WAV file, so either changing extension from g726 into wav, or using -f option will do:
ffmpeg -i test.mp3 -acodec g726 test.wav
Now I got another error message. It's good, it means I'm progressing. The error message is:
[g726 @ 0x1dad1e0] Bitrate - Samplerate combination is invalid
This time the problem is that mp3 input file has 44khz sampling rate which is not supported by g726. So, using ar option, I specified sampling rate supported by g726, i.e. 8khz:
ffmpeg -i test.mp3 -acodec g726 -ar 8k test.wav
Then, a new message appeared:
[g726 @ 0x1f511e0] Only mono is supported
Ok, this is easy too, MP3 file is in stereo, while g726 supports only mono. So, I have to choose only one channel (or stream) from input file. Digging a bit through the manual gave the answer, I should use ac option. In other words, default number of output channels is the same as input ones. By using ac options it can be changed to 1, i.e. to mono output:
ffmpeg -i lucky.mp3 -acodec g726 -ar 8k -ac 1 test.wav
This solved the previous error, but now I got a new one:
[g726 @ 0xb6b1e0] Unsupported number of bits 8
Well, this is a confusing error message because it's obvious that it wants to use 8 bits per sample, but the question is where. Also, when you look into diagnostic information you'll notice that everything is 16 bits! But then, I realized that it uses sample frequency and default bit rate to calculate bits per sample, which turns out to be 8 bits. So, by increasing sampling frequency the problem is solved:
ffmpeg -i test.mp3 -acodec g726 -ar 16k -ac 1 test.wav
Finally, I come from where I started, to check if there is an error in input file, and it turns there is not, or, it is suppresed by ffmpeg. So, the problem is in my code obviously.

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive