Everything about nothing: scripts

Showing posts with label scripts. Show all posts

Thursday, August 22, 2019

List directory sorted by length of names in it

So, for whatever reason, while running ls command, I wanted my directory to be sorted by the length of the names in it, not by some other sorting method ls uses. After a bit of trial and error experimenting, I ended up with the following pipeline to do that:

for i in *; do echo `echo "$i" | wc -c` "$i"; done | sort -n | cut -f2- -d" " | xargs -d \\n ls -Uld

Let's break this command into peaces and describe what it does.

The first compound command starting with for and ending with the first pipe character has a task to output length of a name following by a space and then by name itself. You can try to run it within some directory and what you'll get will look similar to this:

1 a
4 name
7 testing
2 ab

What we've got is something to sort on (number a.k.a. length) and we keep name as well since we need it for later.

The next command in pipeline will sort this output so that the shortest name is first, following the longer ones and finally ending up with the longest name, i.e. we'll get

1 a
2 ab
4 name
7 testing

Since we have now sorted names we don't need length any more and thus we get rid of it using cut command as the next command in the pipeline. The output after cut command will look like this:

a
ab
name
testing

Now, if there are no spaces in the names, then it's easy, just hand over this list to the ls command. The command would then look like this:

ls -Uld `for i in *; do echo `echo "$i" | wc -c` "$i"; done | sort -n | cut -f2- -d" "`

Note backticks before for and at the end of the command line! The options U, l and d cause ls not to sort anything (U), to provide long output (l) and not to list content of directories (d).

But, in case there are spaces in names, this will fail horribly, as many other things do when they encounter spaces in names. So, the trick used in this case was to employ xargs command that collects standard input and runs command with certain number of arguments collected from stdin. The xarg command is

xargs -d \\n ls -Uld

In this command with option d we are telling xargs that delimiter between arguments is new line, and not space which is default setting. The rest of the line xargs takes as-is and just adds arguments and runs a command.

And that's it!

By the way, I also unsuccessfully tried to collect arguments into array by reading names with while loop (and read command). The problem is that any variable being set within while command is lost after while finishes and I didn't managed to pass this out of the while loop.

Wednesday, August 30, 2017

Difference between command substitution and 'while read' in bash

I just changed one of my scripts that, in principle, looked like this:

for i in `find . -type d`
do
# do some processing on the found directory
done

The new format I use is:

find . -type d | while read i
do
# do some processing on the found directory
done

While both versions will work in general, the second variant is better for the following reasons:

It's faster. Namely, in the first case the find command has to finish before processing on directories starts. This isn't noticeable for small directory hierarchies, but it becomes very noticeable for large ones. In the second case the find command outputs results and in parallel while loop picks them up and does processing.
In case you have spaces embedded in directory names, the second version will work, while the first won't.

Maybe there are some other advantages (or disadvantages) of the second version, but none I can remember at the moment. If you know any, please write it in the comments!

Friday, January 9, 2015

Getting free disk space in Linux

While working on a script to have full Zimbra backups as many days in the past as possible, I was trying to automatically remove old backups based on the free space value. Basically, the idea was to remove directory by directory until free space reached some threshold. To find out free space on a disk is easy, use df(1) command. Basically, it looks like this:

$ df -k /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 56267084 39311864 16938836 70% /

The problem is that it is necessary to use some postprocessing in order to obtain desired value, i.e. 5th or 5th column. cut(1) command, in this case, is a bit problematic because in general you can not expect that the output is so nicely formatted, nor it is fixed. For example, based on the width of the widest device node in the first column, it is automatically resized. That in turn means number of whitespaces varies, and you end up being forced to use something else than cut(1). Probably, the most appropriate tool is awk(1), since awk(1) can properly parse fields separated with variable number of whitespaces. In addition, you need to get rid of first line. That can be done using head(1)/tail(1), but it is more efficient to use awk(1) itself. So, you end up with the following construct:

$ df -k / | awk 'NR==2 {print $4}'
16938836

But, for some reason, I wasn't satisfied with the given solution because I thought I'm using too complex tools for something that should be simpler than that. So, I started to search is there some other way to obtain free space of some partition. It turned out that stat(1) command is able to do that, but it's rarely used for that purpose. It is used to find out data about files, or directories, but not file systems. Yet, there is an option, -f, that tells stat(1) we are querying file system, and also there is an option --format which accepts format sequences in a style of date(1) command. So, to get the free space on root file system you can use it as follows:

$ stat -f --format "%f" /
4238805

stat(1) command without --format option prints all the data about file system it can find out:

$ stat -f /
File: "/"
ID: b8a4e1f0a2aefb22 Namelen: 255 Type: ext2/ext3
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 14066771 Free: 4238805 Available: 4234709
Inodes: Total: 3588096 Free: 2151591

This makes it in some way analogous to df(1) command. But, we are getting values in blocks, instead of kilobytes! You can get block size using %S format sequence, but that's it. So, some additional trickery is needed. One solution is to output arithmetic expression and evaluate it using bc(1) command, like this:

$ stat -f --format "%f * %S" / | bc
17362145280

Alternatively, it is also possible to use shell's arithmetic evaluation like this:

$ echo $((`stat -f --format "%f * %S" /`))17362145280

But, in both cases we are starting two process. In a first case the processes are stat(1) and bc(1), and in the second case it is a new subshell (for backtick) and stat(1). Note that this is the same as the solution with awk(1). But in case of awk(1) we are starting two more complex tools of which one, df(1), is more targeted to display value to a user than to be used in scripts. One additional advantage of a method using awk(1) might be portability, i.e. I'm df(1)/awk(1) combination is probably more common than stat(1)/bc(1) combination.

Anyway, the difference probably isn't so big with respect to performance, but obviously there is another way to do it, and it was interesting to pursue an alternative.

Friday, June 27, 2014

Detecting which directory is changing...

Suppose that you have some directory with a lot of subdirectories. Of all those subdirectories, one of them is changing in size, while all the others are of a constant size. The question is, how to detect which subdirectory is that?

This happened to me while I was downloading mail archives from IETF. lftp client, that I'm using, shows only a file that it is currently downloading, not a directory in which it is, i.e. the output looks something like this:

lftp ftp.ietf.org:/> mirror ietf-mail-archive
`2010-04.mail' at 518040 (50%) 120.1K/s eta:4s [Receiving data]

A solution to search for a given file won't work because this particular filename is in almost every directory.

The solution I used, was the following shell command:

$ ( du -sk *; sleep 5; du -sk * ) | sort | uniq -u
36204 mmusic
36848 mmusic

This command has to be executed inside ietf-mail-archive directory. It works as follows:

First 'du -sk *' command lists all directory sizes.
Then it sleeps for a five seconds (sleep 5) waiting for a directory that is currently changing, to change its size.
Again we get all the directory sizes using the second du -sk command.
Parentheses around all three are used so all of those commands execute within a subshell and that we receive output of both du commands.
Then, we sort output. Note that the directories that don't change will be one after the another, while the one that changes won't be.
Finally, we use uniq command to filter out all the duplicate lines, meaning, only the directory that changed will be passed to the output.

Thursday, July 26, 2012

Searching for packet catpuring and interface manipulation library for Python...

I needed a script that would monitor network traffic and capture and process only DHCP traffic. It turned out I couldn't find such script so I decided to write one (more about that script in another post). For a language I decided to use Python. That was the easy part. Now, I had to decide which libraries I will use that will allow me to capture network traffic, decode DHCP request and responses, and manipulate IP addresses on interfaces.

I started with the network traffic capturing. pcap library is the library for network capture, so it was natural for me to search for a Python interface to this library. I found several such interfaces, i.e. pcap, pylibpcap, pypcap, and pcapy. There is also library interface specifically for Python 3, i.e. py3kcap. While searching for pcap interface, three other Python libraries poped out: libdnet (here is the old project page), dpkt and scapy.

But, not all libraries are equal, nor they serve the same purpose. libdnet allows sending packets, manipulation with kernel's routing tables, firewall and arp cache. So, besides Ethernet and IP, it doesn't offer much more in term of supported protocols. dpkt, on the other hand, is made just for this purpose! It supports easy creation and parsing of different TCP/IP protocols. Finally, Scapy is a swiss army knife of network manipulation. It offers shell in which one can manipulate packets, but also can be used within other scripts. Unfortunately, while browsing the source of Scapy I realized that it uses os.popen interface and calls external programs. So, this actually was enough for me to eliminate scapy from further consideration.

The next elimination criteria is availability of the packages within CentOS and Fedora. I try to hold on prepackaged software as much as possible, so quick search (yum search) showed that on both, CentOS 6 and Fedora 17, there are packages for pcapy and dpkt (named python-dpkt). For some reason, there is dnet, but python interface isn't packaged. I found this bugzilla entry, but without any answer!

So, I settled on pcapy and dpkt. The only piece of puzzle that was missing now is how to manipulate interface addresses. I stumbled on netifaces, which allows me to obtain information about interfaces and also on this post for Windows. But all the results I got were on how to obtain IP address. In the end, I gave up and decided that I'll try to use libdnet even though I'll have to compile it from the source. Either that, or I'll use raw sockets and ioctls which are accessible from Python using standard libraries.

And for the end, as a curiosity, I'll mention that there is Python interface to IPTables, python-iptables, which is also packaged for Fedora.

Sunday, February 5, 2012

Error: cannot open tty-output

I wrote a script whose purpose is to offer a user an option to select test or production environment when connecting to a server and then, based on the selection, script configures environment variables appropriately. To make a script a bit more user friendly I used dialog utility. All good, until user logs in and then switches to another, non-root(!), user using su command at which point the following error is reported:

cannot open tty-output

Well, a simple and quick use of strace revealed what is the problem. Namely, initial user is owner of pseudo-terminal (e.g. /dev/pts/3) and when he/she switches to another user the ownership isn't changed so opening a terminal device is unsuccessful. What caused dialog tool to try to open pseudo-terminal in the first place was --stdout switch, and if this switch isn't used then there is no error about tty-output. But then, there is another problem and that's why this option was used in the first place. Namely, I used dialog in the following way:

ANS=`dialog ... --stdout`

to catch output of the command into variable and to be able to test it within if statement. The problem was that there was no output without --stdout option, as it was going to stderr and thus invoking shell couldn't catch output and place it into variable ANS. That is, dialog tool uses stdout to draw on terminal and stderr to output user's response!

There was solution to use temporary file for that purpose, but I was reluctant to do so. The first idea was to use pipe and read statement, something like this:

dialog ... | read ANS

and in that way to avoid use of --stdout. But the problem is that pipe components (command on left and right of pipe character) are executed within subshells and thus, the result of read command isn't visible to a parent shell. In other words, this is dead end. And besides, I still have a problem of read getting the data via dialog's stdout!

In the end, I modified dialog invocation in the following way:

ANS=`dialog ... 2>&1 > /dev/tty`

What this does is the following: parent shell invokes subshell that will execute everything between backticks, but first, it redirects stdout so that it can later place output to variable ANS. Subshell then duplicates stderr so that it points to stdout (which isn't actually stdout but is a pipe to parent shell). In this way the output of the dialog command will be taken by parent bash, i.e. the one that starts subshell to execute backtick command. Then, stdout is redirected to /dev/tty that will be always possible to open/read/write and that allows dialog to control display on terminal.

Tuesday, January 31, 2012

arpwatch on multiple interfaces

I'm regularly using arpwatch on all servers I install in order to track MAC changes and to notice potential MAC spoofings. But the problem is that on CentOS 6.2 the startup script shipped with arpwatch (package arpwatch-2.1a15-14.el6.x86_64) doesn't support multiple interfaces. More specifically, I can tell arpwatch on which interface to listen by modifying OPTIONS variable in /etc/sysconfig/arpwatch file and inserting -i <interface> option. But, I'm still restricted to a single interface. That is, it is possible to specify multiple -i options, but arpwatch still listens only on a single interface. I checked that in the source (version 2.1a15), and the last -i command is in effect, the previous one's are ignored.

So, I modified startup script so that it now accepts INTERFACES variable within /etc/syconfig/arpwatch configuration file and starts arpwatch on each specified interface. If this variable isn't defined then it behaves as before. For example, to start it on interfaces eth0 and eth1 you should add the following line in /etc/syconfig/arpwatch:

INTERFACES="eth0 eth1"

The basic idea behind this change is to start arpwatch tool multiple times, once per each specified interface. Also, to each instance I give different database (arp.dat) so that multiple instances don't overwrite each other data.

Note that the script is a bit rough on edges, i.e. it properly behaves during startup phase, but not on shudown. Also, I embedded fixed path to data files. I'll improve this script in a due course when I find more time, or when it turns out that it's necessary to do so. :)

[20120203] Update: I had a an error in script because of which database files were placed in wrong directory and, as a consequence, arpwatch couldn't write database when it was exiting. Now, the script is updated and it works, furthermore, I tested stoping arpwatch using that script and it also works

Thursday, January 26, 2012

How to detect your script is started using su...

I wrote a script that had a problem when started via su command. Actually, this is a script within /etc/profile.d so it is executed when new login shell is executed. I'll write about that problem in another post, but here I'll concetrate on how to detect su command.

But before continuing let me clarify that this is a bit of a misnomer. Namely, the goal is to detect whether current environment is a consequence of user ID switching after login, but since this is almost exclusively done using su command, then I think I can put this title. There is also one more "problem". Namely, all user IDs currently having running processes descended from user id 0. But, we are not going so far with philosophy. :)

I started by thinking/hoping that id command could identify originating user, i.e. real user. But that was not possible since the distinction between real and effective user ids is preserved only via setuid flag on files. So, another approach has to be used. There are three possibilities, each one with its own advantages and shortcomings.

Everything about nothing