Everything about nothing: socket

Showing posts with label socket. Show all posts

Sunday, March 23, 2014

Two or more applications listening on the same (IP, port) socket

It might happen that you need two applications to listen to the same (IP, port) UDP socket with the idea that the applications know how to differentiate between packets that are intended for them. If this is the case, then you'll have to do something special because the kernel doesn't allow two, or more, applications to bind in such a way. As a side note, starting with kernel 3.9 there is a socket option SO_REUSEPORT that allows multiple applications to bind to a single port (under certain preconditions) but it doesn't work the way I described here.

One solution is to have some kind of a demultiplexer application, i.e. it binds to a given port, receives packets and then sends them to appropriate application. This will work, but it wasn't appropriate for my situation. So, the solution is that one application binds to the given port, and the other uses PF_PACKET socket with appropriate filter so that it also receives UDP packets of interest. I hope that you realize that this works only for UDP, and not for TCP or other connection oriented protocols!

So, what you have to do is:

Open appropriate socket with socket() system call.
Bind to interface using bind() system call.
Attach filter to socket using setsockopt().
Receive packets.

If you want an example of how this is done, take a look into busybox, more specifically its udhcpc client.

Now, there are two problems with this approach that you need to be aware of. The first is that if you try to send via this socket you are avoiding routing code in the kernel! In other words, it might happen that you try to send packets to wrong directions. How this can be solved, and if it really needs solution, depends on the specific scenario you are trying to achieve.

The second problem is that if there is no application listening on a given port, the kernel will sent ICMP port unreachable error messages on each received UDP message. I found a lot of questions on the Internet about this issue, but without any real answer. So, I took a look at where this error message is generated, and if there is anything that might prevent this from happening.

UDP packets are received in function __udp4_lib_rcv() that, in case there is no application listening on a given port, sends ICMP destination port unreachable message. As it turns out, the only case when this message will not be sent is if the destination is multicast or broadcast address. So, your options are, from the most to the least preferred:

Be certain that you always have application listening on a given port.
Use iptables to block ICMP error messages (be careful not to block too much!).
The application on the other end ignores those error messages.

Thursday, December 27, 2012

UDP Lite...

Many people know for TCP and UDP, at least those that work in the field of networking or are learning computer networks in some course. But, the truth is that there are others too, e.g. SCTP, DCCP, UDP Lite. And all of those are actually implemented in Linux kernel. What I'm going to do is describe each one of those in the following few posts and give examples of their use. In this post, I'm going to talk about UDP Lite. I'll assume that you know UDP and also that you know how to use socket API to write UDP application.

UDP Lite is specified in RFC3828: The Lightweight User Datagram Protocol (UDP-Lite) . The basic motivation for the introduction of a new variant of UDP is that certain applications (primarily multimedia ones) want to receive packets even if they are damaged. The reason is that codecs used can recover and mask errors. UDP itself has a checksum field that covers the whole packet and if there is an error in the packet, it is silently dropped. It should be noted that this checksum is quite weak actually and doesn't catch a lot of errors, but nevertheless it is problematic for such applications. So, UDP lite changes standard UDP behavior in that it allows only part of the packet to be covered with a checksum. And, because it is now different protocol, new protocol ID is assigned to it, i.e. 136.

So, how to use UDP Lite in you applications? Actually, very easy. First, when creating socket you have to specify that you want UDP Lite, and not (default) UDP:

s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);

Next, you need to define what part of the packet will be protected by a checksum. This is achieved with socket options, i.e. setsockopt(2) system call. Here is the function that will set how many octets of the packet has to be protected:

void setoption(int sockfd, int option, int value)
{
if (setsockopt(sockfd, IPPROTO_UDPLITE, option,
(void *)&value, sizeof(value)) == -1) {
perror("setsockopt");
exit(1);
}
}

It receives socket handle (sockfd) created with socket function, option that should be set (option) and the option's value (value). There are two options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV. Option UDPLITE_SEND_CSCOV sets the number of protected octets for outgoing packets, and UDPLITE_RECV_CSCOV sets at least how many octets have to be protected in the inbound packets in order to be passed to the application.

You can also obtain values using the following function:

int getoption(int sockfd, int option)
{
int cov;
socklen_t len = sizeof(int);
if (getsockopt(sockfd, IPPROTO_UDPLITE, option,
(void *)&cov, &len) == -1) {
perror("getsockopt");
exit(1);
}
return cov;
}

This function accepts socket (sockfd) and option it should retrieve (i.e. UDPLITE_SEND_CSCOV or UDPLITE_RECV_CSCOV) and returns the option's value. Note that the two constants, UDPLITE_SEND_CSCOV or UDPLITE_RECV_CSCOV, should be explicitly defined in your source because it is possible that glibc doesn't (yet) define them.

I wrote fully functional client and server applications you can download and test. To compile them you don't need any special options. So that should be easy. Only change you'll probably need is the IP address that clients sends packets to. This is a constant SERVER_IPADDR which contains server's IP address hex encoded. For example, IP address 127.0.0.1 is 0x7f000001.

Finally, I have to say that UDP Lite will probably have problems traversing NATs. For example, I tried it on my ADSL connection and it didn't pass through the NAT. What I did is that I just started client with IP address of one of my servers on the Internet, and on that server I sniffed packets. Nothing came to the server. This will probably be a big problem for the adoption of UDP Lite, but the time will tell...

You can read more about this subject on the Wikipedia page and in the Linux manual page udplite(7).

Tuesday, December 25, 2012

Controlling which congestion control algorithm is used in Linux

Linux kernel has a quite advanced networking stack, and that's also true for congestion control. It is a very advanced implementation who's primary characteristics are modular structure and flexibility. All the specific congestion control algorithms are separated into loadable modules. The following congestion control mechanisms are available in the mainline kernel tree:

Default, system wide, congestion control algorithm is Cubic. You can check that by inspecting the content of the file /proc/sys/net/ipv4/tcp_congestion_control:

$ cat /proc/sys/net/ipv4/tcp_congestion_control
cubic

So, to change system-wide default you only have to write a name of congestion control algorithm to the same file. For example, to change it to reno you would do it this way:

# echo reno > /proc/sys/net/ipv4/tcp_congestion_control
# cat /proc/sys/net/ipv4/tcp_congestion_control
reno

Note that, to change the value, you have to be the root user. As the root you can specify any available congestion algorithm you wish. In the case the algorithm you specified isn't loaded into the kernel, via standard kernel module mechanism, it will be automatically loaded. To see what congestion control algorithms are currently loaded take a look into the content of the file /proc/sys/net/ipv4/tcp_available_congestion_control:

$ cat /proc/sys/net/ipv4/tcp_available_congestion_control
vegas lp reno cubic

It is also possible to change congestion control algorithm on a per-socket basis using setsockopt(2) system call. Here is the essential part of the code to do that:

...
int s, ns, optlen;
char optval[TCP_CA_NAME_MAX];
...
s = socket(AF_INET, SOCK_STREAM, 0);
...
ns = accept(s, ...);
...
strcpy(optval, "reno");
optlen = strlen(optval);
if (setsockopt(ns, IPPROTO_TCP, TCP_CONGESTION, optval, optlen) < 0) {
perror("setsockopt");
return 1;
}

In this fragment we are setting congestion control algorithm to reno. Note that that the constant TCP_CA_NAME_MAX (value 16) isn't defined in system include files so they have to be explicitly defined in your sources.

When you are using this way of defining congestion control algorithm, you should be aware of few things:

You can change congestion control algorithm as an ordinary user.
If you are not root user, then you are only allowed to use congestion control algorithms specified in the file /proc/sys/net/ipv4/tcp_allowed_congestion_control. For all the other you'll receive error message.
No congestion control algorithm is bound to socket until it is in the connected state.

You can also obtain current control congestion algorithm using the following snippet of the code:

optlen = TCP_CA_NAME_MAX;
if (getsockopt(ns, IPPROTO_TCP, TCP_CONGESTION, optval, &optlen) < 0) {
perror("getsockopt");
return 1;
}

Here you can download a code you can compile and run. To compile it just run gcc on it without any special options. This code will start server (it will listen on port 10000). Connect to it using telnet (telnet localhost 10000) in another terminal and the moment you do that you'll see that the example code printed default congestion control algorithm and then it changed it to reno. It will then close connection.

Instead of the conclusion I'll warn you that this congestion control algorithm manipulation isn't portable to other systems and if you use this in your code you are bound to Linux kernel.

Friday, August 17, 2012

How to communicate with OSSEC deamons via Unix socket...

OSSEC daemons, when started, open Unix sockets for a local communication. For example, ossec-execd opens the following socket:

/var/ossec/var/queue/alerts/execq

On which, it waits for commands. If you try to send it message using echo, or in some similar way, you'll receive an error message:

$ echo 1 message > /opt/ossec/var/queue/alerts/execq
bash: /opt/ossec/var/queue/alerts/execq: No such device or address

because, it's not a pipe. But, it is possible to "manually" send it command using socat utility. socat is very capable utility with, equivalently complex syntax. In this case you should run it like follows:

$ socat - UNIX-CLIENT:/opt/ossec/var/queue/alerts/execq

What we are saying in this case is that we want socat to relay messages between stdin (first address, minus) and unix socket in which it is a client (i.e. the socket already has to be opened/created).

Now, whatever you type, will go to ossec-execd. This can be monitored either in ossec's logs, or if we start ossec-execd in debug mode (without forking), in the terminal.

Sunday, February 12, 2012

Who's listening on an interface...

While I was trying to deduce whether arpwatch honors multiple -i options and listens on multiple interfaces I had a problem of detecting on which interface exactly does it listen? To determine that, I started arpwatch in the following way:

arpwatch -i wlan0 -i em1

but that didn't report to me which interface did the command bound to. Using -d option (debug) didn't help either. So, the first attempt was looking into files open by the command. It can be done using lsof command, or by directly looking into aprwatch's proc directory. So, I found out PID (in this particular case that was 23833) of the command (use ps for that) and then I went into directory /proc/PID/fd. In there, I saw the following content:

# cd /proc/23833/fd
# ls -l
total 0
lrwx------. 1 root root 64 Feb 12 12:32 0 -> socket:[27121497]
lrwx------. 1 root root 64 Feb 12 12:32 1 -> socket:[27121498]

This wasn't so useful! Actually, it tells me that arpwatch closed it's stdin and stdout descriptors and opened only appropriate sockets to get arp frames. lsof command also didn't show anything direct:

# lsof -p 23833 -n
COMMAND    PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
arpwatch 23833 root cwd    DIR              253,2     4096   821770 /var/lib/arpwatch
arpwatch 23833 root rtd    DIR              253,2     4096        2 /
arpwatch 23833 root txt    REG              253,2    34144 2672471 /usr/sbin/arpwatch
arpwatch 23833 root mem    REG              253,2   168512 2228280 /lib64/ld-2.14.90.so
arpwatch 23833 root mem    REG              253,2 2068608 2228301 /lib64/libc-2.14.90.so
arpwatch 23833 root mem    REG              253,2   235944 2675058 /usr/lib64/libpcap.so.1.1.1
arpwatch 23833 root mem    REG                0,6          27121497 socket:[27121497] (stat: No such file or directory)
arpwatch 23833 root    0u pack           27121497      0t0      ALL type=SOCK_RAW
arpwatch 23833 root    1u unix 0xffff8802e3024ac0      0t0 27121498 socket

But it did show that there is a raw socket in use, but not anything more than that. So, the next step was to find out how to list all raw sockets? netstat can list all open and listening sockets, so, looking into man page it turns out that the option --raw or -w is used to show raw sockets, i.e.

# netstat -w
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State

Well, neither this was very useful, but by default netstat doesn't show listening sockets so I repeated command adding -l option:

# netstat -w -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
raw 0 0 *:icmp *:* 7

So, this is definitely not what I'm looking for. This RAW socket listens for ICMP messages, and arpwatch definitely isn't capturing those. In man page it also says that netstat looks for information about raw sockets from /proc/net/raw file, so I looked into its content:

# cat /proc/net/raw
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
1: 00000000:0001 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 47105 2 ffff880308e18340 0

Also not useful! There was inode listed (47105) but how to find out information about particular inode? I looked throughout /proc file system but didn't find anything. I also checked lsof manual but wasn't able to find something useful (though I didn't read manual from start to finish, I just search word inode!).

Then, I remembered that there is a ss command that is specific to Linux, and that is used to provide information about sockets! So, I looked into man page and there it says that the option -0 (or -f link) is used to show PACKET sockets, so I tried:

# ss -f link
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port

Again nothing, but it quickly occured to me that it doesn't show listening sockets by default, so I tried with -l (and -n to avoid any resolving):

# ss -f link -ln
Netid State      Recv-Q Send-Q     Local Address:Port       Peer Address:Port
p_raw UNCONN     0      0                      *:em1                    *
p_dgr UNCONN     0      0                [34958]:wlan0                  *

Woohoo, I was on something, finally! I see raw socket bound to em1 interface (note that I started arpwach with the intention that it listens on wlan0 and em1 interfaces!) Only, I still don't know who is exactly using it. I only see that the other socket is datagram type, meaning network layer, and probably not used by arpwatch. man page for ss again helped, it says to use -p option to find out which process owns a socket, so I tried:

# ss -f link -lnp
Netid State      Recv-Q Send-Q     Local Address:Port       Peer Address:Port
p_raw UNCONN     0      0                      *:em1                    *      users:(("arpwatch",23833,0))
p_dgr UNCONN     0      0                [34958]:wlan0                  *      users:(("wpa_supplicant",1425,9))

Wow!! That was it! I found out that arwatch is listening only to a single interface, and later I confirmed it by looking into the source! I also saw that the other socket is used by wpa_supplicant, i.e. for a wireless network management purposes.

One final thing bothered me. From where does ss take this information? But it's easy to find out that, use strace! :) So, using strace I found out that ss is using /proc/net/packet file:

# cat /proc/net/packet
sk               RefCnt Type Proto Iface R Rmem   User   Inode
ffff880308cec000 3      3    0003   2     1 0      0      27121497
ffff880004358800 3      2    888e   7     1 0      0      25292685

Maybe I would get to that earlier if I had looked more closely into available files in /proc/net when /proc/net/raw turned out to be wrong file! But it doesn't matter, this search was fun and educative. :)

Everything about nothing