Thursday, February 14, 2013

Hotspot JIT output disassembly on Fedora 18

Well, I was very thrilled when I saw that it is possible to output assembly code produced by Hotspot. But, the problem is that this isn't enabled by default, at least not on Fedora 18. It is necessary to compile decompiler plugin before you can try this. To make things worse, this compilation process assumes that you don't have binutils already installed so it tries to compile that too. In the end, I managed to get that working and here is how.

First, you need to download OpenJDK's source. Note that there is source in Fedora's binary repository but this is only the source of Jaba API packages. So, you have to download the real source, either from or approriate SRPM. In both cases be careful to download source that matches OpenJDK you have installed on your machine.

Next, unpack the source and go to the directory openjdk/hotspot/src/share/tools/hsdis. Now, open hsdis.c file and replace the following line:
#include <sysdep.h>
with the following lines:
#include <string.h>
#include <errno.h>
Now, compile the source using the following command:
gcc -o -DLIBARCH_amd64 -DLIBARCH="amd64" \
       -DLIB_EXT=".so" -m64 -fPIC -O hsdis.c -shared \
       -ldl -lopcodes
The compilation will fail unless you have binutils-devel package installed. So, take care about that. In case the compilation was successful you'll have file. It's a dynamic library. Note that I'm using 64 bit AMD/Intel architecture. If you are using 32 bit version replace amd64 with i386 and -m64 with -m32. In case of some other architecture you'll have to find out yourself what's the name.

Now, you'll need some Java class that you'll run and that will produce assembly output. The main point you should have in mind is that the code has to be such to provoke JIT to be started. Otherwise, you'll don't get any assembly output.  I used the following simple class file:
import java.math.BigInteger;

class Multiply
    public static void main(String[] args)
        BigInteger a = BigInteger.ONE;

        for (int i = 0; i < 10000; i++)
            a = a.multiply(BigInteger.valueOf(2));
After compiling it, run it using the following command:
LD_LIBRARY_PATH=. java -XX:+UnlockDiagnosticVMOptions \
    -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel \
Note that I'm using LD_LIBRARY_PATH to tell JIT where disassembler (hsdis) is. In my case everything is in the current directory. Note that in the previous command I specified that I want Intel assembly syntax. The default one is AT&T.

Tuesday, February 5, 2013

Fun when mail server receives SERVFAIL instead of NXDOMAIN...

Ok, I got log files overflowed with error messages like this one:
Feb 5 11:01:35 mail named[994]: error (host unreachable) resolving '':
In essence, name server for this domain ( is unreachable from the DNS server used by mail server. Trying to manually query the server, I get:
$ host -t ns
Host not found: 2(SERVFAIL)
Note the status, it's SERVFAIL. The result is that mail server thinks it is a temporary error and retries later, with the same results. Trying this on another host (that uses another DNS server) I get:
$ host -t ns
Host not found: 3(NXDOMAIN)
Well, this time it tells me that there is no such domain. An error message like this would tell mail server to give up and return error response.

So, why is there discrepancy between the two? Using tcpdump in the first case (i.e. when we get SERVFAIL) the following requests/responses are exchanged (slightly edited for readability):
192.168.x.y.51892 > 39504 [1au] NS? (42) > 192.168.x.y.51892: 39504 4/0/5 NS, NS, NS, NS (194)
192.168.x.y.10749 > 40104 [1au] NS? (42)
192.168.x.y.63081 > 56636 [1au] NS? (42) > 192.168.x.y.63081: 56636 4/0/5 NS, NS, NS, NS (194)
192.168.x.y.31948 > 27220 [1au] NS? (42)
So, let me interpret this trace. The first query is to IP address and it asks for the name server of a domain Doing reverse DNS query, we get:
# host domain name pointer
So, it's some hosting provider. Now, what we get in response is that name servers for that domain are through Ok, our DNS server choose with IP address Then, it queried it for domain. This time the query timed out. So, our DNS server decided to query again for name servers. It again received the same list and then it again queried ns4 which didn't answer the query.

Let us manually try some other server. So, querying we get:
# host -t ns
Using domain server:
Aliases: name server name server name server name server
This is the same response we saw in the first part of the trace. Trying ns2:
$ host -t ns
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
Well, ns2 doesn't respond. Neither do ns3, nor as was always obvious, ns4. What does the trace looks like (again, slightly edited):
12:20:22.532877 IP 192.168.x.y.34364 > 31539+ NS? (31)
12:20:27.532864 IP 192.168.x.y.34364 > 31539+ NS? (31)
12:20:32.533445 IP 192.168.x.y.45322 > 5827+ NS? (31)
12:20:52.733389 IP > 192.168.x.y.34364: 31539 ServFail 0/0/0 (31)
12:20:52.733410 IP 192.168.x.y > ICMP 192.168.x.y udp port 34364 unreachable, length 67
12:20:52.734042 IP > 192.168.x.y.45322: 5827 ServFail 0/0/0 (31)
12:20:52.734053 IP 192.168.x.y > ICMP 192.168.x.y udp port 45322 unreachable, length 67
Now, we have interesting situation here. First, DNS server for takes significant time to answer, and when it answers our local DNS isn't listening any more (thus those ICMP error messages). But, in the end local DNS concludes correctly that something's wrong with the name servers for that domain.

The final piece of puzzle comes from querying com name server for domain:
$ host -t ns
Using domain server:
Aliases: has no NS record
Obviously, this domain was removed. It is clear now that this domain existed for some time, and DNS server that produces error cached IP address of its domain name. If it were to query com domain name server again, it would receive NXDOMAIN error and properly notify mail server.

To see currently cached entries of BIND name server use the following command:
rndc dumpdb
Then look for file cache_dump.db in /var/named/data (or in /var/named/chroot/var/named/data if you are running BIND in chroot). It is a textual file that you can inspect with text editor, less or something similar. In my case there were the following lines there:
; glue          172669  NS
                        172669  NS
                        172669  NS
                        172669  NS

To flush a single entry use the following command
rndc flushname internal
This removes from caches in internal view (I configured viewes so that server behaves differently depending on who asks it). Yet, this didn't help. Then I tried to flush everything in internal view using:
rndc flush internal
But this, while helped, didn't actually solve the problem. Namely, looking into packet trace it turns out that  BIND server receives from that given domain doesn't exist and from somewhere it pulls the old IP address!

So, I finally decided to look into log and there are a lot of the following messages:
error (FORMERR) resolving '':
along with the one that triggered all this. Ok, I also tried to upgrade bind, but no luck, still SERVFAIL errors.

Now, its time for heavy artillery, or Wireshark. So I saved packet trace and loaded it into Wireshark. Guess what! Wireshark crashed on requests sent by local DNS server!?

Ok, after a bit more fiddling I realised that some com domain name servers do know for this domain. But note the difference between the output of host command (that I've used previously) and nslookup command:
$ nslookup -type=ns

Non-authoritative answer:
*** Can't find No answer

Authoritative answers can be found from: nameserver = nameserver = nameserver = nameserver = internet address = internet address = internet address = internet address =
Ok, if this com server tells me that there is no this domain, why is then it pointing me to those nameservers? dig command is a bit more informative:
$ dig @

; <<>> DiG 9.9.2-P1-RedHat-9.9.2-6.P1.fc18 <<>> @
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64382
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 5
;; WARNING: recursion requested but not available

; EDNS: version: 0, flags:; udp: 4096
; IN A

;; AUTHORITY SECTION: 172800 IN NS 172800 IN NS 172800 IN NS 172800 IN NS

;; ADDITIONAL SECTION: 172800 IN A 172800 IN A 172800 IN A 172800 IN A
;; Query time: 149 msec
;; WHEN: Tue Feb  5 13:49:22 2013
;; MSG SIZE  rcvd: 194
What a mess?!

Then, I googled a bit to find why BIND is returning SERVFAIL instead of NXDOMAIN. This is something interesting that I found:
  • BIND could have returned SERVFAIL instead of NXDOMAIN responses for nonexistent resource records from the unsigned child zone if the parent zone was signed. (BZ#643012)
Trying to lookup that bug in RedHat's Bugzilla gives be big red square which tells me that I'm not allowed to see it (despite being logged in) so it's some security issue obviously!?

Looking at how different BIND versions behave I get the following results:
  • bind-9.3.6-20.P1.el5_8.6 and bind-9.9.2-6.P1.fc18.x86_64 return NXDOMAIN.
  • bind-9.8.2-0.10.rc1.el6_3.6.x86_64 and bind-9.8.2-0.10.rc1.el6_3.5.x86_64 return SERVFAIL.
Could it be somehow related to DNSSEC?

Ok, let me conclude. The problem is that name servers for domain aren't correctly configured, while the domain itself is registered with com domain servers. This triggers different behavior from BIND. So, there are two possibilities from here:
  1. Persuade somehow BIND to return NXDOMAIN instead of SERVFAIL.
  2. Find what is causing queries for this domain in the first place.
Stay tuned... :)

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)