Everything about nothing: performance

Saturday, August 23, 2014

Memory access latencies

Once, I saw a table in which all the memory latencies are scaled in such a way that CPU cycle is defined to be 1 second, and then L1 cache latency is several seconds, L2 cache even more, and so on up to SCSI commands timeout and system reboot. This was very interesting because I have much better developed sense for seconds and higher time units that for nanoseconds, microseconds, etc. Few days ago I remembered that table and I wanted to see it again, but couldn't find it. This was from some book I couldn't remember the name. So, I started to google for it, and finally, after an hour or so of googling, I managed to find this picture. It turns out that this was from the book Systems performance written by Brendan Gregg. So, I decided to replicate it here for a future reference:

Table 2.2: Example Time Scale of System Latencies
Event	Latency	Scaled
1 CPU Cycle	0.3 ns	1 s
Level 1 cache access	0.9 ns	3 s
Level 2 cache access	2.8 ns	9 s
Level 3 cache access	12.9 ns	43 s
Main memory access (DRAM, from CPU)	120 ns	6 min
Solid-state disk I/O (flash memory)	50 - 150 us	2-6 days
Rotational disk I/O	1-10 ms	1-12 months
Internet: San Francisco to New York	40 ms	4 years
Internet: San Francisco to United Kingdom	81 ms	8 years
Internet: San Francisco to Australia	183 ms	19 years
TCP packet retransmit	1-3 s	105-317 years
OS virtualization system reboot	4 s	423 years
SCSI command timeout	30 s	3 millennia
Hardware (HW) virtualization system reboot	40 s	4 millennia
Physical system reboot	5 min	32 millennia

It's actually impressive how fast CPU is with respect to other components. It is also very good argument for multitasking, i.e. assigning CPU to some other task while waiting for, e.g. disk, or something from the network.

One additional impressive thing is written below the table in the book. Namely, if you multiply CPU cycle with speed of light (c) you can see that the light can travel only 0.5m while CPU does one instruction. That's really impressive. :)

That's it for this post. For the end, while I was searching for this table, I stumbled on some additional interesting links:

Thursday, February 14, 2013

Hotspot JIT output disassembly on Fedora 18

Well, I was very thrilled when I saw that it is possible to output assembly code produced by Hotspot. But, the problem is that this isn't enabled by default, at least not on Fedora 18. It is necessary to compile decompiler plugin before you can try this. To make things worse, this compilation process assumes that you don't have binutils already installed so it tries to compile that too. In the end, I managed to get that working and here is how.

First, you need to download OpenJDK's source. Note that there is source in Fedora's binary repository but this is only the source of Jaba API packages. So, you have to download the real source, either from java.net or approriate SRPM. In both cases be careful to download source that matches OpenJDK you have installed on your machine.

Next, unpack the source and go to the directory openjdk/hotspot/src/share/tools/hsdis. Now, open hsdis.c file and replace the following line:

#include <sysdep.h>

with the following lines:

#include <string.h>
#include <errno.h>

Now, compile the source using the following command:

gcc -o hsdis-amd64.so -DLIBARCH_amd64 -DLIBARCH="amd64" \
-DLIB_EXT=".so" -m64 -fPIC -O hsdis.c -shared \
-ldl -lopcodes

The compilation will fail unless you have binutils-devel package installed. So, take care about that. In case the compilation was successful you'll have hsdis-amd64.so file. It's a dynamic library. Note that I'm using 64 bit AMD/Intel architecture. If you are using 32 bit version replace amd64 with i386 and -m64 with -m32. In case of some other architecture you'll have to find out yourself what's the name.

Now, you'll need some Java class that you'll run and that will produce assembly output. The main point you should have in mind is that the code has to be such to provoke JIT to be started. Otherwise, you'll don't get any assembly output. I used the following simple class file:

import java.math.BigInteger;

class Multiply
{
    public static void main(String[] args)
    {
        BigInteger a = BigInteger.ONE;

        for (int i = 0; i < 10000; i++)
            a = a.multiply(BigInteger.valueOf(2));
        System.out.println(a);
    }
}

After compiling it, run it using the following command:

LD_LIBRARY_PATH=. java -XX:+UnlockDiagnosticVMOptions \
-XX:+PrintAssembly -XX:PrintAssemblyOptions=intel \
Multiply

Note that I'm using LD_LIBRARY_PATH to tell JIT where disassembler (hsdis) is. In my case everything is in the current directory. Note that in the previous command I specified that I want Intel assembly syntax. The default one is AT&T.

Everything about nothing

Saturday, August 23, 2014

Memory access latencies

Thursday, February 14, 2013

Hotspot JIT output disassembly on Fedora 18

About Me

Blog Archive