About the java direct buffer pool
While looking into one of our metrics dashboard, 2 of the graphs caught our attention. The titles on them were Direct buffer and Mapped buffer. What do these graphs measure and why should we monitor them?
Right off the bat we can see that they are measuring memory, presumably how much memory was allocated by the JVM for direct buffers and mapped buffers.
After some investigation we found out that a Direct buffer is a type of byte buffer. Java has two types of byte buffers, Direct and Non-Direct, which can be allocated using java.nio.ByteBuffer
. The only difference between them is the memory space where they are allocated. While Non-Direct buffers (implemented by HeapByteBuffer) are created in the JVM heap, Direct buffers are created directly in native memory, being essentially a chunk of native memory shared between the OS and the JVM.
Why do we need buffers in native memory?
This is the only way the JVM can communicate with the outside, the OS. How so? In order to read or write, the OS needs to execute instructions on memory areas which are contiguous sequences of bytes and the normal java byte[]
that is allocated in the heap does not offer this guarantee. Even if it did, the JVM does not guarantee that the memory allocated for the heap itself is contiguous.
2.5.3. Heap
The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.
[…] The memory for the heap does not need to be contiguous. […] - JVM specification
If we used this byte array for I/O we would need to copy it from the heap to native memory before every I/O operation, which does not seem very efficient. This is the problem Direct buffers came to solve, working as an interface between Java and the OS I/O subsystems, allowing the OS to write data as it receives it from a socket or disk, and Java to read it directly.
What about the Mapped buffer?
Mapped buffer is just another type of Direct buffer which represent a memory-mapped region of a file, ie. a section of a file that was loaded to a particular location in native memory, so we can have much faster subsequent reads. Instead of loading the file data from disk every time it is already available in memory.
Why don’t we use this everywhere?
Allocation and deallocation of this kind of buffers is much more expensive than non-direct buffers, so its usage is recommended for large long-lived buffers that need to be accessed by the OS, where they can bring substantial gains in performance. Another option to avoid constant allocation and deallocation is to allocate a big portion of memory and use smaller slices of it that can be reused when no longer needed. This approach can pose problems as the initial byte buffer can become fragmented and cannot be compacted.
This buffers are allocated outside of Java heap space, meaning they reside outside of normal garbage-collected memory. However, the JVM still takes care of it. Each time we allocate a direct buffer, the JVM creates a java.nio.DirectBuffer instance to represent it in the heap (a phantom reference). As the life cycle of this object is managed by the JVM it can be collected by the GC thread when there is no more references to it and the associated native memory can be released.
The issue is that it is hard to predict when a Direct Buffer will be collected and return the underlying native memory to the OS. Direct buffer objects clean up their native buffers automatically but they do it as part of Java heap GC, thus they do not automatically respond to pressure on the native heap. We can explicitly request it by calling ((DirectBuffer)buffer).cleaner().clean();
, although it is not recommended because it causes performance degradation.