Optimizing Java Class Metadata in Project Valhalla
Introduction
While digging into Project Valhalla, I encountered a subtle part of the JVM that turned out to be a prime opportunity for optimization. This exploration led to an approach that shaves unused memory from Java class instance metadata.
In this post, I’ll walk through some of the behind-the-scenes details of one of Project Valhalla’s core features: value classes, and the concept of flattening (or inlining, the term I will use here). Whether you’re a JVM developer or just deeply curious about how Java works under the hood, this technical deep dive is for you.
Note: The features and code snippets in this post reflect the current state of Project Valhalla development (notably JEP 401) and may change as the project continues to take shape. The work behind the optimizations discussed here is detailed in https://github.com/openjdk/valhalla/pull/1966.
Background
InstanceKlass and Inlining
When discussing Java class representations inside the JVM, the term Klass is used rather than “class”.
An InstanceKlass is the JVM’s internal representation of a Java class, containing all the information needed for the class at runtime. With Project Valhalla, one key addition is an array of InlineLayoutInfo, which holds metadata about fields that have been inlined. In simple terms, inlining a field means embedding the contents of one class directly inside another, rather than storing a reference or pointer. This eliminates a level of indirection: instead of looking up an object via a reference, the data lives “inlined” within the containing object itself. Among other effects, this improves cache locality, which can in turn benefit performance.
Each object in the JVM has an associated object header. This header contains metadata such as a pointer to the Klass that defines the object’s type. However, as a result of inlining a field, the object header is omitted for such fields. To preserve access to the Klass for inlined fields, relevant metadata, including the Klass pointer, is stored in an array within the InstanceKlass of the containing class. To clarify: whether a field is inlined is a property of the InstanceKlass that contains the field (the “container”), not of the InstanceKlass being stored (the “payload”). This distinction is important, since both the container and payload can be represented as InstanceKlass objects in the JVM, which can lead to some confusion.
src/hotspot/share/oops/instanceKlass.hpp
class InlineLayoutInfo : public MetaspaceObj {
InlineKlass* _klass;
LayoutKind _kind;
int _null_marker_offset;
// ...
};
class InstanceKlass: public Klass {
// ...
Array<InlineLayoutInfo>* _inline_layout_info_array;
// ...
};
When Fields Can Be Inlined
For a field to be considered for inlining, its type must be declared as a value class using the value keyword in Java (a preview feature in Project Valhalla). This makes the field eligible for inlining by the JVM, but inlining is not guaranteed and depends on several runtime and layout-specific factors. To reiterate, whether a field is inlined is a property of the containing class, not of the field itself. The field’s class only determines whether inlining is possible.
FruitBasket Example
In the short snippet below, FruitBasket contains two fields of type Banana. Banana is defined as a value record (and therefore a value class). Inlining is decided by the JVM, and if the fields b1 and/or b2 are inlined, that’s a property of the FruitBasket class, not Banana itself.
value record Banana(int ripeness) {}
class FruitBasket {
Banana b1 = new Banana(1);
Banana b2 = new Banana(2);
}
Possible layouts:
Not Inlined | Inlined
|
+-------------+ | +-------------+
| FruitBasket | | | FruitBasket |
+-------------+ | +-------------+
| 84a304b840 | --------------> +--------+ | | 1 |
| 83b2428a44 | -->+--------+ | Banana | | | 2 |
+-------------+ | Banana | +--------+ | +-------------+
+--------+ | 1 | |
| 2 | +--------+ |
+--------+ |
- In the not inlined case, FruitBasket has reference fields pointing to separate heap objects of type Banana.
- In the inlined case, the ripeness values are embedded directly inside the FruitBasket object, eliminating references and reducing indirection, object headers, and per-object memory overhead.
The InlineLayoutInfo Array
The _inline_layout_info_array is allocated during classfile parsing, when the InstanceKlass is created, with one entry for each field in the class. All entries start out uninitialized and are only populated for fields that are eligible for inlining. The array uses the same indexing as the class’s field list, which eliminates the need to translate between a field index and an array index. Since this array is hot, we want to avoid anything that could negatively impact performance, such as an extra index conversion.
Problem
Currently, _inline_layout_info_array is allocated and stored for all InstanceKlass instances, regardless of whether it is ever required. This approach leads to unnecessary allocations and wasted memory in many cases.
Let’s examine the possible scenarios:
- No inlineable fields: The InstanceKlass contains no value class fields, so no fields can ever be inlined.
- Inlineable fields, but none inlined: The InstanceKlass has one or more value class fields that could be inlined, but none actually are.
- At least one field is inlined: The InstanceKlass has one or more value class fields, and at least one field is inlined.
Of these, only scenario 3 requires maintaining the _inline_layout_info_array, since it may need to be accessed for future operations related to inlined fields. For scenarios 1 and 2, retaining this array serves no useful purpose and consumes extra memory. Given that each InlineLayoutInfo object is 16 bytes, the memory overhead is 16 * (number of fields) bytes per class, which can add up quickly across applications with thousands of loaded classes.
Solution
This unnecessary memory usage motivates a more efficient strategy for managing _inline_layout_info_array. Instead of allocating the array for every InstanceKlass unconditionally, we can reduce memory overhead by creating and maintaining it only when inlining actually takes place.
On-Demand Allocation
A straightforward way to address scenario 1 is to avoid allocating the array when there are no fields that could ever be inlined. The approach I chose here is to allocate the array on-demand, only at the point where an entry needs to be initialized.
src/hotspot/share/classfile/classFileParser.cpp
void ClassFileParser::set_inline_layout_info_klass(int field_index, InlineKlass* ik, TRAPS) {
// ...
// The array of InlineLayoutInfo is allocated on-demand. This way the array is
// never allocated for an InstanceKlass which has no need for this information.
if (_inline_layout_info_array == nullptr) {
_inline_layout_info_array = MetadataFactory::new_array<InlineLayoutInfo>(_loader_data,
java_fields_count(),
CHECK);
}
// Set the Klass for the field's index
_inline_layout_info_array->adr_at(field_index)->set_klass(ik);
}
Although simple, this approach fully addresses scenario 1, where no fields can ever be inlined. It does not, however, solve scenario 2: classes that contain inlineable fields but where the JVM ultimately chooses not to inline any of them. We’ll look at that case next.
Deallocate When No Fields Are Inlined
In scenario 2, fields are eligible for inlining, but we still need to determine whether any fields were actually inlined. The approach I went with is to inspect the layout selected for each field, and if any field uses a layout other than the default LayoutKind::REFERENCE, that field has been inlined and we should keep the array. Otherwise, there are no inlined fields and we deallocate the array since we know for certain that no inline layout metadata will be needed at runtime.
src/hotspot/share/classfile/fieldLayoutBuilder.cpp
void FieldLayoutBuilder::regular_field_sorting() {
for (FieldInfo fi : *_field_info) {
FieldInfo fieldinfo = *it;
// ..
case T_OBJECT:
case T_ARRAY:
{
LayoutKind lk = field_layout_selection(fieldinfo, _inline_layout_info_array, true);
if (lk == LayoutKind::REFERENCE) {
// ...
} else {
_has_inlined_fields = true;
// ...
}
}
}
}
src/hotspot/share/classfile/classFileParser.cpp
FieldLayoutBuilder lb(/* ... */);
lb.build_layout();
// If it turned out that we didn't inline any of the fields, we deallocate
// the array of InlineLayoutInfo since it isn't needed, and so it isn't
// transferred to the allocated InstanceKlass.
if (_inline_layout_info_array != nullptr && !_layout_info->_has_inlined_fields) {
MetadataFactory::free_array<InlineLayoutInfo>(_loader_data, _inline_layout_info_array);
_inline_layout_info_array = nullptr;
}
Practical Memory Improvements
Each InlineLayoutInfo is 16 bytes, and every InstanceKlass contains an array with one entry per field. For classes that do not require this array, we save 16 * (number of fields) bytes per class. We can observe these savings in practice using Java’s Native Memory Tracking (NMT).
The _inline_layout_info_array is allocated in Metaspace, so we are looking at the Metadata section. In its simplest form when running java --version, we can see savings of about 85 KB. The savings become more noticeable if running a Java program that does not contain many value classes. Here we compare a baseline build (without any optimizations) to a target build (with optimizations).
$ baseline/jdk/bin/java -XX:NativeMemoryTracking=summary \
-XX:+UnlockDiagnosticVMOptions \
-XX:+PrintNMTStatistics \
--enable-preview --version | grep "Metadata:" -A 2
( Metadata: )
( reserved=67108864, committed=7995392)
( used=7953120)
$ target/jdk/bin/java -XX:NativeMemoryTracking=summary \
-XX:+UnlockDiagnosticVMOptions \
-XX:+PrintNMTStatistics \
--enable-preview --version | grep "Metadata:" -A 2
( Metadata: )
( reserved=67108864, committed=7929856)
( used=7865848)
# 7953120 - 7865848 = 87272 bytes = 85.2 KB
To see the impact per class, we can add ad-hoc prints showing the total bytes allocated for InlineLayoutInfo arrays versus the bytes saved by not allocating them. In this example (java --version), we see that the savings are substantial. Actual reductions in real-world applications may vary depending on the number of value classes used.
Bytes allocated for InlineLayoutInfo arrays:
864
Bytes saved by not allocating InlineLayoutInfo arrays:
77008
Summary:
77008 / (77008 + 864) = 98.9% reduction
The memory savings reported by NMT are slightly larger than those from the ad-hoc prints. This difference likely results from Metaspace allocation granularity and alignment requirements, which can cause the JVM to allocate more memory than strictly needed.
Opportunities For Further Optimization
The current optimization prevents unnecessary allocations of _inline_layout_info_array, but there remains room for additional memory savings. Currently, _inline_layout_info_array is sized to hold an entry for every field in a Java class, regardless of how many fields are actually inlined. As a result, in classes with many fields where only a small number are actually inlined, memory is wasted on unused entries.
Ideally, the array would allocate space only for the fields that are actually inlined. The trade-off is that memory savings must be balanced against the need for fast access in performance-critical (“hot”) paths. Whether this approach is feasible depends on how and where the data is accessed in the JVM.
Below are two potential areas for further optimization, which would require careful evaluation to ensure correctness and performance.
Repurposing Unused Padding in ResolvedFieldEntry
One target for improvement is the Interpreter, where field metadata is resolved and stored in ResolvedFieldEntry objects. Each ResolvedFieldEntry already contains information such as the field’s holder class. Currently, looking up the class of an inlined field requires accessing the corresponding index in the container’s _inline_layout_info_array. On 64-bit platforms, ResolvedFieldEntry contains unused padding that could potentially be repurposed to store an index into a sparsely allocated _inline_layout_info_array, reducing the need for repeated lookups.
class ResolvedFieldEntry {
friend class VMStructs;
InstanceKlass* _field_holder; // Field holder klass
int _field_offset; // Field offset in bytes
u2 _field_index; // Index into field information in holder InstanceKlass
u2 _cpool_index; // Constant pool index
u1 _tos_state; // TOS state
u1 _flags; // [000|has_null_marker|is_null_free_inline_type|is_flat|is_final|is_volatile]
u1 _get_code, _put_code; // Get and Put bytecodes for the field
#ifdef _LP64
u4 _padding;
#endif
};
Computing the Inlined Field Index On-Demand
One approach is to compute the index of inlined fields on-demand while keeping performance acceptable. To access the Klass of an inlined field, we reference the container’s _inline_layout_info_array. The index for an inlined field is computed by scanning a compressed array of field metadata in the container Klass, iterating through entries until we find a match with the target field’s signature.
While iterating, we can add a counter that is incremented for each inlined field encountered. When the target field is found, the counter gives its inlined field index, which is used to access a sparsely allocated _inline_layout_info_array containing entries only for the inlined fields. This approach reduces memory waste while maintaining efficient lookups, since array entries exist only for inlined fields present in the class. There may be additional subtleties to consider, but this approach is an interesting avenue for further investigation.
// Iterate over only the Java fields
class JavaFieldStream : public FieldStreamBase {
// ...
// Performs either a linear search or binary search through the stream
// looking for a matching name/signature combo
bool lookup(const Symbol* name, const Symbol* signature);
};
bool InstanceKlass::find_local_field(Symbol* name, Symbol* sig, fieldDescriptor* fd) const {
JavaFieldStream fs(this);
if (fs.lookup(name, sig)) {
assert(fs.name() == name, "name must match");
assert(fs.signature() == sig, "signature must match");
fd->reinitialize(const_cast<InstanceKlass*>(this), fs.to_FieldInfo());
return true;
}
return false;
}
Closing Thoughts
Knowing how far to push optimizations can be tricky. Further optimizations could eventually require sacrificing performance, making the trade-offs harder to quantify. The approach presented here for allocating and deallocating _inline_layout_info_array strikes a practical balance between memory efficiency and implementation complexity.