Status: PARTIALLY IMPLEMENTED - A runtime global inline cache (4096 entries in RuntimeCode.java) is implemented using callsite IDs allocated at compile time. The per-call-site static field approach and INVOKEDYNAMIC variant described below remain unimplemented design alternatives.
PerlOnJava compiles Perl code to JVM bytecode using the ASM library. When you write:
$object->method($arg)PerlOnJava currently generates bytecode equivalent to:
// Current implementation (slow path always)
RuntimeCode.call(object, method, __SUB__, args[], context);This calls RuntimeCode.call() which:
- Resolves the method through
InheritanceResolver.findMethodInHierarchy() - Creates a RuntimeArray for @_
- Invokes the MethodHandle
Every single call pays the full method resolution cost (~100ns).
Store the resolved method in a static field of the generated class, along with the blessId it was resolved for. On subsequent calls, check if the object has the same blessId - if so, invoke the cached method directly.
For each method call site, PerlOnJava would generate additional static fields and guard logic:
// Generated class (conceptual)
public class org/perlonjava/anon123 {
// NEW: Cache fields for each method call site
private static int cache_site_1_blessId = 0;
private static MethodHandle cache_site_1_method = null;
public RuntimeList apply(RuntimeArray args, int ctx) {
// Original code: $o->add(1)
// NEW: Inline cache check
int objectBlessId = object.blessedId(); // ~3ns
if (objectBlessId != 0 && // Quick null check
objectBlessId == cache_site_1_blessId && // Cache hit check ~2ns
cache_site_1_method != null) {
// FAST PATH: Direct method invocation (~10ns)
RuntimeArray a = new RuntimeArray();
a.elements.add(object); // $self
a.elements.add(arg);
return (RuntimeList) cache_site_1_method.invoke(a, ctx);
} else {
// SLOW PATH: Resolve and cache (~150ns first time, then fast)
RuntimeScalar resolvedMethod =
InheritanceResolver.findMethodInHierarchy("add", ...);
cache_site_1_blessId = objectBlessId;
cache_site_1_method = ((RuntimeCode)resolvedMethod.value).methodHandle;
return RuntimeCode.apply(resolvedMethod, a, ctx);
}
}
}Modify handleArrowOperator() to emit the cache:
// In Dereference.java:528-680
static void handleArrowOperator(EmitterVisitor emitterVisitor, BinaryOperatorNode node) {
MethodVisitor mv = emitterVisitor.ctx.mv;
JavaClassInfo classInfo = emitterVisitor.ctx.javaClassInfo;
// NEW: Allocate cache fields for this call site
int cacheId = classInfo.nextCacheId++;
String blessIdField = "cache_" + cacheId + "_blessId";
String methodField = "cache_" + cacheId + "_method";
// Add static fields to the class (done during class creation)
classInfo.addCacheFields(blessIdField, methodField);
// Generate object and method name evaluation (existing code)
object.accept(scalarVisitor);
method.accept(scalarVisitor);
// NEW: Emit inline cache guard
// Stack: [object, method, ...]
// Get object.blessedId()
mv.visitInsn(Opcodes.DUP); // Duplicate object
mv.visitMethodInsn(Opcodes.INVOKESTATIC,
"org/perlonjava/runtimetypes/RuntimeScalar",
"blessedId",
"(Lorg/perlonjava/runtimetypes/RuntimeScalar;)I",
false);
// Stack: [object, method, ..., blessId]
// Load cached blessId
mv.visitFieldInsn(Opcodes.GETSTATIC,
classInfo.javaClassName,
blessIdField,
"I");
// Stack: [object, method, ..., blessId, cachedBlessId]
// Compare: if (blessId == cachedBlessId && blessId != 0)
Label slowPath = new Label();
Label fastPath = new Label();
mv.visitJumpInsn(Opcodes.IF_ICMPNE, slowPath); // Mismatch -> slow path
// Load cached method
mv.visitFieldInsn(Opcodes.GETSTATIC,
classInfo.javaClassName,
methodField,
"Ljava/lang/invoke/MethodHandle;");
mv.visitInsn(Opcodes.DUP);
Label methodIsNull = new Label();
mv.visitJumpInsn(Opcodes.IFNULL, methodIsNull); // Not cached -> slow path
// FAST PATH: Direct invocation
mv.visitLabel(fastPath);
// ... emit direct MethodHandle.invoke(args, ctx) ...
Label end = new Label();
mv.visitJumpInsn(Opcodes.GOTO, end);
// SLOW PATH: Full resolution
mv.visitLabel(slowPath);
mv.visitLabel(methodIsNull);
// ... emit RuntimeCode.call() ...
// ... update cache fields ...
mv.visitLabel(end);
}PerlOnJava generates one class per script/eval. Static fields are allocated once and accessed in ~2ns via GETSTATIC.
RuntimeCode already stores MethodHandle methodHandle for each compiled subroutine. We just cache a reference to it.
NameNormalizer.getBlessId() assigns each blessed class a unique integer ID that never changes. Perfect for caching.
PerlOnJava uses ASM for bytecode generation. Adding fields and conditional jumps is straightforward:
ClassWriter.visitField()- adds static fieldsMethodVisitor.visitJumpInsn()- emits guardsMethodVisitor.visitFieldInsn()- accesses cache
InheritanceResolver.invalidateCache() is already called when @ISA changes. We can extend it to clear inline caches:
// In InheritanceResolver.java
public static void invalidateCache() {
methodCache.clear();
linearizedClassesCache.clear();
// NEW: Invalidate inline caches
InlineCacheRegistry.clearAll();
}Total per method call: ~350ns
- Method resolution: 100ns (InheritanceResolver.findMethodInHierarchy)
- Array creation: 50ns
- MethodHandle invoke: 200ns
FIRST CALL (cache miss):
- blessId extraction: 3ns
- Cache check: 2ns (miss)
- Slow path (same as before): 350ns
Total: 355ns
SUBSEQUENT CALLS (cache hit - 99.9% of calls):
- blessId extraction: 3ns
- Cache check: 2ns (hit)
- Direct MethodHandle invoke: 140ns
Total: 145ns
Speedup: 350ns → 145ns = 2.4x faster!
Problem: What if the same call site sees different classes?
Solution: Start with monomorphic cache (1 class). If we see a different blessId, update the cache (replacing the old one). For truly polymorphic sites (rare), the cache will thrash but won't break - it just falls back to slow path.
Future: Implement megamorphic cache (small array of blessId→method pairs).
Problem: What if someone redefines a method at runtime?
Solution: InheritanceResolver.invalidateCache() already handles this. Extend it to zero out all inline cache fields:
public class InlineCacheRegistry {
private static List<InlineCacheInvalidator> invalidators = new ArrayList<>();
public static void register(InlineCacheInvalidator invalidator) {
invalidators.add(invalidator);
}
public static void clearAll() {
for (InlineCacheInvalidator inv : invalidators) {
inv.clear();
}
}
}
// Generated class
public class org/perlonjava/anon123 {
static {
InlineCacheRegistry.register(() -> {
cache_site_1_blessId = 0;
cache_site_1_method = null;
});
}
}Problem: Each call site adds 2 static fields (8 bytes each).
Solution:
- Only cache hot call sites (e.g., inside loops)
- One script might have 100 call sites = 1.6KB overhead
- This is negligible compared to the ~100KB of bytecode per script
Problem: Multiple threads accessing/updating cache?
Solution:
- Reads are always safe (primitive int + volatile MethodHandle)
- Writes might race, but worst case: we cache a different method (still correct!)
- For true thread safety, use
volatilefields:private static volatile int cache_site_1_blessId = 0;
For maximum performance, use Java 7's INVOKEDYNAMIC:
// Bootstrap method
public static CallSite bootstrap(
MethodHandles.Lookup lookup,
String name,
MethodType type,
int blessId,
String methodName
) {
// Resolve method for this blessId
RuntimeScalar method = InheritanceResolver.findMethodInHierarchy(
methodName, NameNormalizer.getBlessStr(blessId), null, 0);
MethodHandle target = ((RuntimeCode)method.value).methodHandle;
// Create a SwitchPoint for invalidation
SwitchPoint switchPoint = new SwitchPoint();
MethodHandle guarded = switchPoint.guardWithTest(
target,
// Fallback to re-resolve if invalidated
MethodHandles.insertArguments(
RELINK_METHOD, 0, lookup, name, type, blessId, methodName)
);
return new MutableCallSite(guarded);
}
// Generated bytecode for $obj->add(1)
INVOKEDYNAMIC add(
Lorg/perlonjava/runtime/RuntimeScalar;
Lorg/perlonjava/runtime/RuntimeScalar;
)Lorg/perlonjava/runtime/RuntimeList; [
org/perlonjava/runtime/InlineCacheBootstrap.bootstrap(
Ljava/lang/invoke/MethodHandles$Lookup;
Ljava/lang/String;
Ljava/lang/invoke/MethodType;
I // blessId
Ljava/lang/String; // method name
)Ljava/lang/invoke/CallSite;,
0, // initial blessId (resolved at runtime)
"add" // method name
]Advantages:
- JVM optimizes INVOKEDYNAMIC heavily
- Built-in guard check optimization
- SwitchPoint enables efficient invalidation
Disadvantages:
- More complex to implement
- Requires Java 7+ (PerlOnJava already requires Java 21)
- Harder to debug
- test/inline_cache_basic.t: Verify cache hits work
- test/inline_cache_invalidation.t: Test @ISA changes
- test/inline_cache_polymorphic.t: Multiple classes at same site
- test/inline_cache_threading.t: Concurrent access
# bench_inline_cache.pl
use Benchmark;
package Foo;
sub new { bless { x => 1 }, shift }
sub add { $_[0]{x} += $_[1] }
my $o = Foo->new();
timethis(10_000_000, sub { $o->add(1) });
# Expected:
# Before: ~350ns/call = 2.86M calls/sec
# After: ~145ns/call = 6.90M calls/sec- Add
JavaClassInfo.nextCacheIdcounter - Add
JavaClassInfo.addCacheFields()method - Create
InlineCacheRegistryfor invalidation
- Modify
Dereference.handleArrowOperator()to emit cache checks - Emit monomorphic inline cache (1 class per site)
- Hook into
InheritanceResolver.invalidateCache()
- Write correctness tests
- Benchmark with
bench_method.pl - Verify no regressions in existing tests
- Implement megamorphic cache (2-4 classes per site)
- Use INVOKEDYNAMIC for even better performance
- Profile-guided cache placement (only cache hot sites)
Conservative estimate (static field cache):
- Monomorphic call sites: 2.4x speedup
bench_method.pl: 119 → 240 iter/sec
Optimistic estimate (INVOKEDYNAMIC):
- Monomorphic call sites: 3.0x speedup
bench_method.pl: 119 → 300 iter/sec
Combined with Phase 2 (hash access optimization), this gets us to the 340 iter/sec target!
Inline caching is highly feasible in PerlOnJava because:
- ✅ Static fields for caching (2ns access)
- ✅ MethodHandle infrastructure already exists
- ✅ blessId provides stable identity
- ✅ ASM library supports bytecode generation
- ✅ Invalidation hooks already in place
The implementation builds on existing infrastructure and requires no API changes - purely a bytecode generation optimization.