Inline Caching in PerlOnJava: Technical Implementation Guide

Status: PARTIALLY IMPLEMENTED - A runtime global inline cache (4096 entries in RuntimeCode.java) is implemented using callsite IDs allocated at compile time. The per-call-site static field approach and INVOKEDYNAMIC variant described below remain unimplemented design alternatives.

How It Works

PerlOnJava compiles Perl code to JVM bytecode using the ASM library. When you write:

$object->method($arg)

PerlOnJava currently generates bytecode equivalent to:

// Current implementation (slow path always)
RuntimeCode.call(object, method, __SUB__, args[], context);

This calls RuntimeCode.call() which:

Resolves the method through InheritanceResolver.findMethodInHierarchy()
Creates a RuntimeArray for @_
Invokes the MethodHandle

Every single call pays the full method resolution cost (~100ns).

Inline Caching Solution

Concept

Store the resolved method in a static field of the generated class, along with the blessId it was resolved for. On subsequent calls, check if the object has the same blessId - if so, invoke the cached method directly.

Generated Bytecode

For each method call site, PerlOnJava would generate additional static fields and guard logic:

// Generated class (conceptual)
public class org/perlonjava/anon123 {
    // NEW: Cache fields for each method call site
    private static int cache_site_1_blessId = 0;
    private static MethodHandle cache_site_1_method = null;

    public RuntimeList apply(RuntimeArray args, int ctx) {
        // Original code: $o->add(1)

        // NEW: Inline cache check
        int objectBlessId = object.blessedId();  // ~3ns

        if (objectBlessId != 0 &&                // Quick null check
            objectBlessId == cache_site_1_blessId &&  // Cache hit check ~2ns
            cache_site_1_method != null) {

            // FAST PATH: Direct method invocation (~10ns)
            RuntimeArray a = new RuntimeArray();
            a.elements.add(object);  // $self
            a.elements.add(arg);
            return (RuntimeList) cache_site_1_method.invoke(a, ctx);

        } else {
            // SLOW PATH: Resolve and cache (~150ns first time, then fast)
            RuntimeScalar resolvedMethod =
                InheritanceResolver.findMethodInHierarchy("add", ...);

            cache_site_1_blessId = objectBlessId;
            cache_site_1_method = ((RuntimeCode)resolvedMethod.value).methodHandle;

            return RuntimeCode.apply(resolvedMethod, a, ctx);
        }
    }
}

Implementation in Dereference.java

Modify handleArrowOperator() to emit the cache:

// In Dereference.java:528-680
static void handleArrowOperator(EmitterVisitor emitterVisitor, BinaryOperatorNode node) {
    MethodVisitor mv = emitterVisitor.ctx.mv;
    JavaClassInfo classInfo = emitterVisitor.ctx.javaClassInfo;

    // NEW: Allocate cache fields for this call site
    int cacheId = classInfo.nextCacheId++;
    String blessIdField = "cache_" + cacheId + "_blessId";
    String methodField = "cache_" + cacheId + "_method";

    // Add static fields to the class (done during class creation)
    classInfo.addCacheFields(blessIdField, methodField);

    // Generate object and method name evaluation (existing code)
    object.accept(scalarVisitor);
    method.accept(scalarVisitor);

    // NEW: Emit inline cache guard
    // Stack: [object, method, ...]

    // Get object.blessedId()
    mv.visitInsn(Opcodes.DUP);  // Duplicate object
    mv.visitMethodInsn(Opcodes.INVOKESTATIC,
        "org/perlonjava/runtimetypes/RuntimeScalar",
        "blessedId",
        "(Lorg/perlonjava/runtimetypes/RuntimeScalar;)I",
        false);
    // Stack: [object, method, ..., blessId]

    // Load cached blessId
    mv.visitFieldInsn(Opcodes.GETSTATIC,
        classInfo.javaClassName,
        blessIdField,
        "I");
    // Stack: [object, method, ..., blessId, cachedBlessId]

    // Compare: if (blessId == cachedBlessId && blessId != 0)
    Label slowPath = new Label();
    Label fastPath = new Label();

    mv.visitJumpInsn(Opcodes.IF_ICMPNE, slowPath);  // Mismatch -> slow path

    // Load cached method
    mv.visitFieldInsn(Opcodes.GETSTATIC,
        classInfo.javaClassName,
        methodField,
        "Ljava/lang/invoke/MethodHandle;");

    mv.visitInsn(Opcodes.DUP);
    Label methodIsNull = new Label();
    mv.visitJumpInsn(Opcodes.IFNULL, methodIsNull);  // Not cached -> slow path

    // FAST PATH: Direct invocation
    mv.visitLabel(fastPath);
    // ... emit direct MethodHandle.invoke(args, ctx) ...
    Label end = new Label();
    mv.visitJumpInsn(Opcodes.GOTO, end);

    // SLOW PATH: Full resolution
    mv.visitLabel(slowPath);
    mv.visitLabel(methodIsNull);
    // ... emit RuntimeCode.call() ...
    // ... update cache fields ...

    mv.visitLabel(end);
}

Why This Works in PerlOnJava

1. Static Fields Are Cheap

PerlOnJava generates one class per script/eval. Static fields are allocated once and accessed in ~2ns via GETSTATIC.

2. MethodHandle Infrastructure Exists

RuntimeCode already stores MethodHandle methodHandle for each compiled subroutine. We just cache a reference to it.

3. blessId Is Stable

NameNormalizer.getBlessId() assigns each blessed class a unique integer ID that never changes. Perfect for caching.

4. ASM Library Support

PerlOnJava uses ASM for bytecode generation. Adding fields and conditional jumps is straightforward:

ClassWriter.visitField() - adds static fields
MethodVisitor.visitJumpInsn() - emits guards
MethodVisitor.visitFieldInsn() - accesses cache

5. Cache Invalidation Hook Exists

InheritanceResolver.invalidateCache() is already called when @ISA changes. We can extend it to clear inline caches:

// In InheritanceResolver.java
public static void invalidateCache() {
    methodCache.clear();
    linearizedClassesCache.clear();

    // NEW: Invalidate inline caches
    InlineCacheRegistry.clearAll();
}

Performance Impact

Current Path (119 iter/sec)

Total per method call: ~350ns
- Method resolution: 100ns (InheritanceResolver.findMethodInHierarchy)
- Array creation: 50ns
- MethodHandle invoke: 200ns

With Inline Cache (Expected: 240 iter/sec)

FIRST CALL (cache miss):
- blessId extraction: 3ns
- Cache check: 2ns (miss)
- Slow path (same as before): 350ns
Total: 355ns

SUBSEQUENT CALLS (cache hit - 99.9% of calls):
- blessId extraction: 3ns
- Cache check: 2ns (hit)
- Direct MethodHandle invoke: 140ns
Total: 145ns

Speedup: 350ns → 145ns = 2.4x faster!

Challenges & Solutions

Challenge 1: Polymorphic Call Sites

Problem: What if the same call site sees different classes?

Solution: Start with monomorphic cache (1 class). If we see a different blessId, update the cache (replacing the old one). For truly polymorphic sites (rare), the cache will thrash but won't break - it just falls back to slow path.

Future: Implement megamorphic cache (small array of blessId→method pairs).

Challenge 2: Method Redefinition

Problem: What if someone redefines a method at runtime?

Solution: InheritanceResolver.invalidateCache() already handles this. Extend it to zero out all inline cache fields:

public class InlineCacheRegistry {
    private static List<InlineCacheInvalidator> invalidators = new ArrayList<>();

    public static void register(InlineCacheInvalidator invalidator) {
        invalidators.add(invalidator);
    }

    public static void clearAll() {
        for (InlineCacheInvalidator inv : invalidators) {
            inv.clear();
        }
    }
}

// Generated class
public class org/perlonjava/anon123 {
    static {
        InlineCacheRegistry.register(() -> {
            cache_site_1_blessId = 0;
            cache_site_1_method = null;
        });
    }
}

Challenge 3: Memory Overhead

Problem: Each call site adds 2 static fields (8 bytes each).

Solution:

Only cache hot call sites (e.g., inside loops)
One script might have 100 call sites = 1.6KB overhead
This is negligible compared to the ~100KB of bytecode per script

Challenge 4: Thread Safety

Problem: Multiple threads accessing/updating cache?

Solution:

Reads are always safe (primitive int + volatile MethodHandle)
Writes might race, but worst case: we cache a different method (still correct!)

For true thread safety, use volatile fields:

private static volatile int cache_site_1_blessId = 0;

Alternative: INVOKEDYNAMIC

For maximum performance, use Java 7's INVOKEDYNAMIC:

// Bootstrap method
public static CallSite bootstrap(
    MethodHandles.Lookup lookup,
    String name,
    MethodType type,
    int blessId,
    String methodName
) {
    // Resolve method for this blessId
    RuntimeScalar method = InheritanceResolver.findMethodInHierarchy(
        methodName, NameNormalizer.getBlessStr(blessId), null, 0);

    MethodHandle target = ((RuntimeCode)method.value).methodHandle;

    // Create a SwitchPoint for invalidation
    SwitchPoint switchPoint = new SwitchPoint();
    MethodHandle guarded = switchPoint.guardWithTest(
        target,
        // Fallback to re-resolve if invalidated
        MethodHandles.insertArguments(
            RELINK_METHOD, 0, lookup, name, type, blessId, methodName)
    );

    return new MutableCallSite(guarded);
}

// Generated bytecode for $obj->add(1)
INVOKEDYNAMIC add(
    Lorg/perlonjava/runtime/RuntimeScalar;
    Lorg/perlonjava/runtime/RuntimeScalar;
)Lorg/perlonjava/runtime/RuntimeList; [
    org/perlonjava/runtime/InlineCacheBootstrap.bootstrap(
        Ljava/lang/invoke/MethodHandles$Lookup;
        Ljava/lang/String;
        Ljava/lang/invoke/MethodType;
        I           // blessId
        Ljava/lang/String;  // method name
    )Ljava/lang/invoke/CallSite;,
    0,          // initial blessId (resolved at runtime)
    "add"       // method name
]

Advantages:

JVM optimizes INVOKEDYNAMIC heavily
Built-in guard check optimization
SwitchPoint enables efficient invalidation

Disadvantages:

More complex to implement
Requires Java 7+ (PerlOnJava already requires Java 21)
Harder to debug

Validation

Correctness Tests

test/inline_cache_basic.t: Verify cache hits work
test/inline_cache_invalidation.t: Test @ISA changes
test/inline_cache_polymorphic.t: Multiple classes at same site
test/inline_cache_threading.t: Concurrent access

Performance Measurement

# bench_inline_cache.pl
use Benchmark;

package Foo;
sub new { bless { x => 1 }, shift }
sub add { $_[0]{x} += $_[1] }

my $o = Foo->new();
timethis(10_000_000, sub { $o->add(1) });

# Expected:
# Before: ~350ns/call = 2.86M calls/sec
# After:  ~145ns/call = 6.90M calls/sec

Implementation Roadmap

Phase 1: Infrastructure (Week 1)

Add JavaClassInfo.nextCacheId counter
Add JavaClassInfo.addCacheFields() method
Create InlineCacheRegistry for invalidation

Phase 2: Simple Inline Cache (Week 1-2)

Modify Dereference.handleArrowOperator() to emit cache checks
Emit monomorphic inline cache (1 class per site)
Hook into InheritanceResolver.invalidateCache()

Phase 3: Validation (Week 2)

Write correctness tests
Benchmark with bench_method.pl
Verify no regressions in existing tests

Phase 4: Optimization (Optional)

Implement megamorphic cache (2-4 classes per site)
Use INVOKEDYNAMIC for even better performance
Profile-guided cache placement (only cache hot sites)

Expected Results

Conservative estimate (static field cache):

Monomorphic call sites: 2.4x speedup
bench_method.pl: 119 → 240 iter/sec

Optimistic estimate (INVOKEDYNAMIC):

Monomorphic call sites: 3.0x speedup
bench_method.pl: 119 → 300 iter/sec

Combined with Phase 2 (hash access optimization), this gets us to the 340 iter/sec target!

Conclusion

Inline caching is highly feasible in PerlOnJava because:

✅ Static fields for caching (2ns access)
✅ MethodHandle infrastructure already exists
✅ blessId provides stable identity
✅ ASM library supports bytecode generation
✅ Invalidation hooks already in place

The implementation builds on existing infrastructure and requires no API changes - purely a bytecode generation optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline Caching in PerlOnJava: Technical Implementation Guide

How It Works

Inline Caching Solution

Concept

Generated Bytecode

Implementation in Dereference.java

Why This Works in PerlOnJava

1. Static Fields Are Cheap

2. MethodHandle Infrastructure Exists

3. blessId Is Stable

4. ASM Library Support

5. Cache Invalidation Hook Exists

Performance Impact

Current Path (119 iter/sec)

With Inline Cache (Expected: 240 iter/sec)

Challenges & Solutions

Challenge 1: Polymorphic Call Sites

Challenge 2: Method Redefinition

Challenge 3: Memory Overhead

Challenge 4: Thread Safety

Alternative: INVOKEDYNAMIC

Validation

Correctness Tests

Performance Measurement

Implementation Roadmap

Phase 1: Infrastructure (Week 1)

Phase 2: Simple Inline Cache (Week 1-2)

Phase 3: Validation (Week 2)

Phase 4: Optimization (Optional)

Expected Results

Conclusion

FilesExpand file tree

inline-cache.md

Latest commit

History

inline-cache.md

File metadata and controls

Inline Caching in PerlOnJava: Technical Implementation Guide

How It Works

Inline Caching Solution

Concept

Generated Bytecode

Implementation in Dereference.java

Why This Works in PerlOnJava

1. Static Fields Are Cheap

2. MethodHandle Infrastructure Exists

3. blessId Is Stable

4. ASM Library Support

5. Cache Invalidation Hook Exists

Performance Impact

Current Path (119 iter/sec)

With Inline Cache (Expected: 240 iter/sec)

Challenges & Solutions

Challenge 1: Polymorphic Call Sites

Challenge 2: Method Redefinition

Challenge 3: Memory Overhead

Challenge 4: Thread Safety

Alternative: INVOKEDYNAMIC

Validation

Correctness Tests

Performance Measurement

Implementation Roadmap

Phase 1: Infrastructure (Week 1)

Phase 2: Simple Inline Cache (Week 1-2)

Phase 3: Validation (Week 2)

Phase 4: Optimization (Optional)

Expected Results

Conclusion