Skip to content

Latest commit

 

History

History
390 lines (299 loc) · 12.2 KB

File metadata and controls

390 lines (299 loc) · 12.2 KB

Inline Caching in PerlOnJava: Technical Implementation Guide

Status: PARTIALLY IMPLEMENTED - A runtime global inline cache (4096 entries in RuntimeCode.java) is implemented using callsite IDs allocated at compile time. The per-call-site static field approach and INVOKEDYNAMIC variant described below remain unimplemented design alternatives.

How It Works

PerlOnJava compiles Perl code to JVM bytecode using the ASM library. When you write:

$object->method($arg)

PerlOnJava currently generates bytecode equivalent to:

// Current implementation (slow path always)
RuntimeCode.call(object, method, __SUB__, args[], context);

This calls RuntimeCode.call() which:

  1. Resolves the method through InheritanceResolver.findMethodInHierarchy()
  2. Creates a RuntimeArray for @_
  3. Invokes the MethodHandle

Every single call pays the full method resolution cost (~100ns).

Inline Caching Solution

Concept

Store the resolved method in a static field of the generated class, along with the blessId it was resolved for. On subsequent calls, check if the object has the same blessId - if so, invoke the cached method directly.

Generated Bytecode

For each method call site, PerlOnJava would generate additional static fields and guard logic:

// Generated class (conceptual)
public class org/perlonjava/anon123 {
    // NEW: Cache fields for each method call site
    private static int cache_site_1_blessId = 0;
    private static MethodHandle cache_site_1_method = null;

    public RuntimeList apply(RuntimeArray args, int ctx) {
        // Original code: $o->add(1)

        // NEW: Inline cache check
        int objectBlessId = object.blessedId();  // ~3ns

        if (objectBlessId != 0 &&                // Quick null check
            objectBlessId == cache_site_1_blessId &&  // Cache hit check ~2ns
            cache_site_1_method != null) {

            // FAST PATH: Direct method invocation (~10ns)
            RuntimeArray a = new RuntimeArray();
            a.elements.add(object);  // $self
            a.elements.add(arg);
            return (RuntimeList) cache_site_1_method.invoke(a, ctx);

        } else {
            // SLOW PATH: Resolve and cache (~150ns first time, then fast)
            RuntimeScalar resolvedMethod =
                InheritanceResolver.findMethodInHierarchy("add", ...);

            cache_site_1_blessId = objectBlessId;
            cache_site_1_method = ((RuntimeCode)resolvedMethod.value).methodHandle;

            return RuntimeCode.apply(resolvedMethod, a, ctx);
        }
    }
}

Implementation in Dereference.java

Modify handleArrowOperator() to emit the cache:

// In Dereference.java:528-680
static void handleArrowOperator(EmitterVisitor emitterVisitor, BinaryOperatorNode node) {
    MethodVisitor mv = emitterVisitor.ctx.mv;
    JavaClassInfo classInfo = emitterVisitor.ctx.javaClassInfo;

    // NEW: Allocate cache fields for this call site
    int cacheId = classInfo.nextCacheId++;
    String blessIdField = "cache_" + cacheId + "_blessId";
    String methodField = "cache_" + cacheId + "_method";

    // Add static fields to the class (done during class creation)
    classInfo.addCacheFields(blessIdField, methodField);

    // Generate object and method name evaluation (existing code)
    object.accept(scalarVisitor);
    method.accept(scalarVisitor);

    // NEW: Emit inline cache guard
    // Stack: [object, method, ...]

    // Get object.blessedId()
    mv.visitInsn(Opcodes.DUP);  // Duplicate object
    mv.visitMethodInsn(Opcodes.INVOKESTATIC,
        "org/perlonjava/runtimetypes/RuntimeScalar",
        "blessedId",
        "(Lorg/perlonjava/runtimetypes/RuntimeScalar;)I",
        false);
    // Stack: [object, method, ..., blessId]

    // Load cached blessId
    mv.visitFieldInsn(Opcodes.GETSTATIC,
        classInfo.javaClassName,
        blessIdField,
        "I");
    // Stack: [object, method, ..., blessId, cachedBlessId]

    // Compare: if (blessId == cachedBlessId && blessId != 0)
    Label slowPath = new Label();
    Label fastPath = new Label();

    mv.visitJumpInsn(Opcodes.IF_ICMPNE, slowPath);  // Mismatch -> slow path

    // Load cached method
    mv.visitFieldInsn(Opcodes.GETSTATIC,
        classInfo.javaClassName,
        methodField,
        "Ljava/lang/invoke/MethodHandle;");

    mv.visitInsn(Opcodes.DUP);
    Label methodIsNull = new Label();
    mv.visitJumpInsn(Opcodes.IFNULL, methodIsNull);  // Not cached -> slow path

    // FAST PATH: Direct invocation
    mv.visitLabel(fastPath);
    // ... emit direct MethodHandle.invoke(args, ctx) ...
    Label end = new Label();
    mv.visitJumpInsn(Opcodes.GOTO, end);

    // SLOW PATH: Full resolution
    mv.visitLabel(slowPath);
    mv.visitLabel(methodIsNull);
    // ... emit RuntimeCode.call() ...
    // ... update cache fields ...

    mv.visitLabel(end);
}

Why This Works in PerlOnJava

1. Static Fields Are Cheap

PerlOnJava generates one class per script/eval. Static fields are allocated once and accessed in ~2ns via GETSTATIC.

2. MethodHandle Infrastructure Exists

RuntimeCode already stores MethodHandle methodHandle for each compiled subroutine. We just cache a reference to it.

3. blessId Is Stable

NameNormalizer.getBlessId() assigns each blessed class a unique integer ID that never changes. Perfect for caching.

4. ASM Library Support

PerlOnJava uses ASM for bytecode generation. Adding fields and conditional jumps is straightforward:

  • ClassWriter.visitField() - adds static fields
  • MethodVisitor.visitJumpInsn() - emits guards
  • MethodVisitor.visitFieldInsn() - accesses cache

5. Cache Invalidation Hook Exists

InheritanceResolver.invalidateCache() is already called when @ISA changes. We can extend it to clear inline caches:

// In InheritanceResolver.java
public static void invalidateCache() {
    methodCache.clear();
    linearizedClassesCache.clear();

    // NEW: Invalidate inline caches
    InlineCacheRegistry.clearAll();
}

Performance Impact

Current Path (119 iter/sec)

Total per method call: ~350ns
- Method resolution: 100ns (InheritanceResolver.findMethodInHierarchy)
- Array creation: 50ns
- MethodHandle invoke: 200ns

With Inline Cache (Expected: 240 iter/sec)

FIRST CALL (cache miss):
- blessId extraction: 3ns
- Cache check: 2ns (miss)
- Slow path (same as before): 350ns
Total: 355ns

SUBSEQUENT CALLS (cache hit - 99.9% of calls):
- blessId extraction: 3ns
- Cache check: 2ns (hit)
- Direct MethodHandle invoke: 140ns
Total: 145ns

Speedup: 350ns → 145ns = 2.4x faster!

Challenges & Solutions

Challenge 1: Polymorphic Call Sites

Problem: What if the same call site sees different classes?

Solution: Start with monomorphic cache (1 class). If we see a different blessId, update the cache (replacing the old one). For truly polymorphic sites (rare), the cache will thrash but won't break - it just falls back to slow path.

Future: Implement megamorphic cache (small array of blessId→method pairs).

Challenge 2: Method Redefinition

Problem: What if someone redefines a method at runtime?

Solution: InheritanceResolver.invalidateCache() already handles this. Extend it to zero out all inline cache fields:

public class InlineCacheRegistry {
    private static List<InlineCacheInvalidator> invalidators = new ArrayList<>();

    public static void register(InlineCacheInvalidator invalidator) {
        invalidators.add(invalidator);
    }

    public static void clearAll() {
        for (InlineCacheInvalidator inv : invalidators) {
            inv.clear();
        }
    }
}

// Generated class
public class org/perlonjava/anon123 {
    static {
        InlineCacheRegistry.register(() -> {
            cache_site_1_blessId = 0;
            cache_site_1_method = null;
        });
    }
}

Challenge 3: Memory Overhead

Problem: Each call site adds 2 static fields (8 bytes each).

Solution:

  • Only cache hot call sites (e.g., inside loops)
  • One script might have 100 call sites = 1.6KB overhead
  • This is negligible compared to the ~100KB of bytecode per script

Challenge 4: Thread Safety

Problem: Multiple threads accessing/updating cache?

Solution:

  • Reads are always safe (primitive int + volatile MethodHandle)
  • Writes might race, but worst case: we cache a different method (still correct!)
  • For true thread safety, use volatile fields:
    private static volatile int cache_site_1_blessId = 0;

Alternative: INVOKEDYNAMIC

For maximum performance, use Java 7's INVOKEDYNAMIC:

// Bootstrap method
public static CallSite bootstrap(
    MethodHandles.Lookup lookup,
    String name,
    MethodType type,
    int blessId,
    String methodName
) {
    // Resolve method for this blessId
    RuntimeScalar method = InheritanceResolver.findMethodInHierarchy(
        methodName, NameNormalizer.getBlessStr(blessId), null, 0);

    MethodHandle target = ((RuntimeCode)method.value).methodHandle;

    // Create a SwitchPoint for invalidation
    SwitchPoint switchPoint = new SwitchPoint();
    MethodHandle guarded = switchPoint.guardWithTest(
        target,
        // Fallback to re-resolve if invalidated
        MethodHandles.insertArguments(
            RELINK_METHOD, 0, lookup, name, type, blessId, methodName)
    );

    return new MutableCallSite(guarded);
}

// Generated bytecode for $obj->add(1)
INVOKEDYNAMIC add(
    Lorg/perlonjava/runtime/RuntimeScalar;
    Lorg/perlonjava/runtime/RuntimeScalar;
)Lorg/perlonjava/runtime/RuntimeList; [
    org/perlonjava/runtime/InlineCacheBootstrap.bootstrap(
        Ljava/lang/invoke/MethodHandles$Lookup;
        Ljava/lang/String;
        Ljava/lang/invoke/MethodType;
        I           // blessId
        Ljava/lang/String;  // method name
    )Ljava/lang/invoke/CallSite;,
    0,          // initial blessId (resolved at runtime)
    "add"       // method name
]

Advantages:

  • JVM optimizes INVOKEDYNAMIC heavily
  • Built-in guard check optimization
  • SwitchPoint enables efficient invalidation

Disadvantages:

  • More complex to implement
  • Requires Java 7+ (PerlOnJava already requires Java 21)
  • Harder to debug

Validation

Correctness Tests

  1. test/inline_cache_basic.t: Verify cache hits work
  2. test/inline_cache_invalidation.t: Test @ISA changes
  3. test/inline_cache_polymorphic.t: Multiple classes at same site
  4. test/inline_cache_threading.t: Concurrent access

Performance Measurement

# bench_inline_cache.pl
use Benchmark;

package Foo;
sub new { bless { x => 1 }, shift }
sub add { $_[0]{x} += $_[1] }

my $o = Foo->new();
timethis(10_000_000, sub { $o->add(1) });

# Expected:
# Before: ~350ns/call = 2.86M calls/sec
# After:  ~145ns/call = 6.90M calls/sec

Implementation Roadmap

Phase 1: Infrastructure (Week 1)

  • Add JavaClassInfo.nextCacheId counter
  • Add JavaClassInfo.addCacheFields() method
  • Create InlineCacheRegistry for invalidation

Phase 2: Simple Inline Cache (Week 1-2)

  • Modify Dereference.handleArrowOperator() to emit cache checks
  • Emit monomorphic inline cache (1 class per site)
  • Hook into InheritanceResolver.invalidateCache()

Phase 3: Validation (Week 2)

  • Write correctness tests
  • Benchmark with bench_method.pl
  • Verify no regressions in existing tests

Phase 4: Optimization (Optional)

  • Implement megamorphic cache (2-4 classes per site)
  • Use INVOKEDYNAMIC for even better performance
  • Profile-guided cache placement (only cache hot sites)

Expected Results

Conservative estimate (static field cache):

  • Monomorphic call sites: 2.4x speedup
  • bench_method.pl: 119 → 240 iter/sec

Optimistic estimate (INVOKEDYNAMIC):

  • Monomorphic call sites: 3.0x speedup
  • bench_method.pl: 119 → 300 iter/sec

Combined with Phase 2 (hash access optimization), this gets us to the 340 iter/sec target!

Conclusion

Inline caching is highly feasible in PerlOnJava because:

  1. ✅ Static fields for caching (2ns access)
  2. ✅ MethodHandle infrastructure already exists
  3. ✅ blessId provides stable identity
  4. ✅ ASM library supports bytecode generation
  5. ✅ Invalidation hooks already in place

The implementation builds on existing infrastructure and requires no API changes - purely a bytecode generation optimization.