PerlOnJava currently compiles all Perl code to JVM bytecode via ASM, which has significant overhead for two use cases:
- Small eval strings - Compilation overhead (~50ms) dominates execution time for one-liners
- Very large code executed once - Compilation time and memory for class metadata is wasted
The current compilation pipeline:
Lexer → Parser → AST → EmitterVisitor → ASM bytecode → ClassLoader → MethodHandle
Fixed costs include ClassWriter initialization, ASM frame computation (expensive), and class loading. This makes compilation inefficient for small/one-time code despite an LRU eval cache (100 entries).
Solution: Add a VERY FAST interpreter that shares the same internal API and runtime, allowing seamless mixing of compiled and interpreted code.
DECISION: Switch-based dispatch selected
Empirical benchmarking showed switch-based dispatch is 2.25-4.63x faster than function-array dispatch:
- Pure dispatch: 0.13 ns/opcode vs 0.58 ns/opcode (4.63x faster)
- Realistic workload: 599M ops/sec vs 266M ops/sec (2.25x faster)
- Memory: 100.7 bytes/exec vs 151.0 bytes/exec (50% less allocation)
See dev/design/interpreter_benchmarks.md for detailed results.
CRITICAL: The compiler uses register architecture (not stack-based) because Perl's complex control flow (GOTO, last/next/redo, die/eval) would corrupt a stack. The interpreter MUST use the same architecture for correctness.
Analysis of ./jperl --disassemble output reveals the compiler:
- Uses register architecture - JVM local variables as registers (no expression stack)
- Uses RuntimeBase objects everywhere - no primitive unboxing in operations
- Calls static operator methods -
MathOperators.add(RuntimeScalar, RuntimeScalar) - Shares global variables - via
GlobalVariablemaps - Uses RuntimeScalarCache - for constant values
- Has specialized unboxed operations - rare optimizations like
add(RuntimeScalar, int)
The interpreter must do EXACTLY the same:
// Pure register architecture (like compiler's JVM locals)
RuntimeBase[] registers = new RuntimeBase[code.maxRegisters];
case Opcodes.ADD_SCALAR:
// 3-address code: rd = rs1 + rs2
int rd = bytecode[pc++] & 0xFF; // destination register
int rs1 = bytecode[pc++] & 0xFF; // source register 1
int rs2 = bytecode[pc++] & 0xFF; // source register 2
// Call the SAME method the compiler uses
registers[rd] = MathOperators.add(
(RuntimeScalar) registers[rs1],
(RuntimeScalar) registers[rs2]
);
break;
case Opcodes.ADD_SCALAR_INT:
// Specialized unboxed operation (rare optimization)
// 2-address code: rd = rs + immediate_int
int rd = bytecode[pc++] & 0xFF;
int rs = bytecode[pc++] & 0xFF;
int immediate = readInt(bytecode, pc);
pc += 4;
// Call specialized unboxed method
registers[rd] = MathOperators.add(
(RuntimeScalar) registers[rs],
immediate // primitive int, not RuntimeScalar
);
break;
case Opcodes.LOAD_GLOBAL_SCALAR:
int destReg = bytecode[pc++] & 0xFF;
int nameIdx = bytecode[pc++] & 0xFF;
String name = code.stringPool[nameIdx];
registers[destReg] = GlobalVariable.getGlobalScalar(name); // Shared with compiled code
break;
case Opcodes.ARRAY_GET:
int resultReg = bytecode[pc++] & 0xFF;
int arrayReg = bytecode[pc++] & 0xFF;
int indexReg = bytecode[pc++] & 0xFF;
RuntimeArray array = (RuntimeArray) registers[arrayReg];
RuntimeScalar index = (RuntimeScalar) registers[indexReg];
registers[resultReg] = array.get(index.getInt()); // Use existing RuntimeArray API
break;
case Opcodes.GOTO:
// Jump doesn't corrupt register state (unlike stack-based)!
int offset = readInt(bytecode, pc);
pc = offset; // All registers stay valid
break;- Pure Register Architecture - NO expression stack (matches compiler for control flow correctness)
- No Primitive Unboxing - Everything stays as RuntimeBase objects (except rare specialized ops)
- Reuse Operator Implementations - Call
org.perlonjava.operators.*methods directly - Share Global State - Same
GlobalVariablemaps for$::x,@::arr,%::hash - RuntimeCode Compatibility - Interpreted code looks exactly like compiled code to callers
- Specialized Unboxed Ops - Support rare optimizations like
add(RuntimeScalar, int)when beneficial
- Custom bytecode format (NOT JVM bytecode) - optimized for Perl semantics
- Switch-based dispatch - JVM JIT optimizes to tableswitch (O(1) jump table)
- Pure register machine - NO expression stack (matches compiler for control flow correctness)
- 3-address code -
rd = rs1 op rs2(explicit register operands) - Zero compilation overhead - fast AST-to-bytecode translation (~microseconds)
- Runtime object registers - RuntimeScalar/Array/Hash/List (no primitives, except rare specialized ops)
Why register architecture: Perl's complex control flow (GOTO, last/next/redo, die/eval) requires registers that persist across jumps. Stack-based architecture would corrupt state on non-local jumps.
Expected performance: 2-5x slower than compiled code in steady-state, but 10-50x faster for small eval strings due to eliminated compilation overhead.
New package under existing src tree:
src/main/java/org/perlonjava/interpreter/
├── InterpretedCode.java # Bytecode container (extends RuntimeCode)
├── BytecodeInterpreter.java # Execution engine (switch-based dispatch)
├── BytecodeCompiler.java # AST → bytecode translator (implements Visitor)
└── Opcodes.java # Opcode definitions (byte constants)
CRITICAL: Subclass of RuntimeCode for perfect compatibility
/**
* Interpreted bytecode that IS-A RuntimeCode (subclass).
* Can be stored in global variables, passed as code refs, captured in closures.
* COMPLETELY INDISTINGUISHABLE from compiled RuntimeCode to the system.
*/
public class InterpretedCode extends RuntimeCode {
// Bytecode and metadata
byte[] bytecode; // Instruction opcodes (compact)
Object[] constants; // Constant pool (RuntimeBase objects)
String[] stringPool; // String constants (variable names, etc.)
int maxRegisters; // Number of registers needed
RuntimeBase[] capturedVars; // Closure support (captured from outer scope)
// Constructor
public InterpretedCode(byte[] bytecode, Object[] constants, String[] stringPool,
int maxRegisters, RuntimeBase[] capturedVars) {
this.bytecode = bytecode;
this.constants = constants;
this.stringPool = stringPool;
this.maxRegisters = maxRegisters;
this.capturedVars = capturedVars;
}
/**
* Override RuntimeCode.apply() to dispatch to interpreter.
* This is the ONLY difference from compiled code - execution engine.
* API signature is IDENTICAL.
*/
@Override
public RuntimeList apply(RuntimeArray args, int contextType) {
// Dispatch to interpreter (not compiled bytecode)
return BytecodeInterpreter.execute(this, args, contextType);
}
/**
* Override RuntimeCode.call() for method call support.
* API signature is IDENTICAL to compiled code.
*/
@Override
public RuntimeList call(RuntimeScalar invocant, String method, RuntimeBase[] args, int context) {
// Dispatch to interpreter with method call context
return BytecodeInterpreter.executeMethod(this, invocant, method, args, context);
}
/**
* Closures: Capture variables from outer scope.
* Works EXACTLY like compiled closures.
*/
public void captureVariables(RuntimeBase[] vars) {
this.capturedVars = vars;
}
}Key insight: InterpretedCode IS-A RuntimeCode (not "implements same interface"). This means:
- Can be assigned to
RuntimeScalarholding code ref - Can be stored in
$::funcglobal - Can be passed to/from compiled code
- Can be captured in closures (both directions)
- Can be used in
@ISA, method dispatch, overload, etc. - No code in PerlOnJava can tell the difference (except profiling/debugging)
Closure example:
# Outer compiled code
my $x = 10;
my $closure = sub { $x + $_[0] }; # Interpreted (small eval) - captures $x
# Inner interpreted code can access $x from outer compiled scope
say $closure->(5); # 15
# Reverse: compiled code captures interpreted closure
my $interpreted = eval 'sub { shift + 1 }'; # InterpretedCode
my $compiled = sub { $interpreted->($_[0]) + 10 }; # Compiled, captures InterpretedCode
say $compiled->(5); # 16Both directions work seamlessly because InterpretedCode extends RuntimeCode.
Register-machine instructions (3-address code format) that mirror compiler operations:
public class Opcodes {
// Control flow
public static final byte NOP = 0;
public static final byte RETURN = 1; // return rd (register)
public static final byte GOTO = 2; // pc = offset (absolute)
public static final byte GOTO_IF_FALSE = 3; // if (!rs) pc = offset
// Register operations
public static final byte MOVE = 10; // rd = rs (register copy)
public static final byte LOAD_CONST = 11; // rd = constants[index]
public static final byte LOAD_INT = 12; // rd = RuntimeScalarCache.getScalarInt(immediate)
public static final byte LOAD_STRING = 13; // rd = new RuntimeScalar(stringPool[index])
// Variable access (uses same GlobalVariable maps as compiler)
public static final byte LOAD_GLOBAL_SCALAR = 20; // rd = GlobalVariable.getGlobalScalar(name)
public static final byte STORE_GLOBAL_SCALAR = 21; // GlobalVariable.getGlobalScalar(name).set(rs)
public static final byte LOAD_GLOBAL_ARRAY = 22; // rd = GlobalVariable.getGlobalArray(name)
public static final byte STORE_GLOBAL_ARRAY = 23;
public static final byte LOAD_GLOBAL_HASH = 24; // rd = GlobalVariable.getGlobalHash(name)
public static final byte STORE_GLOBAL_HASH = 25;
// Operators (call org.perlonjava.operators.* methods) - 3-address format
public static final byte ADD_SCALAR = 30; // rd = MathOperators.add(rs1, rs2)
public static final byte SUB_SCALAR = 31; // rd = MathOperators.subtract(rs1, rs2)
public static final byte MUL_SCALAR = 32; // rd = MathOperators.multiply(rs1, rs2)
public static final byte DIV_SCALAR = 33; // rd = MathOperators.divide(rs1, rs2)
public static final byte CONCAT = 34; // rd = StringOperators.concat(rs1, rs2)
public static final byte COMPARE_NUM = 35; // rd = CompareOperators.compareNum(rs1, rs2)
// Specialized unboxed operations (rare optimizations)
public static final byte ADD_SCALAR_INT = 40; // rd = MathOperators.add(rs, immediate_int)
public static final byte CMP_SCALAR_INT = 41; // rd = CompareOperators.compareNum(rs, immediate_int)
// Array operations (use RuntimeArray API) - 3-address format
public static final byte ARRAY_GET = 50; // rd = array_reg.get(index_reg)
public static final byte ARRAY_SET = 51; // array_reg.set(index_reg, value_reg)
public static final byte ARRAY_PUSH = 52; // array_reg.push(value_reg)
public static final byte ARRAY_SIZE = 53; // rd = new RuntimeScalar(array_reg.size())
// Hash operations (use RuntimeHash API) - 3-address format
public static final byte HASH_GET = 60; // rd = hash_reg.get(key_reg)
public static final byte HASH_SET = 61; // hash_reg.put(key_reg, value_reg)
public static final byte HASH_EXISTS = 62; // rd = hash_reg.exists(key_reg)
public static final byte HASH_DELETE = 63; // rd = hash_reg.delete(key_reg)
// Subroutine calls (RuntimeCode.apply)
public static final byte CALL_SUB = 70; // rd = RuntimeCode.apply(coderef_reg, args_reg, context)
public static final byte CALL_METHOD = 71; // rd = RuntimeCode.call(obj_reg, method, args_reg, context)
public static final byte CALL_BUILTIN = 72; // rd = BuiltinRegistry.call(builtin_id, args_reg, context)
// Context operations
public static final byte LIST_TO_SCALAR = 80; // rd = list_reg.scalar()
public static final byte SCALAR_TO_LIST = 81; // rd = new RuntimeList(scalar_reg)
}Encoding format:
- Most opcodes:
[opcode] [rd] [rs1] [rs2](4 bytes for 3-address ops) - Load immediate:
[opcode] [rd] [imm32](6 bytes for 32-bit immediate) - Jump:
[opcode] [offset32](5 bytes for absolute offset) - Conditional jump:
[opcode] [rs] [offset32](6 bytes)
Pure register-based dispatch calling existing operator methods:
public class BytecodeInterpreter {
public static RuntimeList execute(InterpretedCode code, RuntimeArray args, int contextType) {
// Pure register file (NOT stack-based - matches compiler for control flow correctness)
RuntimeBase[] registers = new RuntimeBase[code.maxRegisters];
// Initialize special registers
registers[0] = code; // $this (for closures)
registers[1] = args; // @_
registers[2] = RuntimeScalarCache.getScalarInt(contextType); // wantarray
// Copy captured variables (closure support)
if (code.capturedVars != null) {
System.arraycopy(code.capturedVars, 0, registers, 3, code.capturedVars.length);
}
int pc = 0;
byte[] bytecode = code.bytecode;
// Main dispatch loop - JVM optimizes to tableswitch
while (pc < bytecode.length) {
byte opcode = bytecode[pc++];
switch (opcode) {
case Opcodes.MOVE:
// Register-to-register copy: rd = rs
int dest = bytecode[pc++] & 0xFF;
int src = bytecode[pc++] & 0xFF;
registers[dest] = registers[src];
break;
case Opcodes.LOAD_INT:
// Load immediate integer: rd = RuntimeScalarCache.getScalarInt(imm)
int destReg = bytecode[pc++] & 0xFF;
int value = readInt(bytecode, pc);
pc += 4;
// Uses SAME cache as compiled code
registers[destReg] = RuntimeScalarCache.getScalarInt(value);
break;
case Opcodes.LOAD_GLOBAL_SCALAR:
// Load global scalar: rd = GlobalVariable.getGlobalScalar(name)
int rd = bytecode[pc++] & 0xFF;
int nameIndex = bytecode[pc++] & 0xFF;
String name = code.stringPool[nameIndex];
// Uses SAME GlobalVariable as compiled code
registers[rd] = GlobalVariable.getGlobalScalar(name);
break;
case Opcodes.STORE_GLOBAL_SCALAR:
// Store global scalar: GlobalVariable.getGlobalScalar(name).set(rs)
int nameIdx = bytecode[pc++] & 0xFF;
int srcReg = bytecode[pc++] & 0xFF;
String varName = code.stringPool[nameIdx];
GlobalVariable.getGlobalScalar(varName).set((RuntimeScalar) registers[srcReg]);
break;
case Opcodes.ADD_SCALAR:
// 3-address addition: rd = rs1 + rs2
int rdAdd = bytecode[pc++] & 0xFF;
int rs1Add = bytecode[pc++] & 0xFF;
int rs2Add = bytecode[pc++] & 0xFF;
// Calls SAME method as compiled code
registers[rdAdd] = MathOperators.add(
(RuntimeScalar) registers[rs1Add],
(RuntimeScalar) registers[rs2Add]
);
break;
case Opcodes.ADD_SCALAR_INT:
// Specialized unboxed operation: rd = rs + immediate_int
int rdAddInt = bytecode[pc++] & 0xFF;
int rsAddInt = bytecode[pc++] & 0xFF;
int immediate = readInt(bytecode, pc);
pc += 4;
// Calls specialized unboxed method (rare optimization)
registers[rdAddInt] = MathOperators.add(
(RuntimeScalar) registers[rsAddInt],
immediate // primitive int, not RuntimeScalar
);
break;
case Opcodes.CONCAT:
// String concatenation: rd = rs1 . rs2
int rdConcat = bytecode[pc++] & 0xFF;
int rs1Concat = bytecode[pc++] & 0xFF;
int rs2Concat = bytecode[pc++] & 0xFF;
registers[rdConcat] = StringOperators.concat(
(RuntimeScalar) registers[rs1Concat],
(RuntimeScalar) registers[rs2Concat]
);
break;
case Opcodes.ARRAY_GET:
// Array element access: rd = array_reg[index_reg]
int rdArray = bytecode[pc++] & 0xFF;
int arrayReg = bytecode[pc++] & 0xFF;
int indexReg = bytecode[pc++] & 0xFF;
RuntimeArray arr = (RuntimeArray) registers[arrayReg];
RuntimeScalar idx = (RuntimeScalar) registers[indexReg];
// Uses RuntimeArray API directly
registers[rdArray] = arr.get(idx.getInt());
break;
case Opcodes.HASH_GET:
// Hash element access: rd = hash_reg{key_reg}
int rdHash = bytecode[pc++] & 0xFF;
int hashReg = bytecode[pc++] & 0xFF;
int keyReg = bytecode[pc++] & 0xFF;
RuntimeHash hash = (RuntimeHash) registers[hashReg];
RuntimeScalar key = (RuntimeScalar) registers[keyReg];
// Uses RuntimeHash API directly
registers[rdHash] = hash.get(key);
break;
case Opcodes.CALL_SUB:
// Subroutine call: rd = coderef_reg->(args_reg)
int rdCall = bytecode[pc++] & 0xFF;
int coderefReg = bytecode[pc++] & 0xFF;
int argsReg = bytecode[pc++] & 0xFF;
int callContext = bytecode[pc++] & 0xFF;
RuntimeScalar codeRef = (RuntimeScalar) registers[coderefReg];
RuntimeArray callArgs = (RuntimeArray) registers[argsReg];
// RuntimeCode.apply works for both compiled AND interpreted code
RuntimeList result = RuntimeCode.apply(codeRef, "", callArgs, callContext);
registers[rdCall] = result;
break;
case Opcodes.GOTO:
// Unconditional jump: pc = offset
// Registers persist across jump (unlike stack-based!)
int offset = readInt(bytecode, pc);
pc = offset;
break;
case Opcodes.GOTO_IF_FALSE:
// Conditional jump: if (!rs) pc = offset
int condReg = bytecode[pc++] & 0xFF;
int target = readInt(bytecode, pc);
pc += 4;
RuntimeScalar cond = (RuntimeScalar) registers[condReg];
if (!cond.getBoolean()) {
pc = target; // Jump - all registers stay valid!
}
break;
case Opcodes.RETURN:
// Return from subroutine: return rd
int retReg = bytecode[pc++] & 0xFF;
RuntimeBase retVal = registers[retReg];
if (retVal instanceof RuntimeList) {
return (RuntimeList) retVal;
} else if (retVal instanceof RuntimeScalar) {
return new RuntimeList((RuntimeScalar) retVal);
}
return new RuntimeList();
// ... more opcodes
}
}
return new RuntimeList();
}
private static int readInt(byte[] bytecode, int pc) {
return ((bytecode[pc] & 0xFF) << 24) |
((bytecode[pc+1] & 0xFF) << 16) |
((bytecode[pc+2] & 0xFF) << 8) |
(bytecode[pc+3] & 0xFF);
}
}Key features:
- Pure register file - no expression stack, no stack pointer management
- 3-address code - all operands explicit in bytecode
- Control flow safe - GOTO/last/next don't corrupt state
- Operator reuse - calls same
org.perlonjava.operators.*methods as compiler - Specialized unboxing - rare optimizations like
add(RuntimeScalar, int)when beneficial
public class BytecodeCompiler implements Visitor {
ByteArrayOutputStream bytecode;
List<RuntimeBase> constants; // RuntimeBase objects
List<String> stringPool; // Variable names
Map<String, Integer> registerMap; // Variable name → register index
int nextRegister = 0; // Next available register
public InterpretedCode compile(Node ast, EmitterContext ctx) {
this.registerMap = buildRegisterMap(ctx.symbolTable);
this.nextRegister = registerMap.size() + 3; // After special regs: $this, @_, wantarray
ast.accept(this); // Traverse AST
// Emit return with register 0 (default return value)
emit(Opcodes.RETURN);
emit(0);
return buildInterpretedCode();
}
@Override
public void visit(BinaryOperatorNode node) {
// Allocate registers for operands and result
int rs1 = allocateTempRegister();
int rs2 = allocateTempRegister();
int rd = allocateTempRegister();
// Compile left operand → rs1
node.left.accept(this);
emit(Opcodes.MOVE);
emit(rs1);
emit(getLastResultRegister());
// Compile right operand → rs2
node.right.accept(this);
emit(Opcodes.MOVE);
emit(rs2);
emit(getLastResultRegister());
// Emit 3-address opcode: rd = rs1 op rs2
switch (node.operator) {
case "+" -> {
// Check if right operand is constant int for optimization
if (node.right instanceof NumberNode && ((NumberNode) node.right).isInteger()) {
// Specialized unboxed operation
emit(Opcodes.ADD_SCALAR_INT);
emit(rd);
emit(rs1);
emitInt(((NumberNode) node.right).intValue());
} else {
// General operation
emit(Opcodes.ADD_SCALAR); // → MathOperators.add(rs1, rs2)
emit(rd);
emit(rs1);
emit(rs2);
}
}
case "." -> {
emit(Opcodes.CONCAT); // → StringOperators.concat(rs1, rs2)
emit(rd);
emit(rs1);
emit(rs2);
}
case "<=>" -> {
emit(Opcodes.COMPARE_NUM); // → CompareOperators.compareNum(rs1, rs2)
emit(rd);
emit(rs1);
emit(rs2);
}
// ... more operators
}
setLastResultRegister(rd);
freeTempRegister(rs1);
freeTempRegister(rs2);
}
@Override
public void visit(IdentifierNode node) {
int rd = allocateTempRegister();
if (registerMap.containsKey(node.name)) {
// Lexical variable - register is already allocated
emit(Opcodes.MOVE);
emit(rd);
emit(registerMap.get(node.name));
} else {
// Global variable - uses same GlobalVariable map as compiled code
emit(Opcodes.LOAD_GLOBAL_SCALAR);
emit(rd);
emit(getStringPoolIndex(node.name));
}
setLastResultRegister(rd);
}
@Override
public void visit(ArrayIndexNode node) {
int arrayReg = allocateTempRegister();
int indexReg = allocateTempRegister();
int rd = allocateTempRegister();
// Compile array expression → arrayReg
node.array.accept(this);
emit(Opcodes.MOVE);
emit(arrayReg);
emit(getLastResultRegister());
// Compile index expression → indexReg
node.index.accept(this);
emit(Opcodes.MOVE);
emit(indexReg);
emit(getLastResultRegister());
// Emit array access: rd = array[index]
emit(Opcodes.ARRAY_GET); // → RuntimeArray.get()
emit(rd);
emit(arrayReg);
emit(indexReg);
setLastResultRegister(rd);
freeTempRegister(arrayReg);
freeTempRegister(indexReg);
}
@Override
public void visit(GotoNode node) {
// Emit GOTO - registers persist across jump (unlike stack-based!)
emit(Opcodes.GOTO);
emitInt(getLabelOffset(node.label));
}
private int allocateTempRegister() {
return nextRegister++;
}
private void freeTempRegister(int reg) {
// Simple allocator - could be optimized with register reuse
}
private void emit(byte opcode) {
bytecode.write(opcode);
}
private void emitInt(int value) {
bytecode.write((value >> 24) & 0xFF);
bytecode.write((value >> 16) & 0xFF);
bytecode.write((value >> 8) & 0xFF);
bytecode.write(value & 0xFF);
}
}Key features:
- 3-address code generation - all operands explicit in bytecode
- Register allocation - maps variables to registers
- Specialized unboxing - detects constant integers and uses optimized opcodes
- Control flow safe - GOTO generates absolute offsets, registers persist
InterpretedCode extends RuntimeCode (or implements same interface), so it's indistinguishable from compiled code:
// User code - doesn't know if $coderef is compiled or interpreted
my $coderef = sub { $_[0] + 1 }; # Could be either!
say $coderef->(41); # Works the same
# Global code refs - shared between compiled and interpreted
$::my_func = sub { ... }; # Interpreted closure
some_compiled_function($::my_func); # Passes interpreted code to compiled code
# Eval creates interpreted code (fast)
my $result = eval '$x + 1'; # InterpretedCode, not compiledAPI Compatibility: The internal API is IDENTICAL:
RuntimeCode.apply(RuntimeArray @_, int wantarray)→RuntimeList- Can be stored in
$scalar,@array,%hash - Can capture variables from outer scope (closures)
- Can be passed to/from compiled code
Interpreter reuses ALL existing runtime (no duplication):
- Global Variables -
GlobalVariable.getGlobalScalar(),getGlobalArray(),getGlobalHash() - Runtime Types -
RuntimeScalar,RuntimeArray,RuntimeHash,RuntimeList,RuntimeCode - Operators -
org.perlonjava.runtime.operators.MathOperators,StringOperators,CompareOperators, etc. - Caches -
RuntimeScalarCache.getScalarInt(),getScalarByteString() - Symbol Tables -
ScopedSymbolTable(for lexical scopes) - Context -
RuntimeContextType(VOID=0, SCALAR=1, LIST=2, RUNTIME=3) - Dynamic Variables -
DynamicVariableManager(forlocaloperator)
No code duplication - compiled and interpreted code share identical runtime semantics.
Add decision logic in compilation pipeline:
boolean shouldInterpret(Node ast, String source) {
// Small eval strings - avoid compilation overhead
if (source.length() < 100) {
return true; // Interpreter wins: 10-50x faster
}
// Very large one-time code - avoid compilation time/memory
int estimatedSize = estimateSize(ast);
if (estimatedSize > 50000 && isOneTimeExecution()) {
return true;
}
// Hot path (loop, repeated eval) - compile for speed
if (isInEvalLoop() || isHotPath()) {
return false;
}
// Default: compile (existing behavior)
return false;
}Result: Switch-based dispatch is 2.25-4.63x faster than function-array dispatch.
-
InterpretedCode extends RuntimeCode (subclass, not interface)
- Implements
apply(RuntimeArray, int)→ calls BytecodeInterpreter.execute() - Implements
call(RuntimeScalar, String, RuntimeBase[], int)→ calls BytecodeInterpreter.executeMethod() - Supports closure variable capture via
capturedVarsarray - Completely indistinguishable from compiled RuntimeCode to all system code
- Implements
-
Opcodes.java - Define 50-100 opcodes in 3-address format
- Variable access (register moves, global scalar/array/hash)
- Operators (call
org.perlonjava.operators.*with 3-address encoding) - Array/Hash operations (use RuntimeArray/RuntimeHash API)
- Control flow (GOTO, GOTO_IF_FALSE, RETURN with absolute offsets)
- Subroutine calls (RuntimeCode.apply - works for compiled AND interpreted)
- Specialized unboxed ops (rare optimizations like
add(RuntimeScalar, int))
-
BytecodeInterpreter.java - Switch-based execution with pure register file
- RuntimeBase[] registers (NOT stack - no sp management)
- 3-address code execution (rd = rs1 op rs2)
- Call existing operator methods from
org.perlonjava.operators.* - Share GlobalVariable maps with compiled code
- Return RuntimeList
- Handle GOTO/control flow correctly (registers persist across jumps)
-
BytecodeCompiler.java - AST → bytecode translator
- Visitor pattern (like EmitterVisitor)
- Generate 3-address code with register allocation
- Build constant pool (RuntimeBase objects)
- Build string pool (variable names)
- Detect constant integers for specialized unboxed operations
- Generate absolute offsets for GOTO (not relative)
-
Unit tests
- Basic operations (arithmetic, string concat)
- Variable access (lexical registers, global variables)
- Arrays and hashes
- Control flow (GOTO, conditionals, loops)
- Mixed compiled/interpreted calls (both directions)
- Closures (both directions: compiled capturing interpreted, interpreted capturing compiled)
- Modify compilation pipeline - Add interpreter decision heuristics
- Integration tests - Compiled code calling interpreted, vice versa
- Closure tests - Interpreted closures captured by compiled code
- Global variable tests - Shared state between compiled and interpreted
- Inline caching for polymorphic operations (if needed)
- Fast paths for common operations (if profiling shows hotspots)
- Reduce allocation where possible (object pooling, reuse)
- Profile and optimize based on real workload data
- Expand opcode set - Cover all Perl operators
- Special cases - regex, tie, overload, etc.
- Comprehensive test suite - Perl 5 compatibility tests
- Edge cases - closures, eval, die/catch, control flow
- Micro-benchmarks - Verify performance targets
- Compare with compilation - Measure speedup for small evals
- Document performance - Update docs with benchmarks
- Final optimizations - Based on profiling data
Total Timeline: 6-9 weeks
| Use Case | Interpreter | Compiler | Result |
|---|---|---|---|
| Small eval (< 50 chars) | 1-5ms | 50ms | 10-50x faster ✓ |
| Large code (one-time) | 10-20ms | 200ms+ | 10-20x faster ✓ |
| Steady-state execution | 200-600M ops/sec | 2M-5M ops/sec | 2-5x slower (acceptable) |
/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/runtime/RuntimeCode.java- InterpretedCode extends this or implements same interface/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/scriptengine/PerlLanguageProvider.java- Add interpreter decision heuristics
/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/operators/*.java- All operator implementations to call/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/astvisitor/EmitterVisitor.java- Visitor pattern for AST traversal/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/runtime/GlobalVariable.java- Global variable access/Users/fglock/projects/PerlOnJava/src/main/java/org/perlonjava/runtime/RuntimeScalarCache.java- Cached constants
All new files under src/main/java/org/perlonjava/interpreter/:
InterpretedCode.java(extends RuntimeCode)BytecodeInterpreter.java(switch-based dispatch)BytecodeCompiler.java(AST → bytecode translator)Opcodes.java(opcode constants)
- Phase 0 complete - Switch-based dispatch proven 2.25-4.63x faster than function-array
- Register architecture confirmed - Required for control flow correctness (GOTO/last/next/redo)
- Small eval strings (< 100 chars) execute 10-50x faster than compilation
- Large one-time code (> 50KB) executes 10-20x faster than compilation
- InterpretedCode extends RuntimeCode (IS-A subclass, perfect compatibility)
- Compiled and interpreted code can call each other seamlessly (both directions)
- Global variables shared between compiled and interpreted code (same GlobalVariable maps)
- Interpreted closures work exactly like compiled closures (capture works both directions)
- All existing unit tests pass with interpreter enabled
- Interpreter uses RuntimeBase objects in registers (no primitives except specialized ops)
- Interpreter calls org.perlonjava.operators.* methods (100% code reuse)
- Specialized unboxed operations used where beneficial (e.g., add(RuntimeScalar, int))
- Control flow (GOTO/last/next/redo) works correctly (registers persist across jumps)
- Memory overhead < 1KB per InterpretedCode instance
This plan: dev/design/interpreter.md
Benchmarking results: dev/design/interpreter_benchmarks.md
Add to README.md:
- Experimental feature - interpreter is new and under active development
- Performance characteristics - when interpreter is used vs compiler
- Opt-out mechanism - environment variable to disable interpreter (for testing)
## Interpreter Mode (Experimental)
PerlOnJava includes an experimental bytecode interpreter for:
- Small eval strings (< 100 characters)
- Very large code executed once (> 50KB)
The interpreter is 10-50x faster than compilation for small code,
but 2-5x slower than compiled code in steady-state execution.
Compiled and interpreted code can be mixed seamlessly:
- Same RuntimeScalar/Array/Hash/List objects
- Shared global variables ($::x, @::arr, %::hash)
- Interpreted closures work like compiled closures
- No API differences
To disable the interpreter (use compiler for all code):
```bash
export PERLONJAVA_DISABLE_INTERPRETER=1- JIT compilation: hot interpreted code promoted to compiled code
- Adaptive optimization: profile-guided opcode selection
- Tiered compilation: interpreter → baseline compiler → optimizing compiler
- Specialized opcodes: regex-specific, hash-specific operations