PerlOnJava Concurrency: Unified Design

Version: 1.1
Date: 2026-03-26
Status: Design Proposal (Technically Reviewed)
Supersedes: threads.md, fork.md, multiplicity.md (consolidates all three)

Executive Summary

This document presents a unified, realistic plan for concurrency in PerlOnJava. It replaces the separate threads.md, fork.md, and multiplicity.md documents with a single plan grounded in what JRuby, Jython, and TruffleRuby have actually shipped — not theoretical designs.

Key insight from research: Every successful JVM language implementation chose one primary concurrency model and made it work well, rather than trying to faithfully replicate every native-language concurrency mechanism.

Recommended Strategy

JRuby model (proven): Map Perl threads directly to JVM threads; make the runtime thread-safe; accept that fork() cannot work.
ThreadLocal-based runtime context instead of classloader isolation (Jython learned this the hard way — ThreadLocal is simpler and faster).
Incremental de-static-ification of global state, starting with the smallest subsystems.
No GIL/global lock — follow JRuby's lead: make core data structures safe, document what isn't safe, let users synchronize where needed.

Key Definitions

threads (ithreads): Perl's official mechanism for parallel execution within a single process. Each Perl thread runs in its own cloned interpreter (not sharing memory by default). All variables are deep-copied at thread creation time. Shared variables require explicit :shared attribute. This provides "fork-like semantics without forking" — isolation by default. Note: The use of interpreter-based threads is officially "discouraged" in Perl 5 documentation, but it remains the only officially supported threading model.

fork(): Process-level parallelism by creating a child process with copied memory. Uses copy-on-write semantics for efficiency. Cannot be implemented on the JVM — JRuby confirmed this over 20 years of attempts. The JVM's GC can crash between fork and exec, and JVM state cannot be safely split across processes.

multiplicity: Running multiple Perl interpreter instances within the same process. In Perl 5, this is the internal mechanism (perl_clone()) that enables ithreads. In PerlOnJava, this is achieved through PerlRuntime instances. Use cases include web server request isolation (mod_perl model) and JSR-223 embedding.

1. Lessons from Other JVM Language Projects

1.1 JRuby (most relevant — 20 years of production use)

Threading model:

Ruby threads map 1:1 to JVM threads (real OS threads)
No Global VM Lock — true parallel execution
The JRuby runtime itself is thread-safe for structural operations: defining methods, modifying constants, modifying global variables
Core mutable data structures (String, Array, Hash) are not thread-safe; users must synchronize explicitly
volatile writes for method tables, constant tables, class variables, global variables
Non-volatile: instance variables, local variables in closures

Fork handling:

Kernel#fork raises NotImplementedError on JRuby
JRuby tried POSIX fork+exec via JNA for system()/backticks but abandoned it as too dangerous (JVM GC can crash between fork and exec)
Recommendation: use Process.spawn, system(), or threads instead
This is universally accepted by the JRuby community

Key lesson: Don't waste time on fork emulation. Focus on making threads work well. The ecosystem adapts.

1.2 Jython (cautionary tale)

Threading model:

Python threads map 1:1 to JVM threads
No GIL (unlike CPython) — true parallel execution
dict, list, set are thread-safe (no corruption, but updates can be lost)
All Python attribute get/set are volatile (backed by ConcurrentHashMap)
Sequential consistency for Python code

ThreadState problem:

Jython used ThreadLocal to store execution context (ThreadState)
They later recognized this was a design mistake that "slows things down and unnecessarily limits what a given thread can do"
Planned refactoring to remove ThreadState but never completed it (project stalled before this happened)

Key lesson: ThreadLocal for runtime context is a pragmatic starting point but has performance costs. Plan for eventual migration to explicit parameter passing — but don't let that block the initial implementation.

1.3 TruffleRuby / GraalVM

Threading model:

True parallel threads via GraalVM's managed threading
Each Context is single-threaded; multi-threading requires multiple contexts or explicit opt-in via allowAllAccess
Polyglot contexts provide natural isolation

Key lesson: Context-per-thread is the cleanest isolation model. PerlOnJava's eventual PerlRuntime instances serve the same role.

1.4 Perl 5 ithreads (what we're implementing)

How it works in CPython Perl:

Each thread gets a cloned copy of the entire interpreter
All variables are copied by default (deep clone at thread creation)
Shared variables require explicit :shared attribute
The threads::shared module mediates all cross-thread access
CLONE method called on objects during thread creation
Heavy — thread creation is expensive (full interpreter clone)

What mod_perl 2.0 does:

Pre-creates a pool of Perl interpreter instances
Each Apache worker thread gets its own interpreter
Same pattern as the multiplicity.md "runtime pool" idea

2. PerlOnJava Global State Inventory

The following mutable static state must be addressed before any concurrency:

Class	Mutable Static Fields	Size	Risk
`GlobalVariable`	`globalVariables`, `globalArrays`, `globalHashes`, `globalCodeRefs`, `globalIORefs`, `globalFormatRefs`, `pinnedCodeRefs`, `stashAliases`, `globAliases`, `globalGlobs`, `isSubs`, `packageExistsCache`, `globalClassLoader`	~13 maps + 1 classloader	Critical — all Perl variables live here
`RuntimeIO`	`stdout`, `stderr`, `stdin`	3 fields	High — I/O misdirection under concurrency
`CallerStack`	`callerStack` (List)	1 list	High — stack traces cross threads
`SpecialBlock`	`endBlocks`, `initBlocks`, `checkBlocks`	3 arrays	Medium — phase blocks interfere
`DynamicVariableManager`	`variableStack` (Deque)	1 deque	High — `local` scoping is per-execution-context
`RuntimeScalar`	`dynamicStateStack` (Stack)	1 stack	High — dynamic state per thread
`RuntimeCode`	`evalBeginIds`, `methodHandleCache`, `evalRuntimeContext` (ThreadLocal), `argsStack` (ThreadLocal)	2 maps + 2 ThreadLocals	Medium — some already ThreadLocal
`PerlLanguageProvider`	`globalInitialized`	1 boolean	Low — initialization guard
`InheritanceResolver`	`methodCache`, `linearizedClassesCache`, `overloadContextCache`, `isaStateCache`, `packageMRO`	5 maps	Medium — stale under concurrent class modification
`RuntimeCode` (additional)	`anonSubs`, `interpretedSubs`, `evalContext`, `methodHandleCache`	4 maps	Medium — compiler/eval caches
`RuntimeRegex`	regex state, special vars	state	Medium — `$1`, `$2` etc. must be per-thread

Already ThreadLocal: RuntimeCode.evalRuntimeContext, RuntimeCode.argsStack
Pure functions (safe): RuntimeScalarCache, most operator methods, Base64Util
Read-only after init (safe): Configuration, ParserTables

3. Architecture Decision: ThreadLocal Runtime Context

Why not classloader isolation?

The multiplicity.md document proposed classloader-based isolation first. After reviewing JRuby and Jython's experience, this is not recommended as the primary approach:

Type barriers make inter-runtime communication impossible without reflection
Memory overhead — duplicate class loading for every runtime instance
Debugging nightmare — class identity issues, serialization failures
JRuby abandoned this pattern for user-facing isolation (only used internally for OSGi compatibility)
Jython never used it — went straight to ThreadLocal

Recommended: ThreadLocal + PerlRuntime instances

public final class PerlRuntime {
    private static final ThreadLocal<PerlRuntime> CURRENT = new ThreadLocal<>();

    // Per-runtime mutable state (moved from static fields)
    final VariableStorage variables;    // was GlobalVariable statics
    final IOSystem io;                  // was RuntimeIO statics
    final CallerStackManager callStack; // was CallerStack statics
    final DynamicScopeManager dynamicScope; // was DynamicVariableManager + RuntimeScalar statics
    final SpecialBlockManager blocks;   // was SpecialBlock statics
    final RegexState regexState;        // $1, $2, match state

    public static PerlRuntime current() {
        return CURRENT.get();
    }

    public static void setCurrent(PerlRuntime rt) {
        CURRENT.set(rt);
    }
}

Migration path: Replace GlobalVariable.getGlobalVariable("main::x") with PerlRuntime.current().variables.getGlobalVariable("main::x"). This can be done incrementally, subsystem by subsystem.

4. Concurrency Models to Support

4.1 Primary: `threads` (ithreads) — Full Implementation

This is the only concurrency model that Perl 5 officially supports via CPAN. It maps naturally to JVM threads.

use threads;
use threads::shared;

my $counter :shared = 0;

my @threads;
for (1..4) {
    push @threads, threads->create(sub {
        lock($counter);
        $counter++;
    });
}
$_->join() for @threads;
print "Counter: $counter\n";  # 4

Implementation:

public static RuntimeScalar threadsCreate(RuntimeScalar codeRef, RuntimeArray args) {
    PerlRuntime parent = PerlRuntime.current();

    // Clone runtime state for new thread (deep copy by default)
    PerlRuntime child = parent.clone(CloneMode.THREAD);

    // Copy shared variable references (not values) to child
    child.linkSharedVariables(parent);

    Thread javaThread = Thread.ofVirtual().start(() -> {
        PerlRuntime.setCurrent(child);
        try {
            RuntimeCode.apply(codeRef, args, RuntimeContextType.VOID);
        } finally {
            PerlRuntime.setCurrent(null);
        }
    });

    return new RuntimeScalar(new PerlThread(javaThread, child));
}

Shared variables use a wrapper that synchronizes access:

public class SharedScalar extends RuntimeScalar {
    private final ReentrantLock lock = new ReentrantLock();

    @Override
    public RuntimeScalar set(RuntimeScalar value) {
        lock.lock();
        try {
            return super.set(value);
        } finally {
            lock.unlock();
        }
    }

    @Override
    public String toString() {
        lock.lock();
        try {
            return super.toString();
        } finally {
            lock.unlock();
        }
    }
}

Virtual Threads (Java 21+): Since PerlOnJava requires Java 22+, we can use virtual threads (Thread.ofVirtual()) for lightweight thread creation. This is a significant advantage over native Perl's heavy interpreter-cloning ithreads.

Virtual Thread Pinning Caveat: In Java 21-24, virtual threads become "pinned" to their carrier OS thread when executing inside a synchronized block or native method. This means blocking I/O inside synchronized blocks will block the carrier thread rather than unmounting the virtual thread. JEP 491 ("Synchronize Virtual Threads without Pinning") addresses this and will be available in future Java releases. For now, SharedScalar/SharedArray/SharedHash use ReentrantLock instead of synchronized — this is critical because ReentrantLock uses LockSupport.park() which properly unmounts virtual threads.

4.2 Secondary: `fork()` — Stub with Documentation

Following JRuby's proven approach:

public static RuntimeScalar fork() {
    // Set $! to explain why
    WarnDie.warn("fork() is not available on the JVM. " +
                 "Use threads, system(), or ProcessBuilder instead.");
    GlobalVariable.getGlobalVariable("main::!").set("Function not implemented");
    return RuntimeScalarCache.scalarUndef;  // returns undef
}

Rationale: JRuby has shipped NotImplementedError for fork for 20 years. The community adapted. Most fork usage falls into patterns that have alternatives:

Perl Pattern	JVM Alternative
`fork + exec`	`system()`, `ProcessBuilder`
`fork` for parallelism	`threads`
`fork` for daemon	`Thread` + process management
`fork` in test harness	Thread-based test runner
`open(PIPE, "-\|")`	`ProcessBuilder` with piped streams

4.3 Web Server Concurrency — Runtime Pool

For JSR-223 and web server use cases (the most immediately useful):

public class PerlRuntimePool {
    private final BlockingQueue<PerlRuntime> pool;
    private final PerlRuntime template;
    private final int maxPoolSize;
    private final int minPoolSize;
    private final long maxIdleTimeMs;

    public PerlRuntimePool(int size) {
        this(size, size, 0);  // Default: fixed pool, no idle timeout
    }

    public PerlRuntimePool(int minSize, int maxSize, long maxIdleTimeMs) {
        this.minPoolSize = minSize;
        this.maxPoolSize = maxSize;
        this.maxIdleTimeMs = maxIdleTimeMs;
        this.template = PerlRuntime.createDefault();
        this.pool = new ArrayBlockingQueue<>(maxSize);
        for (int i = 0; i < minSize; i++) {
            pool.add(template.clone(CloneMode.INDEPENDENT));
        }
    }

    public PerlRuntime acquire() throws InterruptedException {
        return pool.take();
    }

    public void release(PerlRuntime rt) {
        rt.reset();  // Clear request-scoped state
        pool.offer(rt);
    }
}

Configuration options (exposed via system properties or API):

minPoolSize: Minimum runtimes to keep warm (default: 1)
maxPoolSize: Maximum concurrent runtimes (default: CPU cores × 2)
maxIdleTimeMs: Time before idle runtimes are destroyed (0 = never)

5. Thread Creation: What Gets Cloned

When threads->create() is called, a new PerlRuntime is created by deep-cloning the parent. This section details the cloning semantics for each category of runtime state — this is the most technically challenging part of the implementation.

5.1 Closure Capture Cloning

PerlOnJava has two closure representations, both of which capture lexical variables by Java object reference:

JVM backend: Closures are compiled JVM classes where captured lexicals become constructor parameters stored as instance fields on the generated class object (RuntimeCode.codeObject). The RuntimeCode.methodHandle is bound to this specific object instance.

Interpreter backend: Closures store captures in an explicit array:

// InterpretedCode.java
public final RuntimeBase[] capturedVars; // Closure support (captured from outer scope)

Cloning rules:

What	Clone?	Rationale
Compiled JVM bytecode / MethodHandle	No — share	Immutable code, safe to share across threads
`InterpretedCode.bytecode` / `constants` / `stringPool`	No — share	Immutable arrays, safe to share
`RuntimeCode.codeObject` (JVM backend)	Yes — deep copy	Holds mutable captured variable references
`InterpretedCode.capturedVars` (interpreter)	Yes — deep copy	Holds mutable captured variable references
Each captured `RuntimeScalar` / `RuntimeArray` / `RuntimeHash`	Yes — deep copy	New thread gets independent values (ithreads semantics)
`RuntimeCode.stateVariable` / `stateArray` / `stateHash`	Yes — deep copy	`state` variables are per-closure-instance
Variables marked `:shared`	No — share reference	Both threads get a reference to the same `SharedScalar` wrapper

Reference fixup challenge: After deep-copying all closures, captured variables that point to globals must be redirected to the child's copies of those globals, not the parent's originals. This requires a two-pass clone:

Pass 1: Deep-copy all VariableStorage (globals) and all RuntimeCode objects, building an identity map: parentObject → childObject.
Pass 2: Walk all closure captures in the child. For each capture that appears in the identity map, replace it with the child's copy.

This is the same approach Perl 5 uses internally (perl_clone() + pointer fixup table in sv.c).

5.2 CLONE Method and External Resources

Perl 5 calls CLONE($package) on every package that defines it whenever a new thread is created. This is the mechanism by which modules handle thread safety.

DBI example (Perl 5 behavior):

package DBI;
sub CLONE {
    # Called in the NEW thread's context after cloning
    # Invalidate all cached database handles — they point to
    # the parent's JDBC Connection and are unusable here
    # Child must call DBI->connect() again
}

Implementation plan:

After cloning PerlRuntime, walk all packages in the child's symbol table.
For each package that has a CLONE subroutine defined, call it with the package name as argument.
The call happens in the child thread's context (after PerlRuntime.setCurrent(child)).

External resource handling rules:

Resource Type	Action in Child	Rationale
JDBC `Connection` (via DBI)	Invalidate → undef	Cannot share connections across threads
File handles (`RuntimeIO`)	Clone with independent position	Perl 5 duplicates file descriptors
Sockets	Invalidate → undef	OS socket FDs can't be shared
Java objects (generic)	Shallow copy (reference)	User's responsibility via CLONE
Tied variables	Clone the tie object	CLONE on tie class handles specifics

Safety net: During clone, any RuntimeScalar whose value wraps a Java object that implements java.io.Closeable or java.sql.Connection should be set to undef in the child by default. The module's CLONE method can then reconnect if needed. This prevents silent use of stale connections.

5.3 Method Cache Invalidation

The InheritanceResolver maintains several static caches that are critical for performance but dangerous under concurrency:

// InheritanceResolver.java — all static, all mutable
static final Map<String, List<String>> linearizedClassesCache;  // MRO linearization
private static final Map<String, RuntimeScalar> methodCache;     // method resolution
private static final Map<Integer, OverloadContext> overloadContextCache;
private static final Map<String, List<String>> isaStateCache;    // @ISA change detection

Additionally, RuntimeCode has:

private static final Map<Class<?>, MethodHandle> methodHandleCache;  // MethodHandle lookup

Thread creation policy:

Cache	Action at Thread Creation	Rationale
`methodCache`	Clear in child	Child may modify `@ISA` independently
`linearizedClassesCache`	Clear in child	Depends on `@ISA` which is cloned
`overloadContextCache`	Clear in child	Overload resolution depends on stash state
`isaStateCache`	Clear in child	Must detect changes relative to child's `@ISA`
`methodHandleCache`	Share (read-only)	Maps `Class<?>` → `MethodHandle`; immutable after creation

Clearing is always correct because caches are just performance optimizations. The child pays a cold-cache penalty on first method resolution but is guaranteed correct behavior.

Long-term: Once these caches are moved into PerlRuntime (Phase 5), each runtime has its own caches and no cross-thread invalidation is needed.

JRuby parallel: JRuby uses volatile writes for method table modifications and call-site invalidation on structural changes. We follow the same principle: structural changes (method definition, @ISA modification) invalidate caches.

5.4 Thread Creation Checklist

Complete sequence for threads->create():

Step	What	How
1	Clone `VariableStorage`	Deep-copy all global scalars/arrays/hashes/code refs
2	Clone closure captures	Deep-copy `capturedVars`/`codeObject` in every `RuntimeCode`
3	Fixup capture references	Redirect cloned captures to child's globals (identity map)
4	Handle `:shared` vars	Don't clone; wrap in `SharedScalar`/`SharedArray`/`SharedHash`
5	Invalidate external resources	Set `Closeable`/`Connection`-backed scalars to undef
6	Clear method caches	Clear `methodCache`, `linearizedClassesCache`, `overloadContextCache`, `isaStateCache`
7	Clone I/O state	Fresh `stdout`/`stderr`/`stdin` per `IOSystem`
8	Clone execution context	Fresh `CallerStack`, `DynamicScope`, `SpecialBlocks`, `RegexState`
9	Start JVM thread	`Thread.ofVirtual().start(...)` with `PerlRuntime.setCurrent(child)`
10	Call `CLONE($pkg)`	Walk all packages in child, call CLONE where defined
11	Execute user code	Apply the code reference with provided arguments

Steps 2-3 (closure capture cloning with reference fixup) are the most technically challenging part. The identity-map approach from Perl 5's perl_clone() is the proven solution.

6. Implementation Phases

Phase 0: Thread-Safety Audit (2 weeks)

(See Section 5 for detailed thread-creation semantics that inform all phases.)

Before any new features, make the existing single-threaded runtime safe for the case where multiple JSR-223 eval() calls might overlap:

Add synchronized to GlobalVariable map operations
Make RuntimeIO.stdout/stderr/stdin volatile
Add synchronized to CallerStack operations
Document what is and isn't thread-safe

Deliverable: Existing functionality works under a global lock. No new APIs.

Phase 1: PerlRuntime Shell (4 weeks)

Create the PerlRuntime class as a thin shell around existing statics:

Create PerlRuntime with ThreadLocal<PerlRuntime> CURRENT
Add PerlRuntime.current() accessor
Keep all existing static fields — just route through PerlRuntime.current()
Wire up PerlLanguageProvider to create and set a default PerlRuntime

Key constraint: No behavior change. Every existing test must still pass. The PerlRuntime is just a new entry point that delegates to existing statics.

Deliverable: PerlRuntime.current() works; JSR-223 can create isolated contexts (even if they still share state internally).

Phase 2: De-static-ify I/O (3 weeks)

Move RuntimeIO.stdout/stderr/stdin into PerlRuntime:

// Before: RuntimeIO.stdout
// After:  PerlRuntime.current().io.stdout

public class IOSystem {
    public RuntimeIO stdout;
    public RuntimeIO stderr;
    public RuntimeIO stdin;
}

This is the smallest subsystem and provides immediate value for JSR-223: each ScriptEngine can redirect I/O independently.

Deliverable: JSR-223 ScriptContext.getWriter() works correctly per-context.

Phase 3: De-static-ify CallerStack + DynamicScope (4 weeks)

Move CallerStack.callerStack and DynamicVariableManager.variableStack and RuntimeScalar.dynamicStateStack into PerlRuntime:

public class ExecutionContext {
    final List<Object> callerStack = new ArrayList<>();
    final Deque<DynamicState> dynamicScope = new ArrayDeque<>();
    final Stack<RuntimeScalar> dynamicStateStack = new Stack<>();
}

Deliverable: Call stacks and local variables are per-runtime.

Phase 4: De-static-ify SpecialBlocks + RegexState (2 weeks)

Move SpecialBlock.endBlocks/initBlocks/checkBlocks and regex match state ($1, $2, $&, etc.) into PerlRuntime.

Regex State Isolation Detail:

The following regex-related variables must be per-runtime (not per-thread-global):

Variable	Description	Storage Location
`$1`, `$2`, ... `$N`	Capture groups	`PerlRuntime.regexState.captures`
`$&`	Entire matched string	`PerlRuntime.regexState.lastMatch`
`$``	String before match	`PerlRuntime.regexState.priorMatch`
`$'`	String after match	`PerlRuntime.regexState.postMatch`
`$+`	Last bracket matched	`PerlRuntime.regexState.lastBracket`
`@-`	Match start positions	`PerlRuntime.regexState.matchStarts`
`@+`	Match end positions	`PerlRuntime.regexState.matchEnds`

public class RegexState {
    final RuntimeArray captures = new RuntimeArray();  // $1, $2, etc.
    RuntimeScalar lastMatch;     // $&
    RuntimeScalar priorMatch;    // $`
    RuntimeScalar postMatch;     // $'
    RuntimeScalar lastBracket;   // $+
    final RuntimeArray matchStarts = new RuntimeArray();  // @-
    final RuntimeArray matchEnds = new RuntimeArray();    // @+
}

Implementation: Modify RuntimeRegex to store results in PerlRuntime.current().regexState instead of static fields. The ScalarSpecialVariable class for capture variables must look up values from the current runtime's regex state.

Deliverable: Regex captures and END blocks don't leak between runtimes.

Phase 5: De-static-ify GlobalVariable (8 weeks)

This is the largest and most invasive change. Move all 13+ maps from GlobalVariable into PerlRuntime.variables:

public class VariableStorage {
    final Map<String, RuntimeScalar> globalVariables = new HashMap<>();
    final Map<String, RuntimeArray> globalArrays = new HashMap<>();
    final Map<String, RuntimeHash> globalHashes = new HashMap<>();
    final Map<String, RuntimeScalar> globalCodeRefs = new HashMap<>();
    final Map<String, RuntimeGlob> globalIORefs = new HashMap<>();
    final Map<String, RuntimeFormat> globalFormatRefs = new HashMap<>();
    // ... etc
}

Strategy: Use IDE "Find Usages" on each static field. Replace with PerlRuntime.current().variables.xxx. Compile, test, repeat.

This is pure mechanical refactoring but touches hundreds of call sites.

Deliverable: Full runtime isolation. Multiple PerlRuntime instances can run concurrently without interference.

Phase 6: threads Module (4-6 weeks)

With runtime isolation complete, implement the threads Perl module. See Section 5 for the full thread-creation cloning protocol.

Core API:

threads->create(\&sub, @args) — clone runtime (Section 5.4) + start JVM thread
threads->join() — wait for completion, return value
threads->detach() — fire and forget
threads->list() — list running threads
threads->self() — current thread object
threads->tid() — thread ID (unique integer, main thread = 0)
threads->yield() — hint to scheduler (maps to Thread.yield())
threads->exit($status) — exit current thread (see below)

threads::shared API: 9. share($var) / :shared attribute — SharedScalar/SharedArray/SharedHash wrappers 10. lock($var) — mutex on shared variables 11. cond_wait()/cond_signal()/cond_broadcast() — condition variables 12. is_shared($var) — check if variable is shared

Internal callbacks: 13. CLONE($package) callback — called in child after cloning (Section 5.2) 14. Closure capture cloning with reference fixup (Section 5.1) 15. Method cache invalidation in child (Section 5.3)

threads->exit() Semantics:

In Perl 5 ithreads, exit() in a non-main thread exits only that thread, not the entire process. This must be implemented carefully:

public static void threadsExit(int status) {
    PerlThread current = PerlThread.current();
    if (current.isMainThread()) {
        // Main thread: exit the process
        System.exit(status);
    } else {
        // Worker thread: exit only this thread
        current.setExitStatus(status);
        throw new ThreadExitException(status);
    }
}

The ThreadExitException should be caught by the thread wrapper created in threads->create() and should NOT propagate to the main thread or cause "Perl exited with active threads" warnings.

Virtual Threads note: Use Thread.ofVirtual() by default. Perl's ithreads are already "heavy" (full interpreter clone), so making the actual JVM thread lightweight is a net win.

Timeline Note: This phase is estimated at 4 weeks but may take 6 weeks due to the complexity of closure capture cloning with reference fixup (the most technically challenging part of the entire implementation).

Phase 7: Runtime Pool + JSR-223 Thread Safety (3 weeks)

Implement PerlRuntimePool for web server use cases
Declare THREADING parameter as "THREAD-ISOLATED" in ScriptEngineFactory
Update JSR-223 to use pool automatically
Benchmark: target 1000+ concurrent requests without interference

7. What We Explicitly Don't Implement

Feature	Reason	Alternative
True `fork()`	JVM cannot split processes (JRuby proved this)	`system()`, `threads`, `ProcessBuilder`
Classloader isolation	Type barriers, memory overhead, debugging pain (Jython lesson)	ThreadLocal + PerlRuntime instances
Copy-on-write cloning	JVM has no COW memory; deep copy is the only option	Full clone at thread creation (like Perl 5 ithreads)
`fork()` in test harnesses	Many CPAN test suites use fork	Thread-based test runner (`jprove`)
`open(PIPE, "-\|")`	Requires fork	`ProcessBuilder` wrapper

8. Performance Considerations

ThreadLocal Overhead

ThreadLocal access is ~2-5ns per call on modern JVMs. With an estimated 10-50 ThreadLocal lookups per Perl statement, overhead is 20-250ns per statement. Perl statement execution is typically 100ns-10µs, so overhead is 0.25-25% — acceptable for the concurrency it enables.

Optimization: Hot paths (like GlobalVariable.getGlobalVariable) can cache the PerlRuntime reference in a local variable at method entry.

Thread Creation Cost

Perl 5 ithreads clone the entire interpreter (~100ms for a complex program). PerlOnJava thread creation involves:

Deep-copying VariableStorage maps (~1-10ms depending on variable count)
Deep-copying closure captures + reference fixup (~0.5-5ms)
Clearing method caches in child (~0.1ms)
Calling CLONE on all packages (~0.1-1ms)
Creating a virtual thread (~0.1ms)
Linking shared variables (~0.1ms)

Expected: 2-15ms per thread creation — still significantly faster than Perl 5.

Memory Per Runtime

Each PerlRuntime holds copies of all global variables. For a typical program with ~1000 global variables, each runtime adds ~100KB-1MB of memory. This is comparable to Perl 5 interpreter clones.

9. Testing Strategy

Unit Tests

Runtime isolation: create 2 runtimes, modify globals in one, verify other is unchanged
SharedVariable: concurrent increment from N threads, verify final count
CallerStack isolation: nested calls in separate runtimes don't interfere
Closure capture isolation: modify captured lexical in child, verify parent is unchanged
Method cache isolation: modify @ISA in child, verify parent's method resolution unchanged

Integration Tests

threads->create() with arguments and return values
threads::shared with lock/cond_wait/cond_signal
JSR-223 concurrent eval from multiple Java threads
Existing test suite passes unchanged (regression)
CLONE method called correctly: DBI handle invalidation in child
Closure with :shared variable: both threads see same value
Nested closures: captured-of-captured variables cloned correctly

Stress Tests

100 concurrent threads modifying shared variables
1000 concurrent JSR-223 eval calls
Thread creation/destruction churn

Perl 5 Test Compatibility

Run perl5_t/t/op/threads.t (adapted for PerlOnJava)
Run representative CPAN module tests that use threads

10. Success Criteria

✅ Multiple PerlRuntime instances run concurrently without interference
✅ threads->create() and threads->join() work for basic patterns
✅ threads::shared variables are correctly synchronized
✅ JSR-223 can handle concurrent eval() calls
✅ fork() returns undef with clear error message (not crash)
✅ No regression in existing single-threaded functionality
✅ Thread creation < 10ms for typical programs
✅ < 5% performance overhead for single-threaded programs

11. Timeline Summary

Phase	Duration	Deliverable	Risk
0: Thread-Safety Audit	2 weeks	Global lock safety	Low
1: PerlRuntime Shell	4 weeks	ThreadLocal routing	Low
2: De-static I/O	3 weeks	JSR-223 I/O isolation	Low
3: De-static CallerStack + DynamicScope	4 weeks	Per-runtime call stacks	Low
4: De-static SpecialBlocks + Regex	2 weeks	Per-runtime regex/blocks	Low
5: De-static GlobalVariable	8-12 weeks	Full runtime isolation	High
6: threads Module	4-6 weeks	Perl `use threads` works	Medium
7: Runtime Pool + JSR-223	3 weeks	Production web server	Low
Total	30-38 weeks

Risk Notes:

Phase 5 is the critical path bottleneck. The estimate of 8 weeks is optimistic; it touches "hundreds of call sites" and requires careful testing. Budget 10-12 weeks.
Phase 6 closure capture cloning with reference fixup is the most technically challenging part of the implementation. May require additional time for edge cases.
Phases 0-2 provide immediate value for JSR-223 embedding use cases and should be prioritized.
Phase 6 is only possible after Phase 5 is complete.

12. Open Questions

Virtual threads vs platform threads: Should threads->create() default to virtual threads? Virtual threads are cheap but have caveats with synchronized blocks (pinning). Test both and choose based on benchmarks. Update: JEP 491 will resolve the pinning issue in future Java releases. Recommend defaulting to virtual threads for I/O-bound Perl code.
Clone depth: Should thread creation clone @INC, %INC, loaded modules? Perl 5 clones everything. We could optimize by sharing read-only state.
Signal handling: $SIG{INT} etc. — should signals be per-runtime or process-wide? Perl 5 delivers to the main thread.
Reference fixup performance: The two-pass clone (deep copy + identity map fixup) may be expensive for programs with many closures. Profile Perl 5's perl_clone() for comparison benchmarks.
Nested closures: A closure that captures a variable which is itself captured from an outer scope creates chains. The fixup pass must follow these chains transitively. Verify with test cases.
Runtime Pool Configuration: For web server scenarios, should expose configurable limits: maxPoolSize, minPoolSize, maxIdleTime.

Resolved Questions

CLONE method priority: Yes, implement in Phase 6. Required for DBI, Storable, and any module that holds external resources. (See Section 5.2)
Method cache handling: Clear all caches in child at thread creation. Move to per-runtime in Phase 5. (See Section 5.3)
DBI handles in child: Invalidate automatically for Closeable/Connection objects; module's CLONE method reconnects if needed. (See Section 5.2)
exit() semantics: Resolved in Phase 6 design. exit() in a non-main thread exits only that thread via ThreadExitException. (See Section 6)

External References

JRuby Concurrency Wiki
Jython Concurrency Guide
perlthrtut — Perl threading tutorial
threads — Perl threads module documentation
threads::shared — Perl shared variable documentation
JEP 444: Virtual Threads — Java virtual threads specification
JEP 491: Synchronize Virtual Threads without Pinning — Future fix for synchronized pinning
Oracle Virtual Threads Guide — Java 21 virtual threads documentation

Technical Review Summary

Review Date: 2026-03-26
Status: APPROVED with minor recommendations

Key Findings

Strategy is sound: The JRuby-inspired ThreadLocal + PerlRuntime model is correct.
Global state inventory verified: All static mutable state correctly identified.
Perl 5 compatibility achievable: ithreads semantics can be faithfully implemented.
Virtual threads recommended: Use Thread.ofVirtual() with ReentrantLock (not synchronized).
fork() approach proven: JRuby's 20-year track record validates returning undef.

Performance Expectations

Thread creation: 2-15ms (5-50x faster than Perl 5)
Single-threaded overhead: 0.25-25% from ThreadLocal routing (acceptable)
Memory per runtime: ~100KB-1MB (comparable to Perl 5)

Recommendations Incorporated

Added threads->exit() semantics (Section 6)
Added threads->yield() and threads->tid() to API (Section 6)
Added regex state isolation detail (Section 4, Phase 4)
Updated virtual thread pinning caveat with JEP 491 reference
Updated timeline with risk assessment
Moved resolved questions out of open questions

FilesExpand file tree

concurrency.md

Latest commit

History

concurrency.md

File metadata and controls

PerlOnJava Concurrency: Unified Design

Executive Summary

Recommended Strategy

Key Definitions

1. Lessons from Other JVM Language Projects

1.1 JRuby (most relevant — 20 years of production use)

1.2 Jython (cautionary tale)

1.3 TruffleRuby / GraalVM

1.4 Perl 5 ithreads (what we're implementing)

2. PerlOnJava Global State Inventory

3. Architecture Decision: ThreadLocal Runtime Context

Why not classloader isolation?

Recommended: ThreadLocal + PerlRuntime instances

4. Concurrency Models to Support

4.1 Primary: threads (ithreads) — Full Implementation

4.2 Secondary: fork() — Stub with Documentation

4.3 Web Server Concurrency — Runtime Pool

5. Thread Creation: What Gets Cloned

5.1 Closure Capture Cloning

5.2 CLONE Method and External Resources

5.3 Method Cache Invalidation

5.4 Thread Creation Checklist

6. Implementation Phases

Phase 0: Thread-Safety Audit (2 weeks)

Phase 1: PerlRuntime Shell (4 weeks)

Phase 2: De-static-ify I/O (3 weeks)

Phase 3: De-static-ify CallerStack + DynamicScope (4 weeks)

Phase 4: De-static-ify SpecialBlocks + RegexState (2 weeks)

Phase 5: De-static-ify GlobalVariable (8 weeks)

Phase 6: threads Module (4-6 weeks)

Phase 7: Runtime Pool + JSR-223 Thread Safety (3 weeks)

7. What We Explicitly Don't Implement

8. Performance Considerations

ThreadLocal Overhead

Thread Creation Cost

Memory Per Runtime

9. Testing Strategy

Unit Tests

Integration Tests

Stress Tests

Perl 5 Test Compatibility

10. Success Criteria

11. Timeline Summary

12. Open Questions

Resolved Questions

Related Documents

External References

Technical Review Summary

Key Findings

Performance Expectations

Recommendations Incorporated

4.1 Primary: `threads` (ithreads) — Full Implementation

4.2 Secondary: `fork()` — Stub with Documentation