Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ cache/

package.sh

site/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

## System Requirements

*neworder* requires python 3.12 or above and runs on 64-bit linux, OSX and Windows platforms. To take advantage of the optional parallel execution functionality, you will also need an MPI implementation, such as [open-mpi](https://www.open-mpi.org/) or [mpich](https://www.mpich.org/). Note that MPI packages are not currently available for Windows via pip Windows users must install an MPI runtime separately (e.g. [MS-MPI](https://learn.microsoft.com/en-us/message-passing-interface/microsoft-mpi)) and use the `parallel-native` extra.
*neworder* requires python 3.12 or above and runs on 64-bit linux, OSX and Windows platforms. To take advantage of the optional parallel execution functionality, you will also need an MPI implementation, such as [open-mpi](https://www.open-mpi.org/) or [mpich](https://www.mpich.org/). Note that MPI packages are not currently available for Windows via pip - Windows users must install an MPI runtime separately (e.g. [MS-MPI](https://learn.microsoft.com/en-us/message-passing-interface/microsoft-mpi)) and use the `parallel-native` extra.

## Installation

Expand Down
76 changes: 59 additions & 17 deletions docs/tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,37 +29,33 @@ If necessary, you can supply your own seeding strategy, for instance if you requ
!!! note "Seeder function signature"
The seeder function must take no arguments and return an `int`. When the function is called by the neworder runtime, the "rank" (in MPI parlance) of each process is available to it. For serial execution, the rank will always be zero.

!!! warning "Resetting the random streams"
`model.mc.reset()` re-invokes the seeder. For non-deterministic seeders this produces a new seed, so the reset stream will differ from the original.

```python
import neworder
def hybrid_seeder() -> int:
return (neworder.mpi.RANK % 2) + 12345
```

or, as a lambda:

```python
hybrid_seeder: Callable[[], int] = lambda r: (r % 2) + 12345
return (neworder.mpi.RANK % 2) + 12345
```

which returns the same seed for all odd-ranked processes and a different seed for the even-ranked ones. You can define your seeder inline when you instantiate the `Model`, e.g.

```python
class MyModel(neworder.Model):
def __init__(self, timeline: neworder.Timeline) -> None:
super().__init__(timeline, lambda: (neworder.mpi.RANK % 2) + 12345)
def __init__(self, timeline: neworder.Timeline) -> None:
super().__init__(timeline, lambda: (neworder.mpi.RANK % 2) + 12345)
...
```

If there was a requirement for multiple processes to all have the same nondeterministic stream, you could implement a seeding strategy like so:

```python
def nondeterministic_identical_stream() -> int:
# only process 0 gets a seed
seed = neworder.MonteCarlo.nondeterministic_stream(0) if neworder.mpi.RANK == 0 else None
# then broadcasts it to the other processes
seed = neworder.mpi.COMM.bcast(seed, root=0)
return seed

# only process 0 gets a seed
seed = neworder.MonteCarlo.nondeterministic_stream() if neworder.mpi.RANK == 0 else None
# then broadcasts it to the other processes
seed = neworder.mpi.COMM.bcast(seed, root=0)
return seed
```

## Identical Streams
Expand All @@ -69,6 +65,52 @@ def nondeterministic_identical_stream() -> int:

The "option" example relies on parallel processes with identical random streams to reduce noise when computing differences for sensitivity analysis. It implements a `check` step that compares the internal states of the random stream in each process and fails if any are different (see the example code).

## Ultimate Reproducibility

The `MonteCarlo` engine is a sequential stream: every draw advances its internal state, so the value you get depends on how many draws have been taken before it. This couples reproducibility to draw order - if agents are added, removed, or processed in a different sequence, the stream diverges.

`SplitMix64` eliminates this coupling. Each variate is computed by hashing a set of **integer keys** (e.g. person ID, process ID, timestep) together with the seed. There is no state to advance, so:

- the draw for person *i* is the same whether you compute the full population or just person *i* in isolation,
- draws can be computed in any order, on any thread, without coordination.

### Basic usage

Construct within your model class, passing a seeder (the same callables used by `MonteCarlo`):

```python
self.rng = neworder.SplitMix64(neworder.MonteCarlo.deterministic_identical_stream)
```

Call `uarray` with any mix of scalar integers (used as context, adding no output dimension) and **1-D** integer arrays (each adding one output dimension). Multi-dimensional arrays are not supported and will raise a `TypeError`.

```python
# 1-D: one variate per person
draws = self.rng.uarray(person_ids, process_id, self.timeline.index)

# 2-D: one variate per (person, draw_index) pair
draws = self.rng.uarray(person_ids, process_id, self.timeline.index, draw_indices)
```

String keys (e.g. the name of a stochastic process) can be converted to stable integers with `SplitMix64.hash64`:

```python
draws = self.rng.uarray(person_ids, neworder.SplitMix64.hash64("mortality"), self.timeline.index)
```

### Repeated calls with the same arguments

Because `SplitMix64` has no state, two calls with identical arguments return identical values. To get independent draws across repeated calls, either:

- use a non-deterministic seeder (this is called each time `uarray` is called), or
- construct with `use_counter=True`, which mixes an auto-incrementing counter into each call. In this case calling `reset()` rewinds the counter and will then replay the same sequence.

!!! warning "Multithreaded use with `use_counter=True`"
The counter is a plain member variable. If multiple threads call `uarray()` concurrently on the same `SplitMix64` instance, they will race on the increment and the results will be nondeterministic. Either give each thread its own `SplitMix64` instance, or avoid `use_counter=True` in multithreaded contexts and instead incorporate a thread-specific *deterministic* scalar (e.g. task identifier or loop index) as a key argument to `uarray()`.

!!! note "When to use `SplitMix64` vs `MonteCarlo`"
Use `MonteCarlo` for general-purpose sampling (non-uniform distributions, arrival times, categorical transitions). Prefer `SplitMix64` for uniform draws that must be **stable under sub-sampling or reordering** - for example when agents enter or leave the population mid-run, or when stochastic processes execute in a non-deterministic order.

## External Sources of Randomness

Other libraries, such as *numpy*, contain a much broader selection of random number functionality than *neworder* does, and it makes no sense to reimplement such functionality. If you are using a specific seeding strategy within neworder, and are also using an external random generator, it is important to ensure they are also following the same strategy, otherwise reproducibility may be compromised.
Expand All @@ -82,14 +124,14 @@ self.nprand = np.random.Generator(np.random.MT19937(ext_seed))
x = self.nprand.normal(size=5)
```

If you've chosen a deterministic seedng strategy, then `ext_seed` will be reproducible, and if you've chosen an independent strategy, then `ext_seed` will be different for each process, thus propagating your chosen seeding strategy to the external generator.
If you've chosen a deterministic seeding strategy, then `ext_seed` will be reproducible, and if you've chosen an independent strategy, then `ext_seed` will be different for each process, thus propagating your chosen seeding strategy to the external generator.

!!! note "Seeding external generators"
Wherever possible, explicitly seed any external random generators using *neworder*'s MonteCarlo engine. This will effectively propagate your seeding strategy to the external generator.

### Using neworder's random generator with numpy

It is now possible to use the RNG from the neworder model's Monte-Carlo engine as a `numpy` generator. In this way all of numpy's functionality is available with neworder's RNG. To achieve this use the adapter function `as_np`. Similarly to the example above, in your model constructor create the numpy generator:
It is now possible to use the the neworder model's Monte-Carlo engine as a `numpy` generator. In this way all of numpy's functionality is available with neworder's `MonteCarlo` RNG. To achieve this use the adapter function `as_np`. Similarly to the example above, in your model constructor create the numpy generator, then:

```py
self.nprand = no.as_np(self.mc)
Expand Down
1 change: 0 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ markdown_extensions:

extra_javascript:
- js/config.js
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js

plugins:
Expand Down
2 changes: 2 additions & 0 deletions neworder/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
MonteCarlo,
NoTimeline,
NumericTimeline,
SplitMix64,
Timeline,
checked,
df,
Expand Down Expand Up @@ -35,6 +36,7 @@
"NoTimeline",
"NumericTimeline",
"Space",
"SplitMix64",
"StateGrid",
"Timeline",
"as_np",
Expand Down
30 changes: 30 additions & 0 deletions neworder/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ __all__: list[str] = [
"NoTimeline",
"NumericTimeline",
"Space",
"SplitMix64",
"StateGrid",
"Timeline",
"as_np",
Expand All @@ -43,6 +44,35 @@ __all__: list[str] = [
"verbose",
]

class SplitMix64:
"""
A hash-based sampler that produces U[0,1) variates deterministically from integer keys.
"""
def __init__(self, seeder: collections.abc.Callable[[], int], *, use_counter: bool = False) -> None:
"""
Constructs a SplitMix64 with a seeder callable and an optional call counter.
"""
def counter(self) -> int:
"""
The current call counter. Incremented by each uarray() call when use_counter=True.
"""
def reset(self) -> None:
"""
Resets the call counter to zero.
"""
def uarray(
self, *args: int | collections.abc.Sequence[int] | numpy.typing.NDArray[numpy.int64]
) -> numpy.typing.NDArray[numpy.float64]:
"""
Returns a float64 array of U[0,1) values hashed from the supplied integer keys.
"""
@staticmethod
def hash64(s: str) -> int:
"""
Returns a deterministic 64-bit integer hash of a string.
"""
def __repr__(self) -> str: ...

class LinearTimeline(Timeline):
"""

Expand Down
3 changes: 0 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,6 @@ minversion = "6.0"
testpaths = [
"test"
]
filterwarnings = [
"ignore:neworder installed in serial mode:RuntimeWarning",
]

[tool.ruff]
line-length = 120
Expand Down
15 changes: 13 additions & 2 deletions src/Module.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@
#include "Model.h"
#include "MonteCarlo.h"
#include "NPArray.h"
#include "SplitMix64.h"
#include "Timeline.h"

#include "NewOrder.h"

#include <memory>
#include <pybind11/native_enum.h>

using namespace py::literals;
Expand Down Expand Up @@ -54,7 +56,6 @@ void init_env(py::object mpi) {
// if something other than module not found has occurred, fail
if (!pyerror.matches(PyExc_ModuleNotFoundError))
throw;
no::warn("neworder installed in serial mode. If necessary, enable MPI with: pip install neworder[parallel]");
}

mpi.attr("COMM") = comm;
Expand Down Expand Up @@ -99,7 +100,6 @@ PYBIND11_MODULE(_neworder_core, m)
.def_property_readonly("start", &no::Timeline::start, timeline_start_docstr)
.def_property_readonly("end", &no::Timeline::end, timeline_end_docstr)
.def_property_readonly("index", &no::Timeline::index, timeline_index_docstr)
//.def_property_readonly("nsteps", &no::Timeline::nsteps, timeline_nsteps_docstr)
.def_property_readonly("dt", &no::Timeline::dt, timeline_dt_docstr)
.def_property_readonly("at_end", &no::Timeline::at_end, timeline_at_end_docstr)
.def("__repr__", &no::Timeline::repr, timeline_repr_docstr);
Expand Down Expand Up @@ -169,6 +169,17 @@ PYBIND11_MODULE(_neworder_core, m)
.export_values()
.finalize();

// Hash-based deterministic sampler
py::class_<no::SplitMix64>(m, "SplitMix64", sms_docstr)
.def(py::init([](const py::function& seeder, bool use_counter) {
return std::make_unique<no::SplitMix64>([seeder]() { return seeder().cast<int64_t>(); }, use_counter);
}), "seeder"_a, py::kw_only(), "use_counter"_a = false, sms_init_docstr)
.def("counter", &no::SplitMix64::counter, sms_counter_docstr)
.def("reset", &no::SplitMix64::reset, sms_reset_docstr)
.def("uarray", &no::SplitMix64::uarray, sms_uarray_docstr)
.def_static("hash64", &no::SplitMix64::hash64, "s"_a, sms_hash64_docstr)
.def("__repr__", &no::SplitMix64::repr, sms_repr_docstr);

// statistical utils
m.def_submodule("stats", stats_docstr)
.def("logistic", no::logistic, stats_logistic_docstr, "x"_a, "x0"_a = 0.0, "k"_a = 1.0)
Expand Down
92 changes: 87 additions & 5 deletions src/Module_docstr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ const char* lineartimeline_init_docstr = R"""(

const char* lineartimeline_init_open_docstr = R"""(
Constructs an open-ended timeline give a start value and a step size. NB the model will run until the Model.halt() method is explicitly called
(from inside the step() method). Note also that nsteps() will return -1 for timelines constructed this way
(from inside the step() method).
)""";

const char* numerictimeline_docstr = R"""(
Expand Down Expand Up @@ -88,10 +88,6 @@ const char* timeline_dt_docstr = R"""(
Returns the step size size of the timeline
)""";

const char* timeline_nsteps_docstr = R"""(
Returns the number of steps in the timeline (or -1 if open-ended)
)""";

const char* timeline_at_end_docstr = R"""(
Returns True if the current step is the end of the timeline
)""";
Expand All @@ -100,6 +96,92 @@ const char* timeline_repr_docstr = R"""(
Prints a human-readable representation of the timeline object
)""";

// SplitMix64

const char* sms_docstr = R"""(
A hash-based sampler that produces U[0,1) variates deterministically from integer keys.

Uses the SplitMix64 finalizer (Stafford Variant 13) to hash an arbitrary sequence of
integer arguments into a float64 value. Unlike a stream-based PRNG, the output at any
index depends only on the seed and the input keys, so draws are independent of call order.
When use_counter=False (the default), instances are safe to share across threads.
When use_counter=True, the internal counter is updated atomically, but concurrent callers
will interleave counter values non-deterministically — use one instance per thread if
reproducible counter sequencing is required.

uarray() accepts any number of positional arguments, each either a scalar int or a 1-D
integer array. All scalar args (regardless of position) are premixed into a shared context
hash (the "salt") before any array elements are processed. Array args each contribute one
dimension to the output (outer-product semantics), and are hashed on top of the salt.
)""";

const char* sms_init_docstr = R"""(
Constructs a SplitMix64 with a seeder callable and an optional call counter.

The seeder is called immediately to set the initial seed and again on each reset().
When use_counter=True, a monotonically increasing counter is mixed into the hash before
any user-supplied arguments, guaranteeing that successive uarray() calls with identical
arguments produce independent draws.

Args:
seeder: A zero-argument callable returning an integer seed.
use_counter: (keyword-only) If True, advance an internal counter on each uarray() call (default False).
)""";

const char* sms_counter_docstr = R"""(
The current call counter (uint64). Incremented by each uarray() call when use_counter=True;
always 0 otherwise. Reset to 0 by reset().
)""";

const char* sms_reset_docstr = R"""(
Resets the call counter to zero. The seeder is called fresh on each uarray() call,
so reset() only affects the counter.
)""";

const char* sms_hash64_docstr = R"""(
Returns a deterministic 64-bit integer hash of a string.

Uses FNV-1a to accumulate the string bytes, then applies the SplitMix64 finalizer
to diffuse the bits. The result is stable across platforms and Python versions and
can be passed directly as a scalar key to uarray().

Args:
s: The string to hash.

Returns:
A signed 64-bit integer.
)""";

const char* sms_uarray_docstr = R"""(
Returns a float64 array of U[0,1) values hashed from the supplied integer keys.

Each positional argument is either a scalar int or a 1-D integer array:
- Scalar args (in argument order) are premixed into a shared salt before any array
elements are processed. They do not add an output dimension.
- Array args (in argument order) are each folded into the hash on top of the salt,
each adding one output dimension (outer-product semantics).

The value at any output index depends only on the seed, the call counter (if enabled),
and the corresponding input key values - not on position within the array or which other
keys are present. This makes draws safe to use under sub-sampling and reordering.

Args:
*args: One or more scalar ints or 1-D integer arrays.

Returns:
ndarray[float64] with shape (len(arr0), len(arr1), ...) for the array args in order.
A 0-d array is returned when all args are scalars.

Raises:
ValueError: If no arguments are supplied.
TypeError: If any argument is not a scalar int or a 1-D integer array.
)""";

const char* sms_repr_docstr = R"""(
Returns a human-readable representation of the SplitMix64. Shows the current counter
value when use_counter=True; the seed is not displayed.
)""";

// MonteCarlo

const char* mc_docstr = R"""(
Expand Down
Loading
Loading