FORtran-based Binary-io Interface Toolkit (FORBIT)

FORBIT is a lightweight Python package for reading and writing Fortran direct-access unformatted binary files as NumPy arrays. It is designed for no-header binary files whose records are written by Fortran with ACCESS='DIRECT' and FORM='UNFORMATTED'. The Python interface keeps track of the current Fortran record number, reads or writes one fixed-size record at a time, and returns numpy.ndarray.

FORBIT brings direct-access binary I/O to NumPy ndarrays while keeping a workflow familiar to Fortran users. Unlike scipy.io.FortranFile, FORBIT supports fixed-length record-oriented direct-access workflows commonly used in atmospheric/ocean science, CFD, and HPC codes.

This library intentionally avoids providing high-level abstractions. Instead, users can freely create and manage wrapper routines suited to their own applications. The library is intended to provide a foundation for a wide variety of analyses. For example, users may construct their own abstractions using custom-defined classes, or convert the outputs into xarray objects for downstream analysis.

Features

Read and write Fortran direct-access unformatted binary files
Handle no-header fixed-record binary files with explicit Fortran-style workflows
Return data as numpy.ndarray
Support arrays of any dimension
Support single/double precision floating-point data and 2/4/8 bytes integer data
Support explicit endian selection through Fortran's CONVERT specifier
Keep the current record number internally and update it after each read or write

Requirements

FORBIT is implemented with NumPy, Cython, and Fortran. The package metadata declares support for CPython on POSIX/Linux with Python 3.9 to 3.13. However, it may also be possible to build from the source code manually in other environments.

Runtime dependency:

NumPy

Build dependencies:

setuptools
wheel
Cython
NumPy
A Fortran compiler (default: gfortran)
A C compiler (default: gcc)
Makefile

The Fortran/C compilers can be configured in src/Makefile. The Fortran compiler needs to support the CONVERT specifier in the OPEN statement.

The recl specifier of Fortran OPEN statement is in bytes. Some Fortran compilers may use different conventions for direct-access record length units, so care is required when changing compilers or compiler options. For example, when using Intel Fortran compilers (ifort or ifx), the -assume byterecl compiler option is required.

Compile and Path

Install from PyPI

$ pip install forbit

Build from Source

$ git clone https://github.com/koseiohara/forbit.git
$ cd forbit/src/
$ make
$ make install PREFIX=ANY_PATH

When building manually, make install copies the generated shared library to the directory specified by PREFIX. Add that directory to both PYTHONPATH and LD_LIBRARY_PATH before importing the module.

$ export PYTHONPATH="${YOUR_PATH}:${PYTHONPATH}" 
$ export LD_LIBRARY_PATH="${YOUR_PATH}:${LD_LIBRARY_PATH}"

Examples

### Read sample.grd ###
import numpy as np
import forbit

raw_binary_file = "sample.grd"
nx = 3
ny = 4
nz = 2
shape   = [nz,ny,nx]
kind    = 4                 ## Kind Parameter
endian  = "little_endian"   ## Endian of the Target File
record  = 1                 ## Initial Record
recstep = 1                 ## Record Increment

if (kind == 4):
    arr_type = np.float32
elif (kind == 8):
    arr_type = np.float64

arr = np.empty(shape, dtype=arr_type)

file = forbit.open(raw_binary_file,
                   action ="read" ,
                   shape  =shape  ,
                   kind   =kind   ,
                   record =record ,
                   recstep=recstep,
                   endian =endian )

nt = 10     ## Number of Timesteps
# record 1 -> 10
for t in range(nt):
    print(f"Record: {file.get_record()}")

    arr[:,:,:] = file.read()
    ## Write any processes here

    print(f"{arr[:,:,:]}\n")

record = 16
nt = 5      ## Number of Timesteps
file.reset_record(newRecord=record)
# record 16 -> 20
for t in range(nt):
    print(f"Record: {file.get_record()}")

    arr[:,:,:] = file.read()
    ## Write any processes here

    print(f"{arr[:,:,:]}\n")

file.close()

### Write to sample.grd ###
import numpy as np
import forbit

raw_binary_file = "sample.grd"
nx = 3
ny = 4
nz = 2
shape   = [nz,ny,nx]
kind    = 4                 ## Kind Parameter
endian  = "little_endian"   ## Endian of the Target File
record  = 1                 ## Initial Record
recstep = 1                 ## Record Increment

if (kind == 4):
    arr_type = np.float32
elif (kind == 8):
    arr_type = np.float64

arr = np.empty(shape, dtype=arr_type)

file = forbit.open(raw_binary_file,
                   action ="write",
                   shape  =shape  ,
                   kind   =kind   ,
                   record =record ,
                   recstep=recstep,
                   endian =endian )

rng = np.random.default_rng()       ## Random Generator
nt = 20     ## Number of Timesteps
# record 1 -> 20
for t in range(nt):
    print(f"Record: {file.get_record()}")

    arr[:,:,:] = rng.random(shape)
    file.write(arr[:,:,:])

    print(f"{arr[:,:,:]}\n")

file.close()

Benchmark

The benchmark scripts used for the measurements below and their results are available under benchmark/ on GitHub.

Test Condition

Item	Value
Array Shape	`[50,150,300]`
Data Type	float32
Record Size	9 MB
Number of Records	5000
Record Step (skip test)	3
Storage Type	HDD
Fortran Compiler	GNU Fortran (GCC) 15.1.0
C Compiler	gcc (GCC) 15.1.0

Compared Implementations

FORBIT was compared against minimal NumPy implementations producing byte-identical binary input/output.

Contiguous Record Write

FORBIT:

fp.write(arr)

NumPy:

arr.tofile(fp)

Sparse Direct-Access Write (recstep=3)

FORBIT:

fp.write(arr)

NumPy:

fp.seek((record - 1) * recl)
arr.tofile(fp)

Contiguous Record Read

FORBIT:

arr = fp.read()

NumPy:

work_arr = np.fromfile(fp, dtype=np.float32, count=nz*ny*nx)
arr[...] = work_arr.reshape([nz,ny,nx])

Sparse Direct-Access Read (recstep=3)

FORBIT:

arr = fp.read()

NumPy:

fp.seek((record-1)*recl)
work_arr = np.fromfile(fp, dtype=np.float32, count=nz*ny*nx)
arr[...] = work_arr.reshape([nz,ny,nx])

Results

Write

Benchmark	NumPy `tofile()`	forbit `write()`
Contiguous record write	0.0833 - 0.0837 s/record	0.0777 - 0.0780 s/record
Sparse direct-access write	0.0838 - 0.0939 s/record	0.0780 - 0.0788 s/record

Read

Benchmark	NumPy `fromfile()`	forbit `read()`
Contiguous record read	0.0772 - 0.0774 s/record	0.0765 - 0.0772 s/record
Sparse direct-access read	0.216 - 0.217 s/record	0.205 - 0.206 s/record

API

`forbit.open()`

file = forbit.open(filename, action, shape, kind, record, recstep, endian, recl=None, dtype="real")

Open a Fortran direct-access unformatted binary file.

Parameters

filename
type=str
File name of a no-header binary file.
action
type=str
File access mode. Accepted values are case-insensitive:
- "read"
- "write"
- "readwrite"
The value is passed to the Fortran OPEN statement as the ACTION specifier.
shape
type=ndarray
Other types of array such as list and tuple may be allowed.
Shape of one Fortran direct-access record as it should appear on the Python side.
Examples:
- For a 1D record: [nx]
- For a 2D record returned as (ny, nx): [ny, nx]
- For a 3D record returned as (nz, ny, nx): [nz, ny, nx]
The shape is given in normal C-order, not F-order. All dimensions must be positive integers.
kind
type=int
Byte size per element.
Accepted values:
- 2: returned/written as numpy.int16 (dtype=int only)
- 4: returned/written as numpy.float32 or numpy.int32
- 8: returned/written as numpy.float64 or numpy.int64
This parameter describes the precision stored in the binary file. When writing, input arrays are converted to the selected data type before being passed to the Fortran write routine.
record
type=int
Initial Fortran direct-access record number. Record numbers are 1-based, as in Fortran.
recstep
type=int
Increment added to the internal record number after each read() or write() call. The value to increment record after every access.
Examples:
- recstep=1: read/write consecutive records (e.g., 1, 2, 3, ...)
- recstep=0: keep using the same record number (e.g., 1, 1, 1, ...)
- recstep=12: jump by 12 records after each access (e.g., 1, 13, 25, ...)
endian
type=str
Endian conversion mode passed to Fortran's CONVERT specifier.
Accepted values are case-insensitive:
- "little_endian"
- "big_endian"
- "native"
recl
type=int
Record length passed to Fortran's RECL specifier. If omitted, the total size of array (recl=kind*product(shape)) is used as the default value. The value must be equal or greater than the total size of array.
dtype
type=str
Data type of returned/written array. real/float or integer/int.

`close()`

file.close()

The file is also closed by the object's destructor, but explicit close() is recommended.

`read()`

arr = file.read()

Read the current record and return a NumPy array. The returned array has the shape specified by shape and dtype determined by kind and dtype. Note that the output array is C-order. After reading, the internal record number is updated by recstep.

`write()`

file.write(arr)

The input array must have the same size as the shape specified when opening the file. Before writing, FORBIT converts the array to a C-contiguous NumPy array with dtype determined by kind and dtype. After writing, the internal record number is updated by recstep. Note that the input array must be C-order.

`get_record()`

record = file.get_record()

Return the current internal record number.

`reset_record(newRecord=None, increment=None)`

file.reset_record(newRecord=10)
file.reset_record(increment=3)

Change the internal record number.
If newRecord is provided, the internal record number is set to that value. If newRecord is not provided and increment is provided, the internal record number is increased by increment. If both arguments are provided, newRecord takes priority.

File Format

FORBIT assumes that the file is a Fortran direct-access unformatted file with fixed-size records.
The file is opened in Fortran with settings equivalent to:

open(newunit=unit, file=..., action=..., form='unformatted', access='direct', recl=..., convert=...)

Dimension Ordering

FORBIT's public Python API uses NumPy-style shapes.
For example, if a Fortran program writes a 3D array as (nx, ny, nz), the corresponding Python shape should normally be written as:

shape = [nz, ny, nx]

This convention makes the returned NumPy array natural to index as:

arr[0:nz,0:ny,0:nx]

Kind parameter and Dtype

`kind`	File Precision	Returned dtype
`2`	2 byte integer (int16)	`numpy.int16`
`4`	4 byte integer (int32) or single precision (real32)	`numpy.int32`/`numpy.float32`
`8`	8 byte integer (int64) or double precision (real64)	`numpy.int64`/`numpy.float64`

Only integer and floating-point data are supported by the public API. Complex, logical, character, and quadruple-precision records are not supported by the current implementation.

Record Handling

The current record number is stored inside the forbit object. Each call to read() or write() uses the current record number and then updates it as follows:

record = record + recstep

Fortran direct-access record numbers are positive and 1-based. The low-level Fortran routines stop with an error if a non-positive record number is used.

Alternatives

scipy.io.FortranFile

Sequential unformatted Fortran binary I/O with record markers. FORBIT instead targets lightweight direct-access binary workflows.

numpy.memmap

Low-level memory-mapped access to raw binary arrays. FORBIT adds explicit record-oriented direct-access I/O.

xgrads

xgrads provides higher-level GrADS/xarray workflows with metadata handling. FORBIT instead focuses on lightweight direct-access binary I/O with explicit Fortran-style workflows for NumPy arrays.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pypi_update.sh		pypi_update.sh
pyproject.toml		pyproject.toml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

FORtran-based Binary-io Interface Toolkit (FORBIT)

Features

Requirements

Compile and Path

Install from PyPI

Build from Source

Examples

Benchmark

Test Condition

Compared Implementations

Contiguous Record Write

Sparse Direct-Access Write (recstep=3)

Contiguous Record Read

Sparse Direct-Access Read (recstep=3)

Results

Write

Read

API

forbit.open()

Parameters

close()

read()

write()

get_record()

reset_record(newRecord=None, increment=None)

File Format

Dimension Ordering

Kind parameter and Dtype

Record Handling

Alternatives

scipy.io.FortranFile

numpy.memmap

xgrads

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`forbit.open()`

`close()`

`read()`

`write()`

`get_record()`

`reset_record(newRecord=None, increment=None)`

Packages