A Go implementation for converting GeoJSON format files to GeoParquet format. This library simplifies working with geospatial data by providing both a command-line interface and a Go library for GeoJSON to GeoParquet conversion.
GeoParquet is a standardized way to describe geospatial data in a columnar format that provides efficient storage and query performance. This tool streamlines the process of converting GeoJSON data to GeoParquet format by:
- Parsing standard GeoJSON files with full specification compliance
- Converting to efficient GeoParquet format with optimized columnar storage
- Automatic type inference from GeoJSON data structures
- Preserving all geospatial metadata including coordinate reference systems and properties
- Handling complex geometries and feature collections
- Providing both CLI and library interfaces for different use cases
This project provides both a command-line interface and a Go library for working with geospatial data conversion.
- β GeoJSON Parsing: Full GeoJSON specification compliant file parsing
- β GeoParquet Conversion: Efficient columnar format output with WKB geometry encoding
- β
Basic Property Support: Handles the
nameproperty from GeoJSON features - β Geometry Support: Complete support for all GeoJSON geometry types
- β Feature Collections: Handle complex multi-feature datasets
- β CLI & Library: Both command-line tool and Go library interfaces
- β Cross-platform: Works on Linux, macOS, and Windows
- β GeoParquet 1.1.0: Compliant with GeoParquet specification v1.1.0
- Go 1.24.7 or later
- Nix 2.25.4 or later (optional but recommended)
- PowerShell v7.5.1 or later (for building)
- Clone the repository:
git clone https://github.com/beyondcivic/gogeo.git
cd gogeo- Build the application:
go build -o gogeo .- Clone the repository:
git clone https://github.com/beyondcivic/gogeo.git
cd gogeo- Prepare the environment using Nix flakes:
nix develop- Build the application:
./build.ps1go install github.com/beyondcivic/gogeo@latestThe gogeo tool provides commands for converting GeoJSON files to GeoParquet:
# Convert GeoJSON to GeoParquet
gogeo generate data.geojson -o data.geoparquet
# Show version information
gogeo versionpackage main
import (
"fmt"
"log"
"github.com/beyondcivic/gogeo/pkg/gogeo"
)
func main() {
// Convert GeoJSON to GeoParquet
featureCollection, err := gogeo.Generate("data.geojson", "data.geoparquet")
if err != nil {
log.Fatalf("Error converting data: %v", err)
}
fmt.Printf("Converted %d features to GeoParquet\n", len(featureCollection.Features))
}Convert a GeoJSON file to efficient GeoParquet format with WKB geometry encoding.
gogeo generate [GEOJSON_FILE] [OPTIONS]Options:
-o, --output: Output file path (default:[filename]_parsed.geoparquet)
Examples:
# Basic conversion
gogeo generate locations.geojson
# With custom output path
gogeo generate locations.geojson -o my-locations.geoparquetEnvironment Variables:
GOGEO_OUTPUT_PATH: Default output path for generated files
Display version, build information, and system details.
gogeo versionThe tool converts GeoJSON data to GeoParquet format, which provides:
- Columnar Storage: Efficient storage and query performance
- WKB Geometry Encoding: Well-Known Binary format for geometry data
- Compression: Built-in Zstd compression for reduced file sizes
- Interoperability: Wide support across geospatial tools and libraries
- GeoParquet Metadata: Embedded geo metadata following GeoParquet 1.1.0 specification
The current implementation focuses on core functionality with:
| Feature | Status | Notes |
|---|---|---|
| Geometry Conversion | β Complete | All GeoJSON geometry types supported |
| Name Property | β Complete | Extracts and stores the name property |
| Additional Properties | Type inference implemented but schema limited | |
| Complex Schemas | π§ In Progress | Future enhancement planned |
| GeoJSON Element | GeoParquet Representation | Description |
|---|---|---|
Point |
WKB geometry column | Single coordinate point |
LineString |
WKB geometry column | Connected line segments |
Polygon |
WKB geometry column | Closed area with optional holes |
MultiPoint |
WKB geometry column | Collection of points |
MultiLineString |
WKB geometry column | Collection of line strings |
MultiPolygon |
WKB geometry column | Collection of polygons |
GeometryCollection |
WKB geometry column | Mixed geometry types |
properties.name |
Optional string column | Feature name attribute |
# Convert a simple GeoJSON file
$ gogeo generate locations.geojson -o locations.geoparquet
Generating GeoParquet file for 'locations.geojson'...
β GeoParquet file generated successfully and saved to: locations.geoparquetGiven a GeoJSON file with multiple features, the tool will create a GeoParquet file with:
- All geometries encoded as WKB (Well-Known Binary) in a single geometry column
- Feature names extracted to an optional
namecolumn - GeoParquet metadata embedded following v1.1.0 specification
- Zstd compression applied for efficient storage
Sample GeoJSON:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": { "type": "Point", "coordinates": [1.0, 2.0] },
"properties": { "name": "Location A" }
}
]
}Resulting GeoParquet schema:
geometry: BYTE_ARRAY (WKB-encoded geometry)name: BYTE_ARRAY OPTIONAL (feature name)
- Property Support: Currently only extracts the
nameproperty from GeoJSON features - Schema Flexibility: Uses a fixed schema structure (
GeoParquetRecord) - Complex Properties: Nested objects and arrays are not yet supported
- π Dynamic Schema Generation: Support for arbitrary GeoJSON properties
- π Advanced Type Inference: Better handling of mixed-type properties
- π Complex Property Support: Nested objects and array properties
- π CRS Support: Coordinate reference system handling beyond EPSG:4326
- π Performance Optimizations: Streaming processing for large files
Converts a GeoJSON file to GeoParquet format with WKB geometry encoding.
Parameters:
geojsonPath: Path to the input .geojson fileoutputPath: Path for the output .geoparquet file
Returns:
*geojson.FeatureCollection: Parsed feature collection structureerror: Any error that occurred during processing
Validates the output path for GeoParquet file generation.
Checks if a file is a valid GeoJSON file based on file extension.
Represents a single record in the output GeoParquet file:
type GeoParquetRecord struct {
Geometry []byte `parquet:"geometry"` // WKB-encoded geometry
Name *string `parquet:"name,optional"` // Optional name property
}GeoParquet metadata structure following v1.1.0 specification:
type GeoParquet struct {
Version string `json:"version"`
PrimaryColumn string `json:"primary_column"`
Columns map[string]GeoParquetColumn `json:"columns"`
}Files generated by gogeo are fully compliant with the GeoParquet specification v1.1.0. This can be verified using validation tools like gpq:
$ gpq validate ./test_simple.geoparquet
Summary: Passed 20 checks.
β file must include a "geo" metadata key
β metadata must be a JSON object
β metadata must include a "version" string
β metadata must include a "primary_column" string
β metadata must include a "columns" object
β column metadata must include the "primary_column" name
β column metadata must include a valid "encoding" string
β column metadata must include a "geometry_types" list
β optional "crs" must be null or a PROJJSON object
β optional "orientation" must be a valid string
β optional "edges" must be a valid string
β optional "bbox" must be an array of 4 or 6 numbers
β optional "epoch" must be a number
β geometry columns must not be grouped
β geometry columns must be stored using the BYTE_ARRAY parquet type
β geometry columns must be required or optional, not repeated
β all geometry values match the "encoding" metadata
β all geometry types must be included in the "geometry_types" metadata (if not empty)
β all polygon geometries must follow the "orientation" metadata (if present)
β all geometries must fall within the "bbox" metadata (if present)This validation confirms that gogeo correctly implements:
- Proper GeoParquet metadata structure
- Compliant geometry encoding (WKB)
- Valid column definitions and types
- Specification-adherent file structure
The library is organized into several key components:
- Parsing: GeoJSON file parsing using
github.com/paulmach/orb/geojson - Conversion: GeoJSON to GeoParquet format conversion with WKB encoding
- Type Inference: Property type detection (string, int, float, bool, null)
- Metadata Generation: GeoParquet 1.1.0 compliant metadata creation
- Utilities: File validation and path handling
- Cobra-based CLI with subcommands for each major function
- Comprehensive help system with detailed usage examples
- Flexible output options and error handling
- Environment variable support for configuration
Key external libraries used:
github.com/parquet-go/parquet-go: Parquet file format handlinggithub.com/paulmach/orb: Geospatial geometry processing and WKB encodinggithub.com/spf13/cobra: Command-line interface frameworkgithub.com/spf13/viper: Configuration management
The library implements GeoParquet specification v1.1.0:
- Metadata Key: Uses
geometadata key as specified - Geometry Encoding: WKB (Well-Known Binary) encoding for all geometries
- Primary Column: Default geometry column named
geometry - Schema Validation: Ensures GeoParquet-compliant file structure
- GeoJSON Parsing: Uses
orb/geojsonfor standards-compliant parsing - Geometry Conversion: Converts geometries to WKB using
orb/encoding/wkb - Property Extraction: Currently extracts
nameproperty with optional handling - Metadata Creation: Generates GeoParquet metadata with geometry type analysis
- Parquet Writing: Uses
parquet-gowith Zstd compression
The library uses a custom AppError type for structured error reporting:
type AppError struct {
Message string // User-friendly error message
Value any // Underlying error or additional context
}- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Make your changes and add tests
- Ensure all tests pass:
go test ./... - Commit your changes:
git commit -am 'Add new feature' - Push to the branch:
git push origin feature/new-feature - Submit a pull request
Run the test suite:
go test ./...Run tests with coverage:
go test -cover ./...Use Nix flakes to set up the build environment:
nix developCheck the build arguments in build.ps1:
# Build static binary with version information
$env:CGO_ENABLED = "1"
$env:GOOS = "linux"
$env:GOARCH = "amd64"Then run:
./build.ps1Or build manually:
go build -o gogeo .This project is licensed under the MIT License - see the LICENSE file for details.