TableScanner is a production-grade microservice for querying tabular data from KBase SQLite databases. It provides a comprehensive DataTables Viewer-compatible API with advanced query capabilities, type-aware filtering, and performance optimizations.
- Data Access: Query SQLite databases from KBase objects and handles
- Multi-Database Support: Access objects containing multiple pangenomes (v2.1)
- Local Uploads: Upload local SQLite files (
.db,.sqlite) for temporary access - User-Driven Auth: Secure access where each user provides their own KBase token
- Type-Aware Filtering: Automatic numeric conversion for proper filtering results
- Advanced Operators: Support for
eq,ne,gt,gte,lt,lte,like,ilike,in,not_in,between,is_null,is_not_null - Aggregations:
GROUP BYsupport withcount,sum,avg,min,max,stddev,variance,distinct_count - Table Statistics: Rich column statistics including null counts, distinct counts, min/max/mean, and sample values
- Full-Text Search: FTS5 support with automatic virtual table creation
- Automatic Operations: Lifecycle management for connection pooling, query caching, and automatic disk cleanup
docker compose up --build -dThe service will be available at http://localhost:8000. API documentation is available at /docs.
cp .env.example .env
# Edit .env and set local development parameters
./scripts/dev.sh| Environment | URL |
|---|---|
| AppDev | https://appdev.kbase.us/services/berdl_table_scanner |
| Production | https://kbase.us/services/berdl_table_scanner |
| Local | http://localhost:8000 |
Each user must provide their own KBase authentication token. The service prioritizes user-provided tokens over shared service tokens.
- Header (Recommended):
Authorization: Bearer <token> - Cookie:
kbase_session=<token>(Used by DataTables Viewer) - Legacy Fallback:
KB_SERVICE_AUTH_TOKENin.envis for local testing only
Upload a SQLite file to receive a temporary handle.
curl -X POST "https://appdev.kbase.us/services/berdl_table_scanner/upload" \
-F "file=@/path/to/my_data.db"
# Returns: {"handle": "local:sha256hash", ...}Works with KBase UPA or the local handle returned above.
curl -H "Authorization: Bearer $KB_TOKEN" \
"https://appdev.kbase.us/services/berdl_table_scanner/object/76990/7/2/tables"Retrieve detailed column metrics and sample values.
curl -H "Authorization: Bearer $KB_TOKEN" \
"https://appdev.kbase.us/services/berdl_table_scanner/object/76990/7/2/tables/Genes/stats"Comprehensive filtering and pagination.
curl -X POST -H "Authorization: Bearer $KB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"berdl_table_id": "76990/7/2",
"table_name": "Genes",
"limit": 100,
"filters": [
{"column": "gene_length", "operator": "gt", "value": 1000}
]
}' \
"https://appdev.kbase.us/services/berdl_table_scanner/table-data"For objects containing multiple pangenomes/databases:
# List all databases in an object
curl -H "Authorization: Bearer $KB_TOKEN" \
"https://appdev.kbase.us/services/berdl_table_scanner/object/76990/7/2/databases"
# List tables in a specific database
curl -H "Authorization: Bearer $KB_TOKEN" \
"https://appdev.kbase.us/services/berdl_table_scanner/object/76990/7/2/db/pg_ecoli_k12/tables"
# Query data from a specific database
curl -H "Authorization: Bearer $KB_TOKEN" \
"https://appdev.kbase.us/services/berdl_table_scanner/object/76990/7/2/db/pg_ecoli_k12/tables/Genes/data?limit=100"- Gzip Compression: Compresses large responses (>1KB) to reduce bandwidth usage
- High-Performance JSON: Uses
orjsonfor fast JSON serialization - Parallel Metadata Fetching: Retrieves table metadata concurrently for fast listing
- Metadata Caching: Caches object types locally to minimize KBase API calls
- Connection Pooling: Reuses database connections for up to 10 minutes of inactivity
- Automatic Cleanup: Expired caches are purged on startup. Uploaded databases automatically expire after 1 hour
- Query Caching: 5-minute TTL, max 1000 entries per instance
- Atomic Renaming: Ensures file integrity during downloads and uploads
- Upload Deduplication: SHA-256 hashing prevents duplicate file storage
- API Reference - Complete API documentation with examples
- Architecture Dictionary - System design and technical overview
- Deployment Readiness - Checklist for production deployment
- Contributing Guide - Setup, testing, and contribution guidelines
# Set PYTHONPATH and run all tests
PYTHONPATH=. pytest
# Run integration tests for local upload
PYTHONPATH=. pytest tests/integration/test_local_upload.pyTableScanner/
├── app/
│ ├── main.py # FastAPI application & Lifecycle handlers
│ ├── routes.py # API endpoints & Auth logic
│ ├── models.py # Pydantic (V2) models
│ ├── config.py # Configuration (BaseSettings)
│ ├── services/
│ │ ├── data/ # Query & Connection pooling logic
│ │ └── db_helper.py # Secure handle resolution
│ └── utils/ # SQLite, KBase Client, and Cache utilities
├── docs/ # API and Architectural documentation
├── tests/ # Unit & Integration tests
├── scripts/ # Development helper scripts
└── static/ # Static assets for the viewer
MIT License