Real-time multi-chain ERC-20 token indexer with spam detection. Polls blocks from Ethereum (or any EVM chain), streams events through Kafka, discovers tokens, scores them for spam, and serves the indexed data through a REST API.
Built as a production-style backend system: event-driven architecture, idempotent writes, reorg-safe block tracking, and a clean separation between ingestion and serving layers.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Indexer Process β
β β
Ethereum RPC β ββββββββββββββββ ββββββββββββββββββββββββ β
(Alchemy) β β Block Poller ββββββββΆβ Kafka Producer β β
β² β β (ethers.js) β β topic: block-eventsβ β
β β ββββββββββββββββ ββββββββββββ¬ββββββββββββ β
β β β β
β β βββββββββΌββββββββ β
β β β Kafka β β
β β β Broker β β
β β βββββββββ¬ββββββββ β
β β β β
β β ββββββββββββββββββββ βββββββββΌββββββββ β
β β β Transfer Scanner ββββββββBlock Processorβ β
β β β (eth_getLogs) β β (Consumer) β β
βββββββββββββββββββββββ€ β βββββββββ¬ββββββββ β
β ββββββββββ¬ββββββββββ β β
β β β β
β ββββββββββΌββββββββββ β β
β β Token Discovery β β β
β β (BullMQ Workers) β β β
β ββββββββββ¬ββββββββββ β β
β β β β
β ββββββββββΌββββββββββ β β
β β Spam Scorer β β β
β ββββββββββ¬ββββββββββ β β
β β β β
ββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββ
β β
βββββββββΌβββββββββββββββββββββββββΌββββββββ
β PostgreSQL β
β blocks β tokens β transfers β
βββββββββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β API Process β
β β β
β βββββββββββΌβββββββββββ β
β β Express REST API β β
β ββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Why this architecture?
- Kafka decouples block ingestion from processing β gives backpressure handling, replay capability, and independent scaling of producers and consumers.
- BullMQ handles token discovery as one-time jobs with retry and exponential backoff, separate from the hot path of block processing.
- Separate processes for indexing and serving means the API never competes with the indexer for resources.
| Layer | Technology |
|---|---|
| Language | TypeScript, Node.js |
| Blockchain | ethers.js v6 (JSON-RPC via Alchemy) |
| Streaming | Apache Kafka (KafkaJS) |
| Job Queue | BullMQ (backed by Redis) |
| Database | PostgreSQL 16 |
| Cache | Redis 7 |
| API | Express 5 |
| Validation | Zod |
| Logging | Winston |
| API Docs | swagger-jsdoc + swagger-ui-express |
| Migrations | node-pg-migrate |
| Infrastructure | Docker Compose |
token-tracker/
βββ docker-compose.yml
βββ packages/
β βββ shared/ @token-tracker/shared
β β βββ src/
β β β βββ config.ts Environment & config loader
β β β βββ db.ts PostgreSQL connection pool
β β β βββ kafka.ts Kafka client
β β β βββ redis.ts Redis client
β β β βββ types.ts Shared interfaces
β β β βββ index.ts Barrel export
β β βββ migrations/ SQL migrations (node-pg-migrate)
β βββ indexer/ @token-tracker/indexer
β β βββ src/
β β βββ index.ts Entry point
β β βββ block-poller.ts Polls chain for new blocks
β β βββ kafka-producer.ts Publishes block events to Kafka
β β βββ block-processor.ts Consumes events, writes to DB
β β βββ provider.ts ethers.js JSON-RPC provider
β βββ api/ @token-tracker/api
β βββ src/
β βββ index.ts Entry point
β βββ app.ts Express app setup (cors, helmet, routes)
β βββ swagger.ts OpenAPI spec generation (swagger-jsdoc)
β βββ controllers/ Request handlers (tokens.ts)
β βββ middleware/ Zod validation, error handler
β βββ repositories/ Database queries (tokens.ts)
β βββ routes/ Route definitions with Swagger annotations (tokens.ts)
β βββ schema/ Zod schemas (tokens.ts)
β βββ utils/ Response helpers, error classes
Monorepo managed with npm workspaces. The shared package is consumed by both the indexer and API via @token-tracker/shared.
- Node.js >= 18
- Docker and Docker Compose
- An Alchemy RPC (free tier works)
# Clone the repository
git clone https://github.com/your-username/token-tracker.git
cd token-tracker
# Copy environment variables
cp .env.example .env
# Edit .env and add the ALCHEMY_RPC for your chain
# Start infrastructure (Postgres, Redis, Kafka, Zookeeper)
docker compose up -d
# Install dependencies
npm install
# Run database migrations
cd packages/shared && npm run migrate:up && cd ../..# Start the indexer
cd packages/indexer && npx ts-node src/index.ts
# Start the API (separate terminal)
cd packages/api && npx ts-node src/index.tsThe indexer will connect to the database, Kafka, and start polling blocks from the configured START_BLOCK. You should see output like:
Starting Token Tracker Indexer...
Database connection successful
Kafka producer connected
Block processor connected to Kafka and subscribed to block-events topic
Starting block polling from block number: 24789700
Published block 24789701 to Kafka
Stored block 24789701 in database
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql://tokenscope:tokenscope@localhost:5555/tokenscope |
REDIS_URL |
Redis connection string | redis://localhost:6666 |
KAFKA_BROKERS |
Comma-separated broker list | localhost:9092 |
ALCHEMY_RPC |
Alchemy RPC | β |
CHAIN_ID |
EVM chain ID to index | 1 (Ethereum mainnet) |
START_BLOCK |
Block number to start indexing | 0 |
PORT |
API server port | 4000 |
NODE_ENV |
Environment | development |
Three tables, all scoped by chain_id for multi-chain support.
Tracks every indexed block. Blocks start as provisional and are promoted to confirmed after 64 confirmations (reorg safety window).
| Column | Type | Notes |
|---|---|---|
block_number |
BIGINT |
PK (composite) |
chain_id |
INT |
PK (composite) |
block_hash |
CHAR(66) |
0x-prefixed, 32-byte hash |
parent_hash |
CHAR(66) |
Used for reorg detection |
status |
VARCHAR(20) |
provisional or confirmed |
indexed_at |
TIMESTAMPTZ |
When the block was indexed |
Discovered ERC-20 contracts. Each token goes through a lifecycle: pending β active or rejected based on spam scoring.
| Column | Type | Notes |
|---|---|---|
contract_address |
CHAR(42) |
PK (composite) |
chain_id |
INT |
PK (composite) |
name |
VARCHAR(255) |
From on-chain name() call |
symbol |
VARCHAR(50) |
From on-chain symbol() call |
decimals |
INT |
From on-chain decimals() call |
spam_score |
INT |
0β100, higher = more likely spam |
status |
VARCHAR(20) |
pending, active, or rejected |
discovered_at_block |
BIGINT |
First block where token was seen |
created_at |
TIMESTAMPTZ |
|
updated_at |
TIMESTAMPTZ |
Every ERC-20 Transfer event log. Uniquely identified by the combination of chain, transaction, and log index.
| Column | Type | Notes |
|---|---|---|
id |
UUID |
PK, auto-generated |
chain_id |
INT |
Part of unique constraint |
token_address |
CHAR(42) |
Contract that emitted the event |
from_address |
CHAR(42) |
|
to_address |
CHAR(42) |
|
value |
NUMERIC |
Raw token value (arbitrary precision) |
tx_hash |
CHAR(66) |
Part of unique constraint |
block_number |
BIGINT |
|
log_index |
INT |
Part of unique constraint |
created_at |
TIMESTAMPTZ |
Indexes: (chain_id, token_address), block_number, from_address, to_address
Unique constraint: (chain_id, tx_hash, log_index) β prevents duplicate event ingestion
Base URL: http://localhost:4000
Interactive Swagger docs: http://localhost:4000/api/docs
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/tokens |
All tokens (paginated, cached) |
GET |
/api/tokens/:chainId |
Tokens by chain (paginated, cached) |
GET |
/api/tokens/:chainId/:address |
Single token (cached) |
GET |
/api/tokens/:chainId/:address/transfers |
Token transfers (paginated) |
GET |
/api/health-check |
Basic health check |
GET |
/api/health/ready |
Readiness check (DB + Redis) |
Pagination query params (available on paginated endpoints):
| Param | Default | Description |
|---|---|---|
page |
1 |
Page number (min: 1) |
limit |
20 |
Items per page (min: 1, max: 100) |
sort |
desc |
Sort order (asc or desc) |
Example response:
{
"data": [...],
"message": "Tokens fetched successfully",
"error": null,
"pagination": {
"page": 1,
"limit": 20,
"total": 95,
"totalPages": 5,
"hasNextPage": true
}
}Supported chain IDs: 1 (Ethereum), 137 (Polygon). Invalid chain IDs return a validation error.
GET endpoints are cached in Redis with a 60-second TTL. Subsequent requests within that window are served from cache.
- Idempotent writes everywhere.
ON CONFLICT DO NOTHINGon blocks, unique constraints on transfers. Safe to replay Kafka topics or reprocess blocks without data corruption. - Provisional block tracking. New blocks are stored as
provisionaland only confirmed after 64 blocks. If a reorg is detected viaparent_hashmismatch, affected blocks and their transfers can be rolled back. - Kafka over direct processing. The block poller doesn't write to the database. It publishes to Kafka, and a separate consumer handles persistence. This means we can independently scale consumers, replay events from any offset, and add new downstream processors without touching the poller.
- BullMQ for token discovery. Token metadata resolution (
name(),symbol(),decimals()) involves RPC calls that can fail or rate-limit. BullMQ gives us per-job retries with exponential backoff, concurrency control, and dead-letter handling β without complicating the main indexing pipeline. - No ORM. Raw parameterized SQL via
pg. Full control over queries, no magic, no abstraction leaks. Every query is visible and auditable. - Multi-chain from day one.
chain_idis baked into every table, every composite key, and every query. Adding a new chain is a config change, not a schema migration.
- Monorepo structure with npm workspaces
- Docker Compose infrastructure (Postgres, Redis, Kafka)
- Database schema and migrations
- Block poller with adaptive polling
- Kafka producer/consumer pipeline
- Block processor with idempotent writes
- Transfer scanner (
eth_getLogsintegration) - Token discovery workers (BullMQ)
- Spam detection scoring
- REST API with Zod validation and pagination
- Redis caching on API endpoints (60s TTL)
- Swagger/OpenAPI documentation
- Winston structured logging
- Reorg detection and rollback