Перейти до основного вмісту

How to Backfill a Bitcoin Indexer: The Full Chain Without Running a Node

· 3 хв. читання
OverBlock Team
OverBlock Engineering

Every time you deploy a new Bitcoin indexer - or migrate to a new server - you face the same problem: you need to process 880,000+ blocks from genesis to the current tip. How you do it determines whether this takes 6 hours or 6 days.

The three approaches

1. JSON-RPC polling (most common starting point)

You call getblock on a public or paid RPC endpoint in a loop. The problems:

  • Throughput ceiling: even with parallelism, you're limited by round-trip latency and rate limits
  • Cost: QuickNode charges approximately $800-1,100 for streaming 880k blocks via their Streams product; per-block RPC costs add up faster
  • Rate limits: public endpoints throttle aggressively; paid tiers have request quotas
  • Ordering: if you parallelize, you handle sequencing yourself

For a small number of blocks this is fine. For a full backfill from genesis, it's slow and expensive.

2. Self-hosted Bitcoin Core node

You provision a server, install Bitcoin Core, and wait for initial block download (IBD).

The reality:

  • IBD takes 2-4 days on fast hardware; longer on commodity hardware
  • Storage: Bitcoin's blockchain is ~650 GB and growing ~60 GB/year
  • Network: you need decent upstream bandwidth to sync efficiently
  • Cost: a dedicated server with enough storage runs $50-200/month
  • Ongoing: you maintain node upgrades, monitor disk usage, handle crashes

This makes sense if you're already running nodes for other reasons. For one-time backfill, you're paying for infrastructure you'll tear down afterward.

3. Managed gRPC streaming

A service pre-indexes and stores all blocks, exposing them via a high-throughput gRPC stream. You connect, specify a start height, and receive ordered blocks at rates JSON-RPC can't approach.

No node. No storage management. Pay for what you use.

What confirmation depth means

One consideration with streaming: reorg safety. Bitcoin can reorganize - a competing chain wins and some recent blocks are invalidated. The probability drops exponentially with block depth.

A streaming service that waits for ~6 confirmations before delivering blocks provides reorg safety without requiring you to implement rollback logic. You receive only finalized blocks.

This 6-block delay (~60 minutes) is irrelevant for historical backfill (you're not racing) and acceptable for most near-real-time use cases.

Checkpointing: don't skip this

Whatever approach you use, persist your last successfully processed block height before committing each block to your database. If the process crashes at block 750,000, you resume from there - not genesis.

# Pseudocode - persist checkpoint before commit
def process_block(block):
data = parse_block(block)
with db.transaction():
db.upsert(data)
db.set_checkpoint(block.height) # inside the same transaction

Without checkpointing, a crash means reprocessing everything from the last reliable point.

Rough time estimates

ApproachTime to backfill 880k blocks
JSON-RPC polling (single-threaded)4-7 days
JSON-RPC polling (parallelized)12-48 hours
Managed gRPC streaming2-8 hours (depends on format and processing)

The managed streaming number assumes your processing logic (parsing, storage writes) keeps up with ingestion speed. For pure ingestion without processing, it's faster.

Practical integration

stream-app provides managed Bitcoin and Ethereum block streaming via gRPC. Activate in the dashboard, get your token, set your start height, receive blocks.

For the mechanics of the gRPC protocol and code examples in Go, Python, and Node.js, see the stream-app documentation.