DataArchitecture

What is Solana Indexing?

Why indexing is essential for production Solana applications and how it works.

Updated March 202514 min read

The Problem with Raw Blockchain Data

The Solana blockchain is fundamentally a sequential ledger — an append-only log of transactions organized into blocks. This structure is optimal for consensus and immutability, but it is deeply inefficient for the queries that applications actually need to answer. Consider a simple question: "What is the total trading volume for this wallet over the past 30 days?" Answering this from raw blockchain data requires fetching potentially thousands of transactions, parsing each one, filtering for relevant instructions, and aggregating the results. At Solana's throughput of 2,000+ TPS, this becomes computationally prohibitive.

Indexing is the process of extracting data from the blockchain, transforming it into a structured format, and storing it in a database optimized for the queries your application needs. An indexer continuously monitors the blockchain, processes new blocks, and updates its database in real time. The result is a queryable data layer that can answer complex questions in milliseconds rather than minutes.

What Applications Need Indexing

Almost every non-trivial Solana application benefits from indexing. The question is not whether to index, but how much and what kind:

  • Wallets and portfolio trackers — transaction history, token balances over time, PnL calculation
  • DEX analytics — trading volume, liquidity depth, price history, fee revenue
  • NFT marketplaces — ownership history, sale prices, collection statistics
  • Trading bots — real-time price feeds, order book state, recent trades
  • DeFi protocols — position tracking, liquidation monitoring, yield calculation
  • Analytics dashboards — on-chain metrics, user activity, protocol health

Types of Indexing

1. Account-Based Indexing

Account-based indexing tracks the state of specific accounts over time. Every time an account's data changes, the indexer records the new state with a timestamp. This is useful for tracking token balances, NFT ownership, and program state. The primary data source is the Geyser plugin's account update stream.

2. Transaction-Based Indexing

Transaction-based indexing processes every transaction that interacts with specific programs or accounts. The indexer parses instruction data, extracts relevant fields (swap amounts, token mints, counterparties), and stores structured records. This is the foundation for DEX analytics, trading history, and protocol-level metrics.

3. Event-Based Indexing

Event-based indexing focuses on specific on-chain events emitted by programs — typically through log messages. Anchor programs emit structured events that indexers can parse and store. This approach is more efficient than full transaction indexing when you only care about specific program events.

4. Block-Based Indexing

Block-based indexing processes entire blocks sequentially, ensuring no transaction is missed. This is the most comprehensive approach and is required for applications that need complete historical coverage. It is also the most resource-intensive, as it requires processing every transaction on the network.

Indexing Architecture

A production indexing pipeline typically consists of three components: a data source (RPC node, gRPC stream, or Geyser plugin), a processor (parsing and transformation logic), and a storage layer (database optimized for your query patterns).

ComponentOptionsTrade-offs
Data SourceRPC polling, WebSocket, gRPC/GeyserLatency vs. reliability vs. completeness
ProcessorCustom code, Substreams, Anchor eventsFlexibility vs. development time
StoragePostgreSQL, ClickHouse, Redis, MongoDBQuery flexibility vs. performance vs. cost

Backfilling Historical Data

A new indexer starts from the current block and has no historical data. For applications requiring complete history, backfilling is necessary. Solana's full transaction history from genesis is approximately 200TB+ of raw data. Practical backfilling strategies include using archival RPC nodes to fetch historical blocks, using pre-built historical datasets from providers like Supanode, or limiting history to a specific time window relevant to your application.

⚠️
Backfilling Solana's complete transaction history from RPC alone can take weeks and is rate-limited by even premium providers. For applications requiring complete historical data, consider using a managed indexer service or pre-built historical datasets.

Real-Time vs. Historical Indexing

Most production applications need both real-time and historical data. Real-time indexing uses gRPC/Geyser streaming to process new transactions as they occur, typically achieving sub-second latency from transaction confirmation to database availability. Historical indexing processes past data through batch jobs, often using parallelized RPC requests to maximize throughput.

The key challenge is ensuring consistency between real-time and historical data — particularly around forks and reorgs. Solana's finality model means that processed blocks can be rolled back. A robust indexer tracks slot status and handles reorgs by reverting uncommitted data.