Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt

Use this file to discover all available pages before exploring further.

Welcome to Rasteret

Rasteret is an index-first access layer for cloud-native GeoTIFF collections, built to eliminate cold start overhead in ML and analysis workflows.
Every cold start re-parses satellite image metadata over HTTP—per scene, per band. Rasteret parses those headers once, caches them in Parquet, and delivers up to 20x faster reads on cold starts.

The Problem

Sentinel-2, Landsat, NAIP, and other satellite imagery require metadata parsing before each pixel read. Your colleague did it last Tuesday, CI did it overnight, PyTorch respawns DataLoader workers every epoch. A single project repeats millions of redundant requests before a pixel moves.

The Solution

Rasteret calls this pattern index-first geospatial retrieval:

Control Plane

A queryable Parquet index storing scene metadata, COG header metadata, and user columns like splits/labels

Data Plane

On-demand tile reads from original GeoTIFF/COG objects with no GDAL in the critical path
This keeps metadata and experiment logic in tables while leaving imagery bytes in source COGs.

Key Features

Easy

Three lines from STAC search or Parquet file to a TorchGeo-compatible dataset

20x Faster

Custom IO gets chunks fast with zero overhead once Collection is built

Zero Downloads

Work with terabytes of imagery while storing only megabytes of metadata

No STAC at Training

Query once at setup, zero API calls during training

Reproducible

Same Parquet index = same records = same results

Native dtypes

uint16 stays uint16 in tensors—no unnecessary type promotion

Architecture Overview

Rasteret integrates with your existing workflow as an opt-in accelerator: The Collection is a standard TorchGeo GeoDataset. Your samplers, DataLoader, xarray workflows, and analysis tools stay the same—Rasteret handles the async tile I/O underneath.

What’s Included

Built-in Dataset Catalog

Rasteret ships with a growing catalog of datasets—each entry includes license metadata and a commercial_use flag:
  • Sentinel-2 Level-2A (global, free)
  • Landsat Collection 2 Level-2 (global, free)
  • NAIP (North America, free)
  • Copernicus DEM (global, 30m/90m)
  • ESRI Land Use/Land Cover (global, CC-BY-4.0)
  • AlphaEarth Foundation Embeddings (global, CC-BY-4.0)
  • And more…
List all available datasets:
rasteret datasets list

Multiple Access Patterns

collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="training_data",
    bbox=(77.5, 12.9, 77.7, 13.1),
    date_range=("2024-01-01", "2024-06-30"),
)

Performance

Cold-Start Comparison

Same AOIs, same scenes, same sampler, same DataLoader:
Scenariorasterio/GDALRasteretSpeedup
Single AOI, 15 scenes9.08 s1.14 s8x
Multi-AOI, 30 scenes42.05 s2.25 s19x
Cross-CRS boundary, 12 scenes12.47 s0.59 s21x
Measured on AWS t3.xlarge (4 CPU) in us-west-2. The difference comes from how headers are accessed: rasterio/GDAL re-parses IFDs over HTTP on each cold start, while Rasteret reads them from a local Parquet cache.

Who Should Use Rasteret?

Rasteret is optimized for: ML training pipelines with PyTorch/TorchGeo
Analysis workflows requiring repeated reads of the same imagery
Remote, tiled GeoTIFFs (Cloud-Optimized GeoTIFFs)
Large-scale experiments where cold starts dominate runtime
Reproducible research requiring version-controlled data splits
Rasteret works with local tiled GeoTIFFs for indexing, filtering, and sharing collections. For non-tiled TIFFs and non-TIFF formats, TorchGeo or rasterio remain the better choice.

Next Steps

Installation

Install Rasteret with pip or uv

Quickstart

Build your first Collection in 5 minutes