Documentation Index
Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt
Use this file to discover all available pages before exploring further.
Welcome to Rasteret
Rasteret is an index-first access layer for cloud-native GeoTIFF collections, built to eliminate cold start overhead in ML and analysis workflows.Every cold start re-parses satellite image metadata over HTTP—per scene, per band. Rasteret parses those headers once, caches them in Parquet, and delivers up to 20x faster reads on cold starts.
The Problem
Sentinel-2, Landsat, NAIP, and other satellite imagery require metadata parsing before each pixel read. Your colleague did it last Tuesday, CI did it overnight, PyTorch respawns DataLoader workers every epoch. A single project repeats millions of redundant requests before a pixel moves.The Solution
Rasteret calls this pattern index-first geospatial retrieval:Control Plane
A queryable Parquet index storing scene metadata, COG header metadata, and user columns like splits/labels
Data Plane
On-demand tile reads from original GeoTIFF/COG objects with no GDAL in the critical path
Key Features
Easy
Three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
20x Faster
Custom IO gets chunks fast with zero overhead once Collection is built
Zero Downloads
Work with terabytes of imagery while storing only megabytes of metadata
No STAC at Training
Query once at setup, zero API calls during training
Reproducible
Same Parquet index = same records = same results
Native dtypes
uint16 stays uint16 in tensors—no unnecessary type promotion
Architecture Overview
Rasteret integrates with your existing workflow as an opt-in accelerator: The Collection is a standard TorchGeoGeoDataset. Your samplers, DataLoader, xarray workflows, and analysis tools stay the same—Rasteret handles the async tile I/O underneath.
What’s Included
Built-in Dataset Catalog
Rasteret ships with a growing catalog of datasets—each entry includes license metadata and acommercial_use flag:
- Sentinel-2 Level-2A (global, free)
- Landsat Collection 2 Level-2 (global, free)
- NAIP (North America, free)
- Copernicus DEM (global, 30m/90m)
- ESRI Land Use/Land Cover (global, CC-BY-4.0)
- AlphaEarth Foundation Embeddings (global, CC-BY-4.0)
- And more…
Multiple Access Patterns
- Built-in Catalog
- Any STAC API
- GeoParquet/Parquet
Performance
Cold-Start Comparison
Same AOIs, same scenes, same sampler, same DataLoader:| Scenario | rasterio/GDAL | Rasteret | Speedup |
|---|---|---|---|
| Single AOI, 15 scenes | 9.08 s | 1.14 s | 8x |
| Multi-AOI, 30 scenes | 42.05 s | 2.25 s | 19x |
| Cross-CRS boundary, 12 scenes | 12.47 s | 0.59 s | 21x |
Measured on AWS t3.xlarge (4 CPU) in us-west-2. The difference comes from how headers are accessed: rasterio/GDAL re-parses IFDs over HTTP on each cold start, while Rasteret reads them from a local Parquet cache.
Who Should Use Rasteret?
Rasteret is optimized for: ✅ ML training pipelines with PyTorch/TorchGeo✅ Analysis workflows requiring repeated reads of the same imagery
✅ Remote, tiled GeoTIFFs (Cloud-Optimized GeoTIFFs)
✅ Large-scale experiments where cold starts dominate runtime
✅ Reproducible research requiring version-controlled data splits Rasteret works with local tiled GeoTIFFs for indexing, filtering, and sharing collections. For non-tiled TIFFs and non-TIFF formats, TorchGeo or rasterio remain the better choice.
Next Steps
Installation
Install Rasteret with pip or uv
Quickstart
Build your first Collection in 5 minutes