Rasteret integrates with TorchGeo to provide a seamless path from STAC search to PyTorch DataLoader. This guide shows you how to build Collections, assign train/val/test splits, and create TorchGeo datasets.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Assigning Train/Val/Test Splits
For reproducible ML workflows, assign splits to your Collection using PyArrow:/home/daytona/workspace/source/examples/ml_training_with_splits.py:1 for a complete example.
Creating TorchGeo Datasets
Basic Image Dataset
Multi-Spectral Dataset
allow_resample=True. Otherwise, Rasteret will raise an error if you request mixed-resolution bands.
Mask Dataset (Single-Band)
TorchGeo Samplers
Random Spatial Sampling
Grid Sampling
Pre-Defined Geometries
For labeled datasets, you can filter the Collection to specific geometries:Training Loop Example
Time-Series Models
For temporal models (e.g. LSTM, Transformer), enable time-series mode:Advanced: Custom Labels
Add a label column to your Collection using PyArrow, then passlabel_field to to_torchgeo_dataset():
Data Augmentation
Use TorchGeo transforms or standardtorchvision augmentations:
Handling Authentication
For datasets requiring credentials (Planetary Computer, NASA Earthdata), create a backend:Best Practices
Dataset Size
- Small AOIs (< 1000 km²): Build a single Collection
- Large AOIs (> 10,000 km²): Consider spatial or temporal partitioning
- Global training: Use pre-built GeoParquet indexes (e.g. Source Cooperative)
Caching
Multi-Resolution Bands
Sentinel-2 bands have different resolutions:- 10m: B02, B03, B04, B08
- 20m: B05, B06, B07, B8A, B11, B12
- 60m: B01, B09
allow_resample=True and choose a target_crs (e.g. UTM zone) for consistent grids:
Dtype Handling
Rasteret returns native COG dtypes:- Sentinel-2:
uint16 - Landsat:
uint16 - NAIP:
uint8
uint16 → int32 and uint32 → int64 for PyTorch compatibility. Normalize in your model:
Troubleshooting
Empty Samples
If you’re getting all-zero or NaN samples:- Check that your AOI overlaps the Collection bounds:
collection.bounds - Verify scenes exist:
len(collection.subset(split="train")) - Inspect a sample manually:
Mixed CRS Errors
If scenes have different CRS (rare), settarget_crs to reproject at read time:
Slow Data Loading
If training is I/O bound:- Increase
max_concurrent(COG fetch concurrency): - Use
num_workersin DataLoader (parallelism across batches) - Consider prefetching scenes to local disk (outside Rasteret scope)
Next Steps
- Data Analysis - Explore data with xarray before training
- Filtering & Queries - Refine training data
- Custom Datasets - Register your own labeled datasets