Documentation Index Fetch the complete documentation index at: https://mintlify.com/terrafloww/rasteret/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Dataset Catalog is Rasteret’s registry of pre-configured datasets. Each entry is a DatasetDescriptor that captures:
Identity (id, name, description)
Access method (STAC API or GeoParquet URI)
Band mapping (band codes → STAC asset keys)
Coverage metadata (spatial, temporal, licensing)
You pick a dataset ID, pass it to rasteret.build(), and Rasteret handles the rest.
List Available Datasets
CLI
$ rasteret datasets list
ID Name Coverage License Auth
aef/v1-annual AlphaEarth Foundation Embeddings (Annual) global CC-BY-4.0 none
earthsearch/sentinel-2-l2a Sentinel-2 Level-2A global proprietary ( free ) none
earthsearch/landsat-c2-l2 Landsat Collection 2 Level-2 global proprietary ( free ) required
earthsearch/naip NAIP north-america proprietary ( free ) required
earthsearch/cop-dem-glo-30 Copernicus DEM 30m global proprietary ( free ) none
earthsearch/cop-dem-glo-90 Copernicus DEM 90m global proprietary ( free ) none
pc/sentinel-2-l2a Sentinel-2 Level-2A (Planetary Computer ) global proprietary( free ) required
pc/io-lulc-annual-v02 ESRI 10m Land Use/Land Cover global CC-BY-4.0 required
pc/alos-dem ALOS World 3D 30m DEM global proprietary ( free ) required
pc/nasadem NASADEM global proprietary ( free ) required
pc/esa-worldcover ESA WorldCover global CC-BY-4.0 required
pc/usda-cdl USDA Cropland Data Layer conus proprietary ( free ) required
Python
import rasteret
# List all datasets
for desc in rasteret.DatasetRegistry.list():
print ( f " { desc.id :30} { desc.name } " )
# Search by keyword
sentinel = rasteret.DatasetRegistry.search( "sentinel" )
for desc in sentinel:
print (desc.id, desc.description)
# Get specific descriptor
desc = rasteret.DatasetRegistry.get( "earthsearch/sentinel-2-l2a" )
print (desc.band_map) # {'B01': 'coastal', 'B02': 'blue', ...}
See src/rasteret/catalog.py:188-266 for the DatasetRegistry implementation.
Built-In Datasets
Sentinel-2 Level-2A
Free, no auth (via Element84 Earth Search):
collection = rasteret.build(
"earthsearch/sentinel-2-l2a" ,
name = "s2_training" ,
bbox = ( 77.5 , 12.9 , 77.7 , 13.1 ),
date_range = ( "2024-01-01" , "2024-06-30" ),
)
Bands : 13 (B01-B12, SCL)
Resolution : 10m (visible/NIR), 20m (red edge/SWIR), 60m (coastal/cirrus)
Coverage : Global, 2015-present
License : Proprietary (free for all uses)
Descriptor at src/rasteret/catalog.py:412-443.
Landsat Collection 2 Level-2
Requester-pays (AWS charges apply):
collection = rasteret.build(
"earthsearch/landsat-c2-l2" ,
name = "landsat_training" ,
bbox = ( 77.5 , 12.9 , 77.7 , 13.1 ),
date_range = ( "2024-01-01" , "2024-06-30" ),
)
Bands : 10 (B1-B7, QA bands)
Resolution : 30m (optical), 100m (thermal)
Coverage : Global, 1982-present
License : Proprietary (free for all uses)
Auth : AWS credentials required (aws configure or environment variables)
Descriptor at src/rasteret/catalog.py:445-482.
Requester-pays buckets charge you for S3 requests. Configure AWS credentials via aws configure or set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.
NAIP (Aerial Imagery)
Requester-pays (1m resolution, CONUS only):
collection = rasteret.build(
"earthsearch/naip" ,
name = "naip_california" ,
bbox = ( - 122.5 , 37.7 , - 122.3 , 37.9 ),
date_range = ( "2020-01-01" , "2022-12-31" ),
)
Bands : 4 (RGB + NIR, single multi-band COG)
Resolution : 1m
Coverage : Continental US, 2010-2023
License : Proprietary (free, USDA policy)
Descriptor at src/rasteret/catalog.py:484-509.
AlphaEarth Foundation Embeddings
Free, no auth (via Source Cooperative):
collection = rasteret.build(
"aef/v1-annual" ,
name = "aef_embeddings" ,
bbox = ( 11.3 , - 0.002 , 11.5 , 0.001 ),
date_range = ( "2023-01-01" , "2023-12-31" ),
)
Bands : 64 (int8 foundation model embeddings)
Resolution : 10m
Coverage : Global, 2018-2023
License : CC-BY-4.0
Source : GeoParquet index (no STAC API)
Descriptor at src/rasteret/catalog.py:686-720.
Planetary Computer Datasets
All require authentication (free but needs sign-up):
# Install Azure extra
# uv pip install "rasteret[azure]"
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
backend = rasteret.create_backend(
credential_provider = PlanetaryComputerCredentialProvider(
"https://planetarycomputer.microsoft.com/api/stac/v1"
)
)
collection = rasteret.build(
"pc/sentinel-2-l2a" ,
name = "pc_s2" ,
bbox = ( 77.5 , 12.9 , 77.7 , 13.1 ),
date_range = ( "2024-01-01" , "2024-06-30" ),
backend = backend,
)
Available datasets:
pc/sentinel-2-l2a — Sentinel-2 (Azure mirror)
pc/io-lulc-annual-v02 — ESRI 10m Land Cover
pc/alos-dem — ALOS World 3D 30m DEM
pc/nasadem — NASADEM 30m
pc/esa-worldcover — ESA WorldCover 10m
pc/usda-cdl — USDA Cropland Data Layer (CONUS)
See src/rasteret/catalog.py:551-682 for descriptors.
DatasetDescriptor Structure
Each entry is a DatasetDescriptor dataclass:
from rasteret.catalog import DatasetDescriptor
desc = DatasetDescriptor(
id = "earthsearch/sentinel-2-l2a" ,
name = "Sentinel-2 Level-2A" ,
description = "Multi-spectral optical imagery, 10-60m, global" ,
# Access method (STAC API)
stac_api = "https://earth-search.aws.element84.com/v1" ,
stac_collection = "sentinel-2-l2a" ,
# Band mapping: {band_code: STAC asset key}
band_map = {
"B01" : "coastal" ,
"B02" : "blue" ,
"B03" : "green" ,
"B04" : "red" ,
"B08" : "nir" ,
"B12" : "swir22" ,
"SCL" : "scl" ,
},
separate_files = True , # Each band is a separate COG
# Coverage metadata
spatial_coverage = "global" ,
temporal_range = ( "2015-06-23" , "present" ),
# Licensing
license = "proprietary" ,
license_url = "https://sentinel.esa.int/documents/..." ,
commercial_use = True ,
# Authentication
requires_auth = False ,
)
See src/rasteret/catalog.py:50-185 for the full dataclass definition.
Add Your Own Datasets
Runtime Registration
Register custom datasets in your code:
from rasteret.catalog import DatasetDescriptor
import rasteret
rasteret.register(DatasetDescriptor(
id = "acme/field-survey-2024" ,
name = "ACME Field Survey" ,
stac_api = "https://acme.example.com/stac/v1" ,
stac_collection = "field-survey-2024" ,
band_map = { "RGB" : "image" , "NIR" : "image" },
band_index_map = { "RGB" : 0 , "NIR" : 1 }, # Multi-band COG
separate_files = False ,
license = "CC-BY-4.0" ,
license_url = "https://acme.example.com/license" ,
spatial_coverage = "regional" ,
))
# Now available via build()
collection = rasteret.build(
"acme/field-survey-2024" ,
name = "acme_training" ,
bbox = ( ... ),
date_range = ( ... ),
)
Contribute to Built-Ins
Add datasets to src/rasteret/catalog.py and open a PR:
Find the dataset’s STAC API or GeoParquet index
Write a descriptor (~20 lines):
DatasetRegistry.register(
DatasetDescriptor(
id = "provider/dataset-name" ,
name = "Human-Readable Name" ,
description = "One-line summary" ,
stac_api = "https://..." ,
stac_collection = "collection-id" ,
band_map = { "B1" : "asset_key" , ... },
separate_files = True ,
spatial_coverage = "global" ,
temporal_range = ( "2020-01-01" , "present" ),
license = "CC-BY-4.0" ,
license_url = "https://..." ,
example_bbox = ( - 122.5 , 37.7 , - 122.3 , 37.9 ),
example_date_range = ( "2024-01-01" , "2024-06-01" ),
)
)
Test it with rasteret.build() using example_bbox and example_date_range
Submit PR — every Rasteret user sees it in rasteret datasets list on next release
See the Dataset Catalog How-To for prerequisites and best practices.
Access Patterns
STAC-Backed Datasets
Most datasets use a STAC API:
desc = DatasetDescriptor(
id = "provider/dataset" ,
stac_api = "https://stac.example.com/v1" ,
stac_collection = "collection-id" ,
band_map = { "B04" : "red" , "B08" : "nir" },
)
Build process :
Query STAC API with bbox and date_range
Fetch COG headers for matching scenes
Write Parquet index to ~/rasteret_workspace/{name}_stac/
GeoParquet-Backed Datasets
Some datasets provide a GeoParquet index (e.g. AlphaEarth Foundation):
desc = DatasetDescriptor(
id = "provider/dataset" ,
geoparquet_uri = "s3://bucket/index.parquet" ,
column_map = { "fid" : "id" , "geom" : "geometry" },
href_column = "path" ,
band_index_map = { "B01" : 0 , "B02" : 1 },
)
Build process :
Load GeoParquet with PyArrow
Apply column_map to normalize schema
Construct assets struct from href_column + band_index_map
Optionally enrich with COG headers (enrich_cog=True)
Write to workspace
See src/rasteret/__init__.py:370-457 for the GeoParquet build path.
Cloud Configuration
Datasets can specify cloud provider settings:
desc = DatasetDescriptor(
id = "earthsearch/landsat-c2-l2" ,
cloud_config = {
"provider" : "aws" ,
"requester_pays" : True ,
"region" : "us-west-2" ,
"url_patterns" : {
"https://landsatlook.usgs.gov/data/" : "s3://usgs-landsat/"
},
},
)
Rasteret auto-creates a CloudConfig and registers it keyed by dataset ID. This enables:
Requester-pays S3 requests
URL rewriting (HTTPS → S3 for faster access)
Per-dataset authentication
See src/rasteret/catalog.py:217-229 for the registration logic.
Licensing and Commercial Use
Each descriptor includes licensing metadata:
desc.license # "CC-BY-4.0" or "proprietary"
desc.license_url # Link to full license text
desc.commercial_use # True/False
Filter by commercial-use-allowed :
commercial_datasets = [
d for d in rasteret.DatasetRegistry.list()
if d.commercial_use
]
Always review the license_url before using datasets commercially. The commercial_use flag is a convenience filter, not legal advice.
Local Datasets
Register local Parquet collections to make them appear in the catalog:
# Build a collection
collection = rasteret.build_from_table(
"./my_data.parquet" ,
name = "local_survey" ,
enrich_cog = True ,
)
# Register it
rasteret.register_local(
dataset_id = "local/survey_2024" ,
path = collection.dataset.files[ 0 ], # Path to Parquet
name = "Field Survey 2024" ,
description = "Drone imagery, 5cm resolution" ,
persist = True , # Save to ~/.rasteret/datasets.local.json
)
# Now available via build()
reloaded = rasteret.build( "local/survey_2024" , name = "reload" )
See src/rasteret/__init__.py:513-588 for the register_local() implementation.
Advanced: Static STAC Catalogs
For static STAC catalogs (no /search endpoint):
desc = DatasetDescriptor(
id = "provider/static-catalog" ,
stac_api = "https://example.com/catalog.json" , # Root catalog URL
stac_collection = None , # Traverse from root
static_catalog = True ,
band_map = { ... },
)
Rasteret uses pystac.Catalog.from_file() and client-side filtering.
See src/rasteret/catalog.py:122-126 for the static_catalog field.
Next Steps
Build From STAC Use build_from_stac() for datasets not in the catalog
Build From Parquet Index GeoParquet files with build_from_table()
Custom Cloud Provider Handle auth-required datasets (Planetary Computer, Earthdata)
Contribute a Dataset Add your dataset to the built-in catalog