rasteret.as_collection()

Function Signature

rasteret.as_collection(
    table: pa.Table | pads.Dataset,
    *,
    name: str = "",
    data_source: str = "",
    description: str = "",
    start_date: datetime | None = None,
    end_date: datetime | None = None,
    require_band_metadata: bool = True,
) -> Collection

Description

Wrap a read-ready Arrow object as a Collection. This is the lightweight re-entry path for workflows where you already have a table derived from an existing Collection and want to keep using Rasteret reads without re-running ingest/enrichment. Unlike build_from_table(), this function performs no COG enrichment, normalization, or persistence. It validates the read contract and wraps the provided Arrow object as-is. Use build_from_table() for first-time external Parquet ingest.

Parameters

table

pa.Table | pads.Dataset

required

Arrow object to wrap. pyarrow.dataset.Dataset is recommended for large collections to keep scans lazy. Despite the parameter name, both table and dataset inputs are first-class.

name

str

default:""

Optional collection name.

data_source

str

default:""

Optional data source identifier. If omitted, Rasteret attempts to infer it from schema metadata or the collection column.

description

str

default:""

Optional collection description.

start_date

datetime

Optional temporal start to attach to the Collection object.

end_date

datetime

Optional temporal end to attach to the Collection object.

require_band_metadata

bool

default:"True"

When True (default), require at least one *_metadata column and validate those columns are struct-typed with required COG metadata fields.

Returns

collection

Collection

A wrapped Collection ready for get_numpy(), get_xarray(), and to_torchgeo_dataset() when the necessary band metadata columns are present.

Raises

TypeError: If the input is not a pyarrow.Table or pyarrow.dataset.Dataset.
ValueError: If required columns are missing or band metadata is invalid.
UserWarning: If a large in-memory pyarrow.Table is provided (>2 GiB or >40% of system RAM).

Usage Example

import rasteret
import pyarrow as pa
import pyarrow.dataset as pads

# Load an existing collection and filter it
base_collection = rasteret.load(
    "~/rasteret_workspace/sentinel2_records"
)

# Apply filters using PyArrow dataset API
filtered_dataset = base_collection.dataset.filter(
    (pads.field("eo:cloud_cover") < 10) &
    (pads.field("year") == 2024)
)

# Wrap the filtered dataset as a new Collection
filtered_collection = rasteret.as_collection(
    filtered_dataset,
    name="low-cloud-2024",
    data_source="sentinel-2-l2a",
)

print(f"Filtered to {len(filtered_collection)} scenes")

# Use the wrapped collection
ds = filtered_collection.get_xarray(
    geometries=aoi,
    bands=["B04", "B03", "B02"],
    resolution=10,
)

# Wrap a table without band metadata (metadata-only workflows)
table = base_collection.dataset.to_table(
    columns=["id", "datetime", "geometry", "assets", "scene_bbox"]
)

metadata_collection = rasteret.as_collection(
    table,
    name="metadata-only",
    require_band_metadata=False,
)

# Export to GeoDataFrame
gdf = metadata_collection.get_gdf()
print(gdf.head())

Performance Notes

For large collections (>2 GiB), prefer passing a pyarrow.dataset.Dataset instead of a pyarrow.Table to keep scans lazy and avoid loading the entire dataset into memory.

# Good: Lazy dataset
dataset = pads.dataset("/path/to/large/collection")
collection = rasteret.as_collection(dataset)

# Warning: Loads entire table into memory
table = pads.dataset("/path/to/large/collection").to_table()
collection = rasteret.as_collection(table)  # May trigger warning

build_from_table() - Build from external Parquet with normalization/enrichment
load() - Load a persisted Collection
Collection - Collection class reference

Core API

Building Collections

Data Access

Configuration

CLI

rasteret.as_collection()

Function Signature

Description

Parameters

Returns

Raises

Usage Example

Performance Notes

Core API

Building Collections

Data Access

Configuration

CLI

Documentation Index

​Function Signature

​Description

​Parameters

​Returns

​Raises

​Usage Example

​Performance Notes

​Related Functions

Function Signature

Description

Parameters

Returns

Raises

Usage Example

Performance Notes

Related Functions