Pavise Documentation
Pavise is a DataFrame validation library using Python Protocol for structural subtyping. It supports both pandas and polars backends as optional dependencies.
Features
Type-safe DataFrame validation using Python’s Protocol and structural subtyping
Multiple backends: Support for both pandas and polars (including LazyFrame)
Runtime validation with detailed error messages
Annotated validators: Attach validators to column types using
typing.AnnotatedStrict mode: Optionally reject DataFrames with extra columns
Covariant type parameters: DataFrames with more columns can be used where fewer are expected
LazyFrame support: Validate polars LazyFrames with schema-level checks at construction and full validation on collect
Quick Start
Installation:
# For pandas backend
pip install pavise[pandas]
# For polars backend
pip install pavise[polars]
# For both backends
pip install pavise[all]
Basic usage:
from typing import Protocol
from pavise.pandas import DataFrame
class UserSchema(Protocol):
user_id: int
name: str
age: int
# Type checking only (no runtime overhead)
def process(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
return df
# Runtime validation at system boundaries
validated_df = DataFrame[UserSchema](raw_df)
Contents: