Getting Started =============== Installation ------------ Pavise requires Python 3.9 or later. Install it via pip with your preferred backend: For pandas backend: .. code-block:: bash pip install pavise[pandas] For polars backend: .. code-block:: bash pip install pavise[polars] For both backends: .. code-block:: bash pip install pavise[all] Basic Usage ----------- Define a Schema ~~~~~~~~~~~~~~~ Define your DataFrame schema using Python's Protocol: .. code-block:: python from typing import Protocol class UserSchema(Protocol): user_id: int name: str age: int email: str Static Type Checking ~~~~~~~~~~~~~~~~~~~~ Use the schema for static type checking with mypy, pyright, or other type checkers: .. code-block:: python from pavise.pandas import DataFrame def process_users(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]: # Type checker validates the schema # No runtime overhead return df Runtime Validation ~~~~~~~~~~~~~~~~~~ Validate DataFrames at runtime, typically at system boundaries: .. code-block:: python import pandas as pd from pavise.pandas import DataFrame from pavise.exceptions import ValidationError # Load data from external source raw_df = pd.read_csv("users.csv") # Validate at system boundary try: validated_df = DataFrame[UserSchema](raw_df) except ValidationError as e: print(f"Validation failed: {e}") If validation fails, you'll get a detailed error message from ``ValidationError``: .. code-block:: text Validation failed: Column 'age': expected int, got object Sample invalid values (showing first 3 of 10): Row 1: 'invalid' (str) Row 5: None (NoneType) Row 8: 200.5 (float) Using Validators ~~~~~~~~~~~~~~~~ Add validators using ``typing.Annotated``: .. code-block:: python from typing import Annotated from pavise.validators import Range, Regex class UserSchema(Protocol): user_id: int name: str age: Annotated[int, Range(0, 150)] email: Annotated[str, Regex(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')] # Runtime validation with validators validated_df = DataFrame[UserSchema](raw_df) Using Literal Types ~~~~~~~~~~~~~~~~~~~ Restrict column values to specific literals using ``Literal``: .. code-block:: python from typing import Literal, Protocol class OrderSchema(Protocol): order_id: int status: Literal["pending", "approved", "rejected"] priority: Literal[1, 2, 3] # Only these exact values are allowed validated_df = DataFrame[OrderSchema](raw_df) Optional Columns ~~~~~~~~~~~~~~~~ Use ``NotRequiredColumn[T]`` for columns that may not exist in the DataFrame: .. code-block:: python from typing import Optional, Protocol from pavise.pandas import DataFrame, NotRequiredColumn class UserSchema(Protocol): user_id: int name: str age: NotRequiredColumn[int] # Column can be missing email: NotRequiredColumn[Optional[str]] # Column can be missing, or contain None # Both of these are valid df1 = pd.DataFrame({"user_id": [1], "name": ["Alice"]}) # age and email missing df2 = pd.DataFrame({"user_id": [1], "name": ["Alice"], "age": [25]}) # only email missing validated_df1 = DataFrame[UserSchema](df1) # OK validated_df2 = DataFrame[UserSchema](df2) # OK Note: ``NotRequiredColumn[T]`` means the column is optional, while ``Optional[T]`` means the column can contain ``None`` values. Use ``NotRequiredColumn[Optional[T]]`` for columns that are both optional and nullable. Next Steps ---------- * Learn about :doc:`user-guide/validators` for data quality checks * Explore :doc:`user-guide/strict-mode` to reject extra columns * Check :doc:`user-guide/polars-backend` for polars DataFrame and LazyFrame support * Check :doc:`examples/index` for real-world use cases