Validators
Pavise provides validators to enforce data quality constraints beyond type checking.
Validators are attached to column types using typing.Annotated.
Available Validators
Range
Validates that numeric values are within a specified range.
from typing import Annotated, Protocol
from pavise.validators import Range
from pavise.pandas import DataFrame
class Schema(Protocol):
age: Annotated[int, Range(0, 150)]
score: Annotated[float, Range(0.0, 100.0)]
validated_df = DataFrame[Schema](df)
Error message example:
ValueError: Column 'age': values must be in range [0, 150]
Sample invalid values (showing first 3 of 5):
Row 1: 200
Row 3: -5
Row 5: 300
Unique
Validates that column values are unique (no duplicates).
from pavise.validators import Unique
class Schema(Protocol):
user_id: Annotated[int, Unique()]
email: Annotated[str, Unique()]
Error message example:
ValueError: Column 'user_id': contains duplicate values
Sample duplicate values (showing first 2):
Value 2 at rows: [1, 3]
Value 5 at rows: [2, 4]
In
Validates that column values are within a set of allowed values.
from pavise.validators import In
class Schema(Protocol):
status: Annotated[str, In(["pending", "approved", "rejected"])]
priority: Annotated[int, In([1, 2, 3, 4, 5])]
Error message example:
ValueError: Column 'status': contains values not in allowed values
Sample invalid values (showing first 2 of 3):
Row 1: 'invalid'
Row 3: 'bad'
Regex
Validates that string values match a regular expression pattern.
from pavise.validators import Regex
class Schema(Protocol):
email: Annotated[str, Regex(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')]
phone: Annotated[str, Regex(r'^\d{3}-\d{4}-\d{4}$')]
Error message example:
ValueError: Column 'email': contains values that don't match the pattern
Sample invalid values (showing first 2 of 4):
Row 1: 'invalid'
Row 3: 'bad@'
MinLen / MaxLen
Validates string length constraints.
from pavise.validators import MinLen, MaxLen
class Schema(Protocol):
username: Annotated[str, MinLen(3), MaxLen(20)]
password: Annotated[str, MinLen(8)]
Error message example:
ValueError: Column 'username': contains strings shorter than minimum length
Sample invalid values (showing first 2 of 3):
Row 0: 'ab' (length: 2)
Row 2: 'x' (length: 1)
Custom
Create custom validators for business-specific logic.
from pavise.validators import Custom
def is_positive(value) -> bool:
return value > 0
def is_business_day(value) -> bool:
return value.weekday() < 5
class Schema(Protocol):
amount: Annotated[int, Custom(is_positive, "must be positive")]
date: Annotated[datetime.date, Custom(is_business_day, "must be a business day")]
Error message example:
ValueError: Column 'amount': must be positive
Sample invalid values (showing first 3 of 5):
Row 1: -100
Row 3: 0
Row 7: -50
Combining Multiple Validators
You can attach multiple validators to a single column:
from typing import Annotated, Protocol
from pavise.validators import Range, Custom
def is_even(value) -> bool:
return value % 2 == 0
class Schema(Protocol):
score: Annotated[int, Range(0, 100), Custom(is_even, "must be even")]
All validators are checked in order, and the first failure is reported.
Performance Considerations
Validators are only executed during runtime validation:
# No validation, no performance cost
def process(df: DataFrame[Schema]) -> DataFrame[Schema]:
return df
# Validation happens here
validated_df = DataFrame[Schema](raw_df)
# No further validation cost
result = process(validated_df)
For large DataFrames, consider validating only at system boundaries and relying on type checking for internal functions.