Schema Validation
Validate configurations at runtime using Python dataclasses with continuous validation - errors caught immediately when you mutate the config.
Type Coercion Matrix
Sparkwheel automatically converts compatible types when coercion is enabled (default: True):
| From ↓ To → | int |
float |
str |
bool |
list |
dict |
|---|---|---|---|---|---|---|
| int | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| float | ✅* | ✅ | ✅ | ❌ | ❌ | ❌ |
| str | ✅** | ✅** | ✅ | ✅*** | ❌ | ❌ |
| bool | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| list | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ |
| dict | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
* Truncates decimal part (e.g., 3.14 → 3)
** Requires valid format (e.g., "42" for int, "3.14" for float)
*** Accepts: "true", "false", "1", "0", "yes", "no" (case-insensitive)
Default Behavior
Type coercion is enabled by default to handle common cases like environment variables and CLI arguments (which are always strings).
Disable for Strict Validation
Set coerce=False for strict type checking:
Quick Start
Define a schema with dataclasses:
from dataclasses import dataclass
from sparkwheel import Config
@dataclass
class AppConfigSchema:
name: str
port: int
debug: bool = False
# Continuous validation - validates on every update/set!
config = Config(schema=AppConfigSchema) # (1)!
config.update("config.yaml")
# Errors caught immediately at mutation time
config.set("port", "8080") # (2)!
config.set("port", "not a number") # (3)!
# Or validate explicitly after loading
config = Config()
config.update("config.yaml")
config.validate(AppConfigSchema) # (4)!
- ✅ Enable continuous validation - errors caught on every mutation
- ✅ Auto-coerced to
int(8080)(coercion enabled by default) - ❌ Raises
ValidationErrorimmediately - invalid type conversion - ✅ Alternative: validate explicitly after loading all config
With type coercion enabled by default, compatible types are automatically converted:
# config.yaml:
# name: "myapp"
# port: "8080" # String value
# debug: "true" # String value
config = Config(schema=AppConfigSchema, coerce=True)
config.update("config.yaml")
# ✓ port coerced to int(8080)
# ✓ debug coerced to bool(True)
If validation fails, you get clear errors:
# With coercion disabled
config = Config(schema=AppConfigSchema, coerce=False)
config.update({"port": "8080"})
# ValidationError: Validation error at 'port': Type mismatch
# Expected type: int
# Actual type: str
# Actual value: '8080'
Defining Schemas
Schemas are Python dataclasses with type hints.
Basic Types
@dataclass
class ConfigSchema:
text: str
count: int
ratio: float
enabled: bool
items: list[str]
mapping: dict[str, int]
Optional Fields
from typing import Optional
@dataclass
class ConfigSchema:
required: str
optional_with_none: Optional[int] = None
optional_with_default: int = 42
Nested Dataclasses
@dataclass
class DatabaseConfigSchema:
host: str
port: int
pool_size: int = 10
@dataclass
class AppConfigSchema:
database: DatabaseConfigSchema # Nested
secret_key: str
Corresponding YAML:
Lists of Dataclasses
@dataclass
class PluginConfigSchema:
name: str
enabled: bool = True
@dataclass
class AppConfigSchema:
plugins: list[PluginConfigSchema]
Dictionaries with Dataclass Values
@dataclass
class ModelConfigSchema:
hidden_size: int
dropout: float
@dataclass
class ConfigSchema:
models: dict[str, ModelConfigSchema]
Custom Validation
Add validation logic with @validator:
from sparkwheel import validator
@dataclass
class TrainingConfigSchema:
lr: float
batch_size: int
@validator
def check_lr(self):
"""Validate learning rate."""
if not (0 < self.lr < 1):
raise ValueError(f"lr must be between 0 and 1, got {self.lr}")
@validator
def check_batch_size(self):
"""Validate batch size is power of 2."""
if self.batch_size <= 0:
raise ValueError("batch_size must be positive")
if self.batch_size & (self.batch_size - 1) != 0:
raise ValueError("batch_size must be power of 2")
Cross-Field Validation
Validators can check relationships between fields:
@dataclass
class ConfigSchema:
start_epoch: int
end_epoch: int
warmup_epochs: int
@validator
def check_epochs(self):
"""Ensure epoch configuration is valid."""
if self.end_epoch <= self.start_epoch:
raise ValueError("end_epoch must be > start_epoch")
if self.warmup_epochs >= (self.end_epoch - self.start_epoch):
raise ValueError("warmup_epochs too large")
With Optional Fields
@dataclass
class ConfigSchema:
value: float
max_value: Optional[float] = None
@validator
def check_max(self):
"""Check value doesn't exceed max if specified."""
if self.max_value is not None and self.value > self.max_value:
raise ValueError(f"value ({self.value}) exceeds max_value ({self.max_value})")
Note: Validators run after type checking. If types are wrong, validation stops there.
Discriminated Unions
Use tagged unions for type-safe variants:
from typing import Literal, Union
@dataclass
class SGDOptimizerSchema:
type: Literal["sgd"] # Discriminator
lr: float
momentum: float = 0.9
@dataclass
class AdamOptimizerSchema:
type: Literal["adam"] # Discriminator
lr: float
beta1: float = 0.9
@dataclass
class ConfigSchema:
optimizer: Union[SGDOptimizerSchema, AdamOptimizerSchema]
YAML:
Sparkwheel detects type as a discriminator and validates against the matching schema.
Error examples:
# Missing discriminator
{"optimizer": {"lr": 0.01}}
# ValidationError: Missing discriminator field 'type'
# Invalid value
{"optimizer": {"type": "rmsprop", "lr": 0.01}}
# ValidationError: Invalid discriminator value 'rmsprop'. Valid: 'sgd', 'adam'
# Wrong fields for type
{"optimizer": {"type": "adam", "momentum": 0.9}}
# ValidationError: Missing required field 'lr'
With Sparkwheel Features
Validation works with references, expressions, and instantiation.
Type Coercion
Sparkwheel automatically converts compatible types when coerce=True (default):
@dataclass
class ServerConfigSchema:
port: int
timeout: float
enabled: bool
# Coercion enabled by default
config = Config(schema=ServerConfigSchema)
config.update({
"port": "8080", # str → int
"timeout": "30.5", # str → float
"enabled": "true" # str → bool
})
print(config["port"]) # 8080 (int, not str!)
print(config["timeout"]) # 30.5 (float)
print(config["enabled"]) # True (bool)
Supported coercions:
- str → int (e.g., "42" → 42)
- str → float (e.g., "3.14" → 3.14)
- str → bool (e.g., "true" → True, "false" → False)
- int → float (e.g., 42 → 42.0)
- Recursive coercion through lists, dicts, and nested dataclasses
Disable coercion if needed:
config = Config(schema=ServerConfigSchema, coerce=False)
config.update({
"port": "8080" # ValidationError: expected int, got str
})
Strict vs Lenient Mode
Control whether extra fields are rejected:
@dataclass
class MySchema:
required_field: int
# Strict mode (default) - rejects extra fields
config = Config(schema=MySchema, strict=True)
config.update({
"required_field": 42,
"extra_field": "oops" # ✗ ValidationError!
})
# Lenient mode - allows extra fields
config = Config(schema=MySchema, strict=False)
config.update({
"required_field": 42,
"extra_field": "ok" # ✓ Allowed
})
Use lenient mode for: - Development/prototyping - Gradual schema migration - Configs with experimental fields
MISSING Sentinel
Support partial configs with required-but-not-yet-set values:
from sparkwheel import Config, MISSING
@dataclass
class APIConfigSchema:
api_key: str
endpoint: str
timeout: int = 30
# Partial config - api_key not set yet
config = Config(schema=APIConfigSchema, allow_missing=True)
config.update({
"api_key": MISSING,
"endpoint": "https://api.example.com"
})
# Later, fill in the missing value
import os
config.set("api_key", os.getenv("API_KEY"))
# Now validate that nothing is MISSING
config.validate(APIConfigSchema) # Uses allow_missing=False by default
Frozen Configs
Prevent modifications after initialization:
config = Config(schema=MySchema)
config.update("config.yaml")
config.freeze()
# Mutations now raise FrozenConfigError
config.set("model::lr", 0.001) # ✗ FrozenConfigError!
config.update({"new": "data"}) # ✗ FrozenConfigError!
# Read operations still work
value = config.get("model::lr")
resolved = config.resolve()
# Unfreeze if needed
config.unfreeze()
config.set("model::lr", 0.001) # ✓ Now works
With Sparkwheel Features
Validation works with references, expressions, and instantiation.
References
@dataclass
class ConfigSchema:
base_lr: float
optimizer_lr: float # Can be a reference
config = Config(schema=ConfigSchema)
config.update({
"base_lr": 0.001,
"optimizer_lr": "@base_lr" # Reference allowed
})
Expressions
@dataclass
class ConfigSchema:
batch_size: int
total_steps: int # Computed
config = Config(schema=ConfigSchema)
config.update({
"batch_size": 32,
"total_steps": "$@batch_size * 100" # Expression allowed
})
Instantiation
Special keys like _target_ are automatically ignored:
@dataclass
class OptimizerConfigSchema:
lr: float
momentum: float = 0.9
config = Config(schema=OptimizerConfigSchema)
config.update({
"_target_": "torch.optim.SGD", # Ignored by validation
"lr": 0.001,
"momentum": 0.95
})
Error Messages
Type Mismatch
# Expected int, got str
# ValidationError: Validation error at 'port': Type mismatch
# Expected type: int
# Actual type: str
# Actual value: '8080'
Missing Field
# ValidationError: Validation error at 'required_field':
# Missing required field 'required_field'
# Expected type: str
Unexpected Field
# ValidationError: Validation error at 'unexpected':
# Unexpected field 'unexpected' not in schema ConfigSchema
Nested Errors
# ValidationError: Validation error at 'database.port': Type mismatch
# Expected type: int
# Actual type: str
# Actual value: 'wrong'
Validation Timing
Continuous (Recommended)
# Validates on every update() and set()
config = Config(schema=MySchema)
config.update("config.yaml")
config.set("port", "8080") # Validates immediately!
Explicit
# Load without schema, validate later
config = Config()
config.update("config.yaml")
# ... maybe modify ...
config.validate(MySchema)
Standalone Function
Complete Example
from dataclasses import dataclass
from typing import Optional
from sparkwheel import Config, validator
@dataclass
class DatabaseConfigSchema:
host: str
port: int
database: str
username: str
password: str
pool_size: int = 10
timeout: int = 30
@dataclass
class APIConfigSchema:
host: str = "0.0.0.0"
port: int = 8000
workers: int = 4
@validator
def check_port(self):
if not (1024 <= self.port <= 65535):
raise ValueError(f"port must be 1024-65535, got {self.port}")
@dataclass
class AppConfigSchema:
app_name: str
environment: str
debug: bool = False
api: APIConfigSchema
database: DatabaseConfigSchema
# Load and validate continuously
config = Config(schema=AppConfigSchema)
config.update("production.yaml")
# Access validated config
print(f"Starting {config['app_name']} on port {config['api::port']}")
# Freeze to prevent modifications
config.freeze()
The YAML:
app_name: "My API"
environment: production
debug: false
api:
port: 3000
workers: 8
database:
host: db.example.com
port: 5432
database: myapp
username: "$import os; os.getenv('DB_USER')"
password: "$import os; os.getenv('DB_PASSWORD')"
pool_size: 20
Next Steps
- Configuration Basics - Learn config management
- References - Link values with @
- Expressions - Compute values with $