Schema Validation

Validate configurations at runtime using Python dataclasses with continuous validation - errors caught immediately when you mutate the config.

Type Coercion Matrix

Sparkwheel automatically converts compatible types when coercion is enabled (default: True):

From ↓ To →	`int`	`float`	`str`	`bool`	`list`	`dict`
int	✅	✅	✅	✅	❌	❌
float	✅*	✅	✅	❌	❌	❌
str	✅**	✅**	✅	✅***	❌	❌
bool	✅	✅	✅	✅	❌	❌
list	❌	❌	✅	❌	✅	❌
dict	❌	❌	✅	❌	❌	✅

* Truncates decimal part (e.g., 3.14 → 3) ** Requires valid format (e.g., "42" for int, "3.14" for float) *** Accepts: "true", "false", "1", "0", "yes", "no" (case-insensitive)

Default Behavior

Type coercion is enabled by default to handle common cases like environment variables and CLI arguments (which are always strings).

Disable for Strict Validation

Set coerce=False for strict type checking:

config = Config(schema=AppConfigSchema, coerce=False)

Quick Start

Define a schema with dataclasses:

app.py

from dataclasses import dataclass
from sparkwheel import Config

@dataclass
class AppConfigSchema:
    name: str
    port: int
    debug: bool = False

# Continuous validation - validates on every update/set!
config = Config(schema=AppConfigSchema)  # (1)!
config.update("config.yaml")

# Errors caught immediately at mutation time
config.set("port", "8080")  # (2)!
config.set("port", "not a number")  # (3)!

# Or validate explicitly after loading
config = Config()
config.update("config.yaml")
config.validate(AppConfigSchema)  # (4)!

✅ Enable continuous validation - errors caught on every mutation
✅ Auto-coerced to int(8080) (coercion enabled by default)
❌ Raises ValidationError immediately - invalid type conversion
✅ Alternative: validate explicitly after loading all config

With type coercion enabled by default, compatible types are automatically converted:

# config.yaml:
# name: "myapp"
# port: "8080"  # String value
# debug: "true" # String value

config = Config(schema=AppConfigSchema, coerce=True)
config.update("config.yaml")
# ✓ port coerced to int(8080)
# ✓ debug coerced to bool(True)

If validation fails, you get clear errors:

# With coercion disabled
config = Config(schema=AppConfigSchema, coerce=False)
config.update({"port": "8080"})
# ValidationError: Validation error at 'port': Type mismatch
#   Expected type: int
#   Actual type: str
#   Actual value: '8080'

Defining Schemas

Schemas are Python dataclasses with type hints.

Basic Types

@dataclass
class ConfigSchema:
    text: str
    count: int
    ratio: float
    enabled: bool
    items: list[str]
    mapping: dict[str, int]

Optional Fields

from typing import Optional

@dataclass
class ConfigSchema:
    required: str
    optional_with_none: Optional[int] = None
    optional_with_default: int = 42

Nested Dataclasses

@dataclass
class DatabaseConfigSchema:
    host: str
    port: int
    pool_size: int = 10

@dataclass
class AppConfigSchema:
    database: DatabaseConfigSchema  # Nested
    secret_key: str

Corresponding YAML:

database:
  host: localhost
  port: 5432
  # pool_size uses default

secret_key: my-secret

Lists of Dataclasses

@dataclass
class PluginConfigSchema:
    name: str
    enabled: bool = True

@dataclass
class AppConfigSchema:
    plugins: list[PluginConfigSchema]

plugins:
  - name: logger
    enabled: true
  - name: metrics
  - name: cache
    enabled: false

Dictionaries with Dataclass Values

@dataclass
class ModelConfigSchema:
    hidden_size: int
    dropout: float

@dataclass
class ConfigSchema:
    models: dict[str, ModelConfigSchema]

models:
  small:
    hidden_size: 128
    dropout: 0.1
  large:
    hidden_size: 512
    dropout: 0.2

Custom Validation

Add validation logic with @validator:

from sparkwheel import validator

@dataclass
class TrainingConfigSchema:
    lr: float
    batch_size: int

    @validator
    def check_lr(self):
        """Validate learning rate."""
        if not (0 < self.lr < 1):
            raise ValueError(f"lr must be between 0 and 1, got {self.lr}")

    @validator
    def check_batch_size(self):
        """Validate batch size is power of 2."""
        if self.batch_size <= 0:
            raise ValueError("batch_size must be positive")
        if self.batch_size & (self.batch_size - 1) != 0:
            raise ValueError("batch_size must be power of 2")

Cross-Field Validation

Validators can check relationships between fields:

@dataclass
class ConfigSchema:
    start_epoch: int
    end_epoch: int
    warmup_epochs: int

    @validator
    def check_epochs(self):
        """Ensure epoch configuration is valid."""
        if self.end_epoch <= self.start_epoch:
            raise ValueError("end_epoch must be > start_epoch")
        if self.warmup_epochs >= (self.end_epoch - self.start_epoch):
            raise ValueError("warmup_epochs too large")

With Optional Fields

@dataclass
class ConfigSchema:
    value: float
    max_value: Optional[float] = None

    @validator
    def check_max(self):
        """Check value doesn't exceed max if specified."""
        if self.max_value is not None and self.value > self.max_value:
            raise ValueError(f"value ({self.value}) exceeds max_value ({self.max_value})")

Note: Validators run after type checking. If types are wrong, validation stops there.

Discriminated Unions

Use tagged unions for type-safe variants:

from typing import Literal, Union

@dataclass
class SGDOptimizerSchema:
    type: Literal["sgd"]  # Discriminator
    lr: float
    momentum: float = 0.9

@dataclass
class AdamOptimizerSchema:
    type: Literal["adam"]  # Discriminator
    lr: float
    beta1: float = 0.9

@dataclass
class ConfigSchema:
    optimizer: Union[SGDOptimizerSchema, AdamOptimizerSchema]

YAML:

optimizer:
  type: sgd  # Selects SGDOptimizer
  lr: 0.01
  momentum: 0.95

Sparkwheel detects type as a discriminator and validates against the matching schema.

Error examples:

# Missing discriminator
{"optimizer": {"lr": 0.01}}
# ValidationError: Missing discriminator field 'type'

# Invalid value
{"optimizer": {"type": "rmsprop", "lr": 0.01}}
# ValidationError: Invalid discriminator value 'rmsprop'. Valid: 'sgd', 'adam'

# Wrong fields for type
{"optimizer": {"type": "adam", "momentum": 0.9}}
# ValidationError: Missing required field 'lr'

With Sparkwheel Features

Validation works with references, expressions, and instantiation.

Type Coercion

Sparkwheel automatically converts compatible types when coerce=True (default):

@dataclass
class ServerConfigSchema:
    port: int
    timeout: float
    enabled: bool

# Coercion enabled by default
config = Config(schema=ServerConfigSchema)
config.update({
    "port": "8080",        # str → int
    "timeout": "30.5",     # str → float
    "enabled": "true"      # str → bool
})

print(config["port"])      # 8080 (int, not str!)
print(config["timeout"])   # 30.5 (float)
print(config["enabled"])   # True (bool)

Supported coercions: - str → int (e.g., "42" → 42) - str → float (e.g., "3.14" → 3.14) - str → bool (e.g., "true" → True, "false" → False) - int → float (e.g., 42 → 42.0) - Recursive coercion through lists, dicts, and nested dataclasses

Disable coercion if needed:

config = Config(schema=ServerConfigSchema, coerce=False)
config.update({
    "port": "8080"  # ValidationError: expected int, got str
})

Strict vs Lenient Mode

Control whether extra fields are rejected:

@dataclass
class MySchema:
    required_field: int

# Strict mode (default) - rejects extra fields
config = Config(schema=MySchema, strict=True)
config.update({
    "required_field": 42,
    "extra_field": "oops"  # ✗ ValidationError!
})

# Lenient mode - allows extra fields
config = Config(schema=MySchema, strict=False)
config.update({
    "required_field": 42,
    "extra_field": "ok"  # ✓ Allowed
})

Use lenient mode for: - Development/prototyping - Gradual schema migration - Configs with experimental fields

MISSING Sentinel

Support partial configs with required-but-not-yet-set values:

from sparkwheel import Config, MISSING

@dataclass
class APIConfigSchema:
    api_key: str
    endpoint: str
    timeout: int = 30

# Partial config - api_key not set yet
config = Config(schema=APIConfigSchema, allow_missing=True)
config.update({
    "api_key": MISSING,
    "endpoint": "https://api.example.com"
})

# Later, fill in the missing value
import os
config.set("api_key", os.getenv("API_KEY"))

# Now validate that nothing is MISSING
config.validate(APIConfigSchema)  # Uses allow_missing=False by default

Frozen Configs

Prevent modifications after initialization:

config = Config(schema=MySchema)
config.update("config.yaml")
config.freeze()

# Mutations now raise FrozenConfigError
config.set("model::lr", 0.001)  # ✗ FrozenConfigError!
config.update({"new": "data"})   # ✗ FrozenConfigError!

# Read operations still work
value = config.get("model::lr")
resolved = config.resolve()

# Unfreeze if needed
config.unfreeze()
config.set("model::lr", 0.001)  # ✓ Now works

With Sparkwheel Features

Validation works with references, expressions, and instantiation.

References

@dataclass
class ConfigSchema:
    base_lr: float
    optimizer_lr: float  # Can be a reference

config = Config(schema=ConfigSchema)
config.update({
    "base_lr": 0.001,
    "optimizer_lr": "@base_lr"  # Reference allowed
})

Expressions

@dataclass
class ConfigSchema:
    batch_size: int
    total_steps: int  # Computed

config = Config(schema=ConfigSchema)
config.update({
    "batch_size": 32,
    "total_steps": "$@batch_size * 100"  # Expression allowed
})

Instantiation

Special keys like _target_ are automatically ignored:

@dataclass
class OptimizerConfigSchema:
    lr: float
    momentum: float = 0.9

config = Config(schema=OptimizerConfigSchema)
config.update({
    "_target_": "torch.optim.SGD",  # Ignored by validation
    "lr": 0.001,
    "momentum": 0.95
})

Error Messages

Type Mismatch

# Expected int, got str
# ValidationError: Validation error at 'port': Type mismatch
#   Expected type: int
#   Actual type: str
#   Actual value: '8080'

Missing Field

# ValidationError: Validation error at 'required_field':
#   Missing required field 'required_field'
#   Expected type: str

Unexpected Field

# ValidationError: Validation error at 'unexpected':
#   Unexpected field 'unexpected' not in schema ConfigSchema

Nested Errors

# ValidationError: Validation error at 'database.port': Type mismatch
#   Expected type: int
#   Actual type: str
#   Actual value: 'wrong'

Validation Timing

Continuous (Recommended)

# Validates on every update() and set()
config = Config(schema=MySchema)
config.update("config.yaml")
config.set("port", "8080")  # Validates immediately!

Explicit

# Load without schema, validate later
config = Config()
config.update("config.yaml")
# ... maybe modify ...
config.validate(MySchema)

Standalone Function

from sparkwheel import validate

# Validate a dict directly
validate(config_dict, AppConfigSchema)

Complete Example

from dataclasses import dataclass
from typing import Optional
from sparkwheel import Config, validator

@dataclass
class DatabaseConfigSchema:
    host: str
    port: int
    database: str
    username: str
    password: str
    pool_size: int = 10
    timeout: int = 30

@dataclass
class APIConfigSchema:
    host: str = "0.0.0.0"
    port: int = 8000
    workers: int = 4

    @validator
    def check_port(self):
        if not (1024 <= self.port <= 65535):
            raise ValueError(f"port must be 1024-65535, got {self.port}")

@dataclass
class AppConfigSchema:
    app_name: str
    environment: str
    debug: bool = False
    api: APIConfigSchema
    database: DatabaseConfigSchema

# Load and validate continuously
config = Config(schema=AppConfigSchema)
config.update("production.yaml")

# Access validated config
print(f"Starting {config['app_name']} on port {config['api::port']}")

# Freeze to prevent modifications
config.freeze()

The YAML:

app_name: "My API"
environment: production
debug: false

api:
  port: 3000
  workers: 8

database:
  host: db.example.com
  port: 5432
  database: myapp
  username: "$import os; os.getenv('DB_USER')"
  password: "$import os; os.getenv('DB_PASSWORD')"
  pool_size: 20

Next Steps

Configuration Basics - Learn config management
References - Link values with @
Expressions - Compute values with $