Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Block AI Report
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Block AI Report
    Home»AI News»How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models
    How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models
    AI News

    How to Design Production-Grade Mock Data Pipelines Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

    February 8, 20269 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    murf


    In this tutorial, we walk through an advanced, end-to-end exploration of Polyfactory, focusing on how we can generate rich, realistic mock data directly from Python type hints. We start by setting up the environment and progressively build factories for data classes, Pydantic models, and attrs-based classes, while demonstrating customization, overrides, calculated fields, and the generation of nested objects. As we move through each snippet, we show how we can control randomness, enforce constraints, and model real-world structures, making this tutorial directly applicable to testing, prototyping, and data-driven development workflows. Check out the FULL CODES here.

    import subprocess
    import sys

    def install_package(package):
    subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, package])

    packages = [
    “polyfactory”,
    “pydantic”,
    “email-validator”,
    “faker”,
    “msgspec”,
    “attrs”
    ]

    changelly

    for package in packages:
    try:
    install_package(package)
    print(f”✓ Installed {package}”)
    except Exception as e:
    print(f”✗ Failed to install {package}: {e}”)

    print(“\n”)

    print(“=” * 80)
    print(“SECTION 2: Basic Dataclass Factories”)
    print(“=” * 80)

    from dataclasses import dataclass
    from typing import List, Optional
    from datetime import datetime, date
    from uuid import UUID
    from polyfactory.factories import DataclassFactory

    @dataclass
    class Address:
    street: str
    city: str
    country: str
    zip_code: str

    @dataclass
    class Person:
    id: UUID
    name: str
    email: str
    age: int
    birth_date: date
    is_active: bool
    address: Address
    phone_numbers: List[str]
    bio: Optional[str] = None

    class PersonFactory(DataclassFactory[Person]):
    pass

    person = PersonFactory.build()
    print(f”Generated Person:”)
    print(f” ID: {person.id}”)
    print(f” Name: {person.name}”)
    print(f” Email: {person.email}”)
    print(f” Age: {person.age}”)
    print(f” Address: {person.address.city}, {person.address.country}”)
    print(f” Phone Numbers: {person.phone_numbers[:2]}”)
    print()

    people = PersonFactory.batch(5)
    print(f”Generated {len(people)} people:”)
    for i, p in enumerate(people, 1):
    print(f” {i}. {p.name} – {p.email}”)
    print(“\n”)

    We set up the environment and ensure all required dependencies are installed. We also introduce the core idea of using Polyfactory to generate mock data from type hints. By initializing the basic dataclass factories, we establish the foundation for all subsequent examples.

    print(“=” * 80)
    print(“SECTION 3: Customizing Factory Behavior”)
    print(“=” * 80)

    from faker import Faker
    from polyfactory.fields import Use, Ignore

    @dataclass
    class Employee:
    employee_id: str
    full_name: str
    department: str
    salary: float
    hire_date: date
    is_manager: bool
    email: str
    internal_notes: Optional[str] = None

    class EmployeeFactory(DataclassFactory[Employee]):
    __faker__ = Faker(locale=”en_US”)
    __random_seed__ = 42

    @classmethod
    def employee_id(cls) -> str:
    return f”EMP-{cls.__random__.randint(10000, 99999)}”

    @classmethod
    def full_name(cls) -> str:
    return cls.__faker__.name()

    @classmethod
    def department(cls) -> str:
    departments = [“Engineering”, “Marketing”, “Sales”, “HR”, “Finance”]
    return cls.__random__.choice(departments)

    @classmethod
    def salary(cls) -> float:
    return round(cls.__random__.uniform(50000, 150000), 2)

    @classmethod
    def email(cls) -> str:
    return cls.__faker__.company_email()

    employees = EmployeeFactory.batch(3)
    print(“Generated Employees:”)
    for emp in employees:
    print(f” {emp.employee_id}: {emp.full_name}”)
    print(f” Department: {emp.department}”)
    print(f” Salary: ${emp.salary:,.2f}”)
    print(f” Email: {emp.email}”)
    print()
    print()

    print(“=” * 80)
    print(“SECTION 4: Field Constraints and Calculated Fields”)
    print(“=” * 80)

    @dataclass
    class Product:
    product_id: str
    name: str
    description: str
    price: float
    discount_percentage: float
    stock_quantity: int
    final_price: Optional[float] = None
    sku: Optional[str] = None

    class ProductFactory(DataclassFactory[Product]):
    @classmethod
    def product_id(cls) -> str:
    return f”PROD-{cls.__random__.randint(1000, 9999)}”

    @classmethod
    def name(cls) -> str:
    adjectives = [“Premium”, “Deluxe”, “Classic”, “Modern”, “Eco”]
    nouns = [“Widget”, “Gadget”, “Device”, “Tool”, “Appliance”]
    return f”{cls.__random__.choice(adjectives)} {cls.__random__.choice(nouns)}”

    @classmethod
    def price(cls) -> float:
    return round(cls.__random__.uniform(10.0, 1000.0), 2)

    @classmethod
    def discount_percentage(cls) -> float:
    return round(cls.__random__.uniform(0, 30), 2)

    @classmethod
    def stock_quantity(cls) -> int:
    return cls.__random__.randint(0, 500)

    @classmethod
    def build(cls, **kwargs):
    instance = super().build(**kwargs)
    if instance.final_price is None:
    instance.final_price = round(
    instance.price * (1 – instance.discount_percentage / 100), 2
    )
    if instance.sku is None:
    name_part = instance.name.replace(” “, “-“).upper()[:10]
    instance.sku = f”{instance.product_id}-{name_part}”
    return instance

    products = ProductFactory.batch(3)
    print(“Generated Products:”)
    for prod in products:
    print(f” {prod.sku}”)
    print(f” Name: {prod.name}”)
    print(f” Price: ${prod.price:.2f}”)
    print(f” Discount: {prod.discount_percentage}%”)
    print(f” Final Price: ${prod.final_price:.2f}”)
    print(f” Stock: {prod.stock_quantity} units”)
    print()
    print()

    We focus on generating simple but realistic mock data using dataclasses and default Polyfactory behavior. We show how to quickly create single instances and batches without writing any custom logic. It helps us validate how Polyfactory automatically interprets type hints to populate nested structures.

    print(“=” * 80)
    print(“SECTION 6: Complex Nested Structures”)
    print(“=” * 80)

    from enum import Enum

    class OrderStatus(str, Enum):
    PENDING = “pending”
    PROCESSING = “processing”
    SHIPPED = “shipped”
    DELIVERED = “delivered”
    CANCELLED = “cancelled”

    @dataclass
    class OrderItem:
    product_name: str
    quantity: int
    unit_price: float
    total_price: Optional[float] = None

    @dataclass
    class ShippingInfo:
    carrier: str
    tracking_number: str
    estimated_delivery: date

    @dataclass
    class Order:
    order_id: str
    customer_name: str
    customer_email: str
    status: OrderStatus
    items: List[OrderItem]
    order_date: datetime
    shipping_info: Optional[ShippingInfo] = None
    total_amount: Optional[float] = None
    notes: Optional[str] = None

    class OrderItemFactory(DataclassFactory[OrderItem]):
    @classmethod
    def product_name(cls) -> str:
    products = [“Laptop”, “Mouse”, “Keyboard”, “Monitor”, “Headphones”,
    “Webcam”, “USB Cable”, “Phone Case”, “Charger”, “Tablet”]
    return cls.__random__.choice(products)

    @classmethod
    def quantity(cls) -> int:
    return cls.__random__.randint(1, 5)

    @classmethod
    def unit_price(cls) -> float:
    return round(cls.__random__.uniform(5.0, 500.0), 2)

    @classmethod
    def build(cls, **kwargs):
    instance = super().build(**kwargs)
    if instance.total_price is None:
    instance.total_price = round(instance.quantity * instance.unit_price, 2)
    return instance

    class ShippingInfoFactory(DataclassFactory[ShippingInfo]):
    @classmethod
    def carrier(cls) -> str:
    carriers = [“FedEx”, “UPS”, “DHL”, “USPS”]
    return cls.__random__.choice(carriers)

    @classmethod
    def tracking_number(cls) -> str:
    return ”.join(cls.__random__.choices(‘0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ’, k=12))

    class OrderFactory(DataclassFactory[Order]):
    @classmethod
    def order_id(cls) -> str:
    return f”ORD-{datetime.now().year}-{cls.__random__.randint(100000, 999999)}”

    @classmethod
    def items(cls) -> List[OrderItem]:
    return OrderItemFactory.batch(cls.__random__.randint(1, 5))

    @classmethod
    def build(cls, **kwargs):
    instance = super().build(**kwargs)
    if instance.total_amount is None:
    instance.total_amount = round(sum(item.total_price for item in instance.items), 2)
    if instance.shipping_info is None and instance.status in [OrderStatus.SHIPPED, OrderStatus.DELIVERED]:
    instance.shipping_info = ShippingInfoFactory.build()
    return instance

    orders = OrderFactory.batch(2)
    print(“Generated Orders:”)
    for order in orders:
    print(f”\n Order {order.order_id}”)
    print(f” Customer: {order.customer_name} ({order.customer_email})”)
    print(f” Status: {order.status.value}”)
    print(f” Items ({len(order.items)}):”)
    for item in order.items:
    print(f” – {item.quantity}x {item.product_name} @ ${item.unit_price:.2f} = ${item.total_price:.2f}”)
    print(f” Total: ${order.total_amount:.2f}”)
    if order.shipping_info:
    print(f” Shipping: {order.shipping_info.carrier} – {order.shipping_info.tracking_number}”)
    print(“\n”)

    We build more complex domain logic by introducing calculated and dependent fields within factories. We show how we can derive values such as final prices, totals, and shipping details after object creation. This allows us to model realistic business rules directly inside our test data generators.

    print(“=” * 80)
    print(“SECTION 7: Attrs Integration”)
    print(“=” * 80)

    import attrs
    from polyfactory.factories.attrs_factory import AttrsFactory

    @attrs.define
    class BlogPost:
    title: str
    author: str
    content: str
    views: int = 0
    likes: int = 0
    published: bool = False
    published_at: Optional[datetime] = None
    tags: List[str] = attrs.field(factory=list)

    class BlogPostFactory(AttrsFactory[BlogPost]):
    @classmethod
    def title(cls) -> str:
    templates = [
    “10 Tips for {}”,
    “Understanding {}”,
    “The Complete Guide to {}”,
    “Why {} Matters”,
    “Getting Started with {}”
    ]
    topics = [“Python”, “Data Science”, “Machine Learning”, “Web Development”, “DevOps”]
    template = cls.__random__.choice(templates)
    topic = cls.__random__.choice(topics)
    return template.format(topic)

    @classmethod
    def content(cls) -> str:
    return ” “.join(Faker().sentences(nb=cls.__random__.randint(3, 8)))

    @classmethod
    def views(cls) -> int:
    return cls.__random__.randint(0, 10000)

    @classmethod
    def likes(cls) -> int:
    return cls.__random__.randint(0, 1000)

    @classmethod
    def tags(cls) -> List[str]:
    all_tags = [“python”, “tutorial”, “beginner”, “advanced”, “guide”,
    “tips”, “best-practices”, “2024”]
    return cls.__random__.sample(all_tags, k=cls.__random__.randint(2, 5))

    posts = BlogPostFactory.batch(3)
    print(“Generated Blog Posts:”)
    for post in posts:
    print(f”\n ‘{post.title}'”)
    print(f” Author: {post.author}”)
    print(f” Views: {post.views:,} | Likes: {post.likes:,}”)
    print(f” Published: {post.published}”)
    print(f” Tags: {‘, ‘.join(post.tags)}”)
    print(f” Preview: {post.content[:100]}…”)
    print(“\n”)

    print(“=” * 80)
    print(“SECTION 8: Building with Specific Overrides”)
    print(“=” * 80)

    custom_person = PersonFactory.build(
    name=”Alice Johnson”,
    age=30,
    email=”[email protected]”
    )
    print(f”Custom Person:”)
    print(f” Name: {custom_person.name}”)
    print(f” Age: {custom_person.age}”)
    print(f” Email: {custom_person.email}”)
    print(f” ID (auto-generated): {custom_person.id}”)
    print()

    vip_customers = PersonFactory.batch(
    3,
    bio=”VIP Customer”
    )
    print(“VIP Customers:”)
    for customer in vip_customers:
    print(f” {customer.name}: {customer.bio}”)
    print(“\n”)

    We extend Polyfactory usage to validated Pydantic models and attrs-based classes. We demonstrate how we can respect field constraints, validators, and default behaviors while still generating valid data at scale. It ensures our mock data remains compatible with real application schemas.

    print(“=” * 80)
    print(“SECTION 9: Field-Level Control with Use and Ignore”)
    print(“=” * 80)

    from polyfactory.fields import Use, Ignore

    @dataclass
    class Configuration:
    app_name: str
    version: str
    debug: bool
    created_at: datetime
    api_key: str
    secret_key: str

    class ConfigFactory(DataclassFactory[Configuration]):
    app_name = Use(lambda: “MyAwesomeApp”)
    version = Use(lambda: “1.0.0”)
    debug = Use(lambda: False)

    @classmethod
    def api_key(cls) -> str:
    return f”api_key_{”.join(cls.__random__.choices(‘0123456789abcdef’, k=32))}”

    @classmethod
    def secret_key(cls) -> str:
    return f”secret_{”.join(cls.__random__.choices(‘0123456789abcdef’, k=64))}”

    configs = ConfigFactory.batch(2)
    print(“Generated Configurations:”)
    for config in configs:
    print(f” App: {config.app_name} v{config.version}”)
    print(f” Debug: {config.debug}”)
    print(f” API Key: {config.api_key[:20]}…”)
    print(f” Created: {config.created_at}”)
    print()
    print()

    print(“=” * 80)
    print(“SECTION 10: Model Coverage Testing”)
    print(“=” * 80)

    from pydantic import BaseModel, ConfigDict
    from typing import Union

    class PaymentMethod(BaseModel):
    model_config = ConfigDict(use_enum_values=True)
    type: str
    card_number: Optional[str] = None
    bank_name: Optional[str] = None
    verified: bool = False

    class PaymentMethodFactory(ModelFactory[PaymentMethod]):
    __model__ = PaymentMethod

    payment_methods = [
    PaymentMethodFactory.build(type=”card”, card_number=”4111111111111111″),
    PaymentMethodFactory.build(type=”bank”, bank_name=”Chase Bank”),
    PaymentMethodFactory.build(verified=True),
    ]

    print(“Payment Method Coverage:”)
    for i, pm in enumerate(payment_methods, 1):
    print(f” {i}. Type: {pm.type}”)
    if pm.card_number:
    print(f” Card: {pm.card_number}”)
    if pm.bank_name:
    print(f” Bank: {pm.bank_name}”)
    print(f” Verified: {pm.verified}”)
    print(“\n”)

    print(“=” * 80)
    print(“TUTORIAL SUMMARY”)
    print(“=” * 80)
    print(“””
    This tutorial covered:

    1. ✓ Basic Dataclass Factories – Simple mock data generation
    2. ✓ Custom Field Generators – Controlling individual field values
    3. ✓ Field Constraints – Using PostGenerated for calculated fields
    4. ✓ Pydantic Integration – Working with validated models
    5. ✓ Complex Nested Structures – Building related objects
    6. ✓ Attrs Support – Alternative to dataclasses
    7. ✓ Build Overrides – Customizing specific instances
    8. ✓ Use and Ignore – Explicit field control
    9. ✓ Coverage Testing – Ensuring comprehensive test data

    Key Takeaways:
    – Polyfactory automatically generates mock data from type hints
    – Customize generation with classmethods and decorators
    – Supports multiple libraries: dataclasses, Pydantic, attrs, msgspec
    – Use PostGenerated for calculated/dependent fields
    – Override specific values while keeping others random
    – Perfect for testing, development, and prototyping

    For more information:
    – Documentation: https://polyfactory.litestar.dev/
    – GitHub: https://github.com/litestar-org/polyfactory
    “””)
    print(“=” * 80)

    We cover advanced usage patterns such as explicit overrides, constant field values, and coverage testing scenarios. We show how we can intentionally construct edge cases and variant instances for robust testing. This final step ties everything together by demonstrating how Polyfactory supports comprehensive and production-grade test data strategies.

    In conclusion, we demonstrated how Polyfactory enables us to create comprehensive, flexible test data with minimal boilerplate while still retaining fine-grained control over every field. We showed how to handle simple entities, complex nested structures, and Pydantic model validation, as well as explicit field overrides, within a single, consistent factory-based approach. Overall, we found that Polyfactory enables us to move faster and test more confidently, as it reliably generates realistic datasets that closely mirror production-like scenarios without sacrificing clarity or maintainability.

    Check out the FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.



    Source link

    synthesia
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Crypto Expert
    • Website

    Related Posts

    Physical AI moves closer to factory floors as companies test humanoid robots

    May 14, 2026

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    May 13, 2026

    Study: Firms often use automation to control certain workers’ wages | MIT News

    May 11, 2026

    AI tool poisoning exposes a major flaw in enterprise agent security

    May 10, 2026
    Add A Comment

    Comments are closed.

    10web
    Latest Posts

    The Best TSX Stocks to Buy Now If You Want Both Income and Growth

    May 14, 2026

    Should Bitcoin Investors Be Worried?

    May 14, 2026

    Kelp DAO, Aave Advances rsETH Recovery

    May 14, 2026

    Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

    May 13, 2026

    Upexi Stock Falls Amid Q3 Widened Net Loss on Solana Holdings

    May 13, 2026
    frase
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Physical AI moves closer to factory floors as companies test humanoid robots

    May 14, 2026

    Bitcoin Firm Nakamoto Surges In Revenue But Bleeds Cash In Q1

    May 14, 2026
    synthesia
    Facebook X (Twitter) Instagram Pinterest
    © 2026 BlockAIReport.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.