GRAPH-BASED PRODUCT CONFIGURATION

A Technical Architecture for Composable Quoting,

Pricing, and Production Tracking at Scale

Moving Beyond Relational Table-Driven MIS Systems

Technical Reference Document

February 2026

1. The Problem: Why Relational MIS Systems Fail at Complexity

2. Core Concept: Products as Directed Acyclic Graphs (DAGs)

3. Data Modeling: The Product Configuration Graph

4. The Node Type System: Extensibility Without Schema Changes

5. Pricing Architecture: Function Pipelines Over Table Lookups

6. The Quote Engine: Tree Traversal and Memoization

7. Storage Strategy: Document vs. Relational Hybrid

8. Event Sourcing: Granular Production Tracking

9. CQRS: Separating Reads from Writes

10. Caching, Performance, and Scale

11. Migration Strategy: From Legacy to Graph

12. Implementation Reference: Data Structures and Pseudocode

13. Architecture Decision Records

14. Graphs vs. Relational: A Deep Side-by-Side Comparison

15. Why This Architecture Isn’t Universal: Honest Bottlenecks

16. Incremental Adoption: What to Build First and Why

17. Graph Theory, Academic Foundations, and Cross-Domain Insights

18. Bottom-Up Pricing: Complete Step-by-Step Walkthrough

19. Quantity Propagation, Waste Cascading, and Cross-Node Dependencies

20. Pitfalls, Edge Cases, and Failure Modes

21. Glossary

1. The Problem: Why Relational MIS Systems Fail at Complexity

Traditional print Management Information Systems (MIS) were designed in an era when print products were relatively uniform: business cards, letterheads, brochures, and simple packaging. The data model reflected this simplicity. A job had a substrate, a press, a run length, and a handful of finishing operations. Pricing was a series of table lookups.

This architecture fails catastrophically when products become composable. Consider a modern specialty card product: a rigid card with a PET overlay, spot UV on select areas, foil stamping in two colors, a die-cut shape, edge painting, and a custom insert. Each layer can have its own substrate, its own set of processes, and its own quantity breaks. Processes can be conditional on other processes. The die cut affects the foil area which affects the foil cost.

1.1 The Table Lookup Bottleneck

In a relational MIS, each variable in the product specification triggers a database query. The system architecture typically follows this pattern:

User selects a base product. System queries the products table.
User selects a substrate. System queries the substrates table, joins with product-substrate compatibility table.
User selects a process (e.g., spot UV). System queries the process table, the process-rates table, the press-capability table.
System calculates pricing by joining rate tables with quantity break tables with markup tables.

For a simple product with 3 variables, this might be 8-12 queries. For your card with multiple layers and processes, this becomes 50-200+ sequential database round trips. Each round trip has network latency (typically 1-5ms even on a local database), query parsing overhead, and lock contention. At 200 queries averaging 3ms each, you are waiting 600ms just for database I/O before any business logic executes.

Key Insight

The fundamental problem is not that relational databases are slow. It is that the architecture requires one query per variable, and composable products have an unbounded number of variables. The solution is not faster queries. It is fewer queries.

1.2 The Schema Rigidity Problem

Relational MIS systems encode product structure in the schema itself. There is a layers table, a processes table, a layer_processes junction table, and so on. Each table has a fixed set of columns representing the known attributes of that entity type.

When you need to add a new process type that has attributes no existing process has (say, edge painting, which has a color attribute and a coverage percentage that no other finishing process uses), you face three bad options:

Add nullable columns to the processes table. This leads to increasingly sparse tables with dozens of unused columns, makes queries slower, and makes the schema incomprehensible.
Create a new table for the new process type with its own schema. This fractures your data model and forces the quoting engine to know about every table.
Use an Entity-Attribute-Value (EAV) pattern where attributes are stored as key-value rows. This solves the schema problem but destroys query performance and makes the data nearly impossible to reason about.

None of these options scale. The core issue is that relational schemas are designed for data that has a known, fixed structure. Composable products have a structure that is defined by the user at configuration time, not by the developer at design time.

1.3 The Coupling Problem

In most legacy MIS platforms, three concerns that should be independent are deeply entangled:

Product Definition: what is this product, what are its components, what are the valid configurations?
Pricing Logic: given a fully defined product, what does it cost?
Production Tracking: as this product moves through the shop, what is the status of each operation?

When these are mixed together in the same tables and stored procedures, any change to one concern risks breaking the others. Adding a new process type means modifying the product definition schema, the pricing stored procedures, the UI forms, and the production tracking workflow. This is why the system is hard to improve: every change is a cross-cutting concern.

2. Core Concept: Products as Directed Acyclic Graphs (DAGs)

The central architectural insight of this document is that a composable product is not a row in a table. It is a graph. Specifically, it is a Directed Acyclic Graph (DAG) where:

Each node represents a component, layer, process, or assembly step.
Each directed edge represents a 'composed of' or 'depends on' relationship.
The graph is acyclic because a component cannot contain itself (no circular dependencies).

2.1 What Is a DAG and Why Not a Tree?

A tree is a special case of a DAG where each node has exactly one parent. In a product configuration, a process can apply to multiple layers (e.g., a single lamination step that bonds layer 1 and layer 2 together). This means a process node can have multiple parent nodes, which makes it a DAG, not a tree.

In practice, most product configurations look tree-like, with occasional shared nodes. The important thing is that the data model supports both without requiring structural changes.

Here is a concrete example of a multi-layer specialty card modeled as a DAG:

Card (root node)

├── Layer 1: "Base Card"

│ type: substrate_layer

│ material: 350gsm Silk Art Board

│ dimensions: 88mm x 55mm

│ ├── Process: CMYK Offset Print (front)

│ │ type: print_process

│ │ colors: 4

│ │ sides: 1

│ │ coverage: 85%

│ ├── Process: CMYK Offset Print (back)

│ │ type: print_process

│ │ colors: 4

│ │ sides: 1

│ │ coverage: 60%

│ ├── Process: Spot UV

│ │ type: coating_process

│ │ area_percentage: 30%

│ │ mask_file: "logo-uv-mask.pdf"

│ └── Process: Die Cut

│ type: cutting_process

│ die_reference: "DC-4421"

│ complexity: "medium"

├── Layer 2: "Clear Overlay"

│ type: substrate_layer

│ material: 200μm Clear PET

│ dimensions: 88mm x 55mm

│ └── Process: Screen Print (white)

│ type: print_process

│ ink: "opaque white"

│ passes: 2

│ coverage: 40%

└── Assembly: Laminate

type: assembly_process

method: "pressure-sensitive adhesive"

inputs: [Layer 1, Layer 2] ← DAG: two parents

Every single node in this graph has a unique identifier, a type, a set of attributes specific to that type, and references to its children. This is the product definition. It is complete, self-describing, and arbitrarily deep.

2.2 Why This Model Is Fundamentally Different

In the relational model, the schema defines the structure. You must know in advance how many layers are possible, what process types exist, and what attributes each has. In the graph model, the schema defines only the rules for what types of nodes exist and how they can connect. The actual structure of any given product is data, not schema.

This means:

Adding a new layer to a card does not require a schema migration. You create a new node and attach it to the root.
Adding a new process type (e.g., embossing) does not require new tables. You define a new node type with its attributes and register it in the type system.
A product with 2 layers and one with 20 layers use the same code paths. The only difference is the depth and breadth of the graph.

3. Data Modeling: The Product Configuration Graph

This chapter specifies the exact data structures that underpin the product configuration graph. These structures are designed to be stored in a document database or in a JSONB column within PostgreSQL.

3.1 The Universal Node Schema

Every node in the product configuration graph conforms to a universal schema. This is critical: the system never needs to ask 'what kind of thing is this?' to know how to traverse, store, or serialize it. The schema:

interface ConfigNode {

id: string; // UUID, globally unique

type: string; // e.g., 'substrate_layer', 'print_process'

version: number; // incremented on each mutation

created_at: ISO8601;

updated_at: ISO8601;

// The type-specific attributes (open schema)

attributes: Record<string, any>;

// Ordered list of child node IDs

children: string[];

// Parent node IDs (supports DAG, not just tree)

parents: string[];

// Pricing function reference (resolved at quote time)

pricing_function_id: string | null;

// Production tracking metadata

tracking: {

status: 'pending' | 'in_progress' | 'complete' | 'on_hold';

assigned_to: string | null;

station: string | null;

started_at: ISO8601 | null;

completed_at: ISO8601 | null;

};

}

The attributes field is the key to extensibility. It is a free-form JSON object whose expected shape is defined by the node's type. A substrate_layer node has material, dimensions, and gsm attributes. A print_process node has colors, coverage, and sides attributes. But the storage layer does not care about this distinction: it stores and retrieves the same shape regardless of content.

3.2 The Configuration Document

A complete product configuration is stored as a single document (not spread across tables). The document contains the root node and a flat map of all nodes in the graph:

interface ProductConfiguration {

id: string; // Config UUID

name: string; // Human-readable name

version: number;

root_node_id: string; // Entry point for traversal

nodes: Record<string, ConfigNode>; // Flat map: node_id -> node

metadata: {

created_by: string;

created_at: ISO8601;

updated_at: ISO8601;

template_id: string | null; // If cloned from a template

tags: string[];

};

}

Why a Flat Map Instead of Nested Objects?

Storing nodes in a flat map with ID references (rather than nesting children directly) is essential for DAG support. If node A and node B both reference node C as a child, a nested structure would duplicate C. A flat map stores C once and lets A and B reference its ID. It also enables O(1) node lookup by ID, which is critical for the pricing engine.

3.3 Concrete Example: The Card as a Document

Here is the specialty card from Chapter 2 stored as an actual JSON configuration document. Study this carefully; it is the canonical reference for the data model:

{

"id": "config-001",

"name": "Premium Layered Business Card",

"version": 1,

"root_node_id": "node-root",

"nodes": {

"node-root": {

"id": "node-root",

"type": "product",

"attributes": {

"product_family": "business_card",

"quantity": 5000

},

"children": ["node-layer1", "node-layer2", "node-asm"],

"parents": [],

"pricing_function_id": "pf-product-aggregator"

},

"node-layer1": {

"id": "node-layer1",

"type": "substrate_layer",

"attributes": {

"material": "350gsm Silk Art Board",

"width_mm": 88,

"height_mm": 55,

"grain_direction": "long"

},

"children": ["node-cmyk-f", "node-cmyk-b", "node-uv",

"node-die"],

"parents": ["node-root"],

"pricing_function_id": "pf-substrate-cost"

},

"node-cmyk-f": {

"id": "node-cmyk-f",

"type": "print_process",

"attributes": {

"method": "offset",

"colors": 4,

"side": "front",

"coverage_pct": 85

},

"children": [],

"parents": ["node-layer1"],

"pricing_function_id": "pf-offset-print"

},

"node-uv": {

"id": "node-uv",

"type": "coating_process",

"attributes": {

"coating_type": "spot_uv",

"area_pct": 30,

"thickness_microns": 25,

"mask_file": "logo-uv-mask.pdf"

},

"children": [],

"parents": ["node-layer1"],

"pricing_function_id": "pf-spot-uv"

},

"node-asm": {

"id": "node-asm",

"type": "assembly_process",

"attributes": {

"method": "pressure_sensitive_adhesive",

"registration_tolerance_mm": 0.5

},

"children": [],

"parents": ["node-root"],

"pricing_function_id": "pf-lamination"

}

}

}

Notice that the entire product, with all its layers, processes, and assembly steps, is one document. Loading it requires exactly one database read, not fifty.

4. The Node Type System: Extensibility Without Schema Changes

The node type system is what makes the architecture extensible without code deployments or database migrations. It is a registry of type definitions that describe what attributes a node of each type can have, what children it can have, and what pricing function it defaults to.

4.1 Type Definition Schema

interface NodeTypeDefinition {

type_id: string; // e.g., 'coating_process'

display_name: string; // e.g., 'Coating / Varnish'

category: 'product' | 'layer' | 'process' | 'assembly';

// JSON Schema for the attributes field

attribute_schema: JSONSchema;

// What node types can be children of this type

allowed_child_types: string[];

// Default pricing function (can be overridden per node)

default_pricing_function_id: string;

// Validation rules beyond schema

constraints: Constraint[];

// UI rendering hints

ui_config: {

icon: string;

color: string;

form_layout: FormField[];

};

}

The attribute_schema field uses JSON Schema, a widely-supported standard for describing the shape of JSON data. This means validation happens at the application layer, not the database layer. When a user configures a product, the UI reads the type definition to render the correct form fields, and the backend validates the submitted attributes against the schema.

4.2 Example Type Definitions

Here are two example type definitions that illustrate how different process types can coexist without schema changes:

Spot UV Coating Type

{

"type_id": "coating_spot_uv",

"display_name": "Spot UV Coating",

"category": "process",

"attribute_schema": {

"type": "object",

"properties": {

"area_pct": { "type": "number", "min": 1, "max": 100 },

"thickness_microns": { "type": "number", "default": 25 },

"mask_file": { "type": "string", "format": "uri" },

"finish": { "type": "string", "enum": ["gloss", "matte"] }

},

"required": ["area_pct"]

},

"allowed_child_types": [],

"default_pricing_function_id": "pf-spot-uv"

}

Edge Painting Type

{

"type_id": "finishing_edge_paint",

"display_name": "Edge Painting",

"category": "process",

"attribute_schema": {

"type": "object",

"properties": {

"color": { "type": "string" },

"color_code": { "type": "string", "pattern": "^PMS" },

"edges": {

"type": "array",

"items": { "enum": ["top","bottom","left","right"] }

},

"total_thickness_mm": { "type": "number" }

},

"required": ["color", "edges", "total_thickness_mm"]

},

"allowed_child_types": [],

"default_pricing_function_id": "pf-edge-paint"

}

The key point: these two process types have completely different attributes (area percentage vs. edge selection, mask files vs. color codes), yet they coexist in the same nodes collection without any schema changes. The type system tells the UI what form to render and tells the validator what shape to expect. The storage layer is oblivious to the difference.

4.3 Adding a New Type at Runtime

When your business needs to support a new process (say, holographic foil stamping), the workflow is:

Define the new type with its attribute schema, allowed child types, and default pricing function.
Register the type definition in the type registry (a simple database table or document).
Implement and register the pricing function for the new type.
The UI automatically renders the correct form based on the type definition. No frontend deployment needed if the UI is driven by the type schema.

There are zero database migrations. Zero schema changes. Zero changes to the quoting engine core code. The engine already knows how to traverse nodes and invoke pricing functions. It simply encounters a new type with a new function, and it works.

5. Pricing Architecture: Function Pipelines Over Table Lookups

This is the chapter that directly addresses the performance problem. The pricing architecture replaces sequential database lookups with in-memory function evaluation. The result is quoting that runs in milliseconds instead of seconds, regardless of product complexity.

5.1 The Pricing Function Contract

Every pricing function in the system conforms to a single interface:

interface PricingFunction {

id: string;

name: string;

evaluate(

node: ConfigNode,

context: PricingContext,

children_results: PricingResult[]

): PricingResult;

}

interface PricingContext {

quantity: number;

rate_tables: Record<string, RateTable>; // Pre-loaded

quantity_curves: Record<string, QuantityCurve>;

markup_rules: MarkupRule[];

global_params: Record<string, any>; // waste factors etc.

}

interface PricingResult {

node_id: string;

setup_cost: number;

unit_cost: number;

total_cost: number;

waste_cost: number;

breakdown: LineItem[]; // Human-readable breakdown

cached: boolean;

}

The critical design decision: the PricingContext contains all rate data, pre-loaded into memory. When the quoting engine starts, it loads all relevant rate tables in a single batch query (or from a cache). From that point on, no database queries are made during pricing computation. This is why the engine is fast.

5.2 Example: The Spot UV Pricing Function

Here is a concrete pricing function implementation for spot UV coating. Notice that it reads from the pre-loaded context, not from the database:

function evaluateSpotUV(node, context, children_results) {

const { area_pct, thickness_microns } = node.attributes;

const qty = context.quantity;

// Read from pre-loaded rate table (already in memory)

const rates = context.rate_tables['spot_uv'];

// Setup cost: fixed per run

const setup = rates.setup_base;

// Material cost: based on area and thickness

const area_factor = area_pct / 100;

const thickness_factor = thickness_microns / rates.base_thickness;

const material_per_unit = rates.material_rate

* area_factor

* thickness_factor;

// Apply quantity curve (pre-loaded)

const qty_curve = context.quantity_curves['coating'];

const qty_multiplier = qty_curve.evaluate(qty);

// Waste factor from global params

const waste_factor = context.global_params.coating_waste_pct;

const effective_qty = qty * (1 + waste_factor / 100);

const unit_cost = material_per_unit * qty_multiplier;

const total = setup + (unit_cost * effective_qty);

return {

node_id: node.id,

setup_cost: setup,

unit_cost: unit_cost,

total_cost: total,

waste_cost: unit_cost * (effective_qty - qty),

breakdown: [

{ label: 'UV Setup', amount: setup },

{ label: `UV Material (${area_pct}% area)`,

amount: unit_cost * qty },

{ label: `UV Waste (${waste_factor}%)`,

amount: unit_cost * (effective_qty - qty) },

],

cached: false

};

}

This function executes in microseconds. There are no database calls, no network round trips, no query parsing. It is pure arithmetic on pre-loaded data. Even if you have 50 such functions in a complex product, the total pricing computation takes less than a millisecond.

5.3 The Aggregator Pattern

Parent nodes in the graph use aggregator pricing functions that roll up their children's results. The root product node, for example, sums all children and applies markup:

function evaluateProductAggregator(node, context, children_results) {

const subtotal = children_results.reduce(

(sum, r) => sum + r.total_cost, 0

);

// Apply markup rules (also pre-loaded)

const markup = context.markup_rules

.filter(r => r.applies_to(node, context))

.reduce((price, rule) => rule.apply(price), subtotal);

return {

node_id: node.id,

setup_cost: children_results.reduce((s,r) => s + r.setup_cost, 0),

unit_cost: markup / context.quantity,

total_cost: markup,

waste_cost: children_results.reduce((s,r) => s + r.waste_cost, 0),

breakdown: [

...children_results.flatMap(r => r.breakdown),

{ label: 'Markup', amount: markup - subtotal },

],

cached: false

};

}

5.4 Dependency-Aware Pricing

Some processes depend on the results of other processes. For example, die cutting cost depends on the substrate thickness, which is an attribute of the parent layer, not the die cut node itself. The pricing context solves this:

function evaluateDieCut(node, context, children_results) {

// Walk up to the parent layer to get material thickness

const parent_layer = context.resolve_node(node.parents[0]);

const thickness = parent_layer.attributes.gsm;

// Thicker stock = more die pressure = higher cost

const thickness_multiplier = thickness > 300 ? 1.25 : 1.0;

// ... rest of pricing logic

}

Because the entire configuration graph is loaded into memory as a single document, resolving parent or sibling nodes is an O(1) hash map lookup, not a database join.

6. The Quote Engine: Tree Traversal and Memoization

The quote engine is the orchestrator that ties together the product graph and the pricing functions. It is surprisingly simple because the complexity has been pushed into the data model and the pricing functions.

6.1 The Core Algorithm

The engine uses a bottom-up (post-order) traversal of the product graph. It evaluates leaf nodes first (processes with no children), then their parents, then their parents' parents, up to the root:

function generateQuote(config: ProductConfiguration): Quote {

// STEP 1: Load all rate data in a single batch

const context = loadPricingContext(config);

// STEP 2: Build memoization cache

const cache = new Map<string, PricingResult>();

// STEP 3: Recursive post-order traversal

function evaluateNode(nodeId: string): PricingResult {

// Check cache first (critical for DAG shared nodes)

if (cache.has(nodeId)) return cache.get(nodeId);

const node = config.nodes[nodeId];

// Evaluate all children first (post-order)

const childResults = node.children.map(

childId => evaluateNode(childId)

);

// Resolve and execute this node's pricing function

const pricingFn = resolvePricingFunction(

node.pricing_function_id

);

const result = pricingFn.evaluate(

node, context, childResults

);

// Cache the result

cache.set(nodeId, result);

return result;

}

// STEP 4: Start from root

const rootResult = evaluateNode(config.root_node_id);

// STEP 5: Assemble the quote

return {

config_id: config.id,

total: rootResult.total_cost,

unit_price: rootResult.unit_cost,

breakdown: rootResult.breakdown,

all_node_results: Object.fromEntries(cache),

generated_at: new Date().toISOString()

};

}

6.2 Memoization and DAG Handling

The memoization cache is essential for correctness and performance in a DAG. If the lamination assembly node is a child of both the root product and is referenced by another assembly step, without memoization it would be priced twice. With memoization, the second reference hits the cache and returns instantly.

This also enables efficient 'what-if' quoting. If the user changes one attribute of one node (say, increasing the spot UV area from 30% to 50%), the engine can invalidate only the cache entries for that node and its ancestors. All other nodes retain their cached results. In practice, re-quoting after a single change evaluates only 2-5 nodes instead of the entire graph.

6.3 Performance Analysis

Metric	Legacy Table-Driven MIS	Graph-Based Engine
DB queries per quote	50–200+ (sequential)	1–3 (batch load)
Compute time (simple product)	200–500ms	< 5ms
Compute time (complex product)	2–8 seconds	10–50ms
Re-quote after single change	Full recalculation	2–5 node re-evaluation
Memory usage	Low (all on DB)	Moderate (rate tables in RAM)
Adding new process type	Schema migration + code	Type registration only

The speed difference is not incremental. It is architectural. The legacy system is I/O-bound (waiting for database). The graph engine is CPU-bound (doing arithmetic), and modern CPUs do arithmetic in nanoseconds.

7. Storage Strategy: Document vs. Relational Hybrid

A common misconception is that adopting a graph-based product model means abandoning relational databases entirely. It does not. The optimal architecture uses a hybrid approach that leverages the strengths of each paradigm.

7.1 What Goes Where

Data Type	Storage	Rationale
Product configurations	Document store (JSONB or MongoDB)	Hierarchical, schema-variable, loaded as unit
Node type definitions	Relational table	Fixed schema, queried by type_id, rarely changes
Rate tables	Relational tables	Tabular data, queried by ranges, updated independently
Pricing functions	Code registry (in-app)	Executable logic, version-controlled in source
Quotes (generated)	Document store	Snapshot of config + pricing results, immutable
Events / audit trail	Append-only event store	Time-series, never updated, high write throughput
Users, customers, orders	Relational tables	Standard CRUD data with referential integrity

7.2 PostgreSQL JSONB: The Practical Sweet Spot

For most teams, PostgreSQL with JSONB columns offers the best of both worlds. You get document storage semantics for configurations and quotes, relational storage for everything else, ACID transactions across both, and the operational simplicity of a single database engine.

The product configurations table would look like this:

CREATE TABLE product_configurations (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

name TEXT NOT NULL,

version INTEGER NOT NULL DEFAULT 1,

root_node_id TEXT NOT NULL,

config JSONB NOT NULL, -- The entire node graph

metadata JSONB DEFAULT '{}',

created_at TIMESTAMPTZ DEFAULT NOW(),

updated_at TIMESTAMPTZ DEFAULT NOW(),

-- GIN index for querying into the JSON

CONSTRAINT valid_config CHECK (

config ? 'nodes' AND config->'nodes' ? root_node_id

)

);

-- Index for finding configs by node type

CREATE INDEX idx_config_node_types ON product_configurations

USING GIN ((config->'nodes'));

-- Index for finding configs by product family

CREATE INDEX idx_config_family ON product_configurations

((config->'nodes'->root_node_id->'attributes'->>'product_family'));

The key insight is that the JSON document is the product definition, but you can still create indexes into it for search and filtering. You get the flexibility of a document model with the query power of a relational database.

7.3 Loading Strategy: One Read, Full Graph

When the quoting engine needs to price a configuration, it performs exactly one database read:

SELECT config FROM product_configurations WHERE id = $1;

This returns the entire product graph as a single JSON document. The engine deserializes it into the in-memory ConfigNode structures and proceeds with evaluation. Compare this to the legacy approach of 50-200 sequential queries, and the performance advantage becomes obvious.

Rate tables are loaded similarly in batch:

SELECT table_name, rates FROM rate_tables

WHERE table_name = ANY($1); -- Array of needed table names

This is typically 1-3 queries total regardless of product complexity. The rate table names needed can be determined by scanning the node types in the configuration graph.

8. Event Sourcing: Granular Production Tracking

Event sourcing is the architectural pattern that enables the granular tracking you need. Instead of updating a status field on a database row (which overwrites history), every state change is recorded as an immutable event. The current state of any object is computed by replaying its events.

8.1 The Event Schema

interface ProductionEvent {

event_id: string; // UUID

event_type: string; // e.g., 'process_started'

timestamp: ISO8601;

// What this event is about

config_id: string; // Which product configuration

node_id: string; // Which specific node in the graph

// Who / what triggered this event

actor: {

type: 'operator' | 'machine' | 'system';

id: string;

name: string;

};

// Event-specific data

payload: Record<string, any>;

// Sequence number for ordering

sequence: number;

}

8.2 Example Event Stream for a Single Node

Here is the event stream for the spot UV process on layer 1 of the card, from job creation through completion:

{ event_type: 'node_created',

node_id: 'node-uv',

payload: { initial_status: 'pending' } }

{ event_type: 'process_scheduled',

node_id: 'node-uv',

payload: { station: 'UV-Coater-3', scheduled: '2026-03-15T09:00' } }

{ event_type: 'process_started',

node_id: 'node-uv',

actor: { type: 'operator', id: 'emp-42', name: 'John' },

payload: { station: 'UV-Coater-3', sheets_planned: 5200 } }

{ event_type: 'waste_recorded',

node_id: 'node-uv',

payload: { sheets_wasted: 85, reason: 'coating_defect' } }

{ event_type: 'process_completed',

node_id: 'node-uv',

actor: { type: 'operator', id: 'emp-42', name: 'John' },

payload: { sheets_good: 5115, duration_minutes: 45 } }

Because every event is immutable and timestamped, you can answer questions that traditional status-field tracking cannot:

How long did the UV coating actually take? (difference between started and completed timestamps)
What was the waste rate? (waste_recorded events / sheets_planned)
Who ran the job and on which machine? (actor and station fields)
Was the job rescheduled? (multiple process_scheduled events)
What was the status of this node at 2:30 PM yesterday? (replay events up to that timestamp)

8.3 Current State Projection

While event sourcing stores the full history, you still need fast access to current state for dashboards and queries. This is achieved through projections, which are materialized views that are updated as events arrive:

-- Materialized current state, updated by event processor

CREATE TABLE node_current_state (

config_id UUID NOT NULL,

node_id TEXT NOT NULL,

status TEXT NOT NULL,

station TEXT,

operator_id TEXT,

started_at TIMESTAMPTZ,

completed_at TIMESTAMPTZ,

sheets_good INTEGER,

sheets_wasted INTEGER,

last_event_sequence BIGINT,

updated_at TIMESTAMPTZ,

PRIMARY KEY (config_id, node_id)

);

This projection gives you O(1) lookup for current status while preserving full history in the event store. The projection is derived data that can be rebuilt from events at any time. If you need a new projection (say, a per-station workload view), you can create it by replaying the existing events without changing any existing code.

9. CQRS: Separating Reads from Writes

Command Query Responsibility Segregation (CQRS) is a pattern that naturally complements event sourcing and the graph-based product model. The core idea is that the data model optimized for writes (creating and modifying configurations, recording events) is different from the data model optimized for reads (listing jobs, showing dashboards, generating reports).

9.1 The Write Side: Commands

The write side handles commands: create a configuration, update a node attribute, record a production event, generate a quote. Each command validates, mutates state, and emits events:

// Command: Update a node attribute

async function updateNodeAttribute(cmd) {

const { config_id, node_id, attribute_key, new_value } = cmd;

// Load the configuration document

const config = await db.getConfig(config_id);

const node = config.nodes[node_id];

// Validate against the type schema

const typeDef = typeRegistry.get(node.type);

typeDef.validateAttribute(attribute_key, new_value);

// Apply the mutation

node.attributes[attribute_key] = new_value;

node.version++;

node.updated_at = new Date().toISOString();

config.version++;

// Persist (single document write)

await db.saveConfig(config);

// Emit event for downstream consumers

await eventBus.emit({

event_type: 'node_attribute_updated',

config_id, node_id,

payload: { attribute_key, old_value, new_value }

});

}

9.2 The Read Side: Query-Optimized Projections

The read side maintains denormalized views optimized for specific queries. These are updated asynchronously by consuming events from the write side:

// Read model: Job Board View (for shop floor dashboard)

CREATE TABLE read_job_board (

config_id UUID,

product_name TEXT,

customer_name TEXT,

total_nodes INTEGER,

nodes_complete INTEGER,

progress_pct NUMERIC,

current_bottleneck TEXT, -- which node is blocking?

estimated_completion TIMESTAMPTZ,

priority INTEGER,

updated_at TIMESTAMPTZ

);

// Read model: Station Queue (for machine operators)

CREATE TABLE read_station_queue (

station_id TEXT,

config_id UUID,

node_id TEXT,

process_type TEXT,

process_name TEXT,

scheduled_time TIMESTAMPTZ,

estimated_duration INTERVAL,

dependencies_met BOOLEAN,

priority INTEGER

);

Each read model is a projection that serves a specific use case. The job board projection aggregates node status for a high-level overview. The station queue projection denormalizes for operator-facing scheduling. Neither requires joining across tables at query time because the data is already shaped for the query.

9.3 The Benefits for Your Use Case

CQRS solves several problems you are likely facing:

The quoting UI can read from a fast, denormalized product catalog while the write side handles complex configuration mutations atomically.
The shop floor dashboard can read from a pre-computed job progress view without querying the event store in real time.
Reports (cost vs. estimate, waste analysis, throughput) can be built as dedicated projections without affecting the performance of the transactional system.
Each projection can be on its own update schedule: the job board might update every 5 seconds, while the monthly cost report projection updates nightly.

10. Caching, Performance, and Scale

10.1 Three-Layer Caching Strategy

The system uses three levels of caching to achieve consistent sub-100ms quoting performance:

Layer 1: Rate Table Cache

Rate tables change infrequently (typically when you renegotiate supplier prices). They are loaded into an in-memory cache (Redis or application-level) with a TTL of 1 hour, and invalidated explicitly when rates change. This eliminates the most common database reads.

Layer 2: Subtree Price Cache

When a product configuration has not changed, its pricing result is cached as a whole. The cache key is a hash of the configuration document content. Any mutation to any node changes the hash and invalidates the cache. This gives instant re-quoting for unchanged configurations.

Layer 3: Node-Level Memoization

During a single quote computation, the memoization map described in Chapter 6 prevents redundant computation of shared DAG nodes. This is per-request and lives only in the quote engine's working memory.

10.2 Scaling Patterns

Challenge	Solution	Implementation
High quote volume	Horizontal scaling of quote engine	Stateless quote service behind load balancer; all state in DB/cache
Large configurations (100+ nodes)	Parallel branch evaluation	Promise.all() on independent subtrees; merge at aggregator
Many concurrent users editing configs	Optimistic concurrency	Version field on configs; reject writes with stale version
High event throughput	Partitioned event store	Partition by config_id; each partition is append-only sequential
Read model latency	Async projection updates	Event consumers update read models; tolerate 1-5s staleness

10.3 Parallel Subtree Evaluation

One of the most powerful performance optimizations in the graph engine is parallel evaluation of independent subtrees. In the card example, Layer 1 and Layer 2 have no dependencies on each other. Their pricing subtrees can be evaluated in parallel:

async function evaluateNodeParallel(nodeId, config, context, cache) {

if (cache.has(nodeId)) return cache.get(nodeId);

const node = config.nodes[nodeId];

// Evaluate independent children in parallel

const childResults = await Promise.all(

node.children.map(childId =>

evaluateNodeParallel(childId, config, context, cache)

)

);

const pricingFn = resolvePricingFunction(node.pricing_function_id);

const result = pricingFn.evaluate(node, context, childResults);

cache.set(nodeId, result);

return result;

}

For a product with 10 independent layers, each with 5 processes, this evaluates all 50 process nodes in parallel rather than sequentially. On a modern multi-core machine, this can be 5-10x faster than serial evaluation for deeply branched products.

11. Migration Strategy: From Legacy to Graph

Migration from a table-driven MIS to a graph-based architecture is a multi-phase effort. The recommended approach uses the Strangler Fig pattern: build the new system alongside the old one, progressively routing traffic to the new system until the old system can be retired.

11.1 Phase 1: Shadow Mode (Weeks 1-4)

Build the graph engine as a standalone service. Write adapters that convert existing product definitions from the legacy tables into graph configurations. Run both engines in parallel: the legacy system generates the actual quote, and the graph engine generates a shadow quote. Compare results to validate accuracy.

11.2 Phase 2: Read Path Migration (Weeks 5-8)

Switch the quoting UI to read from the graph engine while the legacy system remains the source of truth for writes. All product configuration changes still happen in the legacy system and are synced to the graph engine via an adapter. This is low-risk: if the graph engine returns an incorrect quote, the legacy system is still available.

11.3 Phase 3: Write Path Migration (Weeks 9-16)

Build the new product configuration UI that writes directly to the graph engine. Maintain a reverse sync that writes back to the legacy system for any downstream processes that still depend on it (invoicing, shipping, etc.). This phase requires the most careful coordination.

11.4 Phase 4: Legacy Retirement (Weeks 17+)

Once all read and write paths go through the graph engine, the legacy system becomes a historical archive. It can be maintained in read-only mode for auditing purposes and eventually decommissioned.

Migration Rule of Thumb

Never attempt a big-bang migration. The Strangler Fig approach lets you validate the new system against the old one at every stage, reducing risk dramatically. If the graph engine produces a quote that differs from the legacy system by more than 1%, that is a bug in the graph engine that needs to be fixed before proceeding.

12. Implementation Reference: Data Structures and Pseudocode

This chapter provides the complete pseudocode for all core operations. These are designed to be directly translatable to TypeScript, Python, or any language an LLM or developer will use for implementation.

12.1 Configuration CRUD Operations

// CREATE a new configuration from a template

function createConfiguration(templateId, customizations) {

const template = db.getTemplate(templateId);

const config = deepClone(template);

config.id = generateUUID();

config.version = 1;

config.metadata.template_id = templateId;

// Apply customizations (e.g., quantity, material choices)

for (const [nodeId, attrs] of Object.entries(customizations)) {

const node = config.nodes[nodeId];

const typeDef = typeRegistry.get(node.type);

for (const [key, value] of Object.entries(attrs)) {

typeDef.validateAttribute(key, value);

node.attributes[key] = value;

}

}

db.saveConfig(config);

return config;

}

12.2 Adding a Node to an Existing Configuration

function addNode(configId, parentNodeId, nodeType, attributes) {

const config = db.getConfig(configId);

const parentNode = config.nodes[parentNodeId];

const typeDef = typeRegistry.get(nodeType);

// Validate: can this type be a child of the parent?

const parentTypeDef = typeRegistry.get(parentNode.type);

if (!parentTypeDef.allowed_child_types.includes(nodeType)) {

throw new Error(

`${nodeType} cannot be a child of ${parentNode.type}`

);

}

// Validate attributes against type schema

typeDef.validateAttributes(attributes);

// Create the new node

const newNode = {

id: generateUUID(),

type: nodeType,

version: 1,

created_at: now(),

updated_at: now(),

attributes: attributes,

children: [],

parents: [parentNodeId],

pricing_function_id: typeDef.default_pricing_function_id,

tracking: { status: 'pending' }

};

// Wire it into the graph

config.nodes[newNode.id] = newNode;

parentNode.children.push(newNode.id);

config.version++;

db.saveConfig(config);

eventBus.emit({

event_type: 'node_added',

config_id: configId,

node_id: newNode.id,

payload: { parent_id: parentNodeId, type: nodeType }

});

return newNode;

}

12.3 Complete Quote Generation with Caching

async function generateQuoteWithCache(configId, quantity) {

// Check for cached quote

const config = await db.getConfig(configId);

const cacheKey = hashConfig(config, quantity);

const cached = await cache.get(cacheKey);

if (cached) return cached;

// Determine which rate tables are needed

const nodeTypes = new Set(

Object.values(config.nodes).map(n => n.type)

);

const neededTables = typeRegistry

.getRequiredRateTables(nodeTypes);

// Batch-load all rate data (1-2 queries)

const rateTables = await db.getRateTables(neededTables);

const qtyCurves = await db.getQuantityCurves(neededTables);

// Build the pricing context

const context = {

quantity,

rate_tables: rateTables,

quantity_curves: qtyCurves,

markup_rules: await db.getMarkupRules(),

global_params: await db.getGlobalParams(),

resolve_node: (id) => config.nodes[id]

};

// Execute the pricing traversal

const memo = new Map();

const rootResult = await evaluateNodeParallel(

config.root_node_id, config, context, memo

);

const quote = {

config_id: config.id,

config_version: config.version,

quantity,

total: rootResult.total_cost,

unit_price: rootResult.unit_cost,

setup_total: rootResult.setup_cost,

waste_total: rootResult.waste_cost,

breakdown: rootResult.breakdown,

node_results: Object.fromEntries(memo),

generated_at: new Date().toISOString()

};

// Cache for future requests

await cache.set(cacheKey, quote, { ttl: 3600 });

return quote;

}

13. Architecture Decision Records

This chapter documents the key architectural decisions and their rationales. These are invaluable for future maintainers and for providing context to any LLM assisting with implementation.

ADR-001: Document Storage for Product Configurations

Decision: Store product configurations as JSON documents (PostgreSQL JSONB) rather than normalized relational tables.

Context: Product configurations are hierarchical, schema-variable, and always loaded as a complete unit. Relational normalization fragments the data across many tables, requiring expensive joins to reconstitute.

Consequences: Single-read loading, flexible schema, no migrations for new product types. Trade-off: cannot use SQL joins to query across node attributes without extracting to GIN-indexed JSONB paths.

ADR-002: Pricing Functions as Code, Not Database Rules

Decision: Implement pricing logic as registered code functions rather than database-stored rules or stored procedures.

Context: Pricing logic requires conditional branching, mathematical operations, and access to multiple data points simultaneously. Database-stored rules (EAV patterns or stored procedures) are hard to test, hard to version control, and hard to debug.

Consequences: Pricing logic is testable with unit tests, version-controlled in Git, and executes at CPU speed in memory. Trade-off: adding a new pricing function requires a code deployment (mitigated by the type registration system which handles attribute changes without deployment).

ADR-003: Event Sourcing for Production Tracking

Decision: Use event sourcing (immutable event log) instead of mutable status fields for production tracking.

Context: Production tracking requires not just current status but full history: when did each step start, who did it, what was the waste, was it rescheduled? Mutable status fields destroy this history.

Consequences: Complete audit trail, ability to reconstruct state at any point in time, ability to build new projections from existing events. Trade-off: slightly more complex implementation, requires event consumers to maintain read-side projections.

ADR-004: CQRS for Read/Write Separation

Decision: Separate read and write data models. Writes go to the configuration store and event store. Reads come from purpose-built projections.

Context: The shop floor dashboard, quoting UI, and reporting system all need different views of the same data. Trying to serve all these from a single normalized database leads to either slow queries or complex denormalization that is hard to maintain.

Consequences: Each read use case gets an optimized data model. Trade-off: eventual consistency between write and read sides (typically 1-5 seconds), more infrastructure to operate.

14. Graphs vs. Relational: A Deep Side-by-Side Comparison

This chapter exists to bridge the conceptual gap for anyone coming from a relational database background. The graph model does not replace relational databases. It replaces the practice of using relational databases for data that is fundamentally hierarchical and schema-variable. Understanding exactly where each paradigm wins and loses is essential for making good implementation decisions.

14.1 The Core Difference: Fixed Schema vs. Data-Defined Structure

In a relational database, the schema (tables, columns, data types, foreign keys) is defined by a developer and enforced by the database engine. The structure of the data is determined before any data exists. This works brilliantly for data that is uniform: every customer has a name, an email, and an address. Every invoice has a date, a total, and a customer reference. The shape is known and stable.

In the graph/document model used in this architecture, the schema defines only the rules for what types of nodes can exist and how they connect. The actual structure of any specific product is determined by the user at configuration time. A two-layer card and a twelve-layer card use the same code, the same storage, and the same pricing engine. The difference between them is the shape of the data, not the shape of the schema.

To make this concrete, consider what happens in each paradigm when you need to model a new card that has 6 layers instead of the usual 2.

The Relational Approach

If the original schema was designed for 2 layers (common in MIS systems that have a front and back concept), you face a structural problem. The layers table might have a job_id, a side (front/back), and attributes for that side. To support 6 layers, you need to either redesign the table to support arbitrary layers (breaking all existing queries and reports that assumed 2), add layer_3 through layer_6 columns (the wide-table antipattern that leads to sparse, incomprehensible schemas), or create a separate generic_layers table that exists alongside the original layers table (fragmenting the model and forcing the application to check both).

Each approach requires a database migration, changes to every query that touches layers, changes to the UI, changes to the pricing logic, and regression testing of all existing products. This is typically weeks of development work and carries significant risk of breaking existing quotes.

The Graph Approach

In the graph model, a 6-layer card is simply a root node with 6 child nodes of type substrate_layer, each with their own children (processes). No schema change. No migration. No code change. The traversal engine already handles N children. The pricing engine already evaluates children recursively. The UI already renders children dynamically from the type definitions. You create the configuration, and it works.

This is not a theoretical advantage. It is the difference between weeks of development (relational) and minutes of configuration (graph).

14.2 Query Pattern Comparison

A fair comparison must acknowledge that relational databases are superior for certain query patterns. Here is an honest side-by-side:

Query Pattern	Relational Advantage	Graph/Document Advantage
Load a complete product config	Requires 5-15 JOINs across tables. Slow, complex query.	Single document read. O(1). Massively faster.
Price a complex product	50-200 sequential queries for rate lookups.	1-3 batch loads, then in-memory computation. 100x faster.
Find all jobs using material X	Simple WHERE clause on indexed column. Fast and natural.	Requires JSONB path query or GIN index. Works but less ergonomic.
Aggregate monthly revenue by process type	GROUP BY on relational columns. SQL shines here.	Requires extracting data from documents into a reporting projection.
Add a new process type to the system	ALTER TABLE or new table. Migration required.	Register a new type definition. Zero migration.
Change one attribute on one process	UPDATE one row. Simple.	Load document, modify in memory, save document. Slightly more work.
Generate a detailed cost breakdown	Complex multi-table JOIN with subqueries.	Tree traversal of pre-computed node results. Cleaner.
Historical audit: who changed what when	Requires trigger-based audit tables or CDC.	Native with event sourcing. Every change is already an event.

The pattern is clear: the graph model wins decisively for operations that involve loading, pricing, and manipulating product configurations (the hot path of your quoting system). The relational model wins for ad-hoc analytical queries across large datasets. This is precisely why the architecture recommends a hybrid: configurations in JSONB documents, analytics in relational projections.

14.3 The Real Comparison: It Is About Data Loading, Not Storage

The most common misunderstanding is that this is about replacing PostgreSQL with MongoDB or some other document database. It is not. You can implement this entire architecture in PostgreSQL using JSONB columns. The database engine is the same. What changes is the data access pattern.

In the relational MIS, fetching a product for quoting looks like this:

-- Legacy: 12+ queries to load one product for quoting

SELECT * FROM jobs WHERE id = $1;

SELECT * FROM job_layers WHERE job_id = $1;

SELECT * FROM layer_substrates WHERE layer_id IN (...);

SELECT * FROM layer_processes WHERE layer_id IN (...);

SELECT * FROM process_rates WHERE process_id IN (...);

SELECT * FROM process_params WHERE process_id IN (...);

SELECT * FROM quantity_breaks WHERE rate_id IN (...);

SELECT * FROM press_configs WHERE process_id IN (...);

SELECT * FROM finishing_options WHERE job_id = $1;

SELECT * FROM assembly_steps WHERE job_id = $1;

SELECT * FROM markup_rules WHERE customer_id = $2;

SELECT * FROM waste_factors WHERE process_type IN (...);

-- ... application code stitches all of this together

In the graph architecture, loading the same product looks like this:

-- Graph: 1 query to load the complete product

SELECT config FROM product_configurations WHERE id = $1;

-- Done. The entire product graph is in memory.

Both use PostgreSQL. Both use SQL. The difference is architectural: one fragments data across tables and reconstructs it with queries; the other stores it as a complete unit and loads it in one read. The graph approach treats the database as a storage engine for documents, not as a computation engine for joins.

14.4 The Myth of Normalization for Everything

Database normalization (1NF, 2NF, 3NF, BCNF) is one of the most important concepts in computer science, and it is correct for the data it was designed for: data where the structure is fixed, where individual fields are frequently queried in isolation, and where update anomalies must be prevented.

Product configurations violate all three assumptions. Their structure is variable (different products have different shapes). Individual fields are almost never queried in isolation (you never need just the UV coating area percentage without the rest of the product context). And update anomalies are managed through versioning, not normalization (you do not want a rate table change to retroactively alter a historical quote).

The graph model is not anti-normalization. It is normalization applied correctly: the data that is uniform and stable (rate tables, customers, users) stays normalized in relational tables. The data that is hierarchical and variable (product configurations) is stored in the format that matches its nature.

The Litmus Test

Ask yourself: when I load this data, do I always need the complete unit, or do I frequently need individual pieces? If you always load the complete product configuration as a whole to price it, display it, or track it, that is a document. If you frequently need to query one field across thousands of records (e.g., find all customers in California), that is a relational table. Use each tool for what it is good at.

15. Why This Architecture Isn’t Universal: Honest Bottlenecks

If graph-based product configuration is so clearly superior for composable products, why is the entire print industry not using it? The answer involves technical trade-offs, organizational realities, and market dynamics. This chapter is deliberately candid because understanding the risks is as important as understanding the benefits.

15.1 The Talent Gap

This is the single biggest obstacle. The relational model is taught in every computer science program. Every junior developer knows SQL. The concepts in this paper, including graph data modeling, event sourcing, CQRS, function-pipeline pricing, and document storage, come from the domain-driven design (DDD) community, and they are well-established in fintech, logistics, and large-scale e-commerce. But they have barely penetrated the print MIS market.

The practical consequence: hiring developers who can build and maintain this system is harder and more expensive than hiring developers who can maintain a traditional relational MIS. If you lose a key developer, finding a replacement who understands event sourcing and graph traversal is not a job-board-and-wait-a-week situation. This is a real operational risk that must be mitigated through documentation, pair programming, and the use of LLM-assisted development (which is specifically why this paper exists in this level of detail).

15.2 Event Sourcing Operational Complexity

Event sourcing is powerful, but it introduces failure modes that traditional CRUD systems do not have:

Projection desynchronization. If the event consumer that updates the shop floor dashboard crashes or falls behind, the dashboard shows stale data. In a CRUD system, the data is always current because reads and writes hit the same table. With event sourcing, you need monitoring, alerting, and replay mechanisms to handle consumer failures.
Event schema evolution. Over time, the shape of your events will change. A process_started event in version 1 might have different fields than in version 2. You need an event versioning and upcasting strategy so that old events can still be processed by new code. This is solvable (there are well-established patterns), but it is additional complexity that a simple UPDATE statement does not have.
Storage growth. Events are append-only. They accumulate forever. A busy shop generating thousands of events per day will have millions of events within a year. You need archiving, compaction, or snapshotting strategies to keep the event store performant. In a CRUD system, the database stays roughly the same size because you are updating in place.
Debugging difficulty. When a projection shows wrong data, the debugging process is: examine the event stream, find which event was incorrect or missing, determine whether the bug is in the event producer or the event consumer, and replay events to verify. This is more complex than looking at a row in a table and checking the UPDATE query.

15.3 CQRS and Eventual Consistency

CQRS separates reads from writes, which means the read side can be stale. When an operator marks a process complete on the shop floor, the event is written to the event store immediately. But the dashboard projection might not be updated for 1-5 seconds (or longer under load). During that window, anyone looking at the dashboard sees the old status.

For most use cases, this delay is irrelevant. Nobody cares if the dashboard updates in 1 second or 3 seconds. But there are edge cases:

An operator marks a process complete, then immediately checks the dashboard to verify. It still shows in progress. They mark it complete again. Now you have duplicate events. You need idempotency handling.
A manager pulls a report while a batch of events is being processed. The report shows partial data. You need to communicate clearly that projections are eventually consistent, or implement read-after-write consistency for critical paths.
Two operators update the same node simultaneously. With CQRS, the write side handles this through optimistic concurrency (version checking). But the read side might show one operator's change before the other's, causing confusion.

These are all solvable problems with known patterns. But they add complexity that a simple relational system reading from a single source of truth does not have. You are trading simplicity for power, and you need to be honest about that trade.

15.4 Document Storage Query Limitations

When your product configuration lives in a JSON document, certain queries become harder. In a relational system, 'find all jobs that used foil stamping on paper heavier than 300gsm in the last quarter' is a straightforward SQL query with a few JOINs and WHERE clauses on indexed columns.

In the document model, this query requires reaching into the JSON structure of every configuration document to find nodes where type equals foil_process and the parent layer's gsm attribute exceeds 300. PostgreSQL can do this with JSONB path queries and GIN indexes, but the syntax is less intuitive, the query plans can be unpredictable, and performance depends heavily on how the GIN index is structured.

The mitigation (and the recommendation in this architecture) is to maintain relational projections for analytical queries. The event stream feeds a projection that extracts the relevant attributes into flat relational tables optimized for reporting. But this means maintaining two representations of the same data, which is the fundamental trade-off of CQRS.

15.5 The Debugging Surface Area

In a relational MIS, a wrong quote can be debugged by opening the database, examining the rate table, and tracing the SQL query. The path from input to output is linear and visible.

In the graph pricing engine, a wrong quote means:

Load the configuration document and inspect the node graph.
Identify which node's pricing result is incorrect.
Examine that node's pricing function, its input attributes, and the pricing context.
Check whether the rate tables were loaded correctly into the context.
Check whether memoization returned a stale cached result.
If the node depends on parent or sibling attributes, check the graph traversal order.

This is more steps and more abstraction layers than the relational approach. The pricing function pipeline is more powerful, but the debugging surface area is larger. This is mitigated through comprehensive logging at each evaluation step, unit tests for each pricing function, and integration tests that compare graph engine quotes against known-correct values.

15.6 Organizational Inertia: The Real Blocker

The biggest reason print MIS vendors do not adopt this architecture is not technical. It is organizational. Consider the position of an established MIS vendor:

They have a codebase of 500,000+ lines built over 15-20 years on a relational model. Rewriting is a multi-year, multi-million-dollar effort with high failure risk.
Their development team has 10-20 years of expertise in the relational model. Retraining them on event sourcing and graph modeling is a 6-12 month investment with uncertain outcomes.
Their customers have workflows, reports, and integrations built around the current schema. Migrating them is a customer-facing disruption.
Their sales pipeline does not pause during a rewrite. Competitors are shipping features while they are rebuilding infrastructure.
The ROI is hard to prove in advance. The performance and flexibility improvements are real, but they do not map neatly to a sales pitch. Customers do not buy 'graph-based product modeling.' They buy 'faster quoting' and 'easier product setup,' and the vendor has to trust that the architectural investment will eventually deliver those outcomes.

This is why most MIS vendors incrementally patch their existing systems rather than rearchitect. Patching is lower risk, ships faster, and keeps the team in their comfort zone. The result is the system you are currently experiencing: functional but slow, rigid, and increasingly painful as products become more complex.

Your position is different. You are not carrying 20 years of legacy. You are building new (or willing to rebuild), and you are experiencing the pain that motivates the investment. That changes the calculus entirely.

15.7 When NOT to Use This Architecture

For intellectual honesty, here are cases where the graph model is overkill and a simple relational schema is the better choice:

Products are flat and uniform. If every product is a single substrate with CMYK print and maybe one finishing process, the relational model handles this perfectly. The graph model adds complexity without meaningful benefit.
Quote volume is low. If you generate 10 quotes per day, the performance difference between 200ms (relational) and 5ms (graph) is irrelevant. The graph model pays off at scale, when users are configuring products interactively and expect instant feedback.
The team cannot maintain it. If you do not have (or cannot develop) the engineering capability to operate event sourcing and CQRS, the operational complexity will outweigh the architectural benefits. A well-maintained relational system is better than a poorly maintained graph system.
You need ad-hoc reporting as the primary use case. If your system is 80% reporting and 20% quoting, the relational model's query flexibility is more valuable than the graph model's loading performance.

16. Incremental Adoption: What to Build First and Why

The architecture described in this paper has many moving parts: a graph data model, a type system, a function-pipeline pricing engine, event sourcing, CQRS projections, and a multi-layer caching strategy. Attempting to build all of these simultaneously is a recipe for failure. This chapter provides a concrete, phased adoption plan that delivers value at each stage.

16.1 Phase 1: The Product Configuration Graph (Weeks 1-6)

Build this first because it is the foundation everything else depends on, and it delivers immediate value independent of the other components.

What you build: The ConfigNode schema, the flat-map ProductConfiguration document structure, the NodeTypeDefinition registry, and a basic CRUD API for creating, reading, and modifying configurations. Store configurations in PostgreSQL JSONB.

What you skip for now: Event sourcing, CQRS, caching. Use simple CRUD operations (load document, modify, save) with no event log.

What you validate: Can you model your most complex existing product as a configuration graph? Does it capture all the information currently spread across your legacy tables? Can your team add a new process type by registering a type definition without code changes?

Value delivered: You now have a flexible product model that can represent any product complexity. Even without the pricing engine, this is useful as a configuration tool and product catalog.

16.2 Phase 2: The Pricing Engine (Weeks 7-12)

Build this second because it delivers the most visible performance improvement and directly addresses the pain of slow quoting.

What you build: The PricingFunction interface, the PricingContext with pre-loaded rate tables, the post-order traversal engine with memoization, and concrete pricing functions for your most common process types (start with 5-8 functions covering 80% of your products).

What you skip for now: Parallel subtree evaluation (use sequential traversal first, optimize later). Advanced caching (the per-request memoization map is sufficient for now).

What you validate: Run shadow quotes. For every quote generated by the legacy system, also generate a quote from the graph engine. Compare results. Any difference greater than 1% is a bug. Track accuracy across 100+ quotes before trusting the engine for production use.

Value delivered: Quoting drops from seconds to milliseconds. Interactive product configuration with live pricing becomes possible. Users can tweak options and see price changes instantly.

16.3 Phase 3: Event Sourcing for Production Tracking (Weeks 13-20)

Build this third because it requires the product graph to exist (events reference nodes), and it adds the granular tracking capability.

What you build: The ProductionEvent schema, an append-only event store (can be a simple PostgreSQL table initially), event producers that emit events when node status changes, and a single projection: node_current_state.

What you skip for now: Multiple projections, real-time event streaming (use polling initially), complex event replay tooling.

What you validate: Can you track the status of every node in a product configuration independently? Can you answer 'when did this process start, who ran it, and how much waste was there?' from the event log?

Value delivered: Complete audit trail for production. Per-process, per-layer tracking. The ability to analyze actual vs. estimated costs at any granularity.

16.4 Phase 4: CQRS Projections and Advanced Features (Weeks 21+)

Build this last because it optimizes read patterns that only matter once you have meaningful data flowing through the system.

What you build: Purpose-built read projections (shop floor dashboard, station queue, cost analysis report). Async event consumers that maintain projections. Rate table caching in Redis. Subtree price caching.

What you validate: Dashboard latency under load. Projection consistency. Cache hit rates. Event consumer resilience (what happens when a consumer goes down and comes back up?).

Value delivered: Fast, purpose-built views for every user persona (estimators, operators, managers). Scalable read performance independent of write load.

16.5 The Golden Rule of Incremental Adoption

Build Only What Hurts Today

At each phase, you should be solving a pain you are actively feeling. If quoting is slow, Phase 2 addresses that. If you cannot track processes granularly, Phase 3 addresses that. If your dashboards are slow under load, Phase 4 addresses that. Never build infrastructure in anticipation of pain you have not yet experienced. YAGNI (You Aren’t Gonna Need It) applies to architecture patterns just as much as to code features. Event sourcing is powerful, but if your current tracking needs are simple and low-volume, a well-designed CRUD system with good logging may serve you well for a year while you focus on the configuration graph and pricing engine.

16.6 Using an LLM for Implementation

This document is deliberately written to serve as context for LLM-assisted development. When working with an LLM to implement this architecture, the most effective approach is:

Share the relevant chapter with the LLM before asking it to write code. For example, if you are implementing the pricing engine, provide Chapters 5 and 6 as context. The data structures and function signatures give the LLM a concrete target.
Start with the interfaces and type definitions. Have the LLM generate the TypeScript or Python interfaces from the pseudocode in this paper. These become the contract that all implementation code conforms to.
Implement one pricing function at a time. Give the LLM the PricingFunction interface and the specific rate table structure for one process type. Validate the output against a known-correct quote from your legacy system before moving to the next function.
Use the Architecture Decision Records (Chapter 13) as guardrails. When the LLM proposes an approach that contradicts an ADR, push back with the rationale from the ADR. This keeps the implementation aligned with the architecture.
Test exhaustively at each phase boundary. Before starting Phase 2, the product configuration graph should be fully validated. Before starting Phase 3, shadow quoting should show consistent accuracy. Do not build forward on an unvalidated foundation.

17. Graph Theory, Academic Foundations, and Cross-Domain Insights

This chapter takes the theoretical concept of a Directed Acyclic Graph and shows exactly how it manifests in product configuration, with every detail you need to implement it correctly. If you have never worked with graph data structures, this chapter will take you from zero to fluent. If you have, it will show you the specific properties that matter for print product modeling and the non-obvious traps that arise.

17.1 Graph Fundamentals: Nodes, Edges, Direction, and Cycles

A graph is a data structure consisting of nodes (also called vertices) and edges (connections between nodes). In our product model, a node is any discrete component: a product, a layer, a process, an assembly step. An edge is a parent-child relationship: 'this layer belongs to this product' or 'this process is applied to this layer.'

The edges are directed, meaning they have a start and an end. The direction matters: 'product contains layer' is different from 'layer contains product.' We typically model edges as parent-to-child, where the parent 'owns' or 'contains' the child.

The graph is acyclic, meaning there are no loops. You cannot follow edges from any node and arrive back at the same node. This is a fundamental constraint because a card cannot contain itself, a process cannot be applied to itself, and a layer cannot be a sub-layer of its own sub-layer. Cycles would cause the pricing engine to loop infinitely.

17.2 Why a DAG and Not a Tree: The Shared Node Problem

A tree is a restricted graph where every node has exactly one parent (except the root, which has none). Trees are simpler to work with, but they cannot model one important real-world scenario: shared operations.

Consider a lamination process that bonds Layer 1 and Layer 2 together. In a tree, this lamination would need to be a child of one layer or the other, which misrepresents the reality that it operates on both. Or it would need to be duplicated as two separate nodes, which misrepresents the reality that it is a single physical operation with a single cost.

In a DAG, the lamination node can have two parents: Layer 1 and Layer 2. It is stored once, priced once, and tracked once, but it belongs to both layers. This is called a shared node or a multi-parent node.

Here is the distinction visualized:

TREE MODEL (wrong for shared operations):

Product

├── Layer 1

│ ├── CMYK Print

│ └── Lamination (DUPLICATE – costed twice!)

└── Layer 2

├── Screen Print

└── Lamination (DUPLICATE – costed twice!)

DAG MODEL (correct):

Product

├── Layer 1

│ └── CMYK Print

├── Layer 2

│ └── Screen Print

└── Lamination (shared node, parents: [Layer 1, Layer 2])

└── costed ONCE, tracked ONCE

The flat-map storage format from Chapter 3 handles this naturally. The lamination node is stored once in the nodes map. Layer 1, Layer 2, and the root Product all reference it by ID. There is no duplication.

17.3 Adjacency Representations: How to Store a Graph

There are several ways to represent a graph in memory and on disk. For product configurations, we use an adjacency list stored as a flat map. Here is why, and how it compares to alternatives:

Adjacency Matrix

An adjacency matrix is a 2D array where matrix[i][j] is 1 if node i connects to node j, and 0 otherwise. For a product with 50 nodes, this is a 50x50 = 2,500 cell matrix, of which maybe 80 cells are non-zero. This is extremely wasteful for sparse graphs (which product configurations always are). It also does not naturally store node attributes. Never use this for product graphs.

Nested Object (Tree Embedding)

A nested object stores children directly inside their parent: { id: 'product', children: [{ id: 'layer1', children: [...] }] }. This is intuitive but fails for DAGs because shared nodes must be duplicated. It also makes O(1) lookup by ID impossible without building a secondary index. Avoid this.

Flat Map with ID References (Our Approach)

A flat map (dictionary/hash map) keyed by node ID, where each node stores arrays of child IDs and parent IDs. This supports DAGs natively (shared nodes are stored once, referenced by ID from multiple parents), enables O(1) node lookup by ID, serializes cleanly to JSON, and supports efficient graph traversal by following ID references.

// Flat map: the canonical representation

const nodes = {

'product': {

id: 'product',

children: ['layer1', 'layer2', 'lamination'],

parents: []

},

'layer1': {

id: 'layer1',

children: ['cmyk-print'],

parents: ['product']

},

'layer2': {

id: 'layer2',

children: ['screen-print'],

parents: ['product']

},

'lamination': {

id: 'lamination',

children: [],

parents: ['product'], // referenced by product as assembly

attributes: { inputs: ['layer1', 'layer2'] } // knows its inputs

},

'cmyk-print': {

id: 'cmyk-print',

children: [],

parents: ['layer1']

},

'screen-print': {

id: 'screen-print',

children: [],

parents: ['layer2']

}

};

17.4 Topological Sorting: The Foundation of Bottom-Up Pricing

A topological sort is an ordering of all nodes in a DAG such that for every directed edge from node A to node B, A appears before B in the ordering. In simpler terms: every parent appears before its children.

For bottom-up pricing, we need the reverse: every child appears before its parent. This is called a reverse topological sort, and it guarantees that when we evaluate a node, all of its children have already been evaluated and their results are available.

Here is the algorithm:

function reverseTopologicalSort(config) {

const visited = new Set();

const order = []; // will hold nodes in reverse-topo order

function dfs(nodeId) {

if (visited.has(nodeId)) return; // already processed

visited.add(nodeId);

const node = config.nodes[nodeId];

// Visit all children first (go deeper before recording)

for (const childId of node.children) {

dfs(childId);

}

// After all children are recorded, record this node

order.push(nodeId);

}

dfs(config.root_node_id);

return order; // Leaf nodes first, root last

}

// For the card example, this returns:

// ['cmyk-f', 'cmyk-b', 'spot-uv', 'die-cut', ← leaf processes

// 'screen-print', ← leaf process

// 'layer1', 'layer2', ← layers (agg.)

// 'lamination', ← assembly

// 'product'] ← root (last)

This ordering is critical. It means we can iterate through the array from index 0 to the end, evaluate each node in sequence, and be guaranteed that every node's children have already been evaluated. This eliminates the need for recursion in the pricing loop and makes the execution predictable and debuggable.

17.5 Cycle Detection: Preventing Infinite Loops

If someone accidentally creates a cycle in the configuration graph (node A is a child of B, B is a child of C, C is a child of A), the topological sort and the pricing traversal will loop forever. Cycle detection must be built into the graph mutation operations (addNode, reparentNode) so that cycles are rejected before they enter the system.

The standard algorithm uses a three-color marking scheme during depth-first search:

function hasCycle(config) {

const WHITE = 0; // Not yet visited

const GRAY = 1; // Currently being visited (in the DFS stack)

const BLACK = 2; // Fully processed

const color = {};

for (const id of Object.keys(config.nodes)) {

color[id] = WHITE;

}

function dfs(nodeId) {

color[nodeId] = GRAY; // Mark as 'currently visiting'

const node = config.nodes[nodeId];

for (const childId of node.children) {

if (color[childId] === GRAY) {

// We found a node that is already in our current

// traversal path. This IS a cycle.

return true; // CYCLE DETECTED

}

if (color[childId] === WHITE) {

if (dfs(childId)) return true; // Cycle found deeper

}

// BLACK nodes are safe – already fully explored

}

color[nodeId] = BLACK; // Fully explored, no cycle here

return false;

}

// Check from every unvisited node (handles disconnected graphs)

for (const id of Object.keys(config.nodes)) {

if (color[id] === WHITE) {

if (dfs(id)) return true;

}

}

return false;

}

The key insight is the GRAY state. A WHITE node has not been seen. A BLACK node has been fully explored and is safe. A GRAY node is one we are currently in the middle of exploring. If we encounter a GRAY node while traversing, it means we have looped back to a node that is an ancestor of the current node in the DFS path. That is a cycle.

This check should run every time a node is added, moved, or has its children modified. It runs in O(V + E) time where V is the number of nodes and E is the number of edges. For a product with 100 nodes and 120 edges, this is sub-millisecond. There is no excuse for not running it on every mutation.

17.6 The Diamond Dependency Problem

The diamond dependency is a pattern where two children of a node share a common descendant. It is the most common cause of double-counting in pricing and the most subtle bug in graph-based quoting systems.

THE DIAMOND:

Product

/ \

Layer1 Layer2

\ /

Lamination ← shared descendant

Without memoization:

evaluate(Product)

├─ evaluate(Layer1)

│ └─ evaluate(Lamination) → returns $50

└─ evaluate(Layer2)

└─ evaluate(Lamination) → computes AGAIN, returns $50

Total: Layer1($X + $50) + Layer2($Y + $50) = DOUBLE-COUNTED!

With memoization:

evaluate(Product)

├─ evaluate(Layer1)

│ └─ evaluate(Lamination) → returns $50, CACHED

└─ evaluate(Layer2)

└─ evaluate(Lamination) → cache HIT, returns $50

But now: do we add $50 to Layer1 OR Layer2 OR neither?

Memoization prevents the lamination from being computed twice, but it introduces a subtler question: which parent's subtotal should include the lamination cost? If both Layer1 and Layer2 include it (because it is in both their children arrays), the product aggregator will double-count it when summing Layer1 + Layer2.

There are three strategies to handle this:

Strategy A: Root-Level Assembly Collection

Move shared operations out of the layer children and into the product's children directly. Lamination is not a child of Layer1 or Layer2; it is a child of Product with an 'inputs' attribute referencing both layers. This is the approach used in the data model from Chapter 3 and is the recommended default. It eliminates the diamond entirely because the shared node has only one parent.

Strategy B: First-Encountered-Pays

The first parent to evaluate the shared node includes its cost. Subsequent parents receive a cached result with total_cost = 0. This works but makes the cost breakdown confusing: why does Layer1 show a lamination cost but Layer2 does not? The answer depends on traversal order, which is not intuitive to estimators.

Strategy C: Cost Splitting

Divide the shared node's cost equally (or proportionally) among its parents. Lamination costs $50, so Layer1 includes $25 and Layer2 includes $25. This is semantically accurate but adds complexity to the aggregator logic and makes individual line items harder to interpret.

Recommendation

Use Strategy A (root-level assembly collection) as the default. Design the node type system so that assembly operations are always children of the root product, not of individual layers. This makes the graph tree-like in practice (each node has one parent) while retaining DAG capability for the rare cases where true multi-parent relationships are needed. When you do encounter a genuine diamond, use Strategy C with explicit cost-allocation rules.

17.7 Orphan Detection and Graph Integrity

An orphan is a node that exists in the flat map but is not reachable from the root node. This happens when a node is removed from its parent's children array but not deleted from the map, or when a reparent operation leaves a node disconnected.

Orphans waste storage and can cause confusion, but they are not dangerous (the pricing engine never reaches them because it starts from the root). However, detecting them is simple and should be done as part of configuration validation:

function findOrphans(config) {

const reachable = new Set();

function walk(nodeId) {

if (reachable.has(nodeId)) return;

reachable.add(nodeId);

const node = config.nodes[nodeId];

for (const childId of node.children) {

walk(childId);

}

}

walk(config.root_node_id);

const allIds = new Set(Object.keys(config.nodes));

const orphans = [...allIds].filter(id => !reachable.has(id));

return orphans; // IDs of nodes not reachable from root

}

Run this after every mutation. If orphans are found, either reconnect them (if the disconnection was accidental) or delete them (if the parent was intentionally removed).

17.8 Academic Foundations: DAGs in Manufacturing Research

The use of DAGs to model manufacturing systems is not a novel concept invented for this architecture. It has deep academic roots, most notably in the work of Denis Borenstein, whose 2000 paper in the European Journal of Operational Research established the theoretical framework for using directed acyclic graphs to represent routing flexibility in manufacturing systems.

Borenstein's key contribution was demonstrating that manufacturing operation sequences, meaning the order in which processes are applied to a part, can be rigorously represented as DAGs. His model defines three representation levels that map directly to the concerns of our product configuration architecture:

Borenstein's Three Levels

The Precedence Graph of Operations. This represents the hard constraints: which operations must happen before which other operations. In our architecture, this is equivalent to the dependency edges in the configuration graph. Spot UV must happen after printing (because UV is applied over the print). Die cutting must happen after all surface treatments are complete. These precedence relationships are encoded in the parent-child structure of our DAG.
The Tree of Feasible Operations. Given the precedence constraints, this enumerates all valid sequences through the manufacturing process. For a card with CMYK printing, spot UV, and die cutting, there may be multiple valid orderings: print-then-UV-then-die or print-then-die-then-UV, depending on the specific die cut shape and UV area. Borenstein showed that the DAG representation compactly encodes all these valid sequences without explicitly listing each one, which becomes critical when the number of valid sequences grows combinatorially.
The Projected Route Graph (PRG) of Operations and Machines. This maps feasible operations to specific machines on the shop floor. A spot UV process might be executable on UV-Coater-1 or UV-Coater-3, each with different capabilities and costs. The PRG extends the operation DAG to include this machine-level routing flexibility.

The direct parallel to our architecture is striking. Borenstein's Level 1 is our product configuration graph (what the product is and how its components relate). His Level 2 is what our pricing engine traverses when computing costs across valid process sequences. His Level 3 is what our event sourcing system tracks when recording which specific machine and operator handled each process.

Why This Academic Foundation Matters

Borenstein proved mathematically that DAG representations are more storage-efficient and have lower search complexity than alternative methods such as Petri nets for manufacturing routing. This means the graph-based approach in this architecture is not just intuitively cleaner than the relational model; it is provably more efficient for the specific problem of representing composable manufacturing sequences. When you encounter skepticism about abandoning the relational approach, this is peer-reviewed evidence in a top-tier operations research journal that the graph model is the correct abstraction.

Borenstein also demonstrated that DAGs can be embedded within simulation models for real-time analysis, which anticipates the event sourcing and CQRS patterns in our architecture. The shop floor is not static: machines break down, priorities change, and the optimal route through the manufacturing process shifts dynamically. A DAG representation enables real-time re-routing decisions because the alternative paths are already encoded in the graph structure, not locked into a fixed sequence in a relational table.

17.9 Cross-Domain Insights: What Causal Inference DAGs Teach Us

DAGs are used extensively outside of manufacturing, most prominently in causal inference and statistical modeling. In causal inference, the nodes represent variables (education, income, health), the edges represent causal relationships (education causes higher income), and the entire framework exists to answer the question: which variables should you control for in a regression to get an unbiased estimate of a causal effect?

The graph theory underlying both uses is identical. The same algorithms for traversal, the same concepts of ancestors and descendants, the same mathematical properties of acyclicity all apply. But the semantics are different, and understanding both the overlap and the divergence strengthens your ability to work with product configuration graphs.

Concepts That Transfer Directly

Ancestors and descendants. In causal DAGs, an ancestor of a node is any variable that has a directed path to it. In product graphs, an ancestor of a process node is the layer it belongs to, the product that layer belongs to, and so on up to the root. The pricing engine uses ancestor traversal when a process needs to reference its parent layer's attributes (e.g., die cutting cost depends on the substrate thickness of the parent layer).
Independence of branches. In causal DAGs, two variables that share no common ancestor or descendant are independent. In product graphs, two layers that share no processes are independent, meaning they can be priced in parallel. This is the theoretical foundation for the parallel subtree evaluation optimization described in Chapter 11.
d-Separation (conditional independence). In causal inference, d-separation determines whether two variables are statistically independent given a set of conditioning variables. In product graphs, the equivalent concept determines whether two nodes' pricing computations are independent given the current pricing context. If Layer 1's cost does not depend on anything in Layer 2's subtree, and vice versa, they are 'd-separated' in pricing terms and can be computed independently.

Concepts That Do Not Transfer

Confounders. In causal DAGs, a confounder is a variable that causes both the treatment and the outcome, creating a spurious correlation. Product graphs do not have confounders because the edges represent composition, not causation. A substrate does not 'cause' a process; it is a physical component that the process acts upon.
Colliders. In causal DAGs, a collider is a variable caused by two other variables. Conditioning on a collider creates a spurious association between its causes. Product graphs can have nodes with multiple parents (the shared lamination node), but conditioning on them does not create spurious associations because we are computing costs, not estimating statistical effects.
Backdoor paths. The causal concept of tracing non-causal paths between variables has no meaningful parallel in product configuration. There are no 'backdoor paths' through a product graph because all paths represent real structural relationships.

The Dependency Path Concept

The single most useful concept that transfers from causal DAGs to product graphs is the dependency path. In causal inference, you trace paths through the graph to understand what influences what. In the pricing engine, dependency paths determine three critical behaviors:

Cache invalidation scope. When you change an attribute on a node, which cached pricing results become stale? The answer is: the node itself and all of its ancestors up to the root. Its siblings and their subtrees are unaffected. This is determined by tracing the dependency path upward through the DAG.
Parallel evaluation boundaries. Nodes can be evaluated in parallel if and only if they share no dependency path. Two layers with no shared child nodes have no dependency path between them and can be priced simultaneously. A shared lamination node creates a dependency path between the layers it joins, requiring sequential evaluation of that shared node.
Pricing function input resolution. When a pricing function needs data from another node (e.g., die cut cost needs substrate GSM), it resolves this by walking the dependency path to the relevant ancestor. The pricing context's resolve_node function is essentially a path traversal through the graph.

17.10 Reachability Analysis: Who Depends on Whom

Reachability is the graph theory concept of determining which nodes can be reached from a given starting node by following directed edges. In product graphs, reachability has two directions, each serving a different purpose:

Forward Reachability (Root to Leaves)

Starting from the root product node and following child edges, forward reachability answers: 'What are all the components of this product?' This is what the pricing engine uses to enumerate every node that needs to be priced. It is also what the production tracking system uses to determine the complete list of operations for a job.

function forwardReachable(config, startNodeId) {

const reached = new Set();

const queue = [startNodeId];

while (queue.length > 0) {

const nodeId = queue.shift();

if (reached.has(nodeId)) continue;

reached.add(nodeId);

const node = config.nodes[nodeId];

for (const childId of node.children) {

queue.push(childId);

}

}

return reached;

}

// Usage: all components of the product

const allParts = forwardReachable(config, config.root_node_id);

Reverse Reachability (Leaf to Root)

Starting from any node and following parent edges upward, reverse reachability answers: 'What is affected if this node changes?' This is critical for cache invalidation and change impact analysis. If the spot UV area percentage changes, reverse reachability tells you that the UV node, its parent layer, and the root product all need re-pricing. But the other layer and its processes are unaffected.

function reverseReachable(config, startNodeId) {

const reached = new Set();

const queue = [startNodeId];

while (queue.length > 0) {

const nodeId = queue.shift();

if (reached.has(nodeId)) continue;

reached.add(nodeId);

const node = config.nodes[nodeId];

for (const parentId of node.parents) {

queue.push(parentId);

}

}

return reached;

}

// Usage: what needs re-pricing if this node changes?

const invalidated = reverseReachable(config, changedNodeId);

for (const id of invalidated) { priceCache.delete(id); }

17.11 Transitive Reduction: Simplifying the Graph

Transitive reduction is a graph operation that removes redundant edges. An edge from A to C is redundant if there is already a path from A to B to C. In that case, the A-to-C edge is implied by the A-to-B and B-to-C edges and can be removed without losing any structural information.

In product graphs, transitive reduction matters for two reasons:

Visual clarity. When displaying the product configuration to a user, redundant edges create a cluttered graph that is harder to understand. The reduced graph shows only the essential relationships.
Correct pricing aggregation. If a root product node has direct edges to both a layer and a process within that layer, the pricing aggregator might double-count the process (once as a direct child, once as a grandchild through the layer). Transitive reduction ensures each node appears exactly once in the aggregation path.

function transitiveReduction(config) {

// For each edge parent -> child, check if there is

// an alternative path from parent to child through

// other nodes. If so, the direct edge is redundant.

for (const [nodeId, node] of Object.entries(config.nodes)) {

const redundant = [];

for (const childId of node.children) {

// Check if childId is reachable from nodeId

// through any OTHER child

const otherChildren = node.children

.filter(c => c !== childId);

for (const otherId of otherChildren) {

if (forwardReachable(config, otherId).has(childId)) {

redundant.push(childId);

break;

}

}

}

// Remove redundant direct edges

node.children = node.children

.filter(c => !redundant.includes(c));

}

}

17.12 Depth, Breadth, and Complexity Metrics

Understanding the shape of your product graphs helps you predict performance and identify potential bottlenecks:

Depth is the longest path from root to any leaf. A depth of 3 (product → layer → process) is typical for simple cards. A depth of 6+ occurs with nested assemblies (product → component → sub-component → layer → process → sub-process). The pricing engine's recursion depth equals the graph depth.
Breadth is the maximum number of children at any single level. A product with 12 layers has a breadth of 12 at level 1. Breadth determines the potential for parallel evaluation: more independent siblings means more parallelism.
Total nodes is the count of all nodes in the graph. This directly determines memory usage and the maximum number of pricing function evaluations per quote.
Shared node ratio is the percentage of nodes with more than one parent. A ratio of 0% means the graph is a pure tree. Higher ratios mean more DAG complexity and more opportunities for memoization to save computation.

function graphMetrics(config) {

const nodes = Object.values(config.nodes);

function depth(nodeId, visited = new Set()) {

if (visited.has(nodeId)) return 0; // cycle protection

visited.add(nodeId);

const node = config.nodes[nodeId];

if (node.children.length === 0) return 1;

return 1 + Math.max(

...node.children.map(c => depth(c, new Set(visited)))

);

}

const maxBreadth = Math.max(

...nodes.map(n => n.children.length)

);

const sharedNodes = nodes.filter(

n => n.parents.length > 1

).length;

return {

total_nodes: nodes.length,

max_depth: depth(config.root_node_id),

max_breadth: maxBreadth,

shared_node_count: sharedNodes,

shared_node_ratio: sharedNodes / nodes.length

};

}

For typical print products, you should expect depths of 2-5, breadths of 2-15, total node counts of 5-50, and shared node ratios below 10%. If your metrics regularly exceed these ranges, it may indicate either genuinely complex products (which the architecture handles well) or modeling issues where products are over-decomposed into too many fine-grained nodes.

18. Bottom-Up Pricing: Complete Step-by-Step Walkthrough

This chapter walks through the pricing of the specialty card from Chapter 2, step by step, showing every computation, every cache entry, and every aggregation. By the end, you will understand exactly what the pricing engine does at each stage and why.

18.1 The Configuration

We are pricing the following product at a quantity of 5,000 cards:

Product (root) ── qty: 5000

├─ Layer 1: 350gsm Silk Art Board, 88×55mm

│ ├─ CMYK Front: offset, 4 colors, 85% coverage

│ ├─ CMYK Back: offset, 4 colors, 60% coverage

│ ├─ Spot UV: 30% area, 25μm thickness

│ └─ Die Cut: die DC-4421, medium complexity

├─ Layer 2: 200μm Clear PET, 88×55mm

│ └─ Screen Print: opaque white, 2 passes, 40% coverage

└─ Lamination: PSA, 0.5mm tolerance

18.2 Step 0: Load the Pricing Context

Before any node is evaluated, the engine loads all rate data into memory. This is the single batch-load step that replaces hundreds of individual queries:

// One batch query loads everything we need

const context = {

quantity: 5000,

rate_tables: {

'substrate': {

'350gsm_silk': { cost_per_sheet: 0.045, min_order: 500 },

'200um_pet': { cost_per_sheet: 0.082, min_order: 200 }

},

'offset_print': {

setup_per_color: 35.00,

run_rate_per_1000: 12.50,

coverage_multiplier: 1.0 // base rate at 100%

},

'screen_print': {

setup_per_screen: 45.00,

run_rate_per_1000: 28.00,

white_ink_surcharge: 1.15 // 15% more expensive

},

'spot_uv': {

setup_base: 65.00,

material_rate_per_1000: 18.00,

base_thickness: 20 // microns, rate is based on this

},

'die_cut': {

die_lookup: { 'DC-4421': { setup: 55.00, per_1000: 8.50 } },

complexity_multiplier: { simple: 0.8, medium: 1.0, complex: 1.4 }

},

'lamination': {

psa_setup: 40.00,

psa_per_1000: 15.00,

tolerance_surcharge: { 1.0: 1.0, 0.5: 1.20, 0.25: 1.50 }

}

},

quantity_curves: {

'standard': { // price-per-unit decreases with quantity

evaluate: (qty) => {

if (qty <= 500) return 1.0;

if (qty <= 2000) return 0.85;

if (qty <= 5000) return 0.72;

return 0.65;

}

}

},

global_params: {

print_waste_pct: 4,

coating_waste_pct: 3,

cutting_waste_pct: 2,

screen_waste_pct: 6,

assembly_waste_pct: 2,

markup_pct: 40

},

resolve_node: (id) => config.nodes[id]

};

This context object exists in memory for the entire pricing computation. Every pricing function reads from it. Zero additional database queries will be made.

18.3 Step 1: Topological Sort Determines Evaluation Order

The engine performs a reverse topological sort to determine the order in which nodes will be evaluated:

Reverse topological sort result:

Index 0: cmyk-front (leaf – no children)

Index 1: cmyk-back (leaf – no children)

Index 2: spot-uv (leaf – no children)

Index 3: die-cut (leaf – no children)

Index 4: screen-print (leaf – no children)

Index 5: layer-1 (aggregator – children: 0,1,2,3)

Index 6: layer-2 (aggregator – children: 4)

Index 7: lamination (leaf – no children, assembly)

Index 8: product (root aggregator – children: 5,6,7)

The engine will now iterate from index 0 to index 8, evaluating each node. When it reaches a node, all of its children have already been evaluated and their results are in the memoization cache.

18.4 Steps 2-6: Evaluating Leaf Nodes

Evaluate: CMYK Front (index 0)

Input: 4 colors, 85% coverage, offset

qty = 5000, waste = 4% → effective_qty = 5200

Setup: 4 colors × $35.00/color = $140.00

Run: 5200 / 1000 × $12.50 = $65.00

Coverage: $65.00 × 0.85 = $55.25

Qty curve: 0.72 (5000 qty bracket)

Adjusted run: $55.25 × 0.72 = $39.78

Result: { setup: $140.00, run: $39.78, total: $179.78 }

Cache: memo['cmyk-front'] = $179.78

Evaluate: CMYK Back (index 1)

Input: 4 colors, 60% coverage, offset

qty = 5000, waste = 4% → effective_qty = 5200

Setup: 4 colors × $35.00/color = $140.00

Run: 5200 / 1000 × $12.50 = $65.00

Coverage: $65.00 × 0.60 = $39.00

Qty curve: 0.72

Adjusted run: $39.00 × 0.72 = $28.08

Result: { setup: $140.00, run: $28.08, total: $168.08 }

Cache: memo['cmyk-back'] = $168.08

Evaluate: Spot UV (index 2)

Input: 30% area, 25μm thickness

qty = 5000, waste = 3% → effective_qty = 5150

Setup: $65.00 (fixed)

Material: 5150 / 1000 × $18.00 = $92.70

Area factor: $92.70 × 0.30 = $27.81

Thickness factor: 25 / 20 = 1.25

Adjusted: $27.81 × 1.25 = $34.76

Qty curve: 0.72

Final run: $34.76 × 0.72 = $25.03

Result: { setup: $65.00, run: $25.03, total: $90.03 }

Cache: memo['spot-uv'] = $90.03

Evaluate: Die Cut (index 3)

Input: die DC-4421, medium complexity

qty = 5000, waste = 2% → effective_qty = 5100

parent layer gsm = 350 → thickness_multiplier = 1.25

(cross-node dependency: reads parent attribute)

Setup: $55.00 (from die lookup)

Run: 5100 / 1000 × $8.50 = $43.35

Complexity: $43.35 × 1.0 (medium) = $43.35

Thickness: $43.35 × 1.25 = $54.19

Qty curve: 0.72

Final run: $54.19 × 0.72 = $39.02

Result: { setup: $55.00, run: $39.02, total: $94.02 }

Cache: memo['die-cut'] = $94.02

Evaluate: Screen Print (index 4)

Input: white ink, 2 passes, 40% coverage

qty = 5000, waste = 6% → effective_qty = 5300

Setup: 2 passes × $45.00/screen = $90.00

Run: 5300 / 1000 × $28.00 = $148.40

Coverage: $148.40 × 0.40 = $59.36

White ink surcharge: $59.36 × 1.15 = $68.26

Qty curve: 0.72

Final run: $68.26 × 0.72 = $49.15

Result: { setup: $90.00, run: $49.15, total: $139.15 }

Cache: memo['screen-print'] = $139.15

18.5 Steps 7-8: Evaluating Aggregator Nodes

Now the engine reaches the layer nodes. These are aggregators: they compute their own cost (substrate material) and add their children's costs.

Evaluate: Layer 1 (index 5)

SUBSTRATE COST (this node's own cost):

350gsm Silk @ $0.045/sheet

qty = 5000, waste = max of children waste factors

effective_qty = 5200 (4% print waste dominates)

Substrate: 5200 × $0.045 = $234.00

Qty curve: $234.00 × 0.72 = $168.48

CHILDREN ROLLUP (from memoization cache):

memo['cmyk-front'] = $179.78

memo['cmyk-back'] = $168.08

memo['spot-uv'] = $90.03

memo['die-cut'] = $94.02

Children total = $531.91

LAYER 1 TOTAL: $168.48 + $531.91 = $700.39

Cache: memo['layer-1'] = $700.39

Evaluate: Layer 2 (index 6)

SUBSTRATE COST:

200μm Clear PET @ $0.082/sheet

effective_qty = 5300 (6% screen waste)

Substrate: 5300 × $0.082 = $434.60

Qty curve: $434.60 × 0.72 = $312.91

CHILDREN ROLLUP:

memo['screen-print'] = $139.15

LAYER 2 TOTAL: $312.91 + $139.15 = $452.06

Cache: memo['layer-2'] = $452.06

Evaluate: Lamination (index 7)

Input: PSA method, 0.5mm tolerance

qty = 5000, waste = 2% → effective_qty = 5100

Setup: $40.00

Run: 5100 / 1000 × $15.00 = $76.50

Tolerance surcharge: $76.50 × 1.20 = $91.80

Qty curve: 0.72

Final run: $91.80 × 0.72 = $66.10

Result: { setup: $40.00, run: $66.10, total: $106.10 }

Cache: memo['lamination'] = $106.10

18.6 Step 9: Evaluating the Root Aggregator

CHILDREN ROLLUP:

memo['layer-1'] = $700.39

memo['layer-2'] = $452.06

memo['lamination'] = $106.10

Subtotal = $1,258.55

MARKUP APPLICATION:

Markup: 40%

$1,258.55 × 1.40 = $1,761.97

FINAL QUOTE:

Total: $1,761.97

Per unit: $1,761.97 / 5000 = $0.3524

Setup: $140 + $140 + $65 + $55 + $90 + $40 = $530.00

Materials: (computed per above)

Waste cost: sum of (effective_qty - qty) × unit_rates

18.7 The Complete Memoization Map

After evaluation, the memoization cache contains the complete breakdown. This is what the UI uses to display a line-item quote:

Node	Type	Setup	Run	Total	Eval Order
CMYK Front	print_process	$140.00	$39.78	$179.78	0 (leaf)
CMYK Back	print_process	$140.00	$28.08	$168.08	1 (leaf)
Spot UV	coating_process	$65.00	$25.03	$90.03	2 (leaf)
Die Cut	cutting_process	$55.00	$39.02	$94.02	3 (leaf)
Screen Print	print_process	$90.00	$49.15	$139.15	4 (leaf)
Layer 1	substrate_layer	$0.00	$168.48	$700.39	5 (agg)
Layer 2	substrate_layer	$0.00	$312.91	$452.06	6 (agg)
Lamination	assembly_process	$40.00	$66.10	$106.10	7 (leaf)
Product	product (root)	$0.00	$0.00	$1,761.97	8 (root)

Every number in this table is traceable. An estimator can click on any line item, see the formula that produced it, see which rate table values were used, and verify the result. This transparency is not possible in a system where pricing is computed through opaque stored procedures and scattered table joins.

18.8 Re-Quoting After a Single Change

Now the customer asks: 'What if we increase the spot UV area from 30% to 50%?' In the legacy system, the entire quote is recomputed from scratch (200+ queries). In the graph engine:

Invalidate memo['spot-uv'] (the changed node).
Invalidate memo['layer-1'] (parent of changed node).
Invalidate memo['product'] (parent of layer-1).
Nodes cmyk-front, cmyk-back, die-cut, screen-print, layer-2, lamination are UNCHANGED. Their cached results are still valid.

The engine re-evaluates only 3 nodes out of 9. It takes approximately 30% of the original computation time. For larger products with 50+ nodes, this optimization is even more dramatic: a change to one leaf process may require re-evaluating only 3-4 nodes while 46+ retain their cached results.

19. Quantity Propagation, Waste Cascading, and Cross-Node Dependencies

This is the chapter where the real-world complexity of print production meets graph theory. These are the problems that separate a toy quoting system from one that produces accurate, production-ready estimates.

19.1 Quantity Propagation: How Quantity Flows Through the Graph

Quantity is not a simple global variable. In a multi-layer product, different parts of the graph may need different effective quantities because waste rates differ by process.

The base quantity (5,000 cards) is set at the root. But by the time that quantity reaches the substrate of Layer 1, it must account for the waste of every process that will be applied to that substrate. If you print on the substrate (4% waste), then coat it (3% waste), then die cut it (2% waste), the substrate order quantity must be enough to yield 5,000 good finished pieces after all three stages of waste.

Forward Waste Accumulation (Simple but Wrong)

The naive approach is to add waste percentages: 4% + 3% + 2% = 9%, so order 5,450 sheets. This is wrong because waste compounds. If 4% is lost at print, you need 3% of the remaining 96%, not 3% of the original 100%.

Reverse Yield Calculation (Correct)

The correct approach works backward from the required good output:

function calculateEffectiveQuantity(requiredGood, processes) {

// Start from the LAST process and work backward

// (reverse of production order)

let needed = requiredGood;

// Processes in reverse production order

const reversed = [...processes].reverse();

for (const process of reversed) {

const yield_rate = 1 - (process.waste_pct / 100);

needed = Math.ceil(needed / yield_rate);

}

return needed;

}

// Example: 5000 good pieces, processes in production order:

// 1. Print (4% waste), 2. Coat (3% waste), 3. Die cut (2% waste)

// Working backward from die cut:

// After die cut: need 5000 good

// Before die cut: 5000 / 0.98 = 5103

// Before coating: 5103 / 0.97 = 5261

// Before printing: 5261 / 0.96 = 5480

// Order 5480 sheets (not 5450!)

// The difference (30 sheets) matters at scale.

// At 50,000 qty the compound error is 300 sheets.

// At $0.045/sheet that is $13.50 of material cost

// the naive approach misses per job.

19.2 Waste Cascading: When One Process Wastes Another's Output

Waste cascading is the phenomenon where waste at a later production stage destroys value that was added by earlier stages. If a die cutting operation wastes 2% of sheets, those wasted sheets have already been printed, coated, and UV-treated. The cost of that waste includes not just the substrate cost but the cost of all processes that were applied before the waste occurred.

The pricing engine must account for this. The correct approach is to compute waste cost at each stage based on the cumulative value of the product at that point, not just the raw material cost:

function computeWasteCost(node, processOrder, nodeResults) {

// Find this node's position in the process order

const position = processOrder.indexOf(node.id);

// Sum the per-unit cost of all processes BEFORE this one

// (these costs are 'baked in' to each sheet at this stage)

let cumulative_unit_value = 0;

for (let i = 0; i < position; i++) {

const prevResult = nodeResults[processOrder[i]];

cumulative_unit_value += prevResult.unit_cost;

}

// Add substrate cost

cumulative_unit_value += nodeResults['substrate'].unit_cost;

// Waste at this stage costs the cumulative value

const waste_qty = node.effective_qty - node.good_qty;

const waste_cost = waste_qty * cumulative_unit_value;

return waste_cost;

}

// Example: die cut wastes 102 sheets (2% of 5103)

// Each wasted sheet had been printed ($0.035) and

// coated ($0.005), plus substrate ($0.045)

// Waste cost = 102 * ($0.045 + $0.035 + $0.005) = $8.67

// NOT just 102 * $0.045 (substrate only) = $4.59

Why This Matters Commercially

Most legacy MIS systems compute waste cost as substrate cost only, because tracking cumulative value through the process chain is difficult with table lookups. This systematically underestimates the true cost of waste, especially for products with many value-adding processes before the wasteful step. The graph engine computes this correctly because the process ordering and per-node costs are all available in the memoization cache.

19.3 Cross-Node Dependencies

Cross-node dependencies occur when a node's pricing depends on attributes of a sibling, parent, or distant node in the graph. The die cut example from Chapter 18 showed a child-to-parent dependency (die cut reads parent layer's GSM). Here are the other common patterns:

Sibling Dependencies

A coating process may adjust its cost based on what print process was applied before it. Spot UV over digital print has different preparation requirements than spot UV over offset print. The coating pricing function needs to examine its sibling print node:

function evaluateSpotUV(node, context, childResults) {

// Find sibling print process

const parentLayer = context.resolve_node(node.parents[0]);

const printSibling = parentLayer.children

.map(id => context.resolve_node(id))

.find(n => n.type === 'print_process');

// Adjust prep cost based on print method

const prep_surcharge =

printSibling.attributes.method === 'digital' ? 25.00 : 0;

// ... rest of pricing logic with prep_surcharge added

}

Cross-Layer Dependencies

An assembly process (lamination) that joins two layers needs to know the dimensions and material of both layers to calculate cost correctly. PET-to-cardboard lamination has different rates than PET-to-PET:

function evaluateLamination(node, context, childResults) {

// Read both input layers from the 'inputs' attribute

const inputLayers = node.attributes.inputs

.map(id => context.resolve_node(id));

// Determine material combination

const materials = inputLayers.map(

l => l.attributes.material_category // 'paper', 'pet', etc.

);

const combo = materials.sort().join('_to_'); // 'paper_to_pet'

// Look up rate for this specific combination

const rate = context.rate_tables.lamination[combo];

// ... compute with rate

}

Quantity-Dependent Process Availability

Some processes are only available above certain quantities. Offset printing may require a minimum of 1,000 sheets to be economical; below that, digital printing is substituted. This is not just a pricing change but a configuration change that the engine must handle:

function resolveProcessMethod(node, context) {

if (node.type === 'print_process') {

if (node.attributes.method === 'offset' &&

context.quantity < 1000) {

// Auto-substitute digital for small runs

return {

...node,

attributes: {

...node.attributes,

method: 'digital',

auto_substituted: true,

original_method: 'offset'

},

pricing_function_id: 'pf-digital-print'

};

}

}

return node;

}

19.4 Process Ordering and Its Effect on Pricing

The order in which processes are applied to a layer matters for pricing, even though the graph's children array does not inherently encode temporal order. Spot UV before die cutting has different economics than die cutting before spot UV (you waste UV-coated material vs. uncoated material).

The solution is to add an explicit process_order attribute to process nodes, or to define a production_sequence on the layer node:

// On the layer node:

attributes: {

material: '350gsm Silk',

production_sequence: [

'cmyk-front', // step 1

'cmyk-back', // step 2

'spot-uv', // step 3

'die-cut' // step 4 (last, so waste is cheapest)

]

}

The layer's aggregator pricing function uses this sequence to determine the correct effective quantity for each process and to compute waste cascading costs accurately. The sequence itself is part of the product configuration and can be modified without changing any code.

19.5 Gang Running and Sheet Layout: Quantity Transformation

One of the most complex pricing scenarios in print is gang running (also called n-up or imposition), where multiple copies of a card are printed on a single large sheet, then cut apart. This transforms the quantity from 'number of finished cards' to 'number of press sheets,' and the transformation depends on the card size, the press sheet size, and the layout.

function computeSheetLayout(cardWidth, cardHeight, sheetWidth,

sheetHeight, gutterMm) {

// Try both orientations and pick the best

const layout1 = {

across: Math.floor(sheetWidth / (cardWidth + gutterMm)),

down: Math.floor(sheetHeight / (cardHeight + gutterMm))

};

const layout2 = { // rotated 90°

across: Math.floor(sheetWidth / (cardHeight + gutterMm)),

down: Math.floor(sheetHeight / (cardWidth + gutterMm))

};

const up1 = layout1.across * layout1.down;

const up2 = layout2.across * layout2.down;

const best = up1 >= up2 ? layout1 : layout2;

const n_up = Math.max(up1, up2);

return {

n_up, // cards per sheet

across: best.across,

down: best.down,

sheets_needed: Math.ceil(5000 / n_up), // for 5000 cards

waste_cards: (Math.ceil(5000/n_up) * n_up) - 5000

};

}

// Example: 88x55mm card on 720x1020mm SRA3 sheet, 3mm gutter

// Layout 1: floor(720/91) x floor(1020/58) = 7 x 17 = 119 up

// Layout 2: floor(720/58) x floor(1020/91) = 12 x 11 = 132 up

// Best: 132-up, need ceil(5000/132) = 38 sheets

// Waste: 38*132 - 5000 = 16 cards (effectively zero waste)

The sheet layout calculation transforms the quantity context for all press-related pricing. The offset print pricing function receives 38 sheets, not 5,000 cards. Setup costs are per press run. Run rates are per sheet impression. The substrate pricing uses 38 sheets at the SRA3 sheet cost, not 5,000 at the card-size cost.

This transformation is handled by a layout resolver node that sits between the product root and the layer, converting card-quantity to sheet-quantity:

Product (qty: 5000 cards)

└─ Layout Resolver (transforms qty: 5000 cards → 38 sheets)

└─ Layer 1 (qty context: 38 sheets)

├─ CMYK Front (prices at 38 sheet impressions)

├─ Spot UV (prices at 38 sheets)

└─ Die Cut (prices at 38 sheets, cuts 132 cards/sheet)

20. Pitfalls, Edge Cases, and Failure Modes

This chapter catalogs every significant way the system can fail, produce incorrect results, or behave unexpectedly. Each pitfall includes the root cause, the symptom, and the fix. Treat this as a checklist during implementation.

20.1 Floating-Point Pricing Errors

JavaScript and most languages use IEEE 754 double-precision floating-point for numbers. This leads to infamous precision errors:

0.1 + 0.2 = 0.30000000000000004 // not 0.3

// In pricing, this manifests as:

// $12.50 * 0.72 = $8.999999999999998 (not $9.00)

// Over 50 line items, these errors can accumulate to visible

// differences ($0.01 - $0.05) that destroy user trust.

The fix: use integer arithmetic in the smallest currency unit (cents or tenths of cents). Store all monetary values as integers, perform all arithmetic as integer operations, and convert to decimal only for display:

// WRONG: floating point

const price = 12.50 * 0.72; // 8.999999...

// CORRECT: integer cents

const price_cents = 1250 * 72 / 100; // 900 cents = $9.00

// For sub-cent precision (rate tables often have fractions):

// Use tenths of cents (multiply all values by 1000)

const price_mills = 12500 * 720 / 1000; // 9000 mills = $9.00

// Convert to display only at the very end:

function formatPrice(mills) {

return '$' + (Math.round(mills) / 1000).toFixed(2);

}

20.2 Stale Memoization Cache

The memoization cache is per-quote-request. It should never persist across requests. If it does (due to a bug where the cache is module-scoped instead of function-scoped), a configuration change will not be reflected in subsequent quotes because the cache returns old results.

// WRONG: module-scoped cache persists across calls

const globalCache = new Map(); // DANGEROUS

function generateQuote(config) {

return evaluateNode(config.root_node_id, globalCache);

}

// CORRECT: cache is scoped to each quote request

function generateQuote(config) {

const requestCache = new Map(); // FRESH per call

return evaluateNode(config.root_node_id, requestCache);

}

The subtree price cache (Redis-level, for unchanged configurations) is a separate concern. That cache uses a content hash as the key, so any configuration change produces a new key and a cache miss. The two caching levels must not be confused.

20.3 Circular Reference in Cross-Node Dependencies

The cycle detection from Chapter 17 prevents cycles in the parent-child graph structure. But cross-node dependencies (Chapter 19) can create logical cycles that are invisible to the structural cycle detector.

Example: Process A reads an attribute from Process B to determine its cost. Process B reads an attribute from Process A to determine its cost. Neither is a child of the other, so there is no structural cycle. But the pricing functions create a circular dependency that causes infinite recursion or undefined behavior.

The fix: cross-node pricing dependencies must be declared, and the engine must build a dependency graph that includes both structural (parent-child) and pricing (reads-attribute-from) edges, then check for cycles in the combined graph:

interface PricingFunction {

id: string;

// Declared dependencies (beyond parent-child)

reads_from: {

node_ref: string; // 'parent', 'sibling:type', 'node:id'

attributes: string[];

}[];

evaluate(node, context, childResults): PricingResult;

}

// At registration time, build the full dependency graph

// and reject any function that creates a cycle

function validatePricingDependencies(typeRegistry) {

// Build a graph of type-level pricing dependencies

// If TypeA's function reads from TypeB, and

// TypeB's function reads from TypeA, reject both

}

20.4 Concurrent Configuration Edits

Two users editing the same configuration simultaneously can cause lost updates. User A loads version 3, changes the UV area. User B loads version 3, changes the paper weight. User A saves (version becomes 4). User B saves (overwrites version 4 with their changes, losing User A's UV edit).

The fix is optimistic concurrency control using the version field:

async function saveConfig(config) {

const result = await db.query(

`UPDATE product_configurations

SET config = $1, version = version + 1,

updated_at = NOW()

WHERE id = $2 AND version = $3

RETURNING version`,

[config, config.id, config.version]

);

if (result.rowCount === 0) {

// Version mismatch: someone else saved first

throw new ConcurrentModificationError(

`Configuration ${config.id} was modified by another user.`

+ ` Please reload and reapply your changes.`

);

}

config.version = result.rows[0].version;

}

The WHERE version = $3 clause ensures the update only succeeds if no one else has modified the configuration since it was loaded. If it fails, the user is prompted to reload and reapply their change. This is a standard pattern in collaborative editing systems.

20.5 Orphaned Nodes After Bulk Operations

Bulk operations (deleting a layer with all its processes, cloning a subtree, merging two configurations) can leave orphaned nodes if the operation is not atomic. If the system crashes after removing a layer from the root's children but before deleting the layer's process nodes from the flat map, those process nodes become orphans.

The fix: wrap bulk operations in database transactions, and run orphan detection (from Chapter 17.7) as a post-save validation step. If orphans are detected after a transaction commits, log a warning and run a cleanup job.

20.6 Rate Table Version Mismatch

Rate tables are loaded into the pricing context at the start of a quote. If a rate table is updated between loading and quote generation (e.g., a paper supplier changes their price), the quote uses the old rates. This is usually correct behavior (the quote reflects rates at the time it was generated), but it must be handled explicitly:

Every quote document stores the rate table versions that were used, so the quote can be audited and reproduced.
When a rate table changes, all cached quotes that used the old rates should be flagged (not invalidated, since historical quotes should reflect historical rates).
The system should offer a 're-quote at current rates' function that generates a new quote using the latest rates, separate from the historical quote.

20.7 Deep Graph Performance Degradation

For most products, the graph has 2-4 levels of depth and 10-50 nodes. The engine handles this in microseconds. But what happens when a product has 20 layers, each with 10 processes, and 15 assembly operations referencing various layers? That is 215+ nodes and 200+ edges.

The sequential traversal still completes in single-digit milliseconds for this scale. But if the pricing functions themselves are complex (involving iterative calculations, PDF parsing for area computation, or external API calls for real-time material pricing), each node evaluation might take 10-50ms. At 215 nodes and 30ms average, that is 6.5 seconds, which is unacceptable for interactive quoting.

The fixes, in order of implementation priority:

Profile first. Measure where time is actually spent. Often 90% of the time is in 2-3 functions. Optimize those.
Parallel subtree evaluation. Independent branches (Layer 1 and Layer 2) can be evaluated in parallel using Promise.all. This cuts time proportional to the graph's branching factor.
Precompute expensive attributes. If a pricing function parses a PDF to compute UV area, do this once at configuration time and store the result as a node attribute, not at quote time.
Lazy evaluation for non-critical nodes. If the user is interactively configuring and only needs to see the total, evaluate only the subtrees that affect the total. Evaluate the full breakdown only when requested.
Worker thread offloading. Move the pricing computation to a Web Worker (frontend) or Worker Thread (Node.js) so it does not block the UI or event loop.

20.8 Event Ordering in Distributed Systems

If you scale to multiple application servers, events can arrive at the event store out of order. Server A records 'process started' and Server B records 'process completed' nearly simultaneously. If the network delay causes 'completed' to arrive before 'started' in the event store, the projection will show an impossible state transition.

The fix: use a monotonically increasing sequence number per configuration (not a global sequence, which creates a bottleneck). Each event includes the expected previous sequence number. If the event store receives an event with a gap in the sequence, it holds the event in a buffer until the gap is filled, or flags the inconsistency for manual resolution.

20.9 The 'Kitchen Sink' Node Type Antipattern

A common mistake is creating a single 'process' node type with dozens of optional attributes to cover every possible process. This defeats the purpose of the type system:

// ANTIPATTERN: one type to rule them all

{

type_id: 'generic_process',

attribute_schema: {

properties: {

colors: { type: 'number' }, // only for print

area_pct: { type: 'number' }, // only for coatings

die_ref: { type: 'string' }, // only for die cut

edge_color: { type: 'string' }, // only for edge paint

foil_type: { type: 'string' }, // only for foil stamp

// ... 30 more optional fields

}

}

}

// CORRECT: specific types with relevant attributes

// 'print_process' has colors, coverage, method

// 'coating_process' has area_pct, thickness, finish

// 'cutting_process' has die_ref, complexity

// Each type is focused, validated, and self-documenting

The kitchen-sink type provides no validation (any combination of attributes is 'valid'), generates confusing UI forms (most fields are irrelevant), and makes pricing function dispatch ambiguous (which function handles a node with both 'colors' and 'die_ref'?). Keep types specific and focused.

20.10 Testing Strategy

Testing a graph-based pricing engine requires a different approach than testing CRUD operations. Here is the testing pyramid for this architecture:

Unit Tests: One per Pricing Function

Each pricing function should have 10-20 unit tests covering normal cases, edge cases (zero quantity, maximum coverage, minimum order quantities), boundary conditions (exactly at quantity break thresholds), and rounding behavior. These tests use a mocked PricingContext with known rate table values.

Integration Tests: Known Product Configurations

Create 5-10 reference configurations that represent your most common and most complex products. Compute the expected quote by hand (using a spreadsheet). The integration test loads the configuration, runs the engine, and asserts that every line item matches the spreadsheet within $0.01.

Shadow Testing: Engine vs. Legacy System

During migration, every quote generated by the legacy system should also be generated by the graph engine. Differences are logged and investigated. This is the most powerful validation tool because it uses real-world data with real-world complexity.

Property-Based Tests: Invariant Checking

Rather than testing specific outputs, test invariants that should always hold:

A quote total must always be positive if quantity is positive.
Increasing quantity must decrease or maintain per-unit price (never increase it, assuming standard quantity curves).
Adding a process to a product must increase the total (never decrease it).
The sum of all node results' setup costs must equal the quote's total setup cost.
The memoization cache must contain exactly one entry per reachable node in the configuration.

21. Glossary

Adjacency List — A graph representation where each node stores references (IDs) to its connected nodes. The flat-map format used in this architecture is an adjacency list.

ConfigNode — The universal data structure representing any element in a product configuration: a product, layer, process, or assembly step.

Configuration Graph — The complete DAG of ConfigNodes that defines a product. Stored as a single JSON document.

CQRS — Command Query Responsibility Segregation. A pattern where the write-side data model is different from the read-side data model. Writes go to the event store; reads come from purpose-built projections.

DAG — Directed Acyclic Graph. A graph where edges have direction and there are no cycles. Products are modeled as DAGs because a process can serve multiple layers (shared node).

Eventual Consistency — The property of a distributed system where read-side data may lag behind write-side data by a short interval (typically 1-5 seconds). A fundamental trade-off of CQRS.

Event Sourcing — A pattern where state changes are stored as immutable events rather than overwriting current state. Current state is derived by replaying the event log.

Event Upcasting — The process of converting old event formats to new ones during replay, enabling event schema evolution without data migration.

GIN Index — Generalized Inverted Index. A PostgreSQL index type that enables fast queries into JSONB documents.

Idempotency — The property where performing an operation multiple times produces the same result as performing it once. Critical for event consumers that may process the same event more than once.

JSONB — Binary JSON storage in PostgreSQL. Supports indexing, querying, and partial updates on JSON documents.

Memoization — Caching the result of a function call so that identical calls return the cached result instead of re-computing. Used in the pricing engine to avoid redundant evaluation of shared DAG nodes.

Node Type Definition — A schema that describes the attributes, allowed children, and default pricing function for a category of ConfigNode. Stored in a registry; drives UI rendering and validation.

Optimistic Concurrency — A strategy where updates check a version number before writing. If the version has changed since the data was read, the update is rejected and must be retried.

Post-Order Traversal — A tree/graph traversal that visits children before parents. Used by the pricing engine to compute costs bottom-up.

Pricing Context — The pre-loaded collection of rate tables, quantity curves, and global parameters used during quote computation. Loaded once per quote, not per node.

Pricing Function — A registered function that computes the cost of a single ConfigNode given its attributes, context, and children's results.

Projection — A read-side view of data derived from events. Optimized for a specific query pattern (e.g., job board, station queue). Can be rebuilt from events at any time.

Shadow Quoting — Running the new pricing engine in parallel with the legacy system to compare outputs. Used during migration to validate accuracy before switching over.

Strangler Fig — A migration pattern where a new system is built alongside the old one, progressively taking over traffic until the old system is retired. Named after the strangler fig tree that grows around its host.

YAGNI — You Aren't Gonna Need It. A principle that advises against building infrastructure or features in anticipation of needs that have not yet materialized.

Cycle Detection — An algorithm that identifies circular references in a graph. Uses three-color marking (white/gray/black) during DFS. Must run on every graph mutation.

Diamond Dependency — A pattern where two nodes share a common descendant, creating a diamond shape in the graph. Causes double-counting if not handled with memoization and cost-allocation rules.

Gang Running (N-Up) — Printing multiple copies of a product on a single press sheet, then cutting apart. Transforms the quantity context from finished pieces to press sheets.

Orphan Node — A node that exists in the flat map but is not reachable from the root node. Caused by incomplete deletion operations. Detected by comparing reachable nodes against all nodes.

Reverse Topological Sort — An ordering of graph nodes where every child appears before its parent. Guarantees that when a node is evaluated, all its children's results are already available.

Waste Cascading — The phenomenon where waste at a later production stage destroys value added by all earlier stages. Correct waste costing must account for cumulative value, not just raw material cost.

Reverse Yield Calculation — The correct method for computing effective quantities by working backward from required good output, accounting for compounding waste at each process stage.

Cross-Node Dependency — A pricing dependency where one node's cost calculation reads attributes from a sibling, parent, or distant node. Must be declared and checked for cycles.

Optimistic Concurrency Control — A strategy for handling concurrent edits by checking a version number before saving. Rejects writes that would overwrite changes made by another user since the data was loaded.

Borenstein Routing Model — A 2000 paper in the European Journal of Operational Research that established the theoretical framework for representing manufacturing routing flexibility as DAGs. Demonstrates three representation levels: Precedence Graph, Feasible Operations Tree, and Projected Route Graph.

Dependency Path — A directed path through the product graph from one node to another. Determines cache invalidation scope (which ancestors need re-pricing when a node changes), parallel evaluation boundaries (nodes without dependency paths can be priced simultaneously), and pricing function input resolution.

Forward Reachability — Starting from a node and following child edges to determine all descendants. Used to enumerate all components of a product for pricing and production tracking.

Reverse Reachability — Starting from a node and following parent edges upward to determine all ancestors. Used for cache invalidation: when a node changes, all ancestors need re-pricing.

Transitive Reduction — A graph simplification that removes redundant edges. An edge A-to-C is redundant if a path A-to-B-to-C already exists. Prevents double-counting in pricing aggregation and improves visual clarity.

d-Separation (in product graphs) — Borrowed from causal inference. Two nodes are d-separated if no dependency path connects them through the pricing context. d-Separated nodes can be evaluated in parallel because their pricing computations are independent.

Graph Depth — The longest path from the root node to any leaf. Equals the maximum recursion depth of the pricing engine. Typical values for print products: 2-5.

Graph Breadth — The maximum number of children at any single level. Determines the potential for parallel evaluation. A product with 12 layers has breadth 12 at level 1.

Shared Node Ratio — The percentage of nodes with more than one parent. A ratio of 0% means the graph is a pure tree. Higher ratios indicate more DAG complexity and more memoization benefit.

End of Document

Graph-Based Product Configuration

GRAPH-BASED PRODUCT CONFIGURATION

TABLE OF CONTENTS

1. The Problem: Why Relational MIS Systems Fail at Complexity

1.1 The Table Lookup Bottleneck

Key Insight

1.2 The Schema Rigidity Problem

1.3 The Coupling Problem

2. Core Concept: Products as Directed Acyclic Graphs (DAGs)

2.1 What Is a DAG and Why Not a Tree?

2.2 Why This Model Is Fundamentally Different

3. Data Modeling: The Product Configuration Graph

3.1 The Universal Node Schema

3.2 The Configuration Document

Why a Flat Map Instead of Nested Objects?

3.3 Concrete Example: The Card as a Document

4. The Node Type System: Extensibility Without Schema Changes

4.1 Type Definition Schema

4.2 Example Type Definitions

Spot UV Coating Type

Edge Painting Type

4.3 Adding a New Type at Runtime

5. Pricing Architecture: Function Pipelines Over Table Lookups

5.1 The Pricing Function Contract

5.2 Example: The Spot UV Pricing Function

5.3 The Aggregator Pattern

5.4 Dependency-Aware Pricing

6. The Quote Engine: Tree Traversal and Memoization

6.1 The Core Algorithm

6.2 Memoization and DAG Handling

6.3 Performance Analysis

7. Storage Strategy: Document vs. Relational Hybrid

7.1 What Goes Where

7.2 PostgreSQL JSONB: The Practical Sweet Spot

7.3 Loading Strategy: One Read, Full Graph

8. Event Sourcing: Granular Production Tracking

8.1 The Event Schema

8.2 Example Event Stream for a Single Node

8.3 Current State Projection

9. CQRS: Separating Reads from Writes

9.1 The Write Side: Commands

9.2 The Read Side: Query-Optimized Projections

9.3 The Benefits for Your Use Case

10. Caching, Performance, and Scale

10.1 Three-Layer Caching Strategy

Layer 1: Rate Table Cache

Layer 2: Subtree Price Cache

Layer 3: Node-Level Memoization

10.2 Scaling Patterns

10.3 Parallel Subtree Evaluation

11. Migration Strategy: From Legacy to Graph

11.1 Phase 1: Shadow Mode (Weeks 1-4)

11.2 Phase 2: Read Path Migration (Weeks 5-8)

11.3 Phase 3: Write Path Migration (Weeks 9-16)

11.4 Phase 4: Legacy Retirement (Weeks 17+)

Migration Rule of Thumb

12. Implementation Reference: Data Structures and Pseudocode

12.1 Configuration CRUD Operations

12.2 Adding a Node to an Existing Configuration

12.3 Complete Quote Generation with Caching

13. Architecture Decision Records

ADR-001: Document Storage for Product Configurations

ADR-002: Pricing Functions as Code, Not Database Rules

ADR-003: Event Sourcing for Production Tracking

ADR-004: CQRS for Read/Write Separation

14. Graphs vs. Relational: A Deep Side-by-Side Comparison

14.1 The Core Difference: Fixed Schema vs. Data-Defined Structure

The Relational Approach

The Graph Approach

14.2 Query Pattern Comparison

14.3 The Real Comparison: It Is About Data Loading, Not Storage

14.4 The Myth of Normalization for Everything

The Litmus Test

15. Why This Architecture Isn’t Universal: Honest Bottlenecks

15.1 The Talent Gap

15.2 Event Sourcing Operational Complexity

15.3 CQRS and Eventual Consistency

15.4 Document Storage Query Limitations

15.5 The Debugging Surface Area

15.6 Organizational Inertia: The Real Blocker