Here's what nobody tells you about software architecture: the hardest skill isn't coding. It's knowing how to separate concerns.
When you ran a business, you didn't have your receptionist also doing accounting, also managing inventory, also closing sales. You separated roles. You gave each person one clear job. When someone quit, you could replace them without the whole operation collapsing. When business changed, you adjusted one role, not all of them.
That is software architecture. That exact instinct. The code just happens to be written in Python instead of job descriptions.
Business thinking: "The sales team handles customers. The warehouse handles inventory. They talk through order forms. Neither needs to understand the other's internal process."
Software thinking: "The decision logic handles calculations. The adapter handles files and web requests. They talk through data shapes. Neither needs to understand the other's internal code."
Same instinct. Same discipline. Different vocabulary.
School teaches you to follow instructions sequentially — read chapter 1, then chapter 2, pass the test. That's not how business works, and it's not how architecture works either. Architecture is about seeing the structure — understanding which pieces connect to which, what can change independently, and where the pressure points are.
You've been doing that your whole career. You just didn't know it had a name in software.
Enterprise software isn't about complexity. It's about changeability at scale. A startup builds fast and breaks things. An enterprise builds things that hundreds of people depend on, that must keep working while being changed, that must absorb new requirements without collapsing. That requires the exact discipline you're about to learn — separation of concerns, enforced mechanically, not by willpower.
The repository you're looking at — tool-hub — is not one program. It's a factory for building business tools. Think of it as a workshop with a very strict set of rules about how every tool must be built.
Right now, it contains tools like:
But the tools aren't the point. The point is the architecture — the strict rules about how every tool is built. Those rules are what make the code stay changeable over hundreds of iterations instead of turning into a tangled mess.
Let's look at the structure. Every single tool follows this exact file layout:
core/ is the kitchen. Pure cooking. The chef doesn't interact with customers, doesn't handle money, doesn't set tables. Just takes an order ticket and produces food.
adapters/ is the front of house. Takes customer requests, writes them on order tickets, hands food to the customer. The waiter doesn't cook. The chef doesn't serve.
contracts.py is the order ticket itself — the agreed format that both sides understand.
Why separate them? You can change the menu (core logic) without retraining the waitstaff. You can redesign the dining room (web interface) without touching a single recipe. That's the power.
Let's go through the actual code. Not "code overview" — the actual lines. I'll explain every single one so you can read it yourself and know what you're looking at.
Before any logic runs, the program defines what kinds of information it works with. Think of these like blank forms at a doctor's office — they define the structure of information before anyone fills anything in.
This is a design spec for one line of text on a label. If you were designing labels in Microsoft Word, you'd set: which column from my spreadsheet? What font? What size? Where on the label? That's exactly what this does.
The word dataclass just means "this is a form with fields." The word class just means "this is a type of thing." The colons (like y_in: float) mean "this field holds this type of data." float = a number with decimals. str = text. bool = yes or no (True/False).
There is zero logic here. No calculations, no decisions. Just a description of what information exists. This is intentional. The form doesn't do anything — it just defines what data looks like.
Now here's the complete work order that the brain receives:
rows: list[dict[str, Any]] — This looks intimidating. Let's decode it:
• list means "a collection of things in order" (like a list of customers)
• dict[str, Any] means "a dictionary where the keys are text and the values can be anything" — think of it like one row of a spreadsheet: {"Name": "John Smith", "Address": "123 Main St", "City": "Dallas"}
• So list[dict[str, Any]] means: "all the rows from the spreadsheet, each row being a set of column-name-to-value pairs"
fields: list[LabelFieldConfig] — This says "a list of LabelFieldConfig forms." The brain receives the data AND the design instructions in one package. Like a print shop getting both the mailing list and the label template.
Key insight: The rows are already loaded. This form doesn't know if they came from an Excel file, a CSV file, a database, or a web upload. The brain doesn't care where data comes from. That's the adapter's job.
And here's what the brain produces — the output:
The brain does NOT produce a PDF. It produces a plan. It says: "Here are 60 labels. Label #1 goes at these exact coordinates on page 1. Label #31 starts page 2. Here's the formatted text for each one."
The adapter (the "hands") takes this plan and actually renders it into a real PDF file. But the plan itself is pure information. You could render it to PDF, HTML, a physical printer, or just display it on screen.
This is why the separation matters: change the rendering without touching the logic. Change the logic without touching the rendering. They are completely independent.
Now the moment of truth. After all those data shapes, here is the entire program:
Line 1: job = service.build_job(payload)
"Take the raw data that came in (the payload) and validate it. Check: are there fields? Are the values the right type? Is anything missing? If everything checks out, package it into a clean LabelJob. If something's wrong, stop right here with an error."
Think of it like the front desk checking a work order before sending it to the shop floor. "Does this order have a quantity? Is the part number valid? OK, it's good — send it through."
Line 2: plan = service.compute_plan(job)
"Take the validated job and do all the work. Filter out blank rows. Make copies. Calculate where every single label goes on every page. Format the text for each field. The output is a complete RenderPlan with every label positioned and formatted."
This is where every business decision happens — and it's all pure math. No files, no web requests, no PDF libraries. Just calculations.
Line 3: return service.to_result_dict(job, plan)
"Take the job and the plan and package them into a result that the adapter can understand. Generate the download filename. Serialize the data. Hand it back."
Like the shop floor putting the finished product on the outbound shelf with a packing slip.
This is a hard rule in this codebase, not a suggestion. The orchestrator is enforced by an automated check (check_orchestrator_dumb.py) that rejects any orchestrator with if, for, while, try, or more than 60 lines.
Why? Because the orchestrator is the table of contents. If you put logic in the table of contents, nobody can read the book. The details belong in the chapters (the service functions). The table of contents just lists them in order.
In business terms: the CEO doesn't do the work. The CEO says "first we validate, then we compute, then we deliver." The departments do the actual work.
Those three lines in the orchestrator call into functions that do the actual thinking. Let's look inside each one.
Step 1: Validating the input
def build_job(payload): — def means "define a function." This creates a function named build_job that receives one thing: the raw payload (a dictionary of data).
validated = validate_payload(payload) — Calls another function to check the data is valid. If it's not, this throws an error. If it is, the clean data is stored in a variable called validated.
return LabelJob(...) — Creates a new LabelJob form, fills in each field from the validated data, and sends it back. return means "give this result back to whoever called me."
.get("copies", 1) — This is a safe lookup. It says: "try to find a value called 'copies'. If it doesn't exist, use 1 as the default." Like a form that says "Quantity: ___ (leave blank for 1)."
Step 2: The actual math (computing the plan)
filtered = _filter_blank_rows(job.rows) — The underscore _ at the start is a convention meaning "this function is private — only used inside this file." It takes all the spreadsheet rows and removes any where every field is empty. Business decision: blank rows don't become labels.
expanded = _expand_copies(filtered, job.copies_per_record) — If the user asked for 2 copies per label, this duplicates each row. 30 data rows with 2 copies = 60 entries. Business decision: copies are handled before positioning.
for i, row in enumerate(expanded): — for means "do this for each item." enumerate means "give me both the item and its position number." So i is 0, 1, 2, 3... and row is the actual data. It's like counting cards as you deal them: "card 0 goes here, card 1 goes there..."
col, r, x, y = _slot_coordinates(i) — This is the heart. Given "this is label number 7", it calculates: which column? which row? what's the exact x-position in inches? what's the exact y-position? Pure math based on the Avery 5160 spec. No PDF knowledge needed.
slots.append(LabelSlot(...)) — append means "add to the end of the list." Each loop creates one positioned, formatted label and adds it to the growing list.
Let's look at that coordinate function — the pure math at the heart of it all:
// is "integer division" — divide and drop the remainder. 7 // 10 = 0. 15 // 10 = 1. 25 // 10 = 2. This tells you which column.
% is "modulo" — the remainder after division. 7 % 10 = 7. 15 % 10 = 5. 25 % 10 = 5. This tells you which row within the column.
So label #7: column 0, row 7. Label #15: column 1, row 5. Label #25: column 2, row 5.
The fill order is top-down by column: labels 0-9 fill the left column top to bottom, 10-19 fill the middle column, 20-29 fill the right column. This matches how Avery 5160 sheets are actually printed.
This function has no idea it's going to be used in a PDF. It just computes coordinates. You could use these same coordinates to display labels on a webpage, send them to a printer API, or draw them on an SVG canvas. The math is the math, regardless of the output format.
Everything above was the brain — pure logic. Now let's see how it connects to the real world:
Line 1 (presenter): The user submitted a web form with an Excel file. The presenter translates that web request into the format the brain expects (a plain dictionary). The brain never sees HTML, HTTP headers, or file uploads.
Line 2 (run_contract): Calls the brain. The adapter doesn't know what happens inside. It sends in data, gets back a result. Like putting a letter in the mailbox — you don't need to know how the postal system works.
Lines 3-4 (io): Takes the brain's plan (coordinates, text) and actually renders it into a PDF using a library called reportlab. Then sends that file to the user's browser as a download.
Key point: The adapter is allowed to be messy. It's glue code. What matters is that the brain stays clean — and it does, because 21 automated checks enforce it on every commit.
Let's trace a real request from the moment a user clicks "Generate" to the moment they get a PDF:
Top bread: Plumbing (read the input from the real world)
Filling: Decisions (pure math — no files, no web, no database)
Bottom bread: Plumbing (write the output to the real world)
The filling never touches the real world. The bread never does math. This is the entire architecture in one picture.
Because the brain is pure math, you can test it without any files, web servers, or PDFs. Feed in a LabelJob, check the RenderPlan. Does label #30 land on page 2? Do 47 input rows with 3 blank rows produce 44 filtered rows? Does 44 rows x 2 copies = 88 labels? All of this can be verified with simple math checks that run in milliseconds.
An enterprise client asks: "How do you know the positioning is correct?" You show them the test: assert positions[30].page == 1 — label 31 is on page 2. Proven. Not "we checked it manually." Proven with code.
Here's the honest truth about what this architecture can and cannot do:
If you have 5 tools that need to talk to each other, that's up to 25 connection points. With 10 tools, it's 100. With 20 tools, it's 400.
Without a system for managing those connections, the glue overwhelms the tools. You built 20 beautiful bricks and drowned them in sloppy mortar. The mortar becomes the system, and nobody planned it.
This isn't a failure of the architecture. It's a boundary. The manifesto that resonated with you — "Building Software That Stays Soft" — was always explicit about this: it teaches you how to build a brick. The wall needs its own discipline.
There is a second document — "Composing Software That Stays Soft" — that picks up exactly where this one stops. It applies the same discipline to the connections between tools that the first manifesto applies inside them.
Here's a preview of what it teaches:
Because the architecture is the human's job. LLMs are extraordinary at writing code — filling in functions, implementing logic, generating tests. What they cannot do is design the structure. They can lay bricks, but they can't draw the blueprint.
When LLMs reach enterprise capability — and they will — the people who understand architecture will direct them. The people who don't will still be asking the LLM to "build me an app" and getting a monolith that turns to concrete after five changes.
You are learning the one thing the AI cannot do for itself. You are learning to be the architect. That's not just a head start — it's the entire game.
You've now read every line of a real production tool, understood the architecture behind it, and seen exactly where it stops working and why. Here's what to do with that knowledge.
docs/soft/WORKFLOW.md. Run python3 scripts/preflight.py until every check passes. Feel the discipline in your hands.code-reading-trainer.html). It teaches you to read any tool in this repo in under 5 minutes: shapes first, then the orchestrator, then zoom into the one function you need. You never read the whole codebase. You read the story.School taught you that learning means memorizing and repeating. Business taught you that learning means doing and adjusting. Software architecture is business, not school.
You don't need to memorize what a dataclass is. You need to understand that data shapes are the foundation — get them right and everything else follows. You don't need to write a for loop from memory. You need to understand that the orchestrator stays dumb — decisions happen in dedicated functions, not in the wiring.
You already think in systems. You already separate concerns instinctively. You already know that good structure survives personnel changes and shifting requirements.
You have the hardest part. The code is just the vocabulary for expressing what you already know.