Shipping an agent your accountant will actually trust

Accountants don’t trust AI. They trust audit trails, approvals, and paper. If you want an agent anywhere near anything financial, you need to earn that trust the hard way — by making it behave like the kind of system they already trust.

This is a build log from a real engagement. The names are changed, the architecture is not.

The problem

The client had a manual AP workflow: invoices arrived by email, a human typed figures into a spreadsheet, a second human reviewed, a third approved. Slow, error-prone, and expensive. They wanted to automate it. They did not want to “just trust the AI.”

The constraint that shaped everything

Early in the conversation, the controller said: “I need to know that if something goes wrong at 2 AM on a Tuesday, I can read a log and understand exactly what happened and why.”

That sentence determined the architecture.

What we built

The agent doesn’t make decisions. It proposes them. Every action — extract invoice data, match to PO, flag for review, route for approval — is logged as a discrete event with a reason, a confidence level, and a pointer to the source data.

{
  "event": "invoice.matched",
  "invoice_id": "INV-2024-0412",
  "matched_po": "PO-8871",
  "confidence": 0.94,
  "reason": "vendor name, amount, and line items match within 2%",
  "timestamp": "2024-04-08T14:22:11Z",
  "requires_approval": false
}

If confidence falls below a threshold, the item enters a human review queue — not an email, not a Slack message, but a queue in a system the controller already uses.

The architecture

Input: email + attachment via a monitored inbox (SES + Lambda)
Extraction: structured data from PDFs (Textract + a validation layer)
Matching: deterministic rules first, model-assisted disambiguation second
Logging: every event written to an append-only store before any action is taken
Approval: routed through the existing ERP workflow, not a new tool

No new UI. No dashboard. No Slack bot.

Six months later

Error rate on manual data entry: was 3.4%, now 0.1% (model errors caught by the validation layer). Time from invoice received to payment approved: was 4.1 days, now 0.8 days. The controller has reviewed the event log once, during the initial rollout. She said it read like a filing cabinet, and meant it as a compliment.