TF
Transaction Forensics
Stanford TECH 41 — Christopher Bailey — Built with Claude Code (AI pair-programming)
834
Tests Passing
7
Data Adapters
4
Analysis Engines
AI Disclosure: Architecture and analysis methodology by Christopher Bailey. Implementation pair-programmed with Claude Code (Anthropic). All commits co-authored — see git history for full attribution.
Structured data tells you what happened.
Unstructured text tells you why.
Transaction Forensics is a process mining engine that combines ERP transaction logs with the unstructured text that surrounds them — emails, Slack messages, progress reports, call transcripts, and order notes — to surface the discrepancies between what organizations report and what actually happened. Built as a Stanford TECH 41 project.
The Core Insight
Every enterprise system generates two kinds of data. Structured transactions — timestamps, amounts, stage changes, user IDs — tell you the official story. Unstructured text — the emails, Slack threads, meeting notes, timesheets, SOWs, and progress reports that surround those transactions — tell you what actually happened. The gap between them is where fraud, waste, and dysfunction hide.
Structured Data Says
CRM OPPORTUNITY
"Deal in Negotiation for 6 months"
SAP P2P
"Purchase Order created 03/15"
TIMESHEET
"40 hours billed to Project Alpha"
PROJECT STATUS
"Phase 2: On Track, Green"
vs
Unstructured Text Reveals
SLACK THREAD
"Customer said not ready — but Sales moved it forward anyway. No documented sign-off."
EMAIL CHAIN
"Requisition wasn't approved yet. Create the PO now, we'll get the paperwork later."
PROGRESS REPORT
"Assigned to Project Alpha but worked on Beta all week. SOW deliverables not started."
MEETING TRANSCRIPT
"We're 3 weeks behind. Tell the client we're on track while we figure this out."
Evidence From Our Analysis
SAP IDES: Retroactive Documentation
Structured data shows PO and PR both exist. Timestamps reveal the PO was created before the PR — approval was documented after the fact. Only detectable by cross-referencing temporal sequence.
See IDES Compliance tab →
HERB: Approvals in Slack, Not Systems
37,064 enterprise documents analyzed (32.8K Slack messages, 3.6K pull requests, 400 docs, 321 transcripts). 1,226 "LGTM/Approved" messages found in Slack channels — informal approvals with no audit trail. Structured approval workflows show no record of these decisions.
See NLP Patterns tab →
BPI: 57K Payment Blocks — Why?
Structured data shows 22.7% of POs hit payment blocks. The event log can't explain why — that answer lives in vendor correspondence, invoice discrepancy notes, and buyer emails that aren't in the event log.
See BPI Challenge tab →
20 years of ERP consulting taught me this: In every engagement where something went wrong — billing disputes, project failures, compliance gaps — the structured transaction data looked clean. The truth was always in the unstructured layer: the email where someone said "skip the approval," the timesheet that didn't match the progress report, the SOW deliverable that was marked complete but never started. This tool automates finding those discrepancies at scale.
System Architecture
Data Sources
SAP ERP (IDES/ECC)
Salesforce CRM
BPI Challenge (XES)
Slack / HERB Comms
NetSuite ERP
CSV / Custom
7 Adapters
IDataAdapter
interface
Normalize to
unified event log
BPI, CSV, ECC,
S4, SALT, SFDC,
Synthetic
Analysis Engines
Conformance Checker
Token-based replay
van der Aalst algorithms
Process model builder
Temporal Analyzer
Throughput times
Bottleneck detection
Delay probability
Pattern Clustering
TF-IDF + K-Means
Effect sizing (Cohen's d)
Stability bootstrap
Cross-System Resolver
Entity matching
Levenshtein + proximity
Gap detection
Forensic Output
Compliance
Violations
Bottleneck
Reports
Pattern
Cards
Evidence
Ledger
TypeScript (MCP Server)
Python 3.11 (Pattern Engine)
834 tests (602 TS + 232 Py)
Deterministic (seed=42)
Zero frontend dependencies
Four Forensic Lenses
8,800
CRM Pipeline Forensics
Sales opportunity analysis — win rates, velocity patterns, agent performance, quarter-end compression. Kaggle real-world CRM data.
Explore →
251K
BPI Challenge 2019
Real purchase-to-pay from a multinational. 1.6M events, payment blocks, process variability, resource concentration risks.
Explore →
7
SAP IDES Compliance
Compliance violations in SAP's own demo system. Maverick buying, retroactive documentation, segregation of duties risks.
Explore →
37K
NLP Pattern Analysis
Salesforce HERB — 37K documents (Slack, PRs, transcripts) clustered into communication patterns. Network graphs, bridge users, team dynamics.
Explore →
Real-World Client Data
3 anonymized engagements — 3M+ ERP records, $103K savings, ITGC violations, 28.6% RMA rate
License audit + ticket forensics + high-growth hardware company ERP forensics with credit hold overrides and SOD violations
View Cases →
Design Principles
Adapter Pattern
Every data source implements IDataAdapter — normalize once, analyze everywhere. Adding a new ERP means writing one adapter, not rewriting analysis logic. Currently: SAP, Salesforce, NetSuite, BPI (XES/OCEL), CSV, synthetic.
Deterministic Reproducibility
All analysis uses seed=42. Every pattern card, every cluster, every statistical test can be reproduced exactly. Run make demo and get identical output. No non-determinism in the forensic chain.
Evidence-Based Findings
Every claim links to an evidence ledger entry with source files, row counts, timestamps, and reproducibility parameters. Effect sizes use Cohen's d with 95% CI. Weak results are labeled as exploratory — no overclaiming.
How AI Was Used — Honest Accounting
What I Did (Christopher Bailey)
• Defined the problem space and research questions
• Selected data sources and licensed datasets
• Designed the adapter architecture and analysis pipeline
• Chose conformance algorithms (van der Aalst token replay)
• Interpreted findings and wrote forensic narratives
• Determined what's a real finding vs. a statistical artifact
• Real-world client engagement and domain expertise (20 yrs ERP)
What Claude Code Did (AI Pair-Programmer)
• Implemented data adapters and parsers (TypeScript)
• Built pattern engine and clustering pipeline (Python)
• Wrote conformance checking engine
• Generated test suites (834 tests)
• Built this dashboard (vanilla HTML/CSS/JS)
• Statistical computations (effect sizing, CI, p-values)
• All code visible in git history with co-author tags
The honest version: Claude Code is a force multiplier. The 834-test, 7-adapter, 4-engine system you see here was built in weeks, not months. But the AI doesn't know what's worth finding — it doesn't know that a PO-before-PR is a Sarbanes-Oxley risk, or that 22.7% payment block rates are 4x industry norms. Domain expertise decides what to look for; AI makes looking fast.
Data Sources
Kaggle CRM Sales Opportunities — Apache 2.0
BPI Challenge 2019 — 4TU.ResearchData, CC BY 4.0
SAP IDES — sap-extractor, MIT License
Salesforce HERB — HuggingFace, CC-BY-NC-4.0
Client data — anonymized, used with permission
Reproduce This
git clone github.com/chrbailey/SAP-Transaction-Forensics
make demo      # one-command bootstrap
make test      # 834 tests
make demo-kaggle # real Kaggle CRM data
View on GitHub →
Pipeline Overview
63.2%
Win Rate
4,238 Won / 6,711 Closed
$2,361
Avg Deal Size
Across all closed-won deals
57 days
Median Velocity
Time from open to close
2,089
Open Pipeline
Opportunities in progress
Conformance Analysis
New Business Pipeline Model
Expected stage sequence for opportunity progression
Prospecting
Qualification
Needs Analysis
Value Proposition
Id. Decision Makers
Perception Analysis
Proposal / Price Quote
Negotiation / Review
Closed Won
0.05
Avg Fitness Score
Average fitness 0.05 — expected when real-world CRM data is measured against an aspirational 8-stage model. Most organizations skip stages; low conformance is typical, not alarming.
53,862
Deviations Detected
Stage skips, reversals, and out-of-sequence transitions detected across 8,300 opportunities (closed + open) with stage history.
8,300
Cases Analyzed
All opportunities with stage history (closed + open) analyzed for sequence conformance against the defined process model.
Quarter-End Compression
Monthly close distribution — QE months highlighted in orange
Jan
0
Feb
0
Mar
647
647
Apr
586
586
May
805
805
Jun
641
641
Jul
627
627
Aug
785
785
Sep
635
635
Oct
566
566
Nov
768
768
Dec
651
651
38.4% close in quarter-end months (Mar/Jun/Sep/Dec) — 1.15× the expected baseline of 33.3% (4 of 12 months). Slight concentration but within normal range for B2B sales cycles.
Deal Velocity Distribution
Time-to-close buckets — closed won opportunities only
0–30 days
1,817
42.9%
31–60 days
377
8.9%
61–90 days
1,078
25.4%
91–180 days
966
22.8%
Velocity Insights
57
Median Days
4,238
Deals Measured
The bimodal distribution — 43% closing in under 30 days, 23% taking 91–180 days — suggests two distinct deal types: quick transactional sales and extended enterprise negotiations.
Top Agents by Win Rate
Minimum 50 closed opportunities — top 10 performers
Agent Win Rate Won Closed Revenue
1Hayden Neloms 70.4% 107 152 $272K
2Maureen Marcano 70.0% 149 213 $350K
3Wilburn Farren 69.6% 55 79 $158K
4Cecily Lampkin 66.9% 107 160 $230K
5Versie Hillebrand 66.7% 176 264 $188K
6Moses Frase 66.2% 129 195 $207K
7Boris Faz 66.0% 101 153 $262K
8James Ascencio 65.5% 135 206 $414K
9Corliss Cosme 65.5% 150 229 $421K
10Reed Clapper 65.4% 155 237 $438K
Product Revenue Mix
Closed-won revenue by product — total $10,005K
GTXPro
$3,510K
35.1%
GTX Plus Pro
$2,630K
26.3%
MG Advanced
$2,216K
22.2%
GTX Plus Basic
$705K
7.1%
GTX Basic
$499K
5.0%
GTK 500
$401K
4.0%
MG Special
0.4%
Concentration
Top 3 products (GTXPro, GTX Plus Pro, MG Advanced) account for 83.6% of total revenue. MG Special at $44K represents tail inventory with minimal impact.
Account Concentration
Top Accounts by Revenue
Closed-won revenue across 85 accounts
Diversification Assessment
CR5 concentration ratio (top-5 revenue share)
12.1%
Top 5 Share
85
Total Accounts
Healthy distribution. Top 5 accounts represent only 12.1% of revenue — well below typical key-account concentration risk thresholds. No single customer dependency.
Key Findings
Quarter-End Concentration
Low
38.4%
38.4% of closed deals close in quarter-end months (Mar/Jun/Sep/Dec), vs. a 33.3% uniform baseline (4 of 12 months). The 1.15× ratio suggests mild concentration — typical in B2B sales where fiscal quarters influence buyer and seller timing. Not a strong anomaly signal on its own.
Stage Conformance Gap (Expected)
Expected
Avg fitness: 0.05
This is an expected result, not an anomaly. The 8-stage New Business pipeline (Prospecting → Qualification → Needs Analysis → ... → Negotiation/Review → Closed Won) is an aspirational reference model, not an operational requirement. Real-world CRM data typically records only 2-4 stages per deal. The 53,862 "missing_activity" deviations across 8,300 opportunities (closed + open) quantify the gap between prescribed process and actual practice — useful for process improvement, but not indicative of control failures. Average fitness score: 0.05.
Agent Performance Spread
Low
15-pt spread
Win rates among active agents range from approximately 55% to 70.4%, a 15-point spread. Top performers (Hayden Neloms, Maureen Marcano) sustain 70%+ win rates across 150–200+ closed deals, suggesting reproducible behavioral patterns worth codifying as playbooks.
Account Diversification
Low
12.1% top-5
The top 5 accounts (Kan-code, Konex, Condax, Cheers, Hottechi) represent only 12.1% of total closed-won revenue across 85 accounts. Revenue is broadly distributed, reducing customer concentration risk. No single account exceeds 3.4% of total revenue.
CSV Ingest
SFDC Normalize
Event Log Build
Conformance Check
Pattern Detection
Report
Stage 1: CSV Ingest
Source: Kaggle CRM Sales Opportunities (innocentmfa)
Files: sales_pipeline.csv (8,800 rows), accounts.csv (85), products.csv (7), sales_teams.csv (35)
Converter: convert_kaggle_crm.py maps CSV fields to SFDC JSON schema
Stage 2: SFDC Normalization
Adapter: SFDCSyntheticAdapter implements IDataAdapter (8 methods)
Field mapper: Opportunity.Id → VBELN, Account.Id → KUNNR, Amount → NETWR
Pipeline models: Prospecting → Qualification → Needs Analysis → ... → Closed Won
Stage mapping: "Engaging" → "Qualification", "Won" → "Closed Won", "Lost" → "Closed Lost"
Stage 3: Event Log Construction
Records: 49,408 events from 8,800 opportunities
Stage transitions: 31,787 entries (with synthetic intermediate stages for Won deals)
Activities: 17,621 task records (product-related subjects)
Format: case_id, activity, timestamp, resource, attributes
Stage 4: Conformance Checking
Algorithm: Token-based replay (van der Aalst, 2016)
Model: sfdc_new_business — 8-stage pipeline with mandatory transitions
Cases: 8,300 analyzed | Fitness range: 0.00 – 0.10 (real data lacks full stage coverage)
Deviations: 53,862 total (45,386 missing_activity, 8,476 skipped_activity)
Stage 5: Pattern Detection
Quarter-end compression: 38.4% of closes in QE months (1.15x the 33.3% baseline)
Deal velocity bimodal: 42.9% close within 30 days, 22.8% take 91-180 days
Agent spread: 15-point win rate range (55%–70%) across 30 agents
Account concentration: Top 5 = 12.1% of revenue (healthy diversification)
Technology Stack
MCP Server: TypeScript (ESM, strict mode), 7 data adapters
Pattern Engine: Python 3.11, scikit-learn, scipy
Conformance: Token-based replay, ProcessModelBuilder, van der Aalst algorithms
Cross-System: Entity resolver (Levenshtein + proximity), unified event log
Tests: 834 passing (602 TypeScript + 232 Python)
Data: Kaggle CRM Sales Opportunities (Apache 2.0 license)
Frontend: Vanilla HTML/CSS/JS (zero dependencies)
Deployment: Vercel (static)
Source: github.com/chrbailey/SAP-Transaction-Forensics
Cluster Quality Note: Global silhouette score is 0.028 (KMeans) / 0.09 (BERTopic), indicating weak cluster separation. Findings should be treated as exploratory signals, not confirmed patterns. See Pipeline Transparency for methodology details. HERB dataset timestamps extend to 2027 (synthetic); temporal patterns reflect relative timing.

Loading forensic analysis...

SF
View Analysis Code on GitHub
This is an analysis tool, not a hosted service. Clone the repo to run the pattern engine on your own Slack exports, case comments, or CRM data with text fields and timestamps.
View on GitHub →
Data Source: BPI Challenge 2019 — 4TU.ResearchData (CC BY 4.0). Real purchase-to-pay event log from a multinational coatings and paints company. 251,734 purchase order items, 1,595,923 events, 628 unique resources. Period: Jan 2018 — Jan 2019.
Process Overview
64.3
Median Throughput (days)
Average 72.3 days; max 25,670 days (stale POs)
32%
Top 2 Variants Coverage
Only 2 of hundreds of paths cover a third of cases
57,136
Payment Blocks
22.7% of all POs required manual block removal
2,835
Incomplete Cases
POs that never progressed past creation
Most Common Process Paths
# Process Path Cases %
1 Create PO → Vendor Invoice → Goods Receipt → Invoice Receipt → Clear Invoice 50,286 20.0%
2 Create PO → Goods Receipt → Vendor Invoice → Invoice Receipt → Clear Invoice 30,798 12.2%
3 Create PO → Goods Receipt (incomplete) 9,443 3.8%
4 Create PO → Vendor Invoice → GR → IR → Remove Payment Block → Clear 6,931 2.8%
5 Create PO (abandoned — never progressed) 2,835 1.1%
Key finding: Variants 1 and 2 differ only in whether the vendor invoice arrives before or after goods receipt — a classic "3-way match" ordering question in P2P. Variant 4 shows 2.8% of cases require payment block removal, indicating invoice discrepancies.
Anomalies Detected
Process Issues
Payment blocks requiring intervention57,136
Quantity changes after PO creation21,449
Price changes after PO creation12,423
Deleted / cancelled orders5,298
Abandoned POs (single event)2,835
Resource Concentration
Top 10 resources handle 63% of all 1.6M events, creating operational risk if key personnel are unavailable.
System
25.0%
user_002
10.4%
user_001
6.0%
batch_001
4.6%
user_002 alone handles 166,353 events (10.4%). Single-person bottleneck risk.
Document & Matching Types
Purchase Order Types
Standard PO152,562 (60.6%)
Framework Order62,543 (24.9%)
Consignment36,629 (14.6%)
Invoice Matching Strategy
3-way match, invoice after GR164,874 (65.5%)
3-way match, invoice before GR67,583 (26.9%)
2-way match19,277 (7.7%)
3-way matching compares PO, goods receipt, and invoice. The 26.9% "invoice before GR" cases are where vendors bill before physical delivery — common but higher fraud risk.
Forensic Insight

This dataset comes from a real multinational company (anonymized for BPI Challenge 2019). The process mining community uses it as a benchmark for purchase-to-pay analysis. Our forensic engine processed the full 1.6M event log and surfaced several structural concerns:

Payment blocks (22.7%) — Nearly 1 in 4 purchase orders hit a payment block requiring human intervention. This signals systematic issues in invoice matching or vendor master data quality. In a healthy P2P process, payment block rates should be under 5%.

Process variability — With the top 2 variants covering only 32% of cases, the remaining 68% follow hundreds of different paths. This "spaghetti process" pattern makes it difficult to automate, audit, or optimize. Industry benchmarks target 80%+ coverage in the top 5 variants.

Resource concentration — user_002 processes 10.4% of all events. If this person is unavailable (vacation, resignation), the bottleneck could cascade across the entire P2P process. This is a classic single-point-of-failure that process mining can identify but traditional audits miss.

Compliance Alert: 7 process violations detected in SAP's own IDES demo system. IDES (Internet Demonstration and Evaluation System) is SAP's official reference environment used for training and certification. These violations exist in SAP's reference data — our engine found them through automated conformance checking.
Violations Detected
7
Total Violations
In SAP's own reference data
6
Missing Purchase Requisition
PO created without prior approval workflow
1
Retroactive Documentation
PO created before PR — approval was backdated
1
Segregation of Duties Risk
Same resource created both PR and PO
Violation Breakdown
HIGH MISSING_PR — Purchase Order Without Purchase Requisition 6 cases
Six purchase orders were created directly without an associated Purchase Requisition. In a compliant Procure-to-Pay process, every PO must originate from an approved PR to ensure proper authorization and budget control.
Root Cause
Maverick buying — users bypassing requisition approval workflow to create POs directly
Business Risk
Unauthorized spending, budget overruns, audit findings. Bypasses approval controls designed to prevent fraud.
Remediation
Enforce system control: block PO creation without linked PR. Add validation rule in SAP transaction ME21N.
CRITICAL PO_BEFORE_PR — Retroactive Approval Documentation 1 case
One Purchase Order was created before its associated Purchase Requisition. This means the order was placed first and the approval was documented after the fact — a clear compliance violation where the authorization workflow was bypassed and retroactively papered over.
Root Cause
Approval workflow bypassed — order placed first, requisition created afterward to satisfy documentation requirements
Business Risk
Fraudulent documentation, Sarbanes-Oxley violations, audit failure. Represents intentional control circumvention.
Remediation
Implement sequential control: system must enforce PR approval timestamp < PO creation timestamp. Add automated alert.
MEDIUM SOD_VIOLATION — Segregation of Duties Risk 1 case
One case where the same resource created both the Purchase Requisition (request) and the Purchase Order (fulfillment). Proper segregation of duties requires different individuals to request and approve purchases to prevent self-dealing.
Root Cause
Missing role separation — user has authorization for both ME51N (create PR) and ME21N (create PO)
Business Risk
Self-dealing, fictitious vendor schemes. One person can request, approve, and fulfill without oversight.
Remediation
Review SAP role assignments (PFCG). Ensure PR creator and PO creator roles are mutually exclusive.
IDES Process Comparison — O2C vs P2P
Order-to-Cash (O2C)
Sales side — from customer order to payment
Cases analyzed646
Total events5,708
Unique activities8
Process variants158
Median duration2.7 days
Max duration6,578 days
Normal completion rate83.6%
Invoice cancellation delay17.7 days avg
Key finding: 158 unique process variants for just 8 activities and 646 cases — extreme variability. The max duration of 6,578 days (18 years) indicates stale/orphaned orders in the demo system that were never closed.
Procure-to-Pay (P2P)
Purchasing side — from requisition to vendor payment
Cases analyzed2,486
Total events7,420
Unique activities20
Process variants142
Average duration45.2 days
Max duration1,027 days
Compliance violations7
Batch processing outlier2,181 events
Key finding: 7 compliance violations in SAP's own demo data. The "PO before PR" case is particularly notable — it represents retroactive documentation, a pattern that in production systems is a red flag for fraud investigators.
Why This Matters

SAP IDES is not production data — it's SAP's official demo and training environment. Thousands of consultants learn SAP using this system. Yet our automated conformance checker found 7 compliance violations that exist in the reference data itself.

This demonstrates two things: (1) Automated process mining catches what manual review misses, even in well-known systems. (2) If reference data contains these patterns, production systems — with real users under real deadline pressure — almost certainly contain more.

The conformance checking engine uses token-based replay (van der Aalst algorithm) to compare actual event sequences against expected process models. For P2P, the expected model requires: PR → PO → Goods Receipt → Invoice → Payment. Any deviation is flagged, measured, and classified by severity.

The O2C analysis reveals a different problem: 158 process variants from just 8 activities. This is a "spaghetti process" — technically functional but impossible to audit or optimize at scale. Combined with the 6,578-day max duration (stale orders from the 1990s still open), it paints a picture of a system that works but accumulates technical debt in its process layer.

Analysis Methodology
Conformance Engine: Token-based replay, ProcessModelBuilder, van der Aalst algorithms
Data Adapters: BPI (XES/OCEL), SAP IDES (sap-extractor, MIT), Synthetic (seed=42)
Pattern Engine: Python 3.11, scikit-learn, scipy (TF-IDF + K-Means + effect sizing)
Temporal Analysis: Throughput time, bottleneck detection, delay probability
Tests: 834 passing (602 TypeScript + 232 Python)
Source: github.com/chrbailey/SAP-Transaction-Forensics
Client Data Notice: All data on this page comes from real consulting engagements. Company names, individual names, and email addresses have been anonymized. Financial figures, ticket counts, and category distributions are actual. Used with permission for educational purposes.
Case 1 — Healthcare Company: NetSuite License Optimization
Engagement: ERP User License Audit
289-user NetSuite environment — automated classification found $103,896 in annual savings
14.4x
ROI
289
Total Users
69
Eliminable
$103,896
Annual Savings
0.8 mo
Payback Period
Savings by Category
Dormant full-access (8 users, no login 90+ days)$46,464
Departed employee center (est. 53)$31,800
Approval-only users (4, replace w/ SuiteFlow)$23,232
Deprecated integrations (est. 4 of 8)$2,400
What Structured Data Shows vs. What We Found

Structured: NetSuite user list shows 289 active users with assigned roles. Looks clean.

Unstructured signals: Login timestamps reveal 8 full-access users ($5,808/yr each) haven't logged in for 90+ days. Cross-referencing with HR termination dates shows ~53 Employee Center users are departed employees still consuming licenses. 4 users' entire activity consists of clicking "Approve" on purchase orders — replaceable by an email-based workflow that costs nothing.

The gap: $103,896/year in waste invisible to anyone looking at the user list alone.

Case 2 — MedTech Manufacturer: Help Desk Ticket Forensics During Acquisition
Engagement: NetSuite Implementation + Post-Acquisition Support
2,525 help desk tickets reveal organizational stress invisible in ERP transaction data
Diagnostics manufacturer acquired by Fortune 500. Structured data showed normal operations. Tickets told a different story.
2,525
Help Tickets
11
Categories
38%
Uncategorized
3,992
ERP Users
1,423
Inventory Items
Ticket Category Distribution
Uncategorized
956
Finance
469
Access
257
Procurement
215
Inventory
119
Manufacturing
107
Warehouse
103
Cost Accounting
84
Quality
77
Order Mgmt
66
What Ticket Text Reveals — Unstructured Signals from Real Tickets
DATA INTEGRITY
"How did 20413 turn into 20433?"
Inventory team can't explain item number mutation. Structured data shows both items exist. The ticket reveals someone doesn't trust the data — and they're right to question it.
SYSTEM WORKAROUNDS
"Explore creating dummy transactions for MRP"
Manufacturing is building fake transactions to work around MRP limitations. Structured data will show these as real — auditors would never know.
ESCALATION CULTURE
"Bill Payment Email Notification for Vendors — URGENT"
Multiple "URGENT" tickets for routine vendor payments. Finance team is under pressure. Transaction data shows payments made on time — the stress is invisible.
ACQUISITION CHAOS
257 "Request for NetSuite Access" tickets
10% of all tickets are access requests — many from acquiring company email domains. IT is drowning in onboarding during the acquisition. ERP data shows users; tickets show the churn.
ERP Transaction Data
• 3,992 employees in system
• 1,044 active customers
• 1,423 inventory items tracked
• 307 bills of materials
• 465 GL accounts
• 5,035 warehouse bin locations
Status: Operational
vs
Ticket Text Analysis
• 38% of tickets uncategorized (overwhelmed)
• Dummy transactions created as workarounds
• Item numbers mutating unexplainably
• "URGENT" escalation culture in Finance
• Acquiring company flooding access requests
• Lot traceability questions (FDA compliance)
Status: Organization under stress
Available data for forensic analysis: 2,525 help desk tickets (with summary, assignee, priority, category, timestamps, response/close times) + complete NetSuite master data (employees, customers, vendors, items, BOM, chart of accounts, inventory, financial statements). The combination of structured ERP data with unstructured ticket text is exactly the dual-layer forensic approach this tool is designed for.
Case 3 — Connected Hardware Manufacturer: High-Growth ERP Forensics
Engagement: ERP Migration Assessment + ITGC Audit + International Expansion (multi-year engagement)
3M+ ERP records forensically analyzed — credit hold overrides, 28.6% return rate, SOD violations, approval chain complexity
High-growth hardware manufacturer during rapid scaling. Legacy ERP → enterprise ERP migration. Structured transaction data + ITGC audit findings + process documentation.
3M+
CSV Rows
102K
Sales Orders
97K
RMA Returns
43K
Vendors
10K
Customers
28.6%
RMA Rate
Data Sources Analyzed
Master Data
10K customers, 43K vendors
8.7K fixed assets, 5K contacts
Transaction Data
102K sales orders, 1M+ EDI lines
97K RMAs, 164K credit memos
Governance / Text
ITGC audit, SOD analysis
7.6K deductions, PES call notes
Forensic Findings — What the Data Revealed
ITGC VIOLATIONS (Deloitte Audit)
7 users with Administrator role. Terminated employee still active.
153 active users, 40 unique roles. SOD violations at both role and user level. 4 generic shared accounts. No formalized change management policy. Admin access to both dev and prod environments. No post-implementation review process. Critical gaps for a publicly traded company.
Source: Deloitte SOD Role Definition Analysis
CREDIT HOLD OVERRIDES
Sales orders shipped despite "Customer On Credit Hold" flag
Sales order headers contain both "Customer On Credit Hold" and "Shipment Hold Released by Finance" fields. Cross-referencing reveals orders where credit holds were manually overridden — Finance releasing shipments to customers already flagged for credit risk. The structured status says "shipped." The override field tells you it shouldn't have been.
Source: Sales Order Header — 102K records
RETURN RATE ANOMALY
28.62% of customer accounts had return events — only 67.5% on-time delivery
Forensic case analysis of 1,090 customer accounts: 312 had at least one RMA event (28.62%). 97K total RMA line items in the extract across 6 types: Open Box, Closed Box, Destroyed in Field, Stock Rotation, Warranty, Error Shipment. Transaction data shows the returns; memo fields and reason codes hint at systemic quality or logistics failures the structured data can't explain.
Source: Case Outcome Analysis (1,090 accounts, 21,099 events) + RMA Extract (97K line items)
APPROVAL CHAIN COMPLEXITY
7,610 customer deductions with multi-approver routing and rerouting
Marketing deductions (MDF) routed through "Next Approver" and "Set Rerouted Next Approver" chains. Multiple email notification flags. Deductions linked to DFI invoices, credit memos, and proof-of-performance documents. The approval chain is so complex that the rerouting field exists specifically because the normal chain fails regularly.
Source: Customer Deductions — 7,610 records, 50+ columns
Structured ERP Data Says
• 102K sales orders processed
• 97K returns authorized
• 43K vendors in master data
• 153 active users, roles assigned
• Orders shipped, invoiced, cleared
• International entities operational
Status: ERP Functioning
vs
Governance + Text Layer Reveals
• Credit holds overridden to ship anyway
• 28.6% returns — systemic product/logistics issue
• 7 users with admin (SOX risk, public co.)
• Terminated employee still accessing system
• Approval chains so broken a "reroute" field exists
• 144K PO changes — constant purchasing churn
Status: Controls Gap, SOX Exposure
Engagement scope: Multi-year consulting engagement spanning ERP vendor selection, enterprise ERP migration assessment, ITGC audit (Big Four SOD analysis), international tax restructuring (European entity), regional expansion (APAC), and PCI/SOX compliance programs. Data sources: legacy ERP production extracts, audit firm findings, change management logs, 24+ project status call notes, SOW/FRD documentation. All company names, customer IDs, employee names, and identifying details anonymized.
The Pattern Across All Three Cases
Case 1 (license audit): Structured user data hides $103K in waste — login timestamps and role assignments alone told the story. Case 2 (acquisition tickets): When structured data looks normal, 2,525 help desk tickets reveal an organization under stress — workarounds, data trust issues, and IT drowning in access requests. Case 3 (high-growth ERP): 3 million rows of clean-looking transaction data mask credit hold overrides, a 28.6% return rate, SOD violations in a rapidly scaling company, and approval chains so dysfunctional that a "reroute approver" field was built into the system. In every case, the structured data said "operational." The unstructured layer — tickets, audit findings, override fields, memo text — told the real story.