EU AI ACT · META · AUTOMATED VS HUMAN AUDIT · DOG-FOODING

I scanned my own audit service with an automated tool — 6/57 passed

Piotr Reder · aiactaudit.pl 05 May 2026 · ~10 min read

I sell EU AI Act audits. The logical question: would my own service pass an audit?

This morning (05.05.2026) I installed AIR Blackbox — open-source EU AI Act compliance scanner with 51+ automated checks across Articles 9-15, post-quantum signed evidence. Apache 2.0. Tech-deep, used by several EU SaaS in production. The perfect tool for this experiment.

I ran it on aiactaudit.pl source. Result:

Passing

Warnings

Failing

Out of 57 checks total · Static: 6/49 · Runtime: 0/8

Embarrassing? Not necessarily. This result says something important about automated compliance tools — and WHY every audit requires human interpretation.

TL;DR

Automated EU AI Act tools (like AIR Blackbox) scan AI applications — projects with LLM API integration, ML pipelines, agent orchestration. My audit service is a landing page + service workflow, NOT an AI app. 80% of checks address runtime AI behavior (prompt injection, automation bias, model drift) — irrelevant to my use case. This shows: scope-aware audit (what we deliver) beats one-size-fits-all checklist. Automated tools are valuable BUT require human interpretation to understand "what applies to YOUR use case".

Experiment setup

Step by step, hiding nothing:

# Install (requires Python 3.10+)
brew install python@3.11
python3.11 -m pip install --user air-blackbox

# Run static code scan
air-blackbox comply --scan /path/to/aiactaudit.pl --no-llm

Project scanned: directory _AIAct/ containing:

14 HTML files (landing, articles, sample audit, calculator, intake form)
1 Python file (render_audit.py — template renderer for audit deliverables)
JSON + CSV (lead lists, configs)
Markdown docs (research, internal notes)

Note: aiactaudit.pl does NOT use LLM API in production. It's a service-business landing, NOT an AI application. This is a key detail for interpreting the results.

What AIR Blackbox checked — 7 categories

Category	Checks	My result
Article 9 — Risk Management	~7	1 pass / 4 warn / 2 fail
Article 10 — Data Governance	~5	0 pass / 3 warn / 2 fail
Article 11 — Technical Documentation	~5	1 pass / 2 warn / 2 fail
Article 12 — Logging & Audit Trail	~5	0 pass / 1 warn / 4 fail
Article 13 — Transparency	~4	0 pass / 4 warn / 0 fail
Article 14 — Human Oversight + Agent Boundaries	~10	0 pass / 9 warn / 1 fail
Article 15 — Accuracy / Cybersecurity	~10	3 pass / 5 warn / 2 fail
GDPR cross-checks	~8	0 pass / 8 warn / 0 fail

Top 5 fails — what they REALLY mean

1. ❌ "No risk classification (Article 6)"

What AIR Blackbox says: "Article 6 requires system risk classification. No documentation found."

Reality for me: Article 6 applies to AI systems. My system is a service business with a landing page. It's like scanning a coffee shop for "missing aircraft autopilot certification". The check is technically correct BUT not applicable.

What the fix needs: add a RISK_CLASSIFICATION.md at project root explicitly stating "this is not an AI system per Article 6 definition".

2. ❌ "No logging infrastructure detected"

What AIR Blackbox says: "No Python logging framework, no tamper-evident audit chain (Article 12 requires)."

Reality: aiactaudit.pl is static HTML deployed on Vercel. Vercel has its own logging. Plus I'm a service provider — my "audit trail" is emails + intake form submissions + GA4 events. Article 12 requires logging for high-risk AI systems, NOT for landing pages.

Real gap: I should have formal Article 12 logging IF my service uses AI internally for audit work (e.g. LLM helping classify Annex III). Currently I use Claude Code for content prep (NOT in runtime audit decisions). That's limited risk at worst.

3. ❌ "Token expiry / execution bounding" (Article 14)

What AIR Blackbox says: "Agent may run indefinitely without bounds."

Reality: AIR Blackbox checks autonomous AI agents in production. My product is a landing page + email service. I have no autonomous agents in runtime. Check N/A.

4. ❌ "Data governance documentation"

What AIR Blackbox says: "Article 10 requires data governance docs. None found."

Reality: Article 10 applies to training data for AI systems. I don't train models. My "data" is lead list CSV + intake form submissions. GDPR data protection applies, NOT Article 10. It's about privacy.html (have it) and RoPA (have it), not Article 10 governance.

5. ❌ "GDPR consent management patterns"

What AIR Blackbox says: "No consent patterns in code."

Reality: I have cookie consent (statically in HTML), privacy policy, intake form with explicit consent text. AIR Blackbox scans code patterns (regex match for consent_ in Python), not HTML compliance text. Static analysis has blind spots.

What this all says about automated tools

AIR Blackbox is an excellent tool for what it's designed to do — scanning AI applications with LLM API, agent frameworks, ML pipelines. Full feature set:

51+ static checks Articles 9-15 (code patterns, config, docs)
Runtime monitoring via gateway proxy (intercepting LLM calls)
Post-quantum signed evidence ML-DSA-65 (FIPS 204)
Multi-framework mapping (ISO 42001, NIST AI RMF, Colorado SB 24-205)

For LangChain agent farms, OpenAI Assistants production deployment, custom LLM apps — it's a game-changer. Continuous monitoring, audit-ready evidence, integration with 7+ frameworks.

But — and here's the nuance — automated scanner doesn't know whether your project is:

AI application (where all 57 checks apply)
Service business with AI in internal tooling (where ~30% applies, plus extra checks for delivery)
Static landing page (where ~10% applies, most N/A)
SaaS using LLM API (where applicability depends on whether use case falls into Annex III)

These categories require human classification first. Otherwise you get a report with 80% fails that are irrelevant — and 20% real gaps get lost in the noise.

Insight: automated tool that "scans everything" generates false positives + false negatives. Real audit starts with scope definition: "what specifically are we checking, for what risk, in what use case context". Without scope, the report is noise.

What AIR Blackbox caught as real (Pricora comparison)

I also scanned pricora_platform/ (my second project — Next.js SaaS for accounting offices in PL). Result:

Passing

Warnings

Failing

Pricora: 10/57 passing — 4 points more than aiactaudit.pl

Pricora has more passing checks because:

Larger codebase (Next.js + TypeScript) → more patterns detected
Structured logging hints in code (NOT ML logging, but Sentry-style)
Authentication patterns (Supabase Auth) — partial matches for agent identity binding
API security headers + rate limiting in Next.js middleware

Pricora also isn't an AI app — it's a pricing calculator. Same scope mismatch as aiactaudit. But more code = more incidental matches.

Conclusion: both scores (6/57 and 10/57) are scope mismatched. Both projects are actually limited risk per Annex III definition (Pricora could be argued that transparency Art. 50 applies because it has a calculator). Neither is high-risk.

Why scope-aware audit (our €799) beats automated checklist

Step by step, what audit does on day one:

Annex III risk classification — is the project high-risk or not? Decision tree here. Without this classification, 80% of remaining checks are N/A.
Provider vs deployer scope — who's responsible for what? GPAI obligations details.
System inventory — what exactly is an "AI system" in the project. Often NOT everything (e.g. sentiment analysis for customer support ≠ AI system per AI Act).
Articles 9-15 applicability matrix — which article applies to identified AI systems. Art. 10 detail, Art. 14 detail.
Gap analysis — REAL gaps relative to actual applicable checks, NOT static checklist.
Roadmap — prioritized action plan with effort estimates.

This requires human judgment. Automated tools can support each step (AIR Blackbox excellent for #6 once #1-3 done), but doesn't replace them.

Real use case for AIR Blackbox

If someone ordered an audit from me and showed a project like e.g. HR-Tech SaaS (Annex III #4 employment), I would:

Classify as high-risk Annex III #4 (manual, 2h work)
Run AIR Blackbox on their code (15 min)
Filter results — ~40/57 checks apply for HR-Tech
Human interpret findings — which gaps are real, which are false positive
Map each real gap to Article + remediation effort
Build roadmap PDF + Loom walkthrough

This is a hybrid approach: tool for automation, human for scope + judgment + delivery. €799 fee covers human work; tool is free Apache 2.0 OSS.

Practical takeaways for EU SMB SaaS

Don't fear automated tools — AIR Blackbox/VerifyWise/etc. are free and valuable. Run them on your own code.
Don't trust raw scores — 6/57 for service business is fine. 6/57 for high-risk AI app is red alert. Interpretation matters.
Map scope first — Annex III classification + system inventory MUST be done before scoring. Otherwise the report is noise.
Use AIR Blackbox for actual AI deployments — if you have a LangChain agent farm in production, this tool is worth €100k saved consulting. Run it.
Don't pay for "automated audit" services that simply run such tools and print a report — value is in interpretation, NOT the scan.

Honesty disclaimer (eat your own dog food)

You might think: "the guy sells audits and has 6/57 passing, joke". Fair point. But notice:

My product is NOT an AI app — it's a service. Audit Article applicability is different.
Still let's DO the real fixes that apply (there are several, identifiable from the 31 warnings):
- Add explicit RISK_CLASSIFICATION.md stating "service business, not AI system per Annex III"
- Document RoPA.md (Records of Processing Activities) for GDPR — even with privacy policy, formal RoPA won't hurt
- Add SECURITY.md describing data flow (lead CSV, intake form, audit deliverables)
Run scan again post-fix → score should improve to ~15/57. Still 70% N/A due to scope mismatch.

This honesty is my value proposition. EU AI Act consultants often sell "100% compliant" claims that are FTC-style misleading. I sell "clarity" — exactly what applies, what doesn't, what to fix, in what priority.

Check your real EU AI Act scope

Order an audit for €799 (founding tier, limited 10 spots) — in 5 days you get: Annex III classification, system inventory, gap analysis Articles 9-15, prioritized roadmap. PDF + Loom walkthrough. 30-day money-back guarantee.

Order audit →

Tools mentioned

AIR Blackbox (Apache 2.0) — pip install air-blackbox
VerifyWise (BSL 1.1) — self-hosted compliance platform
EU Compliance Bridge (EUPL-1.2) — AI Act + EAA mapping

All open source. All valuable for the appropriate use case. None replaces human-led audit.

Disclaimer: this article is informational, NOT legal advice. Specific implications for your system require legal opinion from EU AI Act-specialized lawyer combined with technical audit.