Why our AI code reviewer asks before it posts

Every AI code review tool today auto-posts every comment directly to GitHub. Engineers turn them off in week two because the noise drowns out real bugs. Here's what we did differently — and why the human gate is the whole point.

The Problem — Every AI Review Tool Auto-Posts

You have seen it happen. A team installs an AI review tool. At first the engineers are curious. A few weeks later, the tool is posting 40 comments per PR, three-quarters of them "consider using a const here" and "this function could be simplified." The real bugs are somewhere in there, buried. Nobody can find them.

CodeRabbit's own research (Business Wire, Dec 2025) found that AI-generated code produces 1.7x more issues than human-written code. That's not a bug in the AI — it's a feature of how it works. It finds possible issues, not necessarily real ones.

An audit of 28 PRs using CodeRabbit found 15% of comments were "Useless/Noise" and 21% were "Nitpicking" (buildmvpfast.com). On r/programming: "Are you drowning in AI code review noise? 70% of AI PR comments are useless." On r/webdev: "We tried CodeRabbit and Qodo — at best added noise, at worst they flagged false positive issues."

There is also the security angle. r/netsec discovered that CodeRabbit required write access to repos — a prerequisite for posting comments — and that misconfigured permissions led to RCE on over 1 million repositories.

The result is the same every time: engineers stop trusting the tool. It becomes a boy-who-cried-wolf situation. The real bugs surface, but nobody is watching anymore.

Our Bet — The AI Proposes, The Human Approves

PullLight works differently. The AI analyzes the diff and produces findings — but those findings go to a review queue first. They do not post to GitHub automatically.

The flow:

A PR is opened or updated. GitHub fires a webhook.
PullLight fetches the diff and runs it through Claude Sonnet 4.5.
The AI returns a structured list of findings — or an empty array if it finds nothing worth flagging.
Findings land in the /reviews queue. No GitHub activity yet.
An engineer reviews the findings, approves each comment they want to keep, and hits submit.
PullLight posts the approved comments to GitHub via the API.

The queue is at /reviews — it shows every PR in the system, which ones have pending findings, and lets you skim the diff alongside each finding. You can approve all, approve some, or dismiss everything. Nothing posts without your say.

That human gate is what makes this work. Signal, not noise.

Architecture

The pipeline in plain terms:

GitHub Webhook → Fetch Diff → Claude Sonnet 4.5 (structured output)
                                      ↓
                              pending_reviews table
                                      ↓
                              /reviews approval queue
                                      ↓
                              GitHub API (post on approve)

Webhook signature verification happens at the GitHub App level before any processing. The diff is fetched via the GitHub API using the installation token. Claude returns a JSON array of findings, each with file, line, severity, category, and the comment body. Those get written to the pending_reviews table. The queue serves as the human-gate UI. On approval, each comment is posted to GitHub via the review comments API.

No ambient writes. No background processes. Each step is explicit and auditable.

The `return []` Decision

This is the most important thing we do — and the hardest to get right.

When the AI finds nothing wrong, we tell it to return an empty array. Not "LGTM." Not "No issues found." Not a reassuring comment to make the engineer feel like the tool is working. An empty array. Silence.

Here is the actual prompt logic (simplified from services/ai-review.js):

Return ONLY a raw JSON array. No markdown fences, no prose, no explanation.
If you find nothing worth flagging, return an empty array: []

// If the code is clean, return an empty array.
// Do NOT write "LGTM" comments. Do NOT post "No issues found."
// Silence is the correct output when nothing is wrong.

const findings = analysisResult.length === 0
  ? []
  : analysisResult.map(r => ({ type: r.severity, path: r.file, line: r.line, message: r.message }));

Why does this matter more than detection quality? Because false positives are the primary reason engineers disable AI review tools. Every noise comment erodes trust a little more. Returning [] when code is clean is harder than it sounds — it requires the AI to resist the temptation to say something, and it forces quality over quantity.

Our cap is 5 findings per PR. One accurate high-severity finding is worth ten speculative lows. We would rather miss something than flood an engineer with noise they have to wade through.

The Viral Loop

Every comment posted by PullLight includes a small footer linking back to /repos/{owner}/{repo} — a public page showing the repo's PullLight stats: total PRs reviewed, total findings, severity breakdown, top bug categories.

Each approved comment is a compounding signal. It is a micro case study: here is a real bug found in a real PR, on a real repo, posted by a real engineer who chose to approve it. The /repos/{owner}/{repo} page turns that into a permanent, shareable landing page.

Every catch becomes a marketing asset. We did not design this as a growth loop — it emerged from the design. When you gate everything behind human approval and surface only signal, the output is inherently credible.

What's Next

We are working on a few things:

More CVE case studies. PullLight has already caught real CVEs in Next.js, waitress, and jsonpath-plus. Deep dives on how those catches worked — what the diff looked like, what the finding said — make the case better than any marketing copy.
Deeper repo-level context. Learning from the codebase's patterns over time. If a repo has a history of SQL injection attempts in certain patterns, the model can weight that higher on future reviews.
Learning from approve/reject signals. Which comments get accepted and which get dismissed tells us what engineers actually find valuable. That signal is already flowing back into the prompt design.

Try It

The GitHub App is free for open-source repos. Install it on one project and see what comes back.

Install PullLight →

Or run the demo analyzer on any public PR. No install required — paste a PR URL and watch what comes back.

Follow @pulllight_ai for case studies as they land.