PullLight Security & Data Handling

01 / What we see

What PullLight reads.

PullLight only sees what's in the PR — and only when you open one. It does not clone repos, scan branches, or access any code outside the diff being reviewed.

Full list of data PullLight accesses +

PR diff — the actual lines added and changed in this PR. Your code, but only the part you're submitting.
PR title and description — the context you wrote when you opened the PR.
Repo metadata — repo name, owner, language, file paths touched in the diff.
File contents (read on-demand) — when the diff references a function or file outside the diff, PullLight reads that specific file to provide accurate context. It does not read arbitrary files.

All access is driven by GitHub webhook events and GitHub API calls scoped to the repos you explicitly install the App on.

What PullLight does NOT see +

Full repository clones — PullLight never clones the full repo.
Unchanged branches — it only sees the diff of the PR being opened.
Secrets, environment variables, .env* files, or configuration containing credentials.
Git history or commit history beyond what GitHub's PR diff API surfaces.
Issues, wiki pages, pull request comments, or any other repository content.

02 / Model input

What gets sent to Claude.

A focused, minimal payload — your diff plus just enough context to give useful feedback. Nothing more.

See how the pipeline works →

Exact payload sent to Claude +

What goes in:

The PR diff (lines added, lines removed, file paths, hunk context)
PR title and description (if provided)
On-demand file reads for functions/classes referenced in the diff (only what's needed)
Repo language and structure hints from GitHub API

How it's sent:

The payload is sent to claude-sonnet-4-5 via the Polsia AI proxy — a proxy layer, not direct API access. Polsia routes to Anthropic. The payload is a structured prompt asking Claude to identify bugs, security issues, logic errors, and style problems in the diff.

What's filtered before the model call:

Webhook delivery headers and tokens — stripped before prompt assembly.
.env, .env.*, .env.local, .env.production files — detected by path and excluded from diff reading.
GitHub App credentials and tokens — never included in model prompts.

No training on customer code +

PullLight uses the Anthropic API — not consumer-facing Claude products. API access means your code is not used for model training, by Polsia or by Anthropic.

Anthropic's API terms explicitly exclude customer inputs from training. We rely on this, not on a promise — it's a contractual and legal guarantee.

03 / Retention

How long we keep data.

PullLight doesn't hold onto your code. Pending reviews are temporary; catches in the public feed are sanitized hourly.

Pending reviews (active AI sessions) +

AI analysis sessions are held in the pending_reviews table while they await human approval at /reviews.

Lifetime: until the review is approved, rejected, or expires (7 days without action).

After approval/rejection: the AI findings and review data are retained for 30 days for audit purposes, then purged. The actual diff and file contents are not retained beyond the session.

Public /catches feed (sanitization) +

PullLight publishes anonymized bug findings to /catches — but only after sanitization. A cron job runs every hour and strips anything that could identify a customer or expose secrets:

API keys, tokens, secrets — regex patterns matching known secret formats
Email addresses and phone numbers
UUIDs, GUIDs, and internal IDs
Variable and function names that include company-specific identifiers
Real names, usernames, or organization names

The sanitized snippets in /catches are the only bug findings that leave the system. Nothing customer-identifying is published.

04 / Human approval

No comment posts without your approval.

This is the core differentiator vs CodeRabbit, Greptile, and other auto-publishing bots. PullLight queues every AI finding in /reviews — you decide what gets posted.

The approval flow, step by step +

Step 1: A PR is opened. GitHub webhook fires → PullLight analyzes the diff with Claude.

Step 2: AI findings land in your /reviews queue with file path, line number, severity, and a plain-language explanation.

Step 3: You review the findings. Approve individual comments, dismiss ones that don't apply, or skip the whole review.

Step 4: Only approved comments are posted to the PR as a GitHub review comment — with a "Reviewed by PullLight" footer on each comment.

PullLight will never post a comment on a PR without explicit human action in the /reviews queue. The queue is gated — there's no auto-publish bypass.

Why this matters +

AI models hallucinate. When CodeRabbit, Greptile, or Copilot PR Review auto-publish comments, they add noise — wrong code, unhelpful suggestions, confident-but-wrong assertions that erode trust in code review.

PullLight's human-in-the-loop model means only signal-rich, human-approved findings ever reach your PR. Your team sees fewer false positives. Reviewers don't learn to ignore the bot.

05 / Auth

How PullLight connects to GitHub.

No personal access tokens. No SSH keys. A GitHub App with scoped permissions — and credentials encrypted at rest.

GitHub App OAuth (no PATs) +

PullLight uses a GitHub App for authentication — the recommended approach, not a workaround. GitHub Apps are scoped to specific repos, can be installed by org admins, and can be revoked instantly from GitHub's settings without touching any external systems.

PullLight never asks for or stores personal access tokens (PATs). If you see a PAT request anywhere, that is not PullLight.

Credentials at rest +

GitHub App credentials (app_id, install_url, private key PEM) are stored in the github_app_config table, encrypted with AES-GCM before insertion. The encryption key is stored in an environment variable on the server — not in the database.

If the database is ever exposed, the credentials are unreadable without the server-side encryption key. This is not security-through-obscurity — it's a real encryption layer on top of the database access.

Webhook signature verification +

Every GitHub webhook delivery includes an X-Hub-Signature-256 header — an HMAC-SHA256 of the payload, signed with the GitHub App's private key. PullLight verifies this signature before processing any webhook. Forged or replayed webhooks are rejected before they reach any business logic.

Per-request authentication +

Every GitHub API call is authenticated with an installation token — short-lived (10 minutes), scoped to the specific installation, and rotated on every request. PullLight does not hold onto or reuse long-lived tokens.

06 / Infrastructure

Where PullLight runs.

Render US region. PostgreSQL encrypted at rest. TLS everywhere.

Hosting and region +

The PullLight application runs on Render in the US region. No data is stored or processed outside the US.

Database encryption +

The PostgreSQL database (Neon) encrypts all data at rest by default — table data, indexes, WAL files, and backups. Encryption keys are managed by Neon, not by PullLight.

Transport security (TLS) +

All connections to and from PullLight use TLS 1.2 or higher:

HTTPS for all inbound requests (enforced on Render)
TLS 1.2+ for all outbound connections (GitHub API, Polsia AI proxy, Postmark)
Certificate pinning is not implemented — rely on TLS and validated hostnames instead

07 / Roadmap

What's coming.

Security is an active investment. Here's what we're building and when.

SOC 2 Type I — target Q3 2026 +

We are pursuing SOC 2 Type I certification, targeting completion in Q3 2026. This covers controls around data access, change management, incident response, and monitoring.

If SOC 2 compliance is a hard requirement for your procurement process, contact us at security@pulllight.io and we'll share what we have so far.

Self-hosted / VPC option — Enterprise tier +

For organizations that require data residency, air-gapped environments, or self-managed infrastructure, we are building a self-hosted or VPC-hosted deployment option for Enterprise customers.

If on-prem or VPC deployment is a hard requirement, let us know — we scope and price these engagements directly.

◈

Enterprise interest Self-hosted, VPC, or custom compliance requirements? Talk to us directly. hello@pulllight.io →

08 / Subprocessors

Who else handles your data.

Five vendors total. Every one listed here. No undisclosed third-party sharing, no data brokers, no advertising networks.

Vendor	Purpose	What they see	Region
Anthropic	AI model (Claude)	PR diff, PR title/description, on-demand file reads. No PII, no credentials. Not used for training (commercial API terms).	US
GitHub	Source repository & webhook delivery	GitHub App receives PR webhook payloads and posts approved review comments. GitHub processes all repo data under their own privacy policy.	US
Render	Application hosting	Runs the PullLight web service. Sees inbound HTTP traffic and logs. No direct access to code content beyond what transit requires.	US
Neon	PostgreSQL database	Stores installations, pending reviews, review findings, and audit logs. Data encrypted at rest (AES-256). Credentials encrypted by PullLight before storage.	US
Postmark	Transactional email	Sends onboarding and waitlist emails. Sees email address and send status only. No code content.	US
Stripe	Payment processing	Handles subscription billing. Sees billing name, email, and payment method only. No code or repo data.	US

For the full legal language covering data collection, GDPR/CCPA rights, and retention obligations, see the Privacy Policy and Terms of Service.

09 / PR Risk Score

How PR Risk Score works.

Every analyzed PR gets a single 0–100 risk score computed deterministically from finding data — no ML, no black box. Here is the exact formula:

severity_weighted_sum = 25×critical + 10×high + 3×medium + 1×low
base   = max(0, 100 − 100 / (1 + severity_weighted_sum))
bonus  = +5 if any finding category ∈ {auth, injection, secret-leak, race-condition}
score  = min(100, floor(base + bonus))
floor  = 0 when no findings

Why this formula? The asymptotic shape (100 − 100/(1+x)) rises steeply for the first few findings but flattens above ~10, so a single critical bug scores in the 70s rather than pegging the needle. Category bonuses reward findings in the most exploitable classes — auth bypasses, injection paths, secret leaks, and race conditions — without making the score discontinuous.

Color bands: 0–24 = green (low), 25–49 = yellow (medium), 50–74 = orange (high), 75–100 = red (critical). Scores appear on the GitHub Check Run, the /reviews queue, and the /dashboard/risk page.

The formula is implemented in lib/risk-score.js. It is deterministic: given the same findings you will always get the same score. PullLight never uses the score to block merges — conclusion stays neutral.

Your code never trains a model. Ever.