Security Architecture

Pixee's security architecture is built on three principles: minimize data exposure (send only relevant code snippets to LLM inference, never entire repositories), preserve human authority (PR-only workflow, never direct commits), and isolate analysis (stateless inference calls that cannot persist or affect other analyses). This page details data flow, access control, credential management, and AI governance for enterprise deployments.

For Pixee's latest security certifications, audit reports, and data processing policies, visit the Pixee Trust Center.

Security teams evaluating Pixee should read this page alongside Compliance. Together, they cover the technical architecture and the compliance mapping that enterprise reviews require.

Security Principles

Five design decisions govern Pixee's security architecture:

Minimum data exposure. Only security-relevant code context is sent to LLM inference. Not entire repositories. Not entire files (for large files). No absolute filesystem paths, repository URLs, git metadata, commit hashes, author information, or CI/CD details are included.
No full repository access by LLM. File paths are relative to the project root. No git metadata or author information is sent. The LLM sees only what it needs to analyze the specific vulnerability.
Stateless inference. Each analysis is an isolated inference call. A single analysis cannot modify the model, persist data across analyses, or affect other applications. This is fundamental to how the system operates.
PR-only workflow. Every code change is delivered as a pull request. This is an architectural constraint, not an optional setting. There is no mode, configuration, or override that allows direct commits.
Human authority preserved. Pixee adds validation layers before changes reach your existing review process. Developers and their teams always have final approval. Your branch protection rules, required reviewers, CI/CD pipelines, and SAST re-scanning all apply to Pixee changes exactly as they apply to human-written code.

Data Flow Architecture

The following describes how data flows through a self-hosted Pixee deployment:

SCM events (push, PR creation, scanner upload) arrive at the Pixee platform via webhook
Pixee platform extracts the relevant code context for the vulnerability being analyzed
Code context is sent to the customer's LLM provider (only vulnerability-relevant snippets)
LLM response is processed by the evaluation layer (independent inference call)
Approved fixes are delivered as pull requests back to the customer's SCM
Triage decisions are persisted with timestamp, classification, and LLM justification

For Dedicated SaaS, steps 2-4 run on Pixee-managed infrastructure. For all self-hosted models, every step runs within the customer's network.

Data Classification Table

This table is the single most important reference for security review. It specifies what data each component accesses.

Data Type	What Pixee Platform Sees	What LLM Sees	What Is Persisted
Source code	Relevant vulnerability context	Relevant code snippets only (not full repos)	Triage outcome and fix content, not source code
Scanner findings	Full SARIF/finding data	Finding metadata only	Classification + justification
Fix proposals	Full proposed diff	Generated by LLM	PR content + quality scores (safety, effectiveness, cleanliness)
Repository metadata	Repo name, language, branch	Not sent	Analysis metadata
Developer identity	PR author for attribution	Never sent	Anonymized in telemetry (if telemetry is enabled)
File paths	Relative to project root	Relative to project root	Relative paths only
Git metadata	Not accessed	Not sent	Not persisted

Data Flow by Deployment Model

Data Type	Dedicated SaaS (single-tenant)	Self-Hosted (Embedded/Helm)	Air-Gapped
Source code snippets	Pixee cloud (dedicated)	Customer network	Customer network
Scanner findings	Pixee cloud (dedicated)	Customer network	Customer network
LLM inference	Pixee-managed	Customer's provider	Customer's private endpoint
Triage records	Pixee cloud (dedicated)	Customer network	Customer network
License validation	Pixee cloud	Pixee cloud (proxyable)	Pixee cloud (proxy required)

Authentication and Access Control

Mechanism	Implementation
SSO	Google Workspace, Microsoft Entra ID, Okta (direct login), or embedded Authentik OIDC
Role-based access	Admin, Security Lead, Member roles via `pixee_roles` scope
SCM authentication	GitHub App (private key), GitLab PAT, Azure DevOps PAT, Bitbucket API tokens
Credential storage	Kubernetes secrets with `existingSecret` support for external secret managers
Session management	RP-Initiated Logout, token splitting, recovery flows

Embedded Authentik OIDC runs inside the customer cluster for self-hosted deployments -- no external IdP dependency for login availability. It federates to the customer's upstream corporate IdP (Google Workspace, Microsoft Entra ID, Okta, or any OIDC source) with auto-redirect and direct login.

Authentication modes are selected at install time: evaluation mode (no auth), embedded OIDC, or direct corporate IdP login. Production deployments should use SSO.

Credential Management

Pixee handles four categories of credentials. None are stored in Helm values -- all use Kubernetes secrets.

Credential Type	Storage	External Manager Support
SCM tokens (GitHub App key, GitLab PAT, ADO PAT, Bitbucket API token)	Kubernetes secret	`existingSecret` (Vault, External Secrets Operator, SOPS)
LLM API keys	Kubernetes secret	`existingSecret`
Database credentials (if BYO PostgreSQL)	Kubernetes secret	`existingSecret`
Object store credentials (if BYO S3/Blob/GCS)	Kubernetes secret or pod identity	IRSA (AWS), Workload Identity (GCP), Managed Identity (Azure)

Key points for security review:

Customer owns all LLM API keys. Pixee never has access to customer LLM credentials.
LLM traffic routes through the customer's account with the customer's billing.
Pod identity support eliminates static credentials for cloud-native deployments.
SCM tokens require minimum scopes -- documented per platform in the installation guide.

Network Security

Requirement	Detail
Outbound connections	License validation (proxyable). LLM provider (customer-controlled). Telemetry (opt-in, toggleable).
Inbound connections	SCM webhooks on port 443 only
Proxy support	`httpProxy`, `httpsProxy`, `noProxy` with per-provider endpoint overrides
TLS options	Upload certificate, self-signed, Let's Encrypt, cert-manager
TLS-intercepting proxy	CA cert injection supported for environments with SSL inspection

Self-hosted deployments require no inbound connections except SCM webhooks. All other communication is outbound and proxyable.

AI Governance

For security leaders evaluating AI risk, Pixee's AI governance architecture addresses common concerns:

Input scope limitation. Only vulnerability-relevant code is sent to the LLM. No customer proprietary patterns, business logic context, or secrets are included in prompts.

No training on customer data. Customer code is used only for the specific analysis request. Azure OpenAI and equivalent providers offer contractual guarantees that customer data is not used for model training. Pixee does not retain customer code after analysis completion.

Provider-family-aware prompting. Pixee uses provider-specific prompt structures (Anthropic gets Anthropic-optimized prompts, not a one-size-fits-all approach). This is a quality measure, not a security measure -- but it indicates the system is purpose-built for each provider, not a generic wrapper.

Independent evaluation. The evaluator runs as a separate inference call from the generator. The generator cannot grade its own work. Fixes that fail evaluation are suppressed entirely.

Responsible AI documentation. The architecture provides concrete, verifiable answers to governance committee questions. See Security & Trust for the full trust framework.

Adversarial Input Protection

Code analyzed by an LLM-based system is itself a potential attack vector. Malicious code embedded in analyzed files could attempt to manipulate the LLM through prompt injection -- crafted comments or string literals designed to alter the model's behavior. Pixee addresses this risk through defense in depth across five layers:

Input scope limitation. Only vulnerability-relevant code snippets are sent to the LLM, not full repositories or entire files. This limits the attacker's surface area for embedding adversarial payloads -- most of the codebase never reaches the model.

Output validation via independent evaluator. The structurally independent evaluation layer (separate context window, separate system prompt, separate scoring rubric) acts as a second opinion on every generated fix. Malicious or anomalous outputs that deviate from expected security fix patterns are caught and suppressed by this independent check.

Narrow generation scope. Fixes are constrained to known security patterns drawn from OWASP and SANS remediation guidance, not open-ended code generation. The model operates within a bounded solution space, making it harder for adversarial inputs to steer output toward arbitrary code execution or data exfiltration.

Deterministic-first routing. The majority of fixes use zero-LLM deterministic codemods. Vulnerabilities handled by codemods never reach an LLM at all, eliminating prompt injection risk entirely for those fix types. This reduces the overall attack surface to only the subset of fixes that require AI generation.

PR-only delivery. Even in the unlikely event that a manipulated fix passed evaluation, it would arrive as a pull request subject to developer review, CI/CD pipeline execution, and SAST re-scanning. No code reaches production without human approval.

Defense in depth reduces the risk of adversarial input attacks. No system can guarantee zero adversarial success against a sufficiently motivated attacker -- but the combination of limited input scope, independent validation, constrained generation, deterministic routing, and human review creates multiple independent barriers that an adversarial payload would need to defeat simultaneously.

Network Architecture

Pixee Enterprise runs as a set of Kubernetes pods with defined communication paths between components:

Platform service -- receives webhooks from the SCM, serves the dashboard UI, and coordinates analysis requests. This is the entry point for all external communication.
Analysis service -- processes scanner findings, runs triage logic, and generates remediation patches. Operates on finding data passed internally from the platform service.
LLM proxy -- routes inference requests to the configured model endpoint(s). Supports OpenAI, Azure AI Foundry, Anthropic, and Azure Anthropic providers. All LLM traffic passes through this proxy for consistent audit logging and provider abstraction.
Database -- PostgreSQL for triage decisions, finding metadata, quality scores, and audit records. Stores the persistent state that powers the dashboard and compliance reporting.

Inter-pod communication uses the internal cluster network only. No pod-to-pod traffic is exposed externally. Service discovery is handled through standard Kubernetes DNS.

Inbound traffic: HTTPS on port 443 from SCM webhooks and dashboard users. No other inbound ports are required.

Outbound traffic: HTTPS to the SCM API for PR creation and status updates. HTTPS to the configured LLM endpoint for inference requests. HTTPS to the Pixee license server for validation (can be routed through a corporate proxy). No other outbound connections are required.

No component requires privileged Kubernetes access. All pods run with standard security contexts. Network policies can further restrict inter-pod communication to only the defined paths above.

Data Retention

Pixee retains the following data categories, each with distinct lifecycle controls:

Triage decisions are retained for the lifetime of the deployment. Retained artifacts include finding metadata, classification verdict, justification prose, and timestamp. These records support audit defensibility and compliance reporting.

Fix artifacts include PR metadata and quality scores (Safety, Effectiveness, Cleanliness). Source code diffs are stored in Git history under customer control -- Pixee does not maintain a separate copy of fixed source code.

LLM interaction logs (prompts and responses) are retained for audit and debugging purposes. Enterprise customers can configure the retention period to match their organization's data governance policies.

Data purge follows standard Kubernetes PVC lifecycle. Customers control retention and deletion through their infrastructure configuration. For Dedicated SaaS deployments, data deletion requests are handled through standard support channels.

Security Principles​

Data Flow Architecture​

Data Classification Table​

Data Flow by Deployment Model​

Authentication and Access Control​

Credential Management​

Network Security​

AI Governance​

Adversarial Input Protection​

Network Architecture​

Data Retention​