Security Architecture
Pixee's security architecture is built on three principles: minimize data exposure (send only relevant code snippets to LLM inference, never entire repositories), preserve human authority (PR-only workflow, never direct commits), and isolate analysis (stateless inference calls that cannot persist or affect other analyses). This page details data flow, access control, credential management, and AI governance for enterprise deployments.
For Pixee's latest security certifications, audit reports, and data processing policies, visit the Pixee Trust Center.
Security teams evaluating Pixee should read this page alongside Compliance. Together, they cover the technical architecture and the compliance mapping that enterprise reviews require.
Security Principles
Five design decisions govern Pixee's security architecture:
-
Minimum data exposure. Only security-relevant code context is sent to LLM inference. Not entire repositories. Not entire files (for large files). No absolute filesystem paths, repository URLs, git metadata, commit hashes, author information, or CI/CD details are included.
-
No full repository access by LLM. File paths are relative to the project root. No git metadata or author information is sent. The LLM sees only what it needs to analyze the specific vulnerability.
-
Stateless inference. Each analysis is an isolated inference call. A single analysis cannot modify the model, persist data across analyses, or affect other applications. This is fundamental to how the system operates.
-
PR-only workflow. Every code change is delivered as a pull request. This is an architectural constraint, not an optional setting. There is no mode, configuration, or override that allows direct commits.
-
Human authority preserved. Pixee adds validation layers before changes reach your existing review process. Developers and their teams always have final approval. Your branch protection rules, required reviewers, CI/CD pipelines, and SAST re-scanning all apply to Pixee changes exactly as they apply to human-written code.
Data Flow Architecture
The following describes how data flows through a self-hosted Pixee deployment:
- SCM events (push, PR creation, scanner upload) arrive at the Pixee platform via webhook
- Pixee platform extracts the relevant code context for the vulnerability being analyzed
- Code context is sent to the customer's LLM provider (only vulnerability-relevant snippets)
- LLM response is processed by the evaluation layer (independent inference call)
- Approved fixes are delivered as pull requests back to the customer's SCM
- Triage decisions are persisted with timestamp, classification, and LLM justification
For Dedicated SaaS, steps 2-4 run on Pixee-managed infrastructure. For all self-hosted models, every step runs within the customer's network.
Data Classification Table
This table is the single most important reference for security review. It specifies what data each component accesses.
| Data Type | What Pixee Platform Sees | What LLM Sees | What Is Persisted |
|---|---|---|---|
| Source code | Relevant vulnerability context | Relevant code snippets only (not full repos) | Triage outcome and fix content, not source code |
| Scanner findings | Full SARIF/finding data | Finding metadata only | Classification + justification |
| Fix proposals | Full proposed diff | Generated by LLM | PR content + quality scores (safety, effectiveness, cleanliness) |
| Repository metadata | Repo name, language, branch | Not sent | Analysis metadata |
| Developer identity | PR author for attribution | Never sent | Anonymized in telemetry (if telemetry is enabled) |
| File paths | Relative to project root | Relative to project root | Relative paths only |
| Git metadata | Not accessed | Not sent | Not persisted |
Data Flow by Deployment Model
| Data Type | Dedicated SaaS (single-tenant) | Self-Hosted (Embedded/Helm) | Air-Gapped |
|---|---|---|---|
| Source code snippets | Pixee cloud (dedicated) | Customer network | Customer network |
| Scanner findings | Pixee cloud (dedicated) | Customer network | Customer network |
| LLM inference | Pixee-managed | Customer's provider | Customer's private endpoint |
| Triage records | Pixee cloud (dedicated) | Customer network | Customer network |
| License validation | Pixee cloud | Pixee cloud (proxyable) | Pixee cloud (proxy required) |
Authentication and Access Control
| Mechanism | Implementation |
|---|---|
| SSO | Google Workspace, Microsoft Entra ID, Okta (direct login), or embedded Authentik OIDC |
| Role-based access | Admin, Security Lead, Member roles via pixee_roles scope |
| SCM authentication | GitHub App (private key), GitLab PAT, Azure DevOps PAT, Bitbucket API tokens |
| Credential storage | Kubernetes secrets with existingSecret support for external secret managers |
| Session management | RP-Initiated Logout, token splitting, recovery flows |
Embedded Authentik OIDC runs inside the customer cluster for self-hosted deployments -- no external IdP dependency for login availability. It federates to the customer's upstream corporate IdP (Google Workspace, Microsoft Entra ID, Okta, or any OIDC source) with auto-redirect and direct login.
Authentication modes are selected at install time: evaluation mode (no auth), embedded OIDC, or direct corporate IdP login. Production deployments should use SSO.
Credential Management
Pixee handles four categories of credentials. None are stored in Helm values -- all use Kubernetes secrets.
| Credential Type | Storage | External Manager Support |
|---|---|---|
| SCM tokens (GitHub App key, GitLab PAT, ADO PAT, Bitbucket API token) | Kubernetes secret | existingSecret (Vault, External Secrets Operator, SOPS) |
| LLM API keys | Kubernetes secret | existingSecret |
| Database credentials (if BYO PostgreSQL) | Kubernetes secret | existingSecret |
| Object store credentials (if BYO S3/Blob/GCS) | Kubernetes secret or pod identity | IRSA (AWS), Workload Identity (GCP), Managed Identity (Azure) |
Key points for security review:
- Customer owns all LLM API keys. Pixee never has access to customer LLM credentials.
- LLM traffic routes through the customer's account with the customer's billing.
- Pod identity support eliminates static credentials for cloud-native deployments.
- SCM tokens require minimum scopes -- documented per platform in the installation guide.
Network Security
| Requirement | Detail |
|---|---|
| Outbound connections | License validation (proxyable). LLM provider (customer-controlled). Telemetry (opt-in, toggleable). |
| Inbound connections | SCM webhooks on port 443 only |
| Proxy support | httpProxy, httpsProxy, noProxy with per-provider endpoint overrides |
| TLS options | Upload certificate, self-signed, Let's Encrypt, cert-manager |
| TLS-intercepting proxy | CA cert injection supported for environments with SSL inspection |
Self-hosted deployments require no inbound connections except SCM webhooks. All other communication is outbound and proxyable.
AI Governance
For security leaders evaluating AI risk, Pixee's AI governance architecture addresses common concerns:
Input scope limitation. Only vulnerability-relevant code is sent to the LLM. No customer proprietary patterns, business logic context, or secrets are included in prompts.
No training on customer data. Customer code is used only for the specific analysis request. Azure OpenAI and equivalent providers offer contractual guarantees that customer data is not used for model training. Pixee does not retain customer code after analysis completion.
Provider-family-aware prompting. Pixee uses provider-specific prompt structures (Anthropic gets Anthropic-optimized prompts, not a one-size-fits-all approach). This is a quality measure, not a security measure -- but it indicates the system is purpose-built for each provider, not a generic wrapper.
Independent evaluation. The evaluator runs as a separate inference call from the generator. The generator cannot grade its own work. Fixes that fail evaluation are suppressed entirely.
Responsible AI documentation. The architecture provides concrete, verifiable answers to governance committee questions. See Security & Trust for the full trust framework.
Adversarial Input Protection
Code analyzed by an LLM-based system is itself a potential attack vector. Malicious code embedded in analyzed files could attempt to manipulate the LLM through prompt injection -- crafted comments or string literals designed to alter the model's behavior. Pixee addresses this risk through defense in depth across five layers:
Input scope limitation. Only vulnerability-relevant code snippets are sent to the LLM, not full repositories or entire files. This limits the attacker's surface area for embedding adversarial payloads -- most of the codebase never reaches the model.
Output validation via independent evaluator. The structurally independent evaluation layer (separate context window, separate system prompt, separate scoring rubric) acts as a second opinion on every generated fix. Malicious or anomalous outputs that deviate from expected security fix patterns are caught and suppressed by this independent check.
Narrow generation scope. Fixes are constrained to known security patterns drawn from OWASP and SANS remediation guidance, not open-ended code generation. The model operates within a bounded solution space, making it harder for adversarial inputs to steer output toward arbitrary code execution or data exfiltration.
Deterministic-first routing. The majority of fixes use zero-LLM deterministic codemods. Vulnerabilities handled by codemods never reach an LLM at all, eliminating prompt injection risk entirely for those fix types. This reduces the overall attack surface to only the subset of fixes that require AI generation.
PR-only delivery. Even in the unlikely event that a manipulated fix passed evaluation, it would arrive as a pull request subject to developer review, CI/CD pipeline execution, and SAST re-scanning. No code reaches production without human approval.
Defense in depth reduces the risk of adversarial input attacks. No system can guarantee zero adversarial success against a sufficiently motivated attacker -- but the combination of limited input scope, independent validation, constrained generation, deterministic routing, and human review creates multiple independent barriers that an adversarial payload would need to defeat simultaneously.
Network Architecture
Pixee Enterprise runs as a set of Kubernetes pods with defined communication paths between components:
- Platform service -- receives webhooks from the SCM, serves the dashboard UI, and coordinates analysis requests. This is the entry point for all external communication.
- Analysis service -- processes scanner findings, runs triage logic, and generates remediation patches. Operates on finding data passed internally from the platform service.
- LLM proxy -- routes inference requests to the configured model endpoint(s). Supports OpenAI, Azure AI Foundry, Anthropic, and Azure Anthropic providers. All LLM traffic passes through this proxy for consistent audit logging and provider abstraction.
- Database -- PostgreSQL for triage decisions, finding metadata, quality scores, and audit records. Stores the persistent state that powers the dashboard and compliance reporting.
Inter-pod communication uses the internal cluster network only. No pod-to-pod traffic is exposed externally. Service discovery is handled through standard Kubernetes DNS.
Inbound traffic: HTTPS on port 443 from SCM webhooks and dashboard users. No other inbound ports are required.
Outbound traffic: HTTPS to the SCM API for PR creation and status updates. HTTPS to the configured LLM endpoint for inference requests. HTTPS to the Pixee license server for validation (can be routed through a corporate proxy). No other outbound connections are required.
No component requires privileged Kubernetes access. All pods run with standard security contexts. Network policies can further restrict inter-pod communication to only the defined paths above.
Data Retention
Pixee retains the following data categories, each with distinct lifecycle controls:
Triage decisions are retained for the lifetime of the deployment. Retained artifacts include finding metadata, classification verdict, justification prose, and timestamp. These records support audit defensibility and compliance reporting.
Fix artifacts include PR metadata and quality scores (Safety, Effectiveness, Cleanliness). Source code diffs are stored in Git history under customer control -- Pixee does not maintain a separate copy of fixed source code.
LLM interaction logs (prompts and responses) are retained for audit and debugging purposes. Enterprise customers can configure the retention period to match their organization's data governance policies.
Data purge follows standard Kubernetes PVC lifecycle. Customers control retention and deletion through their infrastructure configuration. For Dedicated SaaS deployments, data deletion requests are handled through standard support channels.