Skip to main content

SARIF Reference

SARIF (Static Analysis Results Interchange Format) is the OASIS standard that Pixee uses to ingest security findings from its named scanner integrations and any SARIF-producing tool. Pixee reads SARIF files to understand what vulnerabilities were found, where they are located, and what dataflow information is available -- then routes each finding to the appropriate triage and remediation engine. This page documents how Pixee consumes SARIF.

What is SARIF?

SARIF is an OASIS open standard for representing static analysis tool output. It provides a common JSON format for any scanner to express findings, locations, code flows, and rule metadata.

Pixee supports SARIF version 2.1.0, the current stable release of the standard.

SARIF matters to Pixee's architecture because it enables scanner-agnostic remediation. Rather than building custom parsers for every scanner output format, Pixee normalizes all findings through SARIF. Any tool that produces valid SARIF can feed findings into Pixee for triage and remediation.

How Pixee uses SARIF

Scanner  -->  SARIF file  -->  Pixee ingestion  -->  Triage  -->  Fix  -->  PR
  1. Ingestion. SARIF files arrive via webhook from native scanner integrations, via the Universal SARIF integration, or via API upload.
  2. Normalization. Scanner-specific handlers extract maximum metadata from each tool's SARIF output. When a native handler does not exist, the Universal SARIF handler processes any valid SARIF document.
  3. Triage routing. Normalized findings enter the three-tier triage engine. Findings with richer SARIF data (code flows, related locations) receive higher-quality triage and remediation.
  4. Output. Remediation results are delivered as pull requests on the target repository.

Required and optional SARIF fields

Pixee reads specific SARIF fields to route findings and generate fixes. Richer SARIF input produces better results.

Required fields

These fields must be present for Pixee to process a finding:

SARIF FieldTypePixee Usage
runs[].tool.driver.namestringScanner identification and handler routing
runs[].results[].ruleIdstringRule matching for fix routing and knowledge base lookup
runs[].results[].message.textstringFinding description for triage context
runs[].results[].locations[].physicalLocation.artifactLocation.uristringFile path of the vulnerability
runs[].results[].locations[].physicalLocation.region.startLineintegerLine number of the vulnerability

These fields are not required, but significantly improve triage accuracy and fix quality:

SARIF FieldTypePixee Usage
runs[].results[].codeFlows[]arrayDataflow and taint propagation paths. Enables cross-file fix context.
runs[].results[].codeFlows[].threadFlows[]arrayStep-by-step execution paths through the vulnerability
runs[].results[].relatedLocations[]arrayAdditional code context (sink locations, intermediate variables)
runs[].results[].levelstringSeverity classification (error, warning, note)
runs[].tool.driver.rules[]arrayRule metadata including descriptions and help text
runs[].tool.extensions[]arrayExtension packs with additional rule documentation

Optional fields

SARIF FieldTypePixee Usage
runs[].results[].fingerprintsobjectFinding deduplication across scans
runs[].results[].partialFingerprintsobjectFuzzy matching for findings that shift between scans
runs[].results[].suppressions[]arrayPreviously suppressed findings (Pixee respects these)
runs[].results[].propertiesobjectCustom scanner metadata preserved through the pipeline

Dataflow quality and fix quality

The richness of SARIF codeFlows data directly affects fix quality. Pixee classifies dataflow quality into four tiers:

TierSARIF SignalFix Impact
STRONG_MULTI_FILEcodeFlows with threadFlows spanning multiple filesHighest fix quality. Cross-file context enables precise remediation.
STRONG_SINGLE_FILEcodeFlows with threadFlows within a single fileHigh fix quality. Full taint path available for context-aware fixes.
WEAKPartial or low-confidence codeFlowsModerate fix quality. Pixee uses heuristics to supplement incomplete paths.
SINGLE_LOCATIONOnly locations[], no codeFlowsBaseline fix quality. Pixee relies on rule knowledge base and surrounding code analysis.

Recommendation: Configure your scanner to export codeFlows and threadFlows when available. CodeQL and Semgrep produce rich dataflow by default. Some scanners require explicit configuration to include flow data in SARIF output.

SARIF validation

Pixee validates incoming SARIF documents against the 2.1.0 schema before processing.

Common validation failures:

IssueCauseFix
Missing runs arrayMalformed SARIF documentEnsure the top-level object contains a runs array with at least one run
Empty resultsScanner found no findingsExpected behavior -- Pixee logs the scan but generates no fixes
Missing locations on a resultScanner omitted location dataConfigure scanner to include file and line information
Invalid uri in artifactLocationPath uses backslashes or absolute system pathsUse forward slashes and repository-relative paths
Missing ruleIdScanner omitted the rule identifierEnsure scanner output includes rule IDs. Pixee cannot route findings without a rule ID.

Validate your SARIF files before upload using the SARIF Multitool:

sarif validate my-results.sarif

Integration examples

Upload SARIF via API

curl -X POST \
-H "Authorization: Bearer $PIXEE_TOKEN" \
-H "Content-Type: application/json" \
-d @scanner-results.sarif \
https://app.pixee.ai/api/v1/repositories/REPO_ID/sarif

Python: Upload and poll for results

import requests
import time

TOKEN = "YOUR_API_TOKEN"
BASE = "https://app.pixee.ai/api/v1"
REPO_ID = "your-repo-id"
headers = {"Authorization": f"Bearer {TOKEN}"}

# Upload SARIF
with open("scanner-results.sarif", "r") as f:
sarif_data = f.read()

upload = requests.post(
f"{BASE}/repositories/{REPO_ID}/sarif",
headers={**headers, "Content-Type": "application/json"},
data=sarif_data
)
scan_id = upload.json()["scan_id"]

# Poll for completion
while True:
status = requests.get(
f"{BASE}/repositories/{REPO_ID}/scans/{scan_id}",
headers=headers
).json()

if status["state"] in ("completed", "failed"):
break
time.sleep(10)

print(f"Scan {status['state']}: {status.get('fixes_generated', 0)} fixes generated")

CI/CD pipeline: Scanner to Pixee to PR

# GitHub Actions example
- name: Run CodeQL
uses: github/codeql-action/analyze@v3
with:
output: sarif-results

- name: Upload to Pixee
run: |
curl -X POST \
-H "Authorization: Bearer ${{ secrets.PIXEE_TOKEN }}" \
-H "Content-Type: application/json" \
-d @sarif-results/results.sarif \
https://app.pixee.ai/api/v1/repositories/${{ vars.PIXEE_REPO_ID }}/sarif

Scanner-specific SARIF notes

Native scanner integrations handle SARIF automatically. These notes apply when you generate SARIF manually or use the Universal SARIF integration.

ScannerSARIF Notes
CodeQLProduces rich SARIF with codeFlows, threadFlows, and tool.extensions[].rules[].help.markdown. Pixee extracts all of these for maximum triage context.
SemgrepExports SARIF via semgrep --sarif. Rule explanations are in fullDescription.text. Include --verbose for richer output.
SonarQubeSARIF export varies by edition. Ensure codeFlows are included when available.
CheckmarxProduces metadata-sparse SARIF. Pixee compensates with rule-ID-based prompting from its knowledge base.
SnykUse snyk code test --sarif for SAST results.
TrivyUse trivy fs --format sarif for filesystem scanning results.

For full setup guides per scanner, see Integrations Overview.