Output Files

The pipeline produces a set of outputs organized under the results/ directory.

Wide Data Outputs

The primary analytical outputs are gzipped CSV matrices where rows are peptides and columns are samples.

Counts Matrix (`dataset_counts.csv.gz`)

Raw aligned read counts per peptide per sample.

CPM Matrix (`dataset_cpm.csv.gz`)

Counts-per-million normalized values. Accounts for sequencing depth differences between samples.

Z-Score Matrix (`dataset_zscore.csv.gz`)

Statistical enrichment Z-scores computed by fitting a regression model on bead-only negative control samples using phippery.modeling.zscore(). Higher Z-scores indicate stronger antibody binding signal above background.

Virus Scores

Per-sample CSV files containing virus exposure scores aggregated at the species level. The scoring algorithm:

Groups peptides by species.
Sorts viruses by total hit count (descending).
For each virus, counts only novel peptides that do not share a 7-amino-acid subsequence with any previously assigned peptide.

Merged Scores (`merged_scores.csv`)

Combined matrix: Species (rows) × Samples (columns).

FHIR Output

FHIR Transaction Bundle (`*.fhir.json`)

HL7 FHIR R4 compliant bundles containing:

Resource	Description
Patient	Patient resource per sample
Specimen	Serum specimen for PhIP-Seq analysis
Organization	Testing facility
Practitioner	Lab personnel
PractitionerRole	Practitioner’s role at the organization
Observation	Per-peptide Z-score observations with interpretation

Observation Interpretation

Each peptide observation includes:

Z-score value as valueQuantity
Interpretation:
- POS (Positive) — Z-score > 3.5
- NEG (Negative) — Z-score ≤ 3.5
Method: SNOMED CT 708049000 (Phage immunoprecipitation sequencing)

IEDB Annotation Outputs

Per-sample annotation results from cross-referencing with the Immune Epitope Database.

File	Description
`*_annotated_peptides.csv`	All peptides with IEDB match counts, Z-scores, and novelty flags
`*_novel_peptides.csv`	Peptides with no IEDB epitope match
`*_significant_epitopes.csv`	Matched epitopes from peptides with Z-score ≥ threshold
`*_annotation_summary.txt`	Human-readable summary with match rates and statistics

Neutralization Prediction Outputs

Per-sample and aggregated predictions of neutralizing antibody potential from detected epitopes.

File	Description
`neutralization_scores_per_sample.csv`	Per-peptide composite scores, structural features, and prediction categories (High/Moderate/Low)
`high_confidence_candidates.csv`	Filtered subset of peptides predicted as High neutralization potential
`neutralization_summary.txt`	Summary statistics: sample count, reactive peptide count, prediction distribution, threshold, score percentiles
`detailed_analysis.json`	Score distribution statistics (mean, std, percentiles), prediction counts, algorithm weights used, peptides with structural data and cluster membership
`conformational_epitope_clusters.csv`	Spatial clusters of peptides (same PDB + sample) within 8 Ångströms; includes cluster center coordinates (X, Y, Z)

Prediction Categories:

High: composite ≥ 3.0
Moderate: composite ≥ 1.95
Low: composite < 1.95

Visualization Outputs

Virus Score Heatmap (`virus_score_heatmap.html`)

Interactive heatmap showing virus exposure scores across all samples.

Z-Score Heatmap (`zscore_heatmap.html`)

Interactive heatmap with a dropdown menu to select virus organisms. Displays Z-scores per peptide oligo per sample with hover details.

Streamlit Dashboard

A Streamlit web application for interactive 3D protein visualization.

Features:

Mol* Viewer: Renders PDB/CIF structures with epitope mapping.
Epitope Highlighting: Red for all enriched epitopes; green for selected.
Sequence Mapping: Sequence-to-structure alignment.
Controls: PDB selection, sample selection, background color, auto-spin.

Deployment:

cd results/streamlit_app/
chmod +x deploy_streamlit.sh
./deploy_streamlit.sh