Output Files
The pipeline produces a set of outputs organized under the results/ directory.
Wide Data Outputs
The primary analytical outputs are gzipped CSV matrices where rows are peptides and columns are samples.
Counts Matrix (dataset_counts.csv.gz)
Raw aligned read counts per peptide per sample.
CPM Matrix (dataset_cpm.csv.gz)
Counts-per-million normalized values. Accounts for sequencing depth differences between samples.
Z-Score Matrix (dataset_zscore.csv.gz)
Statistical enrichment Z-scores computed by fitting a regression model on bead-only negative control samples using phippery.modeling.zscore(). Higher Z-scores indicate stronger antibody binding signal above background.
Virus Scores
Per-sample CSV files containing virus exposure scores aggregated at the species level. The scoring algorithm:
Groups peptides by species.
Sorts viruses by total hit count (descending).
For each virus, counts only novel peptides that do not share a 7-amino-acid subsequence with any previously assigned peptide.
Merged Scores (merged_scores.csv)
Combined matrix: Species (rows) × Samples (columns).
FHIR Output
FHIR Transaction Bundle (*.fhir.json)
HL7 FHIR R4 compliant bundles containing:
Resource |
Description |
|---|---|
Patient |
Patient resource per sample |
Specimen |
Serum specimen for PhIP-Seq analysis |
Organization |
Testing facility |
Practitioner |
Lab personnel |
PractitionerRole |
Practitioner’s role at the organization |
Observation |
Per-peptide Z-score observations with interpretation |
Observation Interpretation
Each peptide observation includes:
Z-score value as
valueQuantityInterpretation:
POS(Positive) — Z-score > 3.5NEG(Negative) — Z-score ≤ 3.5
Method: SNOMED CT
708049000(Phage immunoprecipitation sequencing)
IEDB Annotation Outputs
Per-sample annotation results from cross-referencing with the Immune Epitope Database.
File |
Description |
|---|---|
|
All peptides with IEDB match counts, Z-scores, and novelty flags |
|
Peptides with no IEDB epitope match |
|
Matched epitopes from peptides with Z-score ≥ threshold |
|
Human-readable summary with match rates and statistics |
Neutralization Prediction Outputs
Per-sample and aggregated predictions of neutralizing antibody potential from detected epitopes.
File |
Description |
|---|---|
|
Per-peptide composite scores, structural features, and prediction categories (High/Moderate/Low) |
|
Filtered subset of peptides predicted as High neutralization potential |
|
Summary statistics: sample count, reactive peptide count, prediction distribution, threshold, score percentiles |
|
Score distribution statistics (mean, std, percentiles), prediction counts, algorithm weights used, peptides with structural data and cluster membership |
|
Spatial clusters of peptides (same PDB + sample) within 8 Ångströms; includes cluster center coordinates (X, Y, Z) |
Prediction Categories:
High: composite ≥ 3.0
Moderate: composite ≥ 1.95
Low: composite < 1.95
Visualization Outputs
Virus Score Heatmap (virus_score_heatmap.html)
Interactive heatmap showing virus exposure scores across all samples.
Z-Score Heatmap (zscore_heatmap.html)
Interactive heatmap with a dropdown menu to select virus organisms. Displays Z-scores per peptide oligo per sample with hover details.
Streamlit Dashboard
A Streamlit web application for interactive 3D protein visualization.
Features:
Mol* Viewer: Renders PDB/CIF structures with epitope mapping.
Epitope Highlighting: Red for all enriched epitopes; green for selected.
Sequence Mapping: Sequence-to-structure alignment.
Controls: PDB selection, sample selection, background color, auto-spin.
Deployment:
cd results/streamlit_app/
chmod +x deploy_streamlit.sh
./deploy_streamlit.sh