Overview

This pipeline is a Nextflow-based workflow for the analysis of Phage Immunoprecipitation Sequencing (PhIP-Seq) data, adapted from phip-flow by the Matsen Group. It processes raw nanopore sequencing reads through alignment, statistical enrichment analysis, virus scoring, IEDB epitope annotation, and generates FHIR-compliant immunological reports with interactive 3D protein structure visualizations.

PhIP-Seq is a high-throughput method for profiling antibody repertoires against thousands of viral peptides simultaneously. This pipeline uses the phippery Python library as its core data structure.

Key Features

  • Nanopore Sequencing Support: Alignment of nanopore reads using Bowtie2.

  • Statistical Enrichment Analysis: Counts-per-million (CPM) normalization, size factor estimation, Z-score against bead-only controls, and optional edgeR/BEER analysis.

  • Virus Scoring: Calculates per-species virus exposure scores based on enriched peptide hits with novel epitope filtering.

  • FDR Analysis: Benjamini-Hochberg FDR correction on Z-score-derived p-values for multiple testing control.

  • IEDB Epitope Annotation: Cross-references enriched peptides against the Immune Epitope Database (IEDB) to identify known and novel epitopes.

  • Neutralizing prediction: Identify potential neutralizing peptides (In development).

  • FHIR Compliance: Generates HL7 FHIR R4 bundles encoding PhIP-Seq observations for clinical data exchange.

  • 3D Protein Visualization: Interactive Streamlit dashboard with Mol* viewer for mapping enriched epitopes onto PDB structures.

  • Interactive Heatmaps: Virus score and Z-score heatmaps for visual exploration.

Key Outputs

  • Normalized count matrices (CPM, size factors, Z-scores)

  • Virus scores per sample per species

  • IEDB epitope annotation reports

  • Neutralizing prediction (in development)

  • FHIR genomics bundles

  • Interactive heatmaps (virus scores, Z-scores)

  • Streamlit 3D protein structure visualization

Directory Structure

phip-seq-ont
├── main.nf                          # Main workflow entry point
├── nextflow.config                  # Configuration and parameters
├── workflows/
│   ├── alignment.nf                 # Read alignment and count collection
│   ├── statistics.nf                # Normalization and Z-score calculation
│   ├── output.nf                    # dataset export
│   ├── fhir_report.nf               # HL7 FHIR R4 bundle generation
│   ├── fdr.nf                       # Benjamini-Hochberg correction
│   ├── virusscore.nf                # Per-species exposure scoring
│   ├── iedb_annotation.nf           # IEDB Epitope annotation and novelty classification
│   ├── visualization.nf             # Interactive heatmaps
│   ├── streamlit.nf                 # 3D protein visualization dashboard
│   ├── neutralization_score.nf      # Neutralization scoring (in development)
│   ├── aggregate.nf                 # result aggregation
│   └── edgeR_BEER.nf                # Optional edgeR/BEER analysis
├── bin/
│   ├── calc_scores_nofilter.py      # Virus score calculation
│   ├── fit-predict-zscore.py        # Z-score calculation
│   ├── generate-fasta.py            # Peptide FASTA generation
│   ├── merge-counts-stats.py        # Count/stats merging into phippery dataset
│   ├── phipseq_to_fhir.py           # PhIP-Seq to FHIR converter
│   ├── replicate-counts.py          # Replicate sequence count aggregation
│   ├── run_BEER.Rscript             # BEER Bayesian enrichment (optional)
│   ├── run_edgeR.Rscript            # edgeR differential enrichment (optional)
│   ├── validate-peptide-table.py    # Peptide table validation
│   └── validate-sample-table.py    # Sample table validation
├── templates/                       # Shell script templates (alignment, SAM processing)
└── data/                            # dataset (sample table, peptide table, FASTQs)