# Overview [This pipeline](https://github.com/oucru-id/phip-seq-ont) is a Nextflow-based workflow for the analysis of Phage Immunoprecipitation Sequencing (PhIP-Seq) data, adapted from [phip-flow](https://github.com/matsengrp/phip-flow) by the Matsen Group. It processes raw nanopore sequencing reads through alignment, statistical enrichment analysis, virus scoring, IEDB epitope annotation, and generates FHIR-compliant immunological reports with interactive 3D protein structure visualizations. PhIP-Seq is a high-throughput method for profiling antibody repertoires against thousands of viral peptides simultaneously. This pipeline uses the [phippery](https://matsen.group/phippery/introduction.html) Python library as its core data structure. ## Key Features * **Nanopore Sequencing Support**: Alignment of nanopore reads using Bowtie2. * **Statistical Enrichment Analysis**: Counts-per-million (CPM) normalization, size factor estimation, Z-score against bead-only controls, and optional edgeR/BEER analysis. * **Virus Scoring**: Calculates per-species virus exposure scores based on enriched peptide hits with novel epitope filtering. * **FDR Analysis**: Benjamini-Hochberg FDR correction on Z-score-derived p-values for multiple testing control. * **IEDB Epitope Annotation**: Cross-references enriched peptides against the Immune Epitope Database (IEDB) to identify known and novel epitopes. * **Neutralizing prediction**: Identify potential neutralizing peptides (In development). * **FHIR Compliance**: Generates HL7 FHIR R4 bundles encoding PhIP-Seq observations for clinical data exchange. * **3D Protein Visualization**: Interactive Streamlit dashboard with Mol* viewer for mapping enriched epitopes onto PDB structures. * **Interactive Heatmaps**: Virus score and Z-score heatmaps for visual exploration. ## Key Outputs * Normalized count matrices (CPM, size factors, Z-scores) * Virus scores per sample per species * IEDB epitope annotation reports * Neutralizing prediction (in development) * FHIR genomics bundles * Interactive heatmaps (virus scores, Z-scores) * Streamlit 3D protein structure visualization ## Directory Structure ``` phip-seq-ont ├── main.nf # Main workflow entry point ├── nextflow.config # Configuration and parameters ├── workflows/ │ ├── alignment.nf # Read alignment and count collection │ ├── statistics.nf # Normalization and Z-score calculation │ ├── output.nf # dataset export │ ├── fhir_report.nf # HL7 FHIR R4 bundle generation │ ├── fdr.nf # Benjamini-Hochberg correction │ ├── virusscore.nf # Per-species exposure scoring │ ├── iedb_annotation.nf # IEDB Epitope annotation and novelty classification │ ├── visualization.nf # Interactive heatmaps │ ├── streamlit.nf # 3D protein visualization dashboard │ ├── neutralization_score.nf # Neutralization scoring (in development) │ ├── aggregate.nf # result aggregation │ └── edgeR_BEER.nf # Optional edgeR/BEER analysis ├── bin/ │ ├── calc_scores_nofilter.py # Virus score calculation │ ├── fit-predict-zscore.py # Z-score calculation │ ├── generate-fasta.py # Peptide FASTA generation │ ├── merge-counts-stats.py # Count/stats merging into phippery dataset │ ├── phipseq_to_fhir.py # PhIP-Seq to FHIR converter │ ├── replicate-counts.py # Replicate sequence count aggregation │ ├── run_BEER.Rscript # BEER Bayesian enrichment (optional) │ ├── run_edgeR.Rscript # edgeR differential enrichment (optional) │ ├── validate-peptide-table.py # Peptide table validation │ └── validate-sample-table.py # Sample table validation ├── templates/ # Shell script templates (alignment, SAM processing) └── data/ # dataset (sample table, peptide table, FASTQs) ```