Input Requirements
The pipeline requires two primary input tables and raw sequencing data in FASTQ format.
Sample Table
Format: CSV file with a header row.
Required Columns:
fastq_filepath— Path to the FASTQ file for each sample.sample_ID— Human-readable sample identifier.control_status— Classification of each sample. Required values:beads_only— Negative control (bead-only immunoprecipitation). At least 2 required whenrun_zscore_fit_predict = true.library— Library control (optional).empirical— Experimental samples.
Note: Do not include a column named
sample_id(lowercase). The pipeline auto-generates integersample_idindices.
Example Sample Table
sample_ID |
fastq_filepath |
control_status |
|---|---|---|
Patient_1_B |
reads/patient1_baseline.fastq.gz |
empirical |
Patient_1_E |
reads/patient1_endline.fastq.gz |
empirical |
neg1 |
reads/neg_control_1.fastq.gz |
beads_only |
neg2 |
reads/neg_control_2.fastq.gz |
beads_only |
Peptide Table
Format: CSV file with a header row.
Required Columns:
oligo— The oligonucleotide sequence for each peptide tile. Uppercase characters are used for alignment; lowercase characters are trimmed.
Optional Columns:
peptide_id— Integer index (auto-generated if not provided).Organism— Species or virus name associated with the peptide (used for virus scoring and heatmaps).Species— Species-level grouping.peptide— Amino acid sequence of the peptide tile.pdb_id— PDB structure identifier (used for 3D visualization).id— Unique identifier for merging with oligo metadata.
Example Peptide Table
oligo |
Organism |
Species |
peptide |
pdb_id |
|---|---|---|---|---|
ATGCGA… |
Human coronavirus OC43 |
Betacoronavirus 1 |
MSDNGP… |
6OHW |
GCTAGC… |
Influenza A virus |
Influenza A virus |
DTLCIA… |
4LXV |
Sequencing Data
Format: FASTQ files (
.fastq.gzor.fastq).Platform: Oxford Nanopore Technologies (ONT).
Compression: Set
params.fastq_stream_func = 'zcat'for gzipped files, or'cat'for uncompressed.Location: By default, paths in the sample table are resolved relative to
$baseDir. If you provide a custom sample table, paths are resolved relative to$launchDir. Absolute paths are also supported.
Optional Input Files
IEDB Database
Path: Configured via
params.iedb_database.Format: TSV file from the Immune Epitope Database.
Usage: Cross-referenced to annotate enriched peptides with known epitope information.
IEDB Neutralizing Database
Path: Configured via
params.neutralization_db.Format: TSV file from the Immune Epitope Database.
Usage: Cross-referenced to annotate neutralizing peptides with known epitope information.
PDB Structures
Path: Configured via
params.pdb_dir.Format: Directory containing
.pdbor.cifstructure files named by PDB ID.Usage: Used by the Streamlit 3D visualization dashboard to render protein structures with mapped epitopes.