Input Requirements

The pipeline requires two primary input tables and raw sequencing data in FASTQ format.

Sample Table

Format: CSV file with a header row.
Required Columns:
- fastq_filepath — Path to the FASTQ file for each sample.
- sample_ID — Human-readable sample identifier.
- control_status — Classification of each sample. Required values:
  - beads_only — Negative control (bead-only immunoprecipitation). At least 2 required when run_zscore_fit_predict = true.
  - library — Library control (optional).
  - empirical — Experimental samples.
Note: Do not include a column named sample_id (lowercase). The pipeline auto-generates integer sample_id indices.

sample_ID	fastq_filepath	control_status
Patient_1_B	reads/patient1_baseline.fastq.gz	empirical
Patient_1_E	reads/patient1_endline.fastq.gz	empirical
neg1	reads/neg_control_1.fastq.gz	beads_only
neg2	reads/neg_control_2.fastq.gz	beads_only

Format: CSV file with a header row.
Required Columns:
- oligo — The oligonucleotide sequence for each peptide tile. Uppercase characters are used for alignment; lowercase characters are trimmed.
Optional Columns:
- peptide_id — Integer index (auto-generated if not provided).
- Organism — Species or virus name associated with the peptide (used for virus scoring and heatmaps).
- Species — Species-level grouping.
- peptide — Amino acid sequence of the peptide tile.
- pdb_id — PDB structure identifier (used for 3D visualization).
- id — Unique identifier for merging with oligo metadata.

oligo	Organism	Species	peptide	pdb_id
ATGCGA…	Human coronavirus OC43	Betacoronavirus 1	MSDNGP…	6OHW
GCTAGC…	Influenza A virus	Influenza A virus	DTLCIA…	4LXV

Format: FASTQ files (.fastq.gz or .fastq).
Platform: Oxford Nanopore Technologies (ONT).
Compression: Set params.fastq_stream_func = 'zcat' for gzipped files, or 'cat' for uncompressed.
Location: By default, paths in the sample table are resolved relative to $baseDir. If you provide a custom sample table, paths are resolved relative to $launchDir. Absolute paths are also supported.

Path: Configured via params.iedb_database.
Format: TSV file from the Immune Epitope Database.
Usage: Cross-referenced to annotate enriched peptides with known epitope information.

Path: Configured via params.neutralization_db.
Format: TSV file from the Immune Epitope Database.
Usage: Cross-referenced to annotate neutralizing peptides with known epitope information.

Path: Configured via params.pdb_dir.
Format: Directory containing .pdb or .cif structure files named by PDB ID.
Usage: Used by the Streamlit 3D visualization dashboard to render protein structures with mapped epitopes.