# Input Requirements The pipeline requires two primary input tables and raw sequencing data in FASTQ format. ## Sample Table * **Format**: CSV file with a header row. * **Required Columns**: * `fastq_filepath` — Path to the FASTQ file for each sample. * `sample_ID` — Human-readable sample identifier. * `control_status` — Classification of each sample. Required values: * `beads_only` — Negative control (bead-only immunoprecipitation). **At least 2 required** when `run_zscore_fit_predict = true`. * `library` — Library control (optional). * `empirical` — Experimental samples. * **Note**: Do **not** include a column named `sample_id` (lowercase). The pipeline auto-generates integer `sample_id` indices. ### Example Sample Table | sample_ID | fastq_filepath | control_status | | :--- | :--- | :--- | | Patient_1_B | reads/patient1_baseline.fastq.gz | empirical | | Patient_1_E | reads/patient1_endline.fastq.gz | empirical | | neg1 | reads/neg_control_1.fastq.gz | beads_only | | neg2 | reads/neg_control_2.fastq.gz | beads_only | ## Peptide Table * **Format**: CSV file with a header row. * **Required Columns**: * `oligo` — The oligonucleotide sequence for each peptide tile. Uppercase characters are used for alignment; lowercase characters are trimmed. * **Optional Columns**: * `peptide_id` — Integer index (auto-generated if not provided). * `Organism` — Species or virus name associated with the peptide (used for virus scoring and heatmaps). * `Species` — Species-level grouping. * `peptide` — Amino acid sequence of the peptide tile. * `pdb_id` — PDB structure identifier (used for 3D visualization). * `id` — Unique identifier for merging with oligo metadata. ### Example Peptide Table | oligo | Organism | Species | peptide | pdb_id | | :--- | :--- | :--- | :--- | :--- | | ATGCGA... | Human coronavirus OC43 | Betacoronavirus 1 | MSDNGP... | 6OHW | | GCTAGC... | Influenza A virus | Influenza A virus | DTLCIA... | 4LXV | ## Sequencing Data * **Format**: FASTQ files (`.fastq.gz` or `.fastq`). * **Platform**: Oxford Nanopore Technologies (ONT). * **Compression**: Set `params.fastq_stream_func = 'zcat'` for gzipped files, or `'cat'` for uncompressed. * **Location**: By default, paths in the sample table are resolved relative to `$baseDir`. If you provide a custom sample table, paths are resolved relative to `$launchDir`. Absolute paths are also supported. ## Optional Input Files ### IEDB Database * **Path**: Configured via `params.iedb_database`. * **Format**: TSV file from the [Immune Epitope Database](https://www.iedb.org/). * **Usage**: Cross-referenced to annotate enriched peptides with known epitope information. ### IEDB Neutralizing Database * **Path**: Configured via `params.neutralization_db`. * **Format**: TSV file from the [Immune Epitope Database](https://www.iedb.org/). * **Usage**: Cross-referenced to annotate neutralizing peptides with known epitope information. ### PDB Structures * **Path**: Configured via `params.pdb_dir`. * **Format**: Directory containing `.pdb` or `.cif` structure files named by PDB ID. * **Usage**: Used by the Streamlit 3D visualization dashboard to render protein structures with mapped epitopes.