Input Requirements

The pipeline requires two primary input tables and raw sequencing data in FASTQ format.

Sample Table

  • Format: CSV file with a header row.

  • Required Columns:

    • fastq_filepath — Path to the FASTQ file for each sample.

    • sample_ID — Human-readable sample identifier.

    • control_status — Classification of each sample. Required values:

      • beads_only — Negative control (bead-only immunoprecipitation). At least 2 required when run_zscore_fit_predict = true.

      • library — Library control (optional).

      • empirical — Experimental samples.

  • Note: Do not include a column named sample_id (lowercase). The pipeline auto-generates integer sample_id indices.

Example Sample Table

sample_ID

fastq_filepath

control_status

Patient_1_B

reads/patient1_baseline.fastq.gz

empirical

Patient_1_E

reads/patient1_endline.fastq.gz

empirical

neg1

reads/neg_control_1.fastq.gz

beads_only

neg2

reads/neg_control_2.fastq.gz

beads_only

Peptide Table

  • Format: CSV file with a header row.

  • Required Columns:

    • oligo — The oligonucleotide sequence for each peptide tile. Uppercase characters are used for alignment; lowercase characters are trimmed.

  • Optional Columns:

    • peptide_id — Integer index (auto-generated if not provided).

    • Organism — Species or virus name associated with the peptide (used for virus scoring and heatmaps).

    • Species — Species-level grouping.

    • peptide — Amino acid sequence of the peptide tile.

    • pdb_id — PDB structure identifier (used for 3D visualization).

    • id — Unique identifier for merging with oligo metadata.

Example Peptide Table

oligo

Organism

Species

peptide

pdb_id

ATGCGA…

Human coronavirus OC43

Betacoronavirus 1

MSDNGP…

6OHW

GCTAGC…

Influenza A virus

Influenza A virus

DTLCIA…

4LXV

Sequencing Data

  • Format: FASTQ files (.fastq.gz or .fastq).

  • Platform: Oxford Nanopore Technologies (ONT).

  • Compression: Set params.fastq_stream_func = 'zcat' for gzipped files, or 'cat' for uncompressed.

  • Location: By default, paths in the sample table are resolved relative to $baseDir. If you provide a custom sample table, paths are resolved relative to $launchDir. Absolute paths are also supported.

Optional Input Files

IEDB Database

  • Path: Configured via params.iedb_database.

  • Format: TSV file from the Immune Epitope Database.

  • Usage: Cross-referenced to annotate enriched peptides with known epitope information.

IEDB Neutralizing Database

  • Path: Configured via params.neutralization_db.

  • Format: TSV file from the Immune Epitope Database.

  • Usage: Cross-referenced to annotate neutralizing peptides with known epitope information.

PDB Structures

  • Path: Configured via params.pdb_dir.

  • Format: Directory containing .pdb or .cif structure files named by PDB ID.

  • Usage: Used by the Streamlit 3D visualization dashboard to render protein structures with mapped epitopes.