Whole-genome Sequencing Data Variant Calling#

Summary of methods#

--pubdir
- Default: /<PATH>
- Description: The directory that the saved outputs will be stored.
-w
- Default: /<PATH>
- The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on scratch space or other directory with ample storage.
--sample_folder
- Default: /<PATH>
- The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
--fasta
- Default: /<PATH>
- Path to the reference genome in FASTA format
--read_type
- Default: PE
- Comment: Type of reads: paired end (PE) or single end (SE).

--concat_lanes
- Default: false
- Options: false and true. Default: false. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
--extension
- Default: .fastq.gz
- Expected file extension for input read files, modify if files are not compressed or have a different form (e.g. ".fastq" or ".fq.gz")
--pattern
- Default: *_R{1,2}*
- Expected R1/R2 matching pattern for paired-end read files in the --sample_folder path. In concert with --extension the default values will match files such as sampleID.L001.R1.fastq.gz and sampleID.L001.R2.fastq.gz
keep_intermediate
- Default: false
- If true workflow will output intermediate alignment files (unsorted BAMs, etc.)
keep_reference
- Default: false
- If true workflow will save a copy of the BWA indices to the output directory

--quality_phred
- Default: 15
- Quality score threshold.
--unqualified_perc
- Default: 40
- Percent threshold of unqualified bases to pass reads.
--tmpdir
- Default: "~/scratch/$USER/tmp/"

--mismatch_penalty
- Default: "-B 8"
--bwa_min_score
- Default: null
- Threshold of alignment quality score to emit read alignment

--ploidy
- Default: 2
- Options: 1, 2
--mpileup_depth
- Default: 100
- For a given position, read maximally INT reads per input file
--skip_indels
- Default: false
- If true, do not call indel sites
--variants_only
- Default: true
- If true, only emit variant sites (e.g. not homozygous reference)

--filter_dp
- Default: "DP < 25"
- Sites with depth lower than this value will have a filter flag set as "LowCoverage"
--filter_very_low_qual
- Default: "QUAL < 30.0"
- Sites with a quality score less than this value will have a filter flag set as "VeryLowQual"
--filter_low_qual
- Default: "QUAL > 30.0 && QUAL < 50.0"
- Sites with a quality score less than this value will have a filter flag set as "LowQual"
--filter_qd
- Default: "QD < 1.5"
- Sites with a depth-normalized quality score less than this value will have a filter flag set as "LowQD"
--filter_fs
- Default: "FS > 60.0"
- Sites with a strand bias greater than this value will have a filter flag set as "StrandBias"

Naming Convention	Description
`wgs_cohort_variants_bcftools_filtered.vcf`	Soft-filtered VCF including all samples
`wgs_cohort_variants_bcftools_filtered_genotype_table.tsv`	RSEM-generated gene-level TPM counts merged across all samples
`wgs_cohort_variants_bcftools_filtered_site_table.tsv`	RSEM-generated isoform-level raw counts merged across all samples
`wgs_report.html`	Nextflow autogenerated report
`index/`	BWA-generated indices saved with optional parameter `keep_reference`
`multiqc/`	MultiQC report summarizing quality metrics across all samples in the run
`${sampleID}/stat/`	QC and Picard metrics files for each sample
`trace.txt`	Nextflow trace of processes

Last update: January 12, 2025