Whole-genome Sequencing Data Variant Calling#
Summary of methods#
- Index generated with BWA
- Filter and trim reads with fastp
- Strand determination with how are we stranded here
- Quality checking of reads with FastQC
- Reads mapped to reference BWA MEM
- Alignment post-processing and QC with Picard
- Variant calling with bcftools
- Variant filtering with GATK
- Aggregation of QC tables using MultiQC
Parameters#
Required parameters:#
-
--pubdir- Default:
/<PATH> - Description: The directory that the saved outputs will be stored.
- Default:
-
-w- Default:
/<PATH> - The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on scratch space or other directory with ample storage.
- Default:
-
--sample_folder- Default:
/<PATH> - The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
- Default:
-
--fasta- Default:
/<PATH> - Path to the reference genome in FASTA format
- Default:
-
--read_type- Default:
PE - Comment: Type of reads: paired end (PE) or single end (SE).
- Default:
Optional parameters:#
General optional parameters:#
-
--concat_lanes- Default:
false - Options:
falseandtrue. Default:false. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
- Default:
-
--extension- Default:
.fastq.gz - Expected file extension for input read files, modify if files are not compressed or have a different form (e.g. ".fastq" or ".fq.gz")
- Default:
-
--pattern- Default:
*_R{1,2}* - Expected R1/R2 matching pattern for paired-end read files in the
--sample_folderpath. In concert with--extensionthe default values will match files such assampleID.L001.R1.fastq.gzandsampleID.L001.R2.fastq.gz
- Default:
-
keep_intermediate- Default:
false - If
trueworkflow will output intermediate alignment files (unsorted BAMs, etc.)
- Default:
-
keep_reference- Default:
false - If true workflow will save a copy of the BWA indices to the output directory
- Default:
fastp filtering parameters:#
-
--quality_phred- Default: 15
- Quality score threshold.
-
--unqualified_perc- Default: 40
- Percent threshold of unqualified bases to pass reads.
-
--tmpdir- Default: "~/scratch/$USER/tmp/"
BWA mem mapping parameters:#
-
--mismatch_penalty- Default: "-B 8"
-
--bwa_min_score- Default: null
- Threshold of alignment quality score to emit read alignment
Variant calling parameters:#
-
--ploidy- Default: 2
- Options: 1, 2
-
--mpileup_depth- Default: 100
- For a given position, read maximally INT reads per input file
-
--skip_indels- Default: false
- If true, do not call indel sites
-
--variants_only- Default: true
- If true, only emit variant sites (e.g. not homozygous reference)
Variant filtering parameters:#
-
--filter_dp- Default: "DP < 25"
- Sites with depth lower than this value will have a filter flag set as "LowCoverage"
-
--filter_very_low_qual- Default: "QUAL < 30.0"
- Sites with a quality score less than this value will have a filter flag set as "VeryLowQual"
-
--filter_low_qual- Default: "QUAL > 30.0 && QUAL < 50.0"
- Sites with a quality score less than this value will have a filter flag set as "LowQual"
-
--filter_qd- Default: "QD < 1.5"
- Sites with a depth-normalized quality score less than this value will have a filter flag set as "LowQD"
-
--filter_fs- Default: "FS > 60.0"
- Sites with a strand bias greater than this value will have a filter flag set as "StrandBias"
Pipeline Default Outputs#
| Naming Convention | Description |
|---|---|
wgs_cohort_variants_bcftools_filtered.vcf |
Soft-filtered VCF including all samples |
wgs_cohort_variants_bcftools_filtered_genotype_table.tsv |
RSEM-generated gene-level TPM counts merged across all samples |
wgs_cohort_variants_bcftools_filtered_site_table.tsv |
RSEM-generated isoform-level raw counts merged across all samples |
wgs_report.html |
Nextflow autogenerated report |
index/ |
BWA-generated indices saved with optional parameter keep_reference |
multiqc/ |
MultiQC report summarizing quality metrics across all samples in the run |
${sampleID}/stat/ |
QC and Picard metrics files for each sample |
trace.txt |
Nextflow trace of processes |
Last update:
January 12, 2025