RNA-seq#

Summary of methods#

Index generated with RSEM
Filter and trim reads with fastp
Strand determination with how are we stranded here
Quality checking of reads with FastQC
Reads mapped to reference and quantified with RSEM
Alignment post-processing and QC with Picard
Aggregation of QC tables using MultiQC

Parameters#

Required parameters:#

--pubdir
- Default: /<PATH>
- Description: The directory that the saved outputs will be stored.
-w
- Default: /<PATH>
- The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on scratch space or other directory with ample storage.
--sample_folder
- Default: /<PATH>
- The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
--fasta
- Default: /<PATH>
- Path to the reference genome in FASTA format
--gtf
- Default: /<PATH>
- Path to the annotation for the reference genome in GTF format
--read_type
- Default: PE
- Comment: Type of reads: paired end (PE) or single end (SE).

Optional parameters:#

General optional parameters:#

--rsem_aligner
- Default: bowtie2
- Options: bowtie2 or star
--strandedness
- Default: NA
- Supported options are reverse_stranded, forward_stranded, non_stranded
- Strandedness of libraries is determined automatically by how are we stranded here, and this parameter is used as a fallback if the automatic strand determination fails
--concat_lanes
- Default: false
- Options: false and true. Default: false. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
--rsem_index
- Default: /<PATH>
- Enables the use of indices that were previously generated with this pipeline, by providing the path to that directory. Will generate an error if any files are missing. If using this parameter, do not use --fasta and --gtf
--extension
- Default: .fastq.gz
- Expected file extension for input read files, modify if files are not compressed or have a different form (e.g. ".fastq" or ".fq.gz")
--pattern
- Default: *_R{1,2}*
- Expected R1/R2 matching pattern for paired-end read files in the --sample_folder path. In concert with --extension the default values will match files such as sampleID.L001.R1.fastq.gz and sampleID.L001.R2.fastq.gz
keep_intermediate
- Default: false
- If true workflow will output intermediate alignment files (unsorted BAMs, etc.)
keep_reference
- Default: false
- If true workflow will save a copy of the RSEM indices to the output directory

fastp filtering parameters:#

--quality_phred
- Default: 15
- Quality score threshold.
--unqualified_perc
- Default: 40
- Percent threshold of unqualified bases to pass reads.

bowtie2 parameters:#

--seed_length
- Default: 25
- From RSEM manual: If RSEM runs Bowtie, it uses this value for Bowtie's seed length parameter. Any read with its or at least one of its mates' (for paired-end reads) length less than this value will be ignored.

Pipeline Default Outputs#

Naming Convention	Description
`rsem.merged.gene_counts.tsv`	RSEM-generated gene-level raw counts merged across all samples
`rsem.merged.gene_tpm.tsv`	RSEM-generated gene-level TPM counts merged across all samples
`rsem.merged.isoform_counts.tsv`	RSEM-generated isoform-level raw counts merged across all samples
`rsem.merged.isoform_tpm.tsv`	RSEM-generated isoform-level TPM counts merged across all samples
`rnaseq_report.html`	Nextflow autogenerated report
`index/`	RSEM-generated indices saved with optional parameter `keep_reference`
`multiqc/`	MultiQC report summarizing quality metrics across all samples in the run
`${sampleID}/bam/`	RSEM-generated alignments of reads to the reference genome and transcriptome
`${sampleID}/stat/`	QC, strand, and Picard metrics files for each sample
`${sampleID}/${sampleID}.genes.results`	RSEM-generated quantification of gene-level count abundances
`${sampleID}/${sampleID}.isoforms.results`	RSEM-generated quantification of transcript-level count abundances
`trace.txt`	Nextflow trace of processes

Last update: January 11, 2025