Skip to content

RNA-seq#

Summary of methods#

  • Index generated with RSEM
  • Filter and trim reads with fastp
  • Strand determination with how are we stranded here
  • Quality checking of reads with FastQC
  • Reads mapped to reference and quantified with RSEM
  • Alignment post-processing and QC with Picard
  • Aggregation of QC tables using MultiQC

Parameters#

Required parameters:#

  • --pubdir

    • Default: /<PATH>
    • Description: The directory that the saved outputs will be stored.
  • -w

    • Default: /<PATH>
    • The directory that all intermediary files and nextflow processes utilize. This directory can become quite large. This should be a location on scratch space or other directory with ample storage.
  • --sample_folder

    • Default: /<PATH>
    • The path to the folder that contains all the samples to be run by the pipeline. The files in this path can also be symbolic links.
  • --fasta

    • Default: /<PATH>
    • Path to the reference genome in FASTA format
  • --gtf

    • Default: /<PATH>
    • Path to the annotation for the reference genome in GTF format
  • --read_type

    • Default: PE
    • Comment: Type of reads: paired end (PE) or single end (SE).

Optional parameters:#

General optional parameters:#

  • --rsem_aligner

    • Default: bowtie2
    • Options: bowtie2 or star
  • --strandedness

    • Default: NA
    • Supported options are reverse_stranded, forward_stranded, non_stranded
    • Strandedness of libraries is determined automatically by how are we stranded here, and this parameter is used as a fallback if the automatic strand determination fails
  • --concat_lanes

    • Default: false
    • Options: false and true. Default: false. If this boolean is specified, FASTQ files will be concatenated by sample. Used in cases where samples are divided across individual sequencing lanes.
  • --rsem_index

    • Default: /<PATH>
    • Enables the use of indices that were previously generated with this pipeline, by providing the path to that directory. Will generate an error if any files are missing. If using this parameter, do not use --fasta and --gtf
  • --extension

    • Default: .fastq.gz
    • Expected file extension for input read files, modify if files are not compressed or have a different form (e.g. ".fastq" or ".fq.gz")
  • --pattern

    • Default: *_R{1,2}*
    • Expected R1/R2 matching pattern for paired-end read files in the --sample_folder path. In concert with --extension the default values will match files such as sampleID.L001.R1.fastq.gz and sampleID.L001.R2.fastq.gz
  • keep_intermediate

    • Default: false
    • If true workflow will output intermediate alignment files (unsorted BAMs, etc.)
  • keep_reference

    • Default: false
    • If true workflow will save a copy of the RSEM indices to the output directory

fastp filtering parameters:#

  • --quality_phred

    • Default: 15
    • Quality score threshold.
  • --unqualified_perc

    • Default: 40
    • Percent threshold of unqualified bases to pass reads.

bowtie2 parameters:#

  • --seed_length
    • Default: 25
    • From RSEM manual: If RSEM runs Bowtie, it uses this value for Bowtie's seed length parameter. Any read with its or at least one of its mates' (for paired-end reads) length less than this value will be ignored.

Pipeline Default Outputs#

Naming Convention Description
rsem.merged.gene_counts.tsv RSEM-generated gene-level raw counts merged across all samples
rsem.merged.gene_tpm.tsv RSEM-generated gene-level TPM counts merged across all samples
rsem.merged.isoform_counts.tsv RSEM-generated isoform-level raw counts merged across all samples
rsem.merged.isoform_tpm.tsv RSEM-generated isoform-level TPM counts merged across all samples
rnaseq_report.html Nextflow autogenerated report
index/ RSEM-generated indices saved with optional parameter keep_reference
multiqc/ MultiQC report summarizing quality metrics across all samples in the run
${sampleID}/bam/ RSEM-generated alignments of reads to the reference genome and transcriptome
${sampleID}/stat/ QC, strand, and Picard metrics files for each sample
${sampleID}/${sampleID}.genes.results RSEM-generated quantification of gene-level count abundances
${sampleID}/${sampleID}.isoforms.results RSEM-generated quantification of transcript-level count abundances
trace.txt Nextflow trace of processes

Last update: January 11, 2025