Module-RNA_QUA


DOC_ID : T15-0002

RNA_QUA module : 

DOC_ID : M31-3000
Editor : Anita
Reviewer : Angela

Function :

    RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate.

    There are many usages of RSEM, here we can use it to calculate Expression Values.To calculate expression values, module will run the rsem-calculate-expression program. In this module will set some parameters :

  • -p/–num-threads : Number of threads to use. Both Bowtie/Bowtie2, expression estimation and ‘samtools sort’ will use this many threads.
  • –paired-end : Input reads are paired-end reads.
  • –bam : Input file is in BAM format.

In its default mode, this program aligns input reads against a reference transcriptome with Bowtie and calculates expression values using the alignments. RSEM assumes the data are single-end reads with quality scores, unless the ‘–paired-end’ or ‘–no-qualities’ options are specified. 

After executing the rsem-calculate-expression program, the file will be output : 

  • sample_name.isoforms.results : File containing isoform level expression estimates. 
  • sample_name.genes.results : File containing gene level expression estimates.
  • sample_name.stat : This is a folder instead of a file. All model related statistics are stored in this folder. Use ‘rsem-plot-model’ can generate plots using this folder.
  • sample_name.alleles.results : Only generated when the RSEM references are built with allele-specific transcripts.
  • sample_name.transcript.bam, sample_name.transcript.sorted.bam and sample_name.transcript.sorted.bam.bai : Only generated when –no-bam-output is not specified.
  • sample_name.time : Only generated when –time is specified.

Ref : http://deweylab.github.io/RSEM/README.htmlhttp://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html

Installation :

All software are included in GA environment

Note :

►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。

►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G註1

►欲了解模組使用的方式,請執行模組的 -h 指令
 

#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者

Description :

Tested environmentGApp0.0.0.2
software versionrsem=1.3.3=pl526ha52163a_0
Usage(Slurm) Command in Slurm (Taiwania III)
Paired-End
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,sampleName=Sample1,refGenome=refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome' modules/RNA_QUA.sh
Single-End
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,sampleName=Sample1,refGenome=refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome' modules/RNA_QUA.sh
Usage(Linux console)Command in linux console
Paired-End
bash modules/RNA_QUA.sh -p $(pwd) -t PE -s Sample1 -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome
Single-End
bash modules/RNA_QUA.sh -p $(pwd) -t SE -s Sample1 -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome
#For Slurm operation , please refer to “Basic operation of Taiwania III

Usage :

The following explains the usage of module parameters :

Parameter DescriptionRemark
RNA_QUA.shModule to estimate gene expression value分析的模組需存放在[modules]資料夾中
projDir分析專案的資料夾路徑(專案資料夾結構說明Script 需在分析專案的資料夾執行, $(pwd) 會傳回使用者現在所在的路徑
seqTypeSequence type (same with RNA_TMP step) 1.當定序方式為Single-End,給予SE
2.當定序方式為Paired-End,給予PE
sampleName欲執行分析的檔案名稱 :資料格式 :  *Aligned.toTranscriptome.out.bam
資料路徑 : processed/
例如: sampleName = Sample1 會在 processed/ 讀取 
Sample1Aligned.toTranscriptome.out.bam
 
輸出的檔案名稱 :資料格式 :  *.bam and *.results
資料路徑 : processed/
例如: sampleName = Sample1 會在 processed/生成Sample1.transcript.bam
Sample1.genes.results
Sample1.isoforms.results
refGenome欲進行比對的genome檔案名稱及路徑 :資料格式 : *.genome.*資料路徑 : refs/例如: refGenome = refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome 會在 ref/ 讀取 GRCh38.104.genome.*

Leave a comment