DOC_ID : T15-0002
RNA_QUA module :
DOC_ID : M31-3000
Editor : Anita
Reviewer : Angela
Function :
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate.
There are many usages of RSEM, here we can use it to calculate Expression Values.To calculate expression values, module will run the rsem-calculate-expression program. In this module will set some parameters :
- -p/–num-threads : Number of threads to use. Both Bowtie/Bowtie2, expression estimation and ‘samtools sort’ will use this many threads.
- –paired-end : Input reads are paired-end reads.
- –bam : Input file is in BAM format.
In its default mode, this program aligns input reads against a reference transcriptome with Bowtie and calculates expression values using the alignments. RSEM assumes the data are single-end reads with quality scores, unless the ‘–paired-end’ or ‘–no-qualities’ options are specified.
After executing the rsem-calculate-expression program, the file will be output :
- sample_name.isoforms.results : File containing isoform level expression estimates.
- sample_name.genes.results : File containing gene level expression estimates.
- sample_name.stat : This is a folder instead of a file. All model related statistics are stored in this folder. Use ‘rsem-plot-model’ can generate plots using this folder.
- sample_name.alleles.results : Only generated when the RSEM references are built with allele-specific transcripts.
- sample_name.transcript.bam, sample_name.transcript.sorted.bam and sample_name.transcript.sorted.bam.bai : Only generated when –no-bam-output is not specified.
- sample_name.time : Only generated when –time is specified.
Ref : http://deweylab.github.io/RSEM/README.html, http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html
Installation :
All software are included in GA environment.
Note :
►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。
►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G註1。
►欲了解模組使用的方式,請執行模組的 -h 指令
#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者
Description :
Tested environment | GApp0.0.0.2 |
software version | rsem=1.3.3=pl526ha52163a_0 |
Usage(Slurm) | Command in Slurm (Taiwania III) Paired-End sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,sampleName=Sample1,refGenome=refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome' modules/RNA_QUA.sh Single-End sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,sampleName=Sample1,refGenome=refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome' modules/RNA_QUA.sh |
Usage(Linux console) | Command in linux console Paired-End bash modules/RNA_QUA.sh -p $(pwd) -t PE -s Sample1 -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome Single-End bash modules/RNA_QUA.sh -p $(pwd) -t SE -s Sample1 -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome |
#For Slurm operation , please refer to “Basic operation of Taiwania III“ |
Usage :
The following explains the usage of module parameters :
Parameter | Description | Remark |
RNA_QUA.sh | Module to estimate gene expression value | 分析的模組需存放在[modules]資料夾中 |
projDir | 分析專案的資料夾路徑(專案資料夾結構說明) | Script 需在分析專案的資料夾執行, $(pwd) 會傳回使用者現在所在的路徑 |
seqType | Sequence type (same with RNA_TMP step) | 1.當定序方式為Single-End,給予SE。 2.當定序方式為Paired-End,給予PE。 |
sampleName | 欲執行分析的檔案名稱 :資料格式 : *Aligned.toTranscriptome.out.bam 資料路徑 : processed/ | 例如: sampleName = Sample1 會在 processed/ 讀取 Sample1Aligned.toTranscriptome.out.bam |
輸出的檔案名稱 :資料格式 : *.bam and *.results 資料路徑 : processed/ | 例如: sampleName = Sample1 會在 processed/生成Sample1.transcript.bam Sample1.genes.results Sample1.isoforms.results | |
refGenome | 欲進行比對的genome檔案名稱及路徑 :資料格式 : *.genome.*資料路徑 : refs/ | 例如: refGenome = refs/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome 會在 ref/ 讀取 GRCh38.104.genome.* |