Module-RNA_TMP



DOC_ID : T15-0002

RNA_TMP module : 

DOC_ID : M30-3000
Editor : Mira
Reviewer : Angela

Function :

   Spliced Transcripts Alignment to a Reference (STAR) is a fast RNA-seq read mapper, with support for splice-junction and fusion read detection.

  STAR is shown to have high accuracy and outperforms other aligners by more than a factor of 50 in mapping speed, but it is memory intensive. The algorithm achieves this highly efficient mapping by performing a two-step process :

  1. Seed searching
  2. Clustering, stitching, and scoring
  • Seed searching
    For every read that STAR aligns, STAR will search for the longest sequence that exactly matches one or more locations on the reference genome. These longest matching sequences are called the Maximal Mappable Prefixes (MMPs) :    
    The different parts of the read that are mapped separately are called ‘seeds’. So the first MMP that is mapped to the genome is called seed1.STAR will then search again for only the unmapped portion of the read to find the next longest sequence that exactly matches the reference genome, or the next MMP, which will be seed
            
    This sequential searching of only the unmapped portions of reads underlies the efficiency of the STAR algorithm. STAR uses an uncompressed suffix array (SA) to efficiently search for the MMPs, this allows for quick searching against even the largest reference genomes. Other slower aligners use algorithms that often search for the entire read sequence before splitting reads and performing iterative rounds of mapping.
  • Clustering, stitching, and scoringThe separate seeds are stitched together to create a complete read by first clustering the seeds together based on proximity to a set of ‘anchor’ seeds, or seeds that are not multi-mapping.Then the seeds are stitched together based on the best alignment for the read (scoring based on mismatches, indels, gaps, etc.).                                                            

  Ref : https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html

Installation :

All software are included in GA environment

Note :

►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。

►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G註1

►欲了解模組使用的方式,請執行模組的 -h 指令
 

#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者

Description :

Tested environmentGApp0.0.0.2
software versionstar=2.7.5a=0
Usage(Slurm)Command in Slurm (Taiwania III)
Paired-End
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample1,sampleName=Sample1,gtfName=refs/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf,refFolder=refs/Homo_sapiens/GRCh38en104Star2.7.5a/' modules/RNA_TMP.sh
Single-End
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample1,sampleName=Sample1,gtfName=refs/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf,refFolder=refs/Homo_sapiens/GRCh38en104Star2.7.5a/' modules/RNA_TMP.sh
Usage(Linux console)Command in linux console
Paired-End
bash modules/RNA_TMP.sh -p $(pwd) -t PE -i Sample1 -s Sample1 -g refs/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/
Single-End
bash modules/RNA_TMP.sh -p $(pwd) -t SE -i Sample1 -s Sample1 -g refs/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf -r refs/Homo_sapiens/GRCh38en104Star2.7.5a/
#For Slurm operation , please refer to “Basic operation of Taiwania III

Usage :

The following explains the usage of module parameters :

Parameter DescriptionRemark
RNA_TMP.shModule of genome mapping分析的模組需存放在[modules]資料夾中
projDir分析專案的資料夾路徑(專案資料夾結構說明Script 需在分析專案的資料夾執行, $(pwd) 會傳回使用者現在所在的路徑
seqTypeSequence type :paired-end => PE
single-end => SE
例如 :1. seqType=PE 則代表此次欲進行分析的樣本是paired-end, 即會看到*_R1.fastq.gz & *_R2.fastq.gz2. seqType=SE 則代表此次欲進行分析的樣本是single-end, 即只會看到*_R1.fastq.gz
inFile欲執行分析的檔案名稱資料格式 : *.fastq 或 *.fastq.gz資料路徑 : processed/例如 : inFile = Sample1 會在 processed/ 讀取 :1. paired-end : Sample1_R1.fastq.gz & Sample1_R2.fastq.gz2. single-end : Sample1_R1.fastq.gz
sampleName輸出的bam檔案名稱資料格式 : *.bam 資料路徑 : processed/, QC/例如: sampleName = Sample1 會在 processed/生成Sample1Aligned.sortedByCoord.out.bam
Sample1Aligned.toTranscriptome.out.bam會在 QC/生成Sample1Log.final.out
refFolderThe name of the genome folder to be compared :File Path : refs/例如: refFolder = refs/Homo_sapiens/GRCh38en104Star2.7.5a/ 會在 refs/Homo_sapiens/GRCh38en104Star2.7.5a/ 讀取資料
gtfNameThe name of the gtf file to be compared :File  format :  *.gtf
File Path : refs/
例如: gtfName = refs/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf 會在 refs/Homo_sapiens/GRCh38en104Star2.7.5a/ 讀取 Homo_sapiens.GRCh38.104.chr.gtf

Leave a comment