Doc_ID: A08-0001GRCm38en101Star275a
Editor: Mira
Reviewer : Anita
Description
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.
Build RSEM references using RefSeq, Ensembl, or GENCODE annotations
RefSeq and Ensembl are two frequently used annotations. For human and mouse, GENCODE annotaions are also available. In this, we show how to build RSEM references using Ensembl annotation. Note that it is important to pair the genome with the annotation file for each annotation source.
Source
Download
- URL:
- DNA Sequrnce:ftp://ftp.ensembl.org/pub/release-101/fasta/mus_musculus/dna/=>點選”Mus_musculus.GRCm38.dna.primary_assembly.fa.gz”,進行下載
- GTF:ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/=>點選”Mus_musculus.GRCm38.101.chr.gtf.gz”,進行下載
File size : 1. Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 769MB (解壓縮後2.71GB)
2. Mus_musculus.GRCh38.101.chr.gtf.gz 31.9MB (解壓縮後1.01GB)
Genome assemble version:GRCm38 Release 101
Detail information:
使用STAR version 2.7.5a及RSEM version 1.3.3 來製作index,請確認已完成GApp standard analysis environment的安裝
#Activate standard analysis environment
conda activate GApp
#移動到工作區(依user設定的工作區而定)
cd /User_Work
#創建放置fasta及GTF的資料夾
mkdir -p refs/Mus_musculus/GRCm38en101Star2.7.5a/
#移動至資料夾
cd refs/Mus_musculus/GRCm38en101Star2.7.5a/
#下載fasta
wget ftp://ftp.ensembl.org/pub/release-101/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
#解壓縮檔案
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
#下載GTF
wget ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/Mus_musculus.GRCm38.101.chr.gtf.gz
#解壓縮檔案
gunzip Mus_musculus.GRCm38.101.chr.gtf.gz
#用RSEM及STAR prepare index
rsem-prepare-reference \
–gtf /User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/Mus_musculus.GRCm38.101.chr.gtf \
–star \
-p 20 \
/User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/Mus_musculus.GRCm38.dna.primary_assembly.fa\
/User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/GRCm38.101.genome
Statistics
Summary
Assembly | GRCm38.p6 (Genome Reference Consortium Mouse Reference 38), INSDC Assembly GCA_000001635.8, Jan 2012 |
Base Pairs | 3,486,944,526 |
Golden Path Length | 2,730,871,774 |
Annotation provider | Ensembl |
Annotation method | Full genebuild |
Genebuild started | Jan 2012 |
Genebuild released | Jul 2012 |
Genebuild last updated/patched | Feb 2020 |
Database version | 101.38 |
Gencode version | GENCODE M25 |
Gene counts (Primary assembly)
Coding genes | 22,519 (incl 273 readthrough) |
Non coding genes | 16,074 |
Small non coding genes | 5,531 |
Long non coding genes | 9,981 (incl 75 readthrough) |
Misc non coding genes | 562 |
Pseudogenes | 13,656 (incl 4 readthrough) |
Gene transcripts | 142,699 |
Gene counts (Alternative sequence)
Coding genes | 351 (incl 5 readthrough) |
Non coding genes | 227 |
Small non coding genes | 110 |
Long non coding genes | 111 (incl 1 readthrough) |
Misc non coding genes | 6 |
Pseudogenes | 201 |
Gene transcripts | 2,027 |
Other
Genscan gene predictions | 57,381 |
Short Variants | 83,761,978 |
Structural variants | 791,878 |
Index and modification
Index
Index software | File list | |
STAR rsem-prepare-reference | chrLength.txt chrName.txt chrNameLength.txt chrStart.txt exonGeTrInfo.tab exonInfo.tab geneInfo.tab Genome genomeParameters.txt SAindex sjdbInfo.txt sjdbList.fromGTF.out.tab sjdbList.out.tab transcriptInfo.tab GRCm38.101.genome.chrlist GRCm38.101.genome.grp GRCm38.101.genome.idx.fa GRCm38.101.genome.n2g.idx.fa GRCm38.101.genome.seq GRCm38.101.genome.ti GRCm38.101.genome.transcripts.fa | |
Bundle files
Type | File list |
dbSNP |