Genome Reference:GRCm38en101Star2.7.5a


Doc_ID: A08-0001GRCm38en101Star275a

Editor: Mira

Reviewer : Anita

Description

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.

Build RSEM references using RefSeq, Ensembl, or GENCODE annotations
RefSeq and Ensembl are two frequently used annotations. For human and mouse, GENCODE annotaions are also available. In this, we show how to build RSEM references using Ensembl annotation. Note that it is important to pair the genome with the annotation file for each annotation source. 

Source

Download

File size : 1. Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 769MB (解壓縮後2.71GB)

                2. Mus_musculus.GRCh38.101.chr.gtf.gz 31.9MB (解壓縮後1.01GB)

Genome assemble version:GRCm38 Release 101

Detail information:
使用STAR version 2.7.5a及RSEM version 1.3.3 來製作index,請確認已完成GApp standard analysis environment的安裝

#Activate standard analysis environment

conda activate GApp

#移動到工作區(依user設定的工作區而定)

cd /User_Work

#創建放置fasta及GTF的資料夾
mkdir -p refs/Mus_musculus/GRCm38en101Star2.7.5a/

#移動至資料夾

cd refs/Mus_musculus/GRCm38en101Star2.7.5a/

#下載fasta
wget ftp://ftp.ensembl.org/pub/release-101/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

#解壓縮檔案
gunzip  Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

#下載GTF
wget ftp://ftp.ensembl.org/pub/release-101/gtf/mus_musculus/Mus_musculus.GRCm38.101.chr.gtf.gz
 

#解壓縮檔案
gunzip Mus_musculus.GRCm38.101.chr.gtf.gz

#用RSEM及STAR prepare index

rsem-prepare-reference \
–gtf /User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/Mus_musculus.GRCm38.101.chr.gtf \
–star \
-p 20 \
/User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/Mus_musculus.GRCm38.dna.primary_assembly.fa\
/User_Work/refs/Mus_musculus/GRCm38en101Star2.7.5a/GRCm38.101.genome

Statistics

Summary

AssemblyGRCm38.p6 (Genome Reference Consortium Mouse Reference 38), INSDC Assembly GCA_000001635.8, Jan 2012
Base Pairs3,486,944,526
Golden Path Length2,730,871,774
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedJan 2012
Genebuild releasedJul 2012
Genebuild last updated/patchedFeb 2020
Database version101.38
Gencode versionGENCODE M25

Gene counts (Primary assembly)

Coding genes22,519 (incl 273 readthrough)
Non coding genes16,074
Small non coding genes5,531
Long non coding genes9,981 (incl 75 readthrough)
Misc non coding genes562
Pseudogenes13,656 (incl 4 readthrough)
Gene transcripts142,699

Gene counts (Alternative sequence)

Coding genes351 (incl 5 readthrough)
Non coding genes227
Small non coding genes110
Long non coding genes111 (incl 1 readthrough)
Misc non coding genes6
Pseudogenes201
Gene transcripts2,027

Other

Genscan gene predictions57,381
Short Variants83,761,978
Structural variants791,878

Index and modification

Index

Index softwareFile list 
STAR
rsem-prepare-reference
chrLength.txt
chrName.txt
chrNameLength.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
GRCm38.101.genome.chrlist
GRCm38.101.genome.grp
GRCm38.101.genome.idx.fa
GRCm38.101.genome.n2g.idx.fa
GRCm38.101.genome.seq
GRCm38.101.genome.ti
GRCm38.101.genome.transcripts.fa
 
   

Bundle files

TypeFile list
dbSNP 

Leave a comment