Genome Reference:GRCh38en104Star2.7.5a


DOC_ID : T11-0001
 

Doc_ID: A08-0001GRCh38en104Star275a
Editor: Mira
Reviewer: hsujc

Description

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.

Build RSEM references using RefSeq, Ensembl, or GENCODE annotations
RefSeq and Ensembl are two frequently used annotations. For human and mouse, GENCODE annotaions are also available. Here, we show how to build RSEM references using Ensembl annotation. It is important to use every genome version with it’s compatible gtf file. 

Source

Genome assemble version : GRCh38 Release 104

Detail information :

使用STAR version 2.7.5a及RSEM version 1.3.3 來製作index,請確認已完成GApp standard analysis environment的安裝

#Activate standard analysis environment

conda activate GApp

#移動到Ref資料夾

cd ~/GA_bundle/Ref/

#創建放置fasta及GTF的資料夾
mkdir -p Homo_sapiens/GRCh38en104Star2.7.5a/

#移動至資料夾

cd Homo_sapiens/GRCh38en104Star2.7.5a/

#下載fasta
wget ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz 

#解壓縮檔案
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

#下載GTF
wget ftp://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz

#解壓縮檔案
gunzip Homo_sapiens.GRCh38.104.gtf.gz

#用RSEM及STAR prepare index
rsem-prepare-reference \
–gtf ~/GA_bundle/Ref/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.104.gtf \
–star \
-p 20 \
~/GA_bundle/Ref/Homo_sapiens/GRCh38en104Star2.7.5a/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
~/GA_bundle/Ref/Homo_sapiens/GRCh38en104Star2.7.5a/GRCh38.104.genome

使用gtfToGenePred及samtools version 1.10製作用來提供基因位置及rRNA位置的訊息的ref_flat file及 ribosomal intervals file

#利用gtfToGenePred將gtf轉為ref_flat檔

gtfToGenePred -genePredExt -geneNameAsName2 -ignoreGroupsWithoutExons Homo_sapiens.GRCh38.104.gtf /dev/stdout |awk ‘BEGIN { OFS=”\t”} {print $12, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10}’ > Homo_sapiens.GRCh38.104.gtf.refflat

#由reference genome抽取genome大小資訊

#Step1
samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa

#Step2
cut -f1,2 Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai > sizes.genome

#Step3
perl -lane ‘print “\@SQ\tSN:$F[0]\tLN:$F[1]\tAS:GRCh38″‘ sizes.genome |grep -v _ >> Homo_sapiens.GRCh38.104.gtf.rRNA.refflat

#合併由GTF抽出的rRNA資訊

grep ‘gene_biotype “rRNA”‘ Homo_sapiens.GRCh38.104.gtf |awk ‘$3 == “gene”‘ |cut -f1,4,5,7,9 |perl -lane ‘/gene_id “([^”]+)”/ or die “no gene_id on $.”;print join “\t”, (@F[0,1,2,3], $1)’ |sort -k1V -k2n -k3n >> Homo_sapiens.GRCh38.104.gtf.rRNA.refflat

Statistics

Summary

AssemblyGRCh38.p13 (Genome Reference Consortium Human Build 38), INSDC Assembly GCA_000001405.28, Dec 2013
Base Pairs3,096,649,726
Golden Path Length3,096,649,726
Assembly providerGenome Reference Consortium
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedJan 2014
Genebuild releasedJul 2014
Genebuild last updated/patchedMar 2021
Database version104.38
Gencode versionGENCODE 38

Gene counts (Primary assembly)

Coding genes20,442 (incl 644 readthrough)
Non coding genes23,982
 Small non coding genes4,865
 Long non coding genes16,896 (incl 307 readthrough)
 Misc non coding genes2,221
Pseudogenes15,228 (incl 6 readthrough)
Gene transcripts237,081

Gene counts (Alternative sequence)

Coding genes3,053 (incl 26 readthrough)
Non coding genes1,555
 Small non coding genes297
 Long non coding genes1,071 (incl 25 readthrough)
 Misc non coding genes187
Pseudogenes1,799
Gene transcripts21,638

Other

Genscan gene predictions51,756
Short Variants714,562,852
Structural variants6,768,792

Index and modification

Index

Index softwareFile list
STAR
rsem-prepare-reference
chrLength.txt
chrName.txt
chrNameLength.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
GRCh38.104.genome.chrlist
GRCh38.104.genome.grp
GRCh38.104.genome.idx.fa
GRCh38.104.genome.n2g.idx.fa
GRCh38.104.genome.seq
GRCh38.104.genome.ti
GRCh38.104.genome.transcripts.fa
Log.out
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
gtfToGenePred
samtools
perl
Homo_sapiens.GRCh38.104.gtf.refflat
Homo_sapiens.GRCh38.104.gtf.rRNA.refflat
Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai
sizes.genome

Bundle files

TypeFile list
NA 

Leave a comment