Module-AN_SnpEff



DOC_ID : T15-0002
 

AN_SnpEff module : 

DOC_ID : M53-3000
Editor : Anita
Reviewer :Angela

Function :

  SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes).

  SnpEff Summary :

  A typical SnpEff use case would be:

  • Input: The inputs are predicted variants (SNPs, insertions, deletions and MNPs). The input file is usually obtained as a result of a sequencing experiment, and it is usually in variant call format (VCF).
  • Output: SnpEff analyzes the input variants. It annotates the variants and calculates the effects they produce on known genes (e.g. amino acid changes). A list of effects and annotations that SnpEff can calculate can be found here.

SnpEff Features :

FeatureComment
Local installSnpEff can be installed in your local computer or servers.
Local installations are preferred for processing genomic data.
As opposed to remote web-based services, running a program locally has many advantages:There no need to upload huge genomic dataset.Processing doesn’t depend on availability or processing capacity of remote servers.Service continuity: no need to worry if a remote service will be maintained in the future.Security and confidentiality issues of uploading data to third party servers are not a problem.Avoid legal problems of processing clinical data on “outside” servers.
Multi platformSnpEff is written in Java. It runs on Unix / Linux, OS.X and Windows.
Simple installationInstallation is as simple as downloading a ZIP file and double clicking on it.
GenomesHuman genome, as well as all model organisms are supported.
Over 2,500 genomes are supported, which includes most mammalian, plant, bacterial and fungal genomes with published genomic data.
SpeedSnpEff is really fast. It can annotate up to 1,000,000 variants per minute.
GATK&Galaxy integrationSnpEff can be easily integrated with GATK and Galaxy pipelines.
Input and Output formatsSnpEff accepts input files in the following format:VCF format, which is the de-facto standard for sequencing variants.BED format: To annotate enrichment experiments (e.g. ChIP-Seq peaks) or other genomic data.
Variants supportedSnpEff can annotate SNPs, MNPs, insertions and deletions. Support for mixed variants and structural variants is available (although sometimes limited).
Effect supportedMany effects are calculated: such as SYNONYMOUS_CODING, NON_SYNONYMOUS_CODING, FRAME_SHIFT, STOP_GAINED just to name a few.
Public databasesSnpEff can annotate using publicly available data from well known databases, for instance:ENCODE datasets are supported by SnpEff (by means of BigWig files provided by ENCODE project).Epigenome Roadmap provides data-sets that can be used with SnpEff.TFBS Transcription factor binding site predictions can be annotated. Motif data used in this annotations is generates by Jaspar and ENSEBML projectsNextProt database can be used to annotate protein domains as well as important functional sites in a protein (e.g. phosphorilation site)
Common variants (dbSnp)Annotating “common” variants from dbSnp and 1,000 Genomes can be easily done (see SnpSift annotate).

Databases :

In order to produce the annotations, SnpEff requires a database. Currently, there are pre-built database for over 20,000 reference genomes. Which databases are supported? You can find out all the supported databases by running the databases command :

$ java -jar snpEff.jar databases | less

Ref : https://pcingola.github.io/SnpEff/se_introduction/

Installation :

All software are included in GA environment

Note :

►執行分析前請先利用CreateProject.2.0.sh創建一個專案資料夾,請參閱Project standard folder structure文件。

►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs24G註1

►欲了解模組使用的方式,請執行模組的 -h 指令
 

#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者

Description :

Tested environmentGApp0.0.0.2
Software versionsnpEff=/opt/ohpc/Taiwania3/pkg/biology/SnpEff/snpEff_v5.0e/snpEff.jar (SnpEff 5.0e 2021-03-09)
Usage(Slurm)Command in Slurm (Taiwania III)
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,refGenome=hg38,sampleName=Sample01_combined.vcf,output=Sample01' modules/AN_SnpEff.sh
Usage(Linux console)Command in linux console
bash modules/AN_SnpEff.sh -p $(pwd) -s Sample01_combined.vcf -o Sample01 -r hg38


Usage :

The following explains the usage of module parameters :

Parameter DescriptionRemark
AN_SnpEff.shmodule of annotation gene marker and phenotype分析的模組需存放在[modules]資料夾中
projDir分析專案的資料夾路徑(專案資料夾結構說明Script需在分析專案的資料夾執行, $(pwd) 會傳回使用者現在所在的路徑
sampleName輸入的檔案名稱資料格式 : *.vcf資料路徑 : processed/例如 : sampleName=Sample01_combined.vcf 會讀取放在processed/資料夾裡的Sample01_combined.vcf 檔案
output輸出的檔案名稱資料格式 : *.ann.vcf / *.txt / *.html資料路徑 : report/, QC/例如 : output=Sample01 會在report/資料夾生成 Sample01.ann.vcf檔案, 在QC/資料夾會生成Sample01_summary.genes.txt 及 Sample01_summary.html
refGenome在執行分析時選用的基因參考資料庫目前支援hg38, hg19及mm10基因資料庫

Leave a comment