Module-GATK_MU2V



DOC_ID : T15-0002
 

GATK_MU2V module : 

DOC_ID : M09-3000
Editor : Anita
Reviewer :Angela

Function :

This workflow requires a paired BAM files ( tumor and normal sample). the BAM format follow  the GATK Best Practices for data pre-processing  document. The SM parameter should be set as sample name. Module GATK_GMP module is suggested to generate formated BAM files. The module execute 4 step for somatic variants calling:

  1. Call candidate variants :When Mutect2 encounters a candidate region of somatic variation, it  reassembles the reads in candidate region and  generate atificial haplotypes. As HaplotypeCaller algorithm, Mutect2 aligns each read to each haplotype via the Pair-HMM algorithm to obtain a matrix of likelihoods.  It then applies a Bayesian somatic likelihoods model to obtain the log odds for alleles to be somatic variants versus sequencing errors.
    • _bqsr.bam ⇒_mutect2-unfiltered.vcf.gz  + _f1r2.tar.gz
  2. Learn Orientation Bias Artifacts :This tool uses an optional F1R2 counts output of Mutect2 to learn the parameters of a model for orientation bias. It finds prior probabilities of single-stranded substitution errors prior to sequencing for each trinucleotide context. This is extremely important for FFPE tumor samples..
    • _f1r2.tar.gz ⇒ _tumor-artifact-prior.tar.gz
  3. Calculate Contamination :This step emits an estimate of the fraction of reads due to cross-sample contamination for each tumor sample and an estimate of the allelic copy number segmentation of each tumor sample.
    • _bqsr.bam ⇒ normal-pileups.table
    • _bqsr.bam ⇒ tumor-pileups.table
    • normal-pileups.table + tumor-pileups.table ⇒ _tumor-normal-contamination.table + _segments.table
  4. Filter Variants :Mutect2’s somatic likelihoods model assumes that read errors are independent, so that, for example, four reads each with an error probability of 1/1000 yield a log odds of roughly 1000^4 in favor of being a real variant versus a sequencing error. FilterMutectCalls accounts for correlated errors, that is, the possibility that all variant reads at a site were due to some common source of error. It accomplishes this through several hard filters to detect alignment artifacts and probabilistic models for strand and orientation bias artifacts, polymerase slippage artifacts, germline variants, and contamination. Additionally, it learns a Bayesian model for the overall SNV and indel mutation rate and allele fraction spectrum of the tumor to refine the log odds emitted by Mutect2.
    • _mutect2-unfiltered.vcf.gz +  _tumor-normal-contamination.table + _segments.table ⇒ _mutect2_VF.vcf.gz
       
    Ref : https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-

Installation :

All software are included in GA environment

Note :

►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。

►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G註1

►欲了解模組使用的方式,請執行模組的 -h 指令
 

#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者

Description :

Tested environmentGApp0.0.0.2
Software versiongatk4=4.1.8.1
Usage(Slurm)Command in Slurm (Taiwania III)
sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,refGenome=hg38,TestName=Sample02,ControlName=Sample01,OutName=Case01' modules/GATK_MU2V.sh
Usage(Linux console)Command in linux console
bash modules/GATK_MU2V.sh -p $(pwd) -r hg38 -t Sample02 -c Sample01 -o Case01
#For Slurm operation, please refer to “Basic operation of Taiwania III

Usage :

The following explains the usage of module parameters :

Parameter DescriptionRemark
GATK_MU2V.shThe module name of GATK somatic variants calling分析的模組需存放在[modules]資料夾中
projDir分析專案的資料夾路徑(專案資料夾結構說明Script需在分析專案的資料夾執行, $(pwd) 會傳回使用者現在所在的路徑
refGenome在執行分析時選用的基因參考資料庫目前支援GATK-hg38, GATK-b37及GATK-hg19 基因資料庫
ControlName欲執行分析的Normal檔案名稱資料格式 : *_bqsr.bam資料路徑 : processed/例如: ControlName=Sample01 會在 processed/ 讀取Sample01_bqsr.bam 作為對照組
TestName欲執行比對的檔案名稱資料格式 : *_bqsr.bam資料路徑 : processed/例如 : TestName=Sample02 會在 processed/ 讀取Sample02_bqsr.bam 作為測試組
OutName輸出的檔案名稱資料格式 : *_mutect2_VF.vcf.gz資料路徑 : processed/例如 : OutName=Case01 會在 processed/ 生成Case01_mutect2_VF.vcf.gzCase01_segments.table

Leave a comment