High-throughput Genome and Big Data Analysis Core Facility

DOC_ID : T15-0002 QC_Fastp module : DOC_ID : M27-3000Editor : Anita/MiraReviewer : Angela Function : 1.Fastp 軟體功能進行分析質量曲線，基本含量、KMER、Q20 / Q30、GC Ratio、duplication、adapter contents…來比較過濾前與過濾後的品質。過濾掉不良reads（質量太低..）去除品質較差的部分並比Trimmomatic速度更快。選取前端與後端去除的bp長度與切除adapters部分。對質量進行校正對於重疊部分配對。對帶分子標籤（UMI）的數據進行預處理，不管UMI在插入片段還是在index上。產生JSON與HTML格式檔案。 2.Fastp 模組功能可以取得隨機片段進行分析去除品質較差與adapters的部分可以分析 Paired-End/Single-End 兩種格式的DATA Ref:https://github.com/OpenGene/fastp Installation : All software are included in GA environment. Note : ►執行分析前請先利用CreateProject.sh創建一個專案資料夾，請參閱Project standard folder structure文件。 ►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs7G註1。 ►欲了解模組使用的方式，請執行模組的 -h 指令 #註1 : 欲確認使用者身分，請登入國網中心iService後，選取會員中心/計畫管理/我的計畫，若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者 Description : Tested environment GApp0.0.0.2 Software version fastp=0.20.1 Usage(Slurm) Command in Slurm (Taiwania III)Rapid Quality Analysis (partial reads) Paired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,reads_to_process=1000000' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,reads_to_process=1000000' modules/QC_Fastp.sh Analyze quality and read clean-upPaired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,out=cleanup' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,out=cleanup' modules/QC_Fastp.sh Analyze quality and trimming readPaired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,adapter=path/XXXX,trim_front1=number,trim_front2=number,trim_tail1=number,trim_tail2=number,out=trim' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,adapter=path/XXXX,trim_front1=number,trim_tail1=number,out=trim' modules/QC_Fastp.sh Usage(Linux console) Command in linux consoleRapid Quality Analysis (partial reads)Paired-Endbash modules/QC_Fastp.sh -p $(pwd) -t PE -i Sample01 -s Sample01 -R 1000000 Single-Endbash modules/QC_Fastp.sh -p $(pwd) -t SE -i Sample01 -s Sample01 -R 1000000Analyze […]

Module-QC_Fastp

This entry was posted in on January 17, 2022 by angela

DOC_ID : T15-0002 GATK_GMP module : DOC_ID : M05-3000Editor : AnitaReviewer :Angela Function : Map raw reads to the reference genome and create bam file for small indel variant calling or structural variants analysis. This module will remove duplicates and adapters for reducing biases from library preparation. The base quality score will also be recalibrated according to GATK’s algorithm : Map to Reference genome and sorting : The first step is performed per-read group and consists of mapping each individual read pair to the reference genome which is a synthetic single-stranded representation of common genome sequence that is intended to provide a common coordinate framework for all genomic analysis. Convert paired raw read file( […]

A total solution for your genome study

A total solution for your genome study

Module-QC_Fastp

Module-GATK_GMP

Module-QC_dnaBamQC

Module-GATK_VC

Module-GATK_MU2V

Module-GATK_GVCF

Module-AN_SnpEff

Module-AN_SnpSift