Daily Archives: Monday January 17th, 2022


DOC_ID : T15-0002  QC_Fastp module :  DOC_ID : M27-3000Editor : Anita/MiraReviewer : Angela Function : 1.Fastp 軟體功能 進行分析質量曲線,基本含量、KMER、Q20 / Q30、GC Ratio、duplication、adapter contents…來比較過濾前與過濾後的品質。 過濾掉不良reads(質量太低..)去除品質較差的部分並比Trimmomatic速度更快。 選取前端與後端去除的bp長度與切除adapters部分。 對質量進行校正對於重疊部分配對。 對帶分子標籤(UMI)的數據進行預處理,不管UMI在插入片段還是在index上。 產生JSON與HTML格式檔案。 2.Fastp 模組功能 可以取得隨機片段進行分析 去除品質較差與adapters的部分 可以分析 Paired-End/Single-End 兩種格式的DATA Ref:https://github.com/OpenGene/fastp Installation : All software are included in GA environment.  Note : ►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。 ►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs7G註1。 ►欲了解模組使用的方式,請執行模組的 -h 指令  #註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者 Description : Tested environment GApp0.0.0.2 Software version fastp=0.20.1 Usage(Slurm) Command in Slurm (Taiwania III)Rapid Quality Analysis (partial reads) Paired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,reads_to_process=1000000' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,reads_to_process=1000000' modules/QC_Fastp.sh Analyze quality and read clean-upPaired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,out=cleanup' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,out=cleanup' modules/QC_Fastp.sh Analyze quality and trimming readPaired-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=PE,inFile=Sample01,sampleName=Sample01,adapter=path/XXXX,trim_front1=number,trim_front2=number,trim_tail1=number,trim_tail2=number,out=trim' modules/QC_Fastp.shSingle-Endsbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,seqType=SE,inFile=Sample01,sampleName=Sample01,adapter=path/XXXX,trim_front1=number,trim_tail1=number,out=trim' modules/QC_Fastp.sh Usage(Linux console) Command in linux consoleRapid Quality Analysis (partial reads)Paired-Endbash modules/QC_Fastp.sh -p $(pwd) -t PE -i Sample01 -s Sample01 -R 1000000 Single-Endbash  modules/QC_Fastp.sh -p $(pwd) -t SE -i Sample01 -s Sample01 -R 1000000Analyze […]

Module-QC_Fastp


DOC_ID : T15-0002  QIIME2_DPP_PE module :  DOC_ID : M19-3000Editor : AnitaReviewer :Angela Function : QIIME2_DPP perform Data import, denoise, dereplicate, and filters chimeras of paired-end sequences : Data import : Import our data. (fastq to qza format) according to manifest fileUsing the qiime demux summarize command to split sequences and measure the quality, depth of each sample. The visualized output files (qza to qzv format) are save by specifying the output path with –o-visualization parameter.  Sequence clean-up :denoise, dereplicate, and filters chimeras by DADA2, output 3 files : paired-end-demux.qza ⇒ table_dada2.qza ⇒ table.qzv paired-end-demux.qza ⇒ rep-seqs_dada2.qza ⇒ rep-seqs_dada2.qzv paired-end-demux.qza ⇒ stats_dada2.qza ⇒ stats_dada2.qzv   Ref : https://docs.qiime2.org/2021.8/tutorials/overview/ Installation : All software are included in GA environment.  Note : ►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard […]

Module-QIIME2_DPP_PE


DOC_ID : T15-0002 GATK_GMP module :  DOC_ID : M05-3000Editor : AnitaReviewer :Angela Function : Map raw reads to the reference genome and create bam file for small indel variant calling or structural variants analysis. This module will remove duplicates and adapters for reducing biases from library preparation. The base quality score will also be recalibrated according to GATK’s algorithm : Map to Reference genome and sorting : The first step is performed per-read group and consists of mapping each individual read pair to the reference genome which is a synthetic single-stranded representation of common genome sequence that is intended to provide a common coordinate framework for all genomic analysis.  Convert paired raw read file( […]

Module-GATK_GMP


DOC_ID : T15-0002 QIIME2_DPP_SE module :  DOC_ID : M55-3000Editor : AnitaReviewer :Angela Function : QIIME2_DPP perform Data import, denoise, dereplicate, and filters chimeras of single-end sequences : Data import: Import our data. (fastq to qza format) according to manifest fileUsing the qiime demux summarize command to split sequences and measure the quality, depth of each sample. The visualized output files (qza to qzv format) are save by specifying the output path with –o-visualization parameter.  Sequence clean-updenoise, dereplicate, and filters chimeras by DADA2, output 3 files : single-end-demux.qza ⇒ table_dada2.qza ⇒ table.qzv single-end-demux.qza ⇒ rep-seqs_dada2.qza ⇒ rep-seqs_dada2.qzv single-end-demux.qza ⇒ stats_dada2.qza ⇒ stats_dada2.qzv   Ref : https://docs.qiime2.org/2021.8/tutorials/overview/ Installation : All software are included in GA environment.  Note : ►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder […]

Module-QIIME2_DPP_SE



DOC_ID : T15-0002  QIIME2_DA module : DOC_ID : M21-3000Editor : AnitaReviewer :Angela Function : QIIME DA module perform phylogenetic diversity, Alpha and beta diversity analysis. The detail function and output file list below: Phylogenetic diversity analyses :generating and manipulating phylogenetic trees using fasttree and mafft alignment. The output will be applied to alpha/beta analysis. rep-seqs_dada2.qza ⇒ aligned-rep-seqs.qza aligned-rep-seqs.qza ⇒ masked-aligned-rep-seqs.qza masked-aligned-rep-seqs.qza ⇒ unrooted-tree.qza unrooted-tree.qza ⇒ rooted-tree.qza  beta diversity analysis : Generate core-metrics-phylogenetic (core-metrics-results) Test for associations between categorical metadata columns and alpha diversity data (Faith Phylogenetic Diversity and Evenness metrics) Faith Phylogenetic Diversity(a measure of community richness) : faith_pd_vector.qza ⇒ faith-pd-group-significance.qzv Evenness metrics : evenness_vector.qza ⇒ […]

Module-QIIME2_DA


DOC_ID : T15-0002  QIIME2_TA module :  DOC_ID : M20-3000Editor : AnitaReviewer :Angela Function : We use taxonomy classifiers to determine the closest taxonomic affiliation with some degree of confidence or consensus, based on alignment, k-mer frequencies, etc.  This module contains many steps : Classified by greengene pretraining classifier : Generate taxonomy_gg.qza  tabulate: Interactively explore Metadata in an HTML table (taxonomy_gg.qza ⇒ taxonomy_gg.qzv)  barplot: Visualize taxonomy with an interactive bar plot (table_dada2.qza and taxonomy_gg.qza ⇒ taxa_gg-bar-plots.qzv)  Classified by silva pretraining classifier :  Generate taxonomy_silva.qza tabulate: Interactively explore Metadata in an HTML table (taxonomy_silva.qza ⇒ taxonomy_silva.qzv)  barplot: Visualize taxonomy with an interactive bar plot (table_dada2.qza and taxonomy_silva.qza […]

Module-QIIME2_TA


DOC_ID : T11-0001  Doc_ID: A08-0001NCBIBioProj33317scikit0.24.1Editor: AnitaReviewer:  Description 參考QIIME2 forum上的文章,使用RESCRIPt下載來自NCBI Genbank的序列和分類,並訓練適用於QIIME2分析的分類器。 Ref : https://forum.qiime2.org/t/using-rescript-to-compile-sequence-databases-and-taxonomy-classifiers-from-ncbi-genbank/15947 Source Download URL : – File size : ncbi-refseqs-unfiltered.qza   198KB  ncbi-refseqs-taxonomy-unfiltered.qza    19.1KB Genome assemble version : BioProj33317 Detail information : 使用RESCRIPt下載來自NCBI Genbank的序列和分類,並訓練適用於QIIME2分析的classifiers,請確認已完成qiime2 standard analysis environment的安裝 #Activate standard analysis environmentconda activate qiime2  #移動到/work/使用者帳號資料夾cd /work/u5777333/  #創建放置參考序列及相應的分類法文件的資料夾mkdir -p NCBIclassifier/BioProject_33317  #移動至資料夾cd NCBIclassifier/BioProject_33317  #安裝RESCRIPtconda install -c conda-forge -c bioconda -c qiime2 -c defaults xmltodict pip install git+https://github.com/bokulich-lab/RESCRIPt.git  # 使用RESCRIPt下載來自NCBI Genbank的序列和分類資料qiime rescript get-ncbi-data \    –p-query ‘33317[BioProject]’ \    –o-sequences ncbi-refseqs-unfiltered.qza \    –o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza  #Filter unusually short 16S rRNA gene sequencesqiime rescript filter-seqs-length-by-taxon \    –i-sequences ncbi-refseqs-unfiltered.qza \    –i-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza \    –p-labels Archaea Bacteria \    –p-min-lens 900 1200 \    –o-filtered-seqs ncbi-refseqs.qza \    –o-discarded-seqs ncbi-refseqs-tooshort.qza  #using the –m-ids-to-keep-file parameter to only […]

Genome Reference:NCBIBioProj33317scikit0.24.1


DOC_ID : T11-0001  Doc_ID: A08-0001NCBIBioProj33175scikit0.24.1Editor: AnitaReviewer:  Description 參考QIIME2 forum上的文章,使用RESCRIPt下載來自NCBI Genbank的序列和分類,並訓練適用於QIIME2分析的分類器。 Ref : https://forum.qiime2.org/t/using-rescript-to-compile-sequence-databases-and-taxonomy-classifiers-from-ncbi-genbank/15947 Source Download URL : – File size : ncbi-refseqs-unfiltered.qza   4.64MB  ncbi-refseqs-taxonomy-unfiltered.qza    377KB Genome assemble version : BioProj33175 Detail information : 使用RESCRIPt下載來自NCBI Genbank的序列和分類,並訓練適用於QIIME2分析的classifiers,請確認已完成qiime2 standard analysis environment的安裝 #Activate standard analysis environmentconda activate qiime2  #移動到/work/使用者帳號資料夾cd /work/u5777333/  #創建放置參考序列及相應的分類法文件的資料夾mkdir -p NCBIclassifier/BioProject_33175  #移動至資料夾cd NCBIclassifier/BioProject_33175  #安裝RESCRIPtconda install -c conda-forge -c bioconda -c qiime2 -c defaults xmltodict pip install git+https://github.com/bokulich-lab/RESCRIPt.git  # 使用RESCRIPt下載來自NCBI Genbank的序列和分類資料qiime rescript get-ncbi-data \    –p-query ‘33175[BioProject]’ \    –o-sequences ncbi-refseqs-unfiltered.qza \    –o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza  #Filter unusually short 16S rRNA gene sequencesqiime rescript filter-seqs-length-by-taxon \    –i-sequences ncbi-refseqs-unfiltered.qza \    –i-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza \    –p-labels Archaea Bacteria \    –p-min-lens 900 1200 \    –o-filtered-seqs ncbi-refseqs.qza \    –o-discarded-seqs ncbi-refseqs-tooshort.qza  #using the –m-ids-to-keep-file parameter to only […]

Genome Reference:NCBIBioProj33175scikit0.24.1