DOC_ID : T15-0002
PICRUSt2 module :
DOC_ID : M22-3000
Editor : Anita
Reviewer :Angela
Function :
PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences.
This module contains many steps :
- sequence placement :Place ASV reads into reference tree.
- .fasta ⇒ out.tre
- .fasta ⇒ intermediate/place_seqs/epa_out
- .fasta ⇒ intermediate/place_seqs/query_align.stockholm
- .fasta ⇒ intermediate/place_seqs/ref_seqs_hmmalign.fasta
- .fasta ⇒ intermediate/place_seqs/study_seqs_hmmalign.fasta
- hidden-state prediction of genomes :
- Hidden-state prediction of ASV gene families
- 16S + out.tre ⇒ marker_predicted_and_nsti.tsv.gz
- Hidden-state prediction of ASV gene families restrict to the EC number database
- EC + out.tre ⇒ EC_predicted.tsv.gz
- Hidden-state prediction of ASV gene families restrict to the KO number database
- KO + out.tre ⇒ KO_predicted.tsv.gz
- Hidden-state prediction of ASV gene families
- metagenome prediction :
- Generate EC_metagenome predictions
- feature-table.biom + marker_predicted_and_nsti.tsv.gz + EC_predicted.tsv.gz ⇒ EC_metagenome_out
- Add EC_metagenome functional description
- pred_metagenome_unstrat.tsv.gz + EC ⇒ pred_metagenome_unstrat_descrip.tsv.gz
- Generate KO_metagenome predictions
- feature-table.biom + marker_predicted_and_nsti.tsv.gz + KO_predicted.tsv.gz ⇒ KO_metagenome_out
- Add KO_metagenome functional descriptions
- pred_metagenome_unstrat.tsv.gz + KO ⇒ pred_metagenome_unstrat_descrip.tsv.gz
- Generate EC_metagenome predictions
- pathway-level predictions :
- Pathway-level inference
- pred_metagenome_contrib.tsv.gz ⇒ EC_pathways_out
- pred_metagenome_contrib.tsv.gz ⇒ EC_pathways_out/EC_pathways_working
- Add EC_pathway functional descriptions
- path_abun_unstrat.tsv.gz + METACYC ⇒ path_abun_unstrat_descrip.tsv.gz
- Pathway-level inference
Ref : https://github.com/picrust/picrust2/wiki
Installation :
All software are included in GA environment.
Note :
►執行分析前請先利用CreateProject.sh創建一個專案資料夾,請參閱Project standard folder structure文件。
►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G註1。
►欲了解模組使用的方式,請執行模組的 -h 指令
#註1 : 欲確認使用者身分,請登入國網中心iService後,選取會員中心/計畫管理/我的計畫,若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者
Description :
Tested environment | GAPC20.0.0.1 |
Software version | picrust2=2.3.0_b |
Usage(Slurm) | Command in Slurm (Taiwania III)sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,sampleName=dna-sequences.fasta,tableBiom=feature-table.biom' modules/PICRUSt2.sh |
Usage(Linux console) | Command in linux consolebash modules/PICRUSt2.sh -p $(pwd) -s dna-sequences.fasta -b feature-table.biom |
#For Slurm operation, please refer to “Basic operation of Taiwania III“ |
Usage :
The following explains the usage of module parameters :
Parameter | Description | Remark |
PICRUSt2.sh | Module of predicting functional abundances | 分析的模組需存放在[modules]資料夾中 |
projDir | 分析專案的資料夾路徑(專案資料夾結構說明) | Script需在分析專案的資料夾執行,$(pwd) 會傳回使用者現在所在的路徑 |
sampleName | 為擴增子序列變體的FASTA資料格式 : *.fasta資料路徑 : processed/►此參數設定的檔案名稱請為”dna-sequences.fasta” | 例如:sampleName=”dna-sequences.fasta” 會將存放在processed/資料夾裡的rep-seqs_dada2.qza之檔案解壓縮後, 讀取內部的dna-sequences.fasta之資料 |
tableBiom | 為樣品中每個擴增子序列變體豐度的BIOM表資料格式 : *.biom資料路徑 : processed/ | 例如:tableBiom=feature-table.biom 會讀取放在processed/資料夾裡的檔案feature-table.biom |