DOC_ID : T15-0002

PICRUSt2 module :

DOC_ID : M22-3000
Editor : Anita
Reviewer :Angela

Function :

PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene sequences.

This module contains many steps :

sequence placement :Place ASV reads into reference tree.
- .fasta ⇒ out.tre
- .fasta ⇒ intermediate/place_seqs/epa_out
- .fasta ⇒ intermediate/place_seqs/query_align.stockholm
- .fasta ⇒ intermediate/place_seqs/ref_seqs_hmmalign.fasta
- .fasta ⇒ intermediate/place_seqs/study_seqs_hmmalign.fasta
hidden-state prediction of genomes :
- Hidden-state prediction of ASV gene families
  - 16S + out.tre ⇒ marker_predicted_and_nsti.tsv.gz
- Hidden-state prediction of ASV gene families restrict to the EC number database
  - EC + out.tre ⇒ EC_predicted.tsv.gz
- Hidden-state prediction of ASV gene families restrict to the KO number database
  - KO + out.tre ⇒ KO_predicted.tsv.gz
metagenome prediction :
- Generate EC_metagenome predictions
  - feature-table.biom + marker_predicted_and_nsti.tsv.gz + EC_predicted.tsv.gz ⇒ EC_metagenome_out
- Add EC_metagenome functional description
  - pred_metagenome_unstrat.tsv.gz + EC ⇒ pred_metagenome_unstrat_descrip.tsv.gz
- Generate KO_metagenome predictions
  - feature-table.biom + marker_predicted_and_nsti.tsv.gz + KO_predicted.tsv.gz ⇒ KO_metagenome_out
- Add KO_metagenome functional descriptions
  - pred_metagenome_unstrat.tsv.gz + KO ⇒ pred_metagenome_unstrat_descrip.tsv.gz
pathway-level predictions :
- Pathway-level inference
  - pred_metagenome_contrib.tsv.gz ⇒ EC_pathways_out
  - pred_metagenome_contrib.tsv.gz ⇒ EC_pathways_out/EC_pathways_working
- Add EC_pathway functional descriptions
  - path_abun_unstrat.tsv.gz + METACYC ⇒ path_abun_unstrat_descrip.tsv.gz

Ref : https://github.com/picrust/picrust2/wiki

Installation :

All software are included in GA environment.

Note :

►執行分析前請先利用CreateProject.sh創建一個專案資料夾，請參閱Project standard folder structure文件。

►執行模組需確認所屬計算節點(–partition) : 一般節點的使用者建議使用ct56 ; 生醫節點的使用者建議使用ngs48G^註1。

►欲了解模組使用的方式，請執行模組的 -h 指令

#註1 : 欲確認使用者身分，請登入國網中心iService後，選取會員中心/計畫管理/我的計畫，若計畫名稱為”國家生醫數位資料與分析運算雲端服務平台III”即為生醫節點使用者

Description :

Tested environment	GAPC20.0.0.1
Software version	picrust2=2.3.0_b
Usage(Slurm)	Command in Slurm (Taiwania III) `sbatch -A $projectID --mail-user=$email --export='projDir='$(pwd)'/,sampleName=dna-sequences.fasta,tableBiom=feature-table.biom' modules/PICRUSt2.sh`
Usage(Linux console)	Command in linux console `bash modules/PICRUSt2.sh -p $(pwd) -s dna-sequences.fasta -b feature-table.biom`
#For Slurm operation, please refer to “Basic operation of Taiwania III“

Usage :

The following explains the usage of module parameters :

Parameter	Description	Remark
PICRUSt2.sh	Module of predicting functional abundances	分析的模組需存放在[modules]資料夾中
projDir	分析專案的資料夾路徑（專案資料夾結構說明）	Script需在分析專案的資料夾執行，$(pwd) 會傳回使用者現在所在的路徑
sampleName	為擴增子序列變體的FASTA資料格式 : .fasta資料路徑 : processed/►此參數設定的檔案名稱請為”dna-sequences.fasta”*	例如:sampleName=”dna-sequences.fasta” 會將存放在processed/資料夾裡的rep-seqs_dada2.qza之檔案解壓縮後, 讀取內部的dna-sequences.fasta之資料
tableBiom	為樣品中每個擴增子序列變體豐度的BIOM表資料格式 : *.biom資料路徑 : processed/	例如:tableBiom=feature-table.biom 會讀取放在processed/資料夾裡的檔案feature-table.biom

基因體研究的全方位解決對策