標準分析專案目錄


DOC_ID:A03-1000S
Editor: Lucas Tung
Review: Anita

Introduction

我們發現過去程式腳本(script)無法重複使用的最大問題是檔案路徑不同,不同分析專案的檔案絕對路徑不同,個人命名習慣也不同,所以不同專案及分析者所寫的程式腳本無法重複使用也無法互通,所以我們團隊協調出一個標準化的檔案目錄結構,大家的腳本都依這個規則撰寫,而且盡可能使用相對路徑。從此腳本就可以在不需修改的狀況下,在不同專案重複使用相互串連,進而也催生了其後的模組化。我們現有的模組是依照這個標準設計,未來依照這個標準設計的模組,也可以很容易的併入現有的分析流程。強烈建議您使用 GA_bundle/Module/目錄下的CreateProject.sh模組建立專案目錄結構,以確保目錄結構相容性。

#執行CreateProject.sh 的模組,生成一個資料夾
>bash ~/GA_bundle/Module/CreateProject.sh myProject

#查看資料夾的內容
>ls myProject

#範例(利用 CreateProject.sh 模組,創建一個名為ProjectDemo的分析專案資料夾 )

Definition of Standard Project Folder Structure

Root folder nameSub-folder nameVar Name in ModuleFile formatDescription
project root projDirManifest fileInformation about all project, analysis design, sample category.Work script: Workflow script is running in the point. After the script is done, it will move to [workflow] subfolder
 rawrawDirSequence file : .fastq, .fasta
Array : .cel files
Raw data directly generated from service provider, such as raw reads, array scan files. 
 processedprcDirQuality control : trim_R1/R2.fastq, cleanup_R1/R2.fastq
Mapping sequence : .bam, .bai
Variant calling : g.vcf.gz, vcf.gz
Trimmed or cleaned raw data. Intermediate data file. Data is not readable directly. Data is used in multi-propose or future analysisex. VCF, Bam, …
 analyzedanaDir*.csv, *.txt, *.xlsx, *.RDataData with manual filter, statistical analysis (p-value, FDR), modeling…Data is readable directlyIntermediate date during analysis optimizationex. Differently expressed gene list by ANOVA, T-test. List with cutoff P<0.001, <0.00001.
 reportrptDir*.html, *.qzv, *.VCF, *.dbData can directly present conclusion of the project. Data is readable and optimized version. Summary of the projectQC: fastQC, multiQC resultPlot/chart; QIIME2 Annotatied; VCF: ANNOVAR export Database: VarDB
 loglogDirSummary.log, *.OU, *.ERlog file of module and job executing log
 QCqcDir*.html
*.gz/*.zip
QC data for each sample
 temptmpDir disposable intermediate data
 modulesmodules modules (Mirror of  ~/GA_bundle/Module folder)
 appsapps Custom apps (Mirror of ~/GA_bundle/App folder)
 refsrefs reference genome (Mirror of ~/GA_bundle/Ref folder)