DOC_ID:A03-1000S
Editor: Lucas Tung
Review: Anita
Introduction
我們發現過去程式腳本(script)無法重複使用的最大問題是檔案路徑不同,不同分析專案的檔案絕對路徑不同,個人命名習慣也不同,所以不同專案及分析者所寫的程式腳本無法重複使用也無法互通,所以我們團隊協調出一個標準化的檔案目錄結構,大家的腳本都依這個規則撰寫,而且盡可能使用相對路徑。從此腳本就可以在不需修改的狀況下,在不同專案重複使用相互串連,進而也催生了其後的模組化。我們現有的模組是依照這個標準設計,未來依照這個標準設計的模組,也可以很容易的併入現有的分析流程。強烈建議您使用 GA_bundle/Module/目錄下的CreateProject.sh模組建立專案目錄結構,以確保目錄結構相容性。
Definition of Standard Project Folder Structure
Root folder name | Sub-folder name | Var Name in Module | File format | Description |
project root | projDir | Manifest file | Information about all project, analysis design, sample category.Work script: Workflow script is running in the point. After the script is done, it will move to [workflow] subfolder | |
raw | rawDir | Sequence file : .fastq, .fasta Array : .cel files | Raw data directly generated from service provider, such as raw reads, array scan files. | |
processed | prcDir | Quality control : trim_R1/R2.fastq, cleanup_R1/R2.fastq Mapping sequence : .bam, .bai Variant calling : g.vcf.gz, vcf.gz | Trimmed or cleaned raw data. Intermediate data file. Data is not readable directly. Data is used in multi-propose or future analysisex. VCF, Bam, … | |
analyzed | anaDir | *.csv, *.txt, *.xlsx, *.RData | Data with manual filter, statistical analysis (p-value, FDR), modeling…Data is readable directlyIntermediate date during analysis optimizationex. Differently expressed gene list by ANOVA, T-test. List with cutoff P<0.001, <0.00001. | |
report | rptDir | *.html, *.qzv, *.VCF, *.db | Data can directly present conclusion of the project. Data is readable and optimized version. Summary of the projectQC: fastQC, multiQC resultPlot/chart; QIIME2 Annotatied; VCF: ANNOVAR export Database: VarDB | |
log | logDir | Summary.log, *.OU, *.ER | log file of module and job executing log | |
QC | qcDir | *.html *.gz/*.zip | QC data for each sample | |
temp | tmpDir | disposable intermediate data | ||
modules | modules | modules (Mirror of ~/GA_bundle/Module folder) | ||
apps | apps | Custom apps (Mirror of ~/GA_bundle/App folder) | ||
refs | refs | reference genome (Mirror of ~/GA_bundle/Ref folder) |