DOC_ID : A04-1000S
Editor: Lucas Tung
Reviewer : Anita
Module design regulation
Module name
[Pipeline]-[Function_description]-[Version].sh
- Pipeline:
- SN : Senteion
- GATK : GATK package
- RSEM : RSEM RNAseq
- QIIME2 : QIIME2 package
- scRNA : Single cell RNAseq
- …
- Function of module description
- GMP : Genome mapping paired
- VC : Variant Calling
- …
- Version : X.Y.Z
- X: Major version on Design
- Y: Minor version, When function and operation changed, give a new minor version update
- Z: Patched version. Fix bug and the function and operation of module is the same
Ex. SN_GMP.1.5.0.sh
– SN: Senteion pipeline
– GMP: genome mapping paired module
– Major version: 1
– Minor version: 5
– Patch version: 0
Name of parameter
- 目錄變數設計需符合 project folder 結構,只要指向預設的folder 就必需以制式參數名進行撰寫 (e.g. $projDir, $rawDir …, 參見標準專案目錄結構)
- Command args 中目錄的變數結尾需自帶 ” / ” 符號(如範例紅字處),避免混淆(e.g. MyScript -v ‘projDir=/project/GP1/user/‘)
Module depository
模組存放在GA_bundle/Module中,以Minor version 為單位分類存放,如下例:
The major version folder
- 1.X/: The current stable version modules
- 1.X/RC: Released candidate or patching version
- 1.X/Arch:Retired version of patched script
!!! RC version module is not tested. These module can’t be used in real analysis project.
Software environment of module
模組軟體來源可以使用 GA application collection 中軟體,或者GA_bundle 內 apps 中的軟體,軟體安裝位置最後都由bash 變數取代。
GA application collection:
GA 系統使用Conda 軟體來管理軟體安裝,以Application collection定義軟體集成環境(Environment) ,開發者可以利用現有集成內的軟體設計模組,如需添加新軟體請連絡管理者設定。使用軟體集成環境需在模組前,加入以下程序,啟動軟體環境:
Ex.
##Creat softlink for conda path
source ~/miniconda3/etc/profile.d/conda.sh
##Activate GApp environment
conda activate GApp
##Define application path
bwa = “bwa”
##execute application
$bwa mem -r refs/hg38.fasta..
GA_bundle apps folder:
有部份軟體不在 conda repository 中,開發者可安裝在GA_bundle 中,由開發者同步到使用者家目錄中 ,透過project folder 中的 soft-link 執行。
Ex.
#Define software path and version
bwa=’apps/bwa0.2.71/bin/bwa’
##execute application
$bwa mem -r refs/hg38.fasta..
Module version control
- SOP for Module creating and modification.
- Create module with module template(1.X) or original version(1.X.Y) as name ModuleX.1.X.0_editorName.sh or ModuleX.1.X.Y+1_editorName.sh) in Verison/RC folder
- Test module with DemoDataSet ( \TeamSharing\DemoDataSet\XXX )
- Move out RC folder When test is done.
- Launch meeting for publish module
- Move original version to Arch folder
- Remove _edtorName tag of Module.
- Update Module Document, Note the modification of the version.
- Check list of Module creating and modification需在 module 表頭 ” ##section 2:description: ” 增加更新的訊息(版本、時間及人員), 確認文件編號及更新demo code的內容.Ex.
#= version: 1.5.0, date: 20200107, modifier: anita
#= version: 1.5.3, date: 20200221 modifier: Lucas, anita
#= version: 2.0.0, date: 20200624, author: Lucas
#= Document : MXX-XXXX
#= demo code : bash MyModule.2.0.sh -P $(pwd) -i XXXX -s YYYY
#= demo code : qsub -P projectID -W group_list=projectID -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
#= demo code (for big black) : qsub -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
Module framework
#!/bin/bash
##section 1: PBS arguments
#PBS -N MyModule
#PBS -l select=1:ncpus=20
#PBS -l place=pack
#PBS -q ngs96G
#PBS -o log/
#PBS -e log/
##section 2 Description:
#= version: 1.0, date: , author:
#= version: 1.5.0, date: 20200107, modifier:
#= version: 1.5.3, date: 20200221 modifier:
#= version: 2.0.0, date: 20200624, author:
#= Document : MXX-XXXX
#= demo code : bash MyModule.2.0.sh -P $(pwd) -i XXXX -s YYYY
#= demo code : qsub -P projectID -W group_list=projectID -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
#= demo code (for big black) : qsub -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
## pre-request
## [projDir] project
## +– [rawDir] raw
## +– [prcDir] processed
## +– [anaDir] analyzed
## +– [rptDir] report
## +– [tmpDir] temp
## +– [logDir] log
## +– [qcDir] QC
##section 3: Software path
##creat softlink for conda path
source ~/miniconda3/etc/profile.d/conda.sh
##Define GApp version
conda activate GApp
gatk=”gatk”
refs=”Homo_sapiens/NCBI/GRCh38Decoy/Sequence/BWAIndex/genome.fa”
# homemade software is in soft-link [app] folder
##section 4: Parsing args and set default value
## args setup for linux console, it is skipped in PBS Pro job submit
while getopts “i:s:p:” argv
do
case $argv in
i) inFile=$OPTARG
;;
s) sampleName=$OPTARG
;;
p) projDir=$OPTARG
;;
esac
done
## Check args input and set default value
if [ ! $inFile ]; then
echo “-i input File Name must be specified!”
exit
fi
if [ ! $sampleName ]; then
echo “-s output sample Name must be specified!”
exit
fi
if [ ! $projDir ]; then
echo “-p projDir must be specified!”
exit
fi
##section 5: Executing script
## module will run in project root point.
cd $projDir
#Step 1:
# write log to log/Summary.log
# Essential log information:
# time: $(date), SampleName, ModuleName:Step, Current status
echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tStart’ >> log/Summary.log
if [ ! -f processed/$sampleName”.ubam” ]; then #Skip when result file is exist
$gatk FastqToSam \
–java-options “-Djava.io.tmpdir=$tmpDir” \
-F1 raw/$inFile”_R1.fastq.gz” \
-F2 raw/$inFile”_R2.fastq.gz” \
-O processed/$sampleName”.ubam” \
-RG $RG \
-SM $SM \
-PU $PU \
-PL $PL \
-LB $LB
# check if the module is done succesfully
if [ $? -eq 0 ]; then
echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tDone’ >> log/Summary.log
else
# if it is fail, move intermediate file to temp and report -1
echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tStop’ >> log/Summary.log
mv processed/$sampleName”.ubam” temp/$sampleName”.ubam.err”
exit -1
fi
else echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tSkip’ >> log/Summary.log
fi