Standard Module Framework


DOC_ID : A04-1000S
Editor: Lucas Tung
Reviewer : Anita
 

Module design regulation

Module name

[Pipeline]-[Function_description]-[Version].sh

  • Pipeline:
    • SN : Senteion
    • GATK : GATK package
    • RSEM : RSEM RNAseq
    • QIIME2 : QIIME2 package
    • scRNA : Single cell RNAseq

    •  
  • Function of module description
    • GMP : Genome mapping paired
    • VC : Variant Calling

    •  
  • Version : X.Y.Z
    • X: Major version on Design
    • Y: Minor version, When function and operation changed, give a new minor version update
    • Z: Patched version. Fix bug and the function and operation of module is the same

Ex. SN_GMP.1.5.0.sh

– SN: Senteion pipeline
– GMP: genome mapping paired module
– Major version: 1
– Minor version: 5
– Patch version: 0

Name of parameter

  1. 目錄變數設計需符合 project folder 結構,只要指向預設的folder 就必需以制式參數名進行撰寫 (e.g. $projDir, $rawDir …, 參見標準專案目錄結構)
  2. Command args 中目錄的變數結尾需自帶 ” / ” 符號(如範例紅字處),避免混淆(e.g. MyScript -v ‘projDir=/project/GP1/user/‘)
     

Module depository

模組存放在GA_bundle/Module中,以Minor version 為單位分類存放,如下例:

The major version folder

  • 1.X/: The current stable version modules 
  • 1.X/RC: Released candidate or patching version
  • 1.X/Arch:Retired version of patched script

!!! RC version  module is not tested. These module can’t be used in real analysis project. 


Software environment of module

模組軟體來源可以使用 GA application collection 中軟體,或者GA_bundle 內 apps 中的軟體,軟體安裝位置最後都由bash 變數取代。

GA application collection:

GA 系統使用Conda 軟體來管理軟體安裝,以Application collection定義軟體集成環境(Environment) ,開發者可以利用現有集成內的軟體設計模組,如需添加新軟體請連絡管理者設定。使用軟體集成環境需在模組前,加入以下程序,啟動軟體環境:

Ex.
##Creat softlink for conda path
source ~/miniconda3/etc/profile.d/conda.sh

##Activate GApp environment
conda activate GApp

##Define application path
bwa = “bwa”

##execute application 
$bwa mem -r refs/hg38.fasta.. 

GA_bundle apps folder:

有部份軟體不在 conda repository 中,開發者可安裝在GA_bundle 中,由開發者同步到使用者家目錄中 ,透過project folder 中的 soft-link 執行。

Ex.
#Define software path and version
bwa=’apps/bwa0.2.71/bin/bwa’

##execute application 
$bwa mem -r refs/hg38.fasta.. 


Module version control

  • SOP for Module creating and modification.
    1. Create module with module template(1.X) or original version(1.X.Y) as name ModuleX.1.X.0_editorName.sh  or ModuleX.1.X.Y+1_editorName.sh) in Verison/RC folder
    2. Test module with DemoDataSet ( \TeamSharing\DemoDataSet\XXX )
    3. Move out RC folder When test is done.
    4. Launch meeting for publish module
      • Move original version to Arch folder
      • Remove _edtorName tag of Module.
      • Update Module Document, Note the modification of the version.
     
  • Check list of Module creating and modification需在 module 表頭 ” ##section 2:description: ” 增加更新的訊息(版本、時間及人員), 確認文件編號及更新demo code的內容.Ex.
    #= version: 1.5.0, date: 20200107, modifier: anita
    #= version: 1.5.3, date: 20200221 modifier: Lucas, anita
    #= version: 2.0.0, date: 20200624, author: Lucas
    #= Document : MXX-XXXX
    #= demo code : bash MyModule.2.0.sh -P $(pwd) -i XXXX -s YYYY 
    #= demo code : qsub -P projectID -W group_list=projectID -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
    #= demo code (for big black) : qsub -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh


Module framework

#!/bin/bash
##section 1: PBS arguments
#PBS -N MyModule
#PBS -l select=1:ncpus=20
#PBS -l place=pack
#PBS -q ngs96G
#PBS -o log/
#PBS -e log/

##section 2 Description:
#= version: 1.0, date: , author:
#= version: 1.5.0, date: 20200107, modifier:
#= version: 1.5.3, date: 20200221 modifier:
#= version: 2.0.0, date: 20200624, author:
#= Document : MXX-XXXX
#= demo code : bash MyModule.2.0.sh -P $(pwd) -i XXXX -s YYYY 
#= demo code : qsub -P projectID -W group_list=projectID -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh
#= demo code (for big black) : qsub -m e -M useremail@gmail.com -v ‘projDir=’$(pwd)’/,inFile=XXXX,sampleName=YYYY’ MyModule.2.0.sh

##  pre-request
## [projDir] project
##   +– [rawDir] raw
##   +– [prcDir] processed
##   +– [anaDir] analyzed
##   +– [rptDir] report
##   +– [tmpDir] temp
##   +– [logDir] log
##   +– [qcDir] QC

##section 3: Software path
##creat softlink for conda path
source ~/miniconda3/etc/profile.d/conda.sh

##Define GApp version
conda activate GApp
gatk=”gatk”
refs=”Homo_sapiens/NCBI/GRCh38Decoy/Sequence/BWAIndex/genome.fa”

# homemade software is in soft-link [app] folder

##section 4: Parsing args and set default value
## args setup for linux console, it is skipped in PBS Pro job submit
while getopts “i:s:p:” argv
do
case $argv in
 i) inFile=$OPTARG
  ;;
 s) sampleName=$OPTARG
  ;;
 p) projDir=$OPTARG
  ;;
esac
done

## Check args input and set default value
if [ ! $inFile ]; then
 echo “-i input File Name must be specified!”
 exit
fi
if [ ! $sampleName ]; then
 echo “-s output sample Name must be specified!”
 exit
fi
if [ ! $projDir ]; then
 echo “-p projDir must be specified!”
 exit
fi

##section 5: Executing script
## module will run in project root point.
cd $projDir  

#Step 1:
# write log to log/Summary.log
# Essential log information:
# time: $(date), SampleName, ModuleName:Step, Current status
echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tStart’ >> log/Summary.log

if [ ! -f processed/$sampleName”.ubam” ]; then #Skip when result file is exist
   $gatk FastqToSam \
   –java-options “-Djava.io.tmpdir=$tmpDir” \
   -F1 raw/$inFile”_R1.fastq.gz” \
   -F2 raw/$inFile”_R2.fastq.gz” \
   -O processed/$sampleName”.ubam” \
   -RG $RG \
   -SM $SM \
   -PU $PU \
   -PL $PL \
   -LB $LB
   # check if the module is done succesfully
if [ $? -eq 0 ]; then
       echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tDone’ >> log/Summary.log
else
         # if it is fail, move intermediate file to temp and report -1
           echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tStop’ >> log/Summary.log
            mv processed/$sampleName”.ubam” temp/$sampleName”.ubam.err”
       exit -1
fi
else echo -e $(date)‘\t’$sampleName’\tMyModule:Step1\tSkip’ >> log/Summary.log
fi