@@ -88,8 +88,8 @@ Instructions are made for release 20.2.
8888
89891 . Create working directory.
9090``` console
91- mkdir --mode=775 ${HOME_PATH} /stats_pipeline_input_dr20.2
92- cd ${HOME_PATH} /stats_pipeline_input_dr20.2
91+ mkdir --mode=775 ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs /stats_pipeline_input_dr20.2
92+ cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs /stats_pipeline_input_dr20.2
9393```
9494
95952 . Copy the input parquet files (±80* 10^6 data points) and mp_chooser_json.
@@ -102,9 +102,7 @@ According to [Observations Output Schema](https://github.com/mpi2/impc-etl/wiki/
102102
1031033 . Convert JSON mp_chooser file to Rdata.
104104``` console
105- R
106- a = jsonlite::fromJSON('part-*.txt');save(a,file='mp_chooser_20230411.json.Rdata')
107- q()
105+ R -e "a = jsonlite::fromJSON('part-00000-b2483dca-4c84-4c90-a79b-e97df8c95091-c000.txt');save(a,file='mp_chooser_20230411.json.Rdata')"
108106```
109107** Note:** we kept the name of the mp_chooser file exactly as mp_chooser_20230411.json.Rdata, because it is used on the code.
110108
@@ -116,8 +114,8 @@ git clone https://github.com/mpi2/impc_stats_pipeline.git
116114
1171155 . Update mp_chooser file in several directories.
118116``` console
119- cp ${HOME_PATH} /stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/annotation/
120- cp ${HOME_PATH} /stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/jobs/Postgres
117+ cp ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs /stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/annotation/
118+ cp ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs /stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/jobs/Postgres
121119```
122120
1231216 . Update master branch of the repository on GitHub with the new version of mp_chooser.
@@ -131,24 +129,23 @@ git push origin master
1311297 . Update packages to the latest version.
132130``` console
133131cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
134- R
135- source('https://raw.githubusercontent.com/mpi2/impc_stats_pipeline/master/Late%20adults%20stats%20pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/ UpdatePackagesFromGithub.R')
136- q()
132+ wget https://raw.githubusercontent.com/mpi2/impc_stats_pipeline/dev/Late%20adults%20stats%20pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/UpdatePackagesFromGithub. R
133+ Rscript UpdatePackagesFromGithub.R mpi2 master
134+ rm UpdatePackagesFromGithub.R
137135```
138136
139137### Run Statistical Pipeline
1401388 . Start screen.
141139``` console
142140cd ~
143141screen -S stats-pipeline
144- bsub -Is -q long bash
145142cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
146143```
147144
1481459 . Run statistical pipeline.
149146``` console
150- R
151- DRrequiredAgeing:::StatsPipeline(DRversion=20.2)
147+ alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
148+ bsub-1gb -o ../stats_pipeline_logs/stats_pipeline_20.2.log -e ../stats_pipeline_logs/stats_pipeline_20.2.err R -e ' DRrequiredAgeing:::StatsPipeline(DRversion=20.2)'
152149```
153150- To leave screen press combination ` Ctrl + A + D ` .
154151- Don't forget to write down the number that will appear after leaving the screen, for example, 3507472, and number of cluster node.
@@ -162,10 +159,9 @@ DRrequiredAgeing:::StatsPipeline(DRversion=20.2)
162159## Step 2. Run Annotation Pipeline
163160The ` IMPC_HadoopLoad ` command uses the power of LSF cluster to assign the annotations to the StatPackets and transfers the files to the Hadoop cluster. The files will be transferred to Hadoop:/hadoop/user/mi_stats/impc/statpackets/DRXX.
164161``` console
165- q()
166162cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2/SP/jobs/Results_IMPC_SP_Windowed
167- R
168- DRrequiredAgeing:::IMPC_HadoopLoad(prefix=' DR20.2' ,transfer=FALSE)
163+ alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
164+ bsub-1gb -o ../stats_pipeline_logs/annotation_pipeline_20.2.log -e ../stats_pipeline_logs/annotation_pipeline_20.2.err R -e ' DRrequiredAgeing:::IMPC_HadoopLoad(prefix=" DR20.2" ,transfer=FALSE)'
169165```
170166- The most complex part of this process is that some files will fail to transfer and you need to use scp command to transfer files to the Hadoop cluster manually.
171167- When you are sure that all files are there, you can share the path with Federico.
0 commit comments