Skip to content

Commit a28c1d5

Browse files
authored
Merge pull request #53 from marinak-ebi/update-instructions
Update instructions
2 parents 4d479de + c2b7b59 commit a28c1d5

File tree

1 file changed

+12
-16
lines changed

1 file changed

+12
-16
lines changed

README.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,8 @@ Instructions are made for release 20.2.
8888

8989
1. Create working directory.
9090
```console
91-
mkdir --mode=775 ${HOME_PATH}/stats_pipeline_input_dr20.2
92-
cd ${HOME_PATH}/stats_pipeline_input_dr20.2
91+
mkdir --mode=775 ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
92+
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
9393
```
9494

9595
2. Copy the input parquet files (±80*10^6 data points) and mp_chooser_json.
@@ -102,9 +102,7 @@ According to [Observations Output Schema](https://github.com/mpi2/impc-etl/wiki/
102102

103103
3. Convert JSON mp_chooser file to Rdata.
104104
```console
105-
R
106-
a = jsonlite::fromJSON('part-*.txt');save(a,file='mp_chooser_20230411.json.Rdata')
107-
q()
105+
R -e "a = jsonlite::fromJSON('part-00000-b2483dca-4c84-4c90-a79b-e97df8c95091-c000.txt');save(a,file='mp_chooser_20230411.json.Rdata')"
108106
```
109107
**Note:** we kept the name of the mp_chooser file exactly as mp_chooser_20230411.json.Rdata, because it is used on the code.
110108

@@ -116,8 +114,8 @@ git clone https://github.com/mpi2/impc_stats_pipeline.git
116114

117115
5. Update mp_chooser file in several directories.
118116
```console
119-
cp ${HOME_PATH}/stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/annotation/
120-
cp ${HOME_PATH}/stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/jobs/Postgres
117+
cp ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/annotation/
118+
cp ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2/mp_chooser_20230411.json.Rdata /tmp/impc_stats_pipeline/Late\ adults\ stats\ pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/jobs/Postgres
121119
```
122120

123121
6. Update master branch of the repository on GitHub with the new version of mp_chooser.
@@ -131,24 +129,23 @@ git push origin master
131129
7. Update packages to the latest version.
132130
```console
133131
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
134-
R
135-
source('https://raw.githubusercontent.com/mpi2/impc_stats_pipeline/master/Late%20adults%20stats%20pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/UpdatePackagesFromGithub.R')
136-
q()
132+
wget https://raw.githubusercontent.com/mpi2/impc_stats_pipeline/dev/Late%20adults%20stats%20pipeline/DRrequiredAgeing/DRrequiredAgeingPackage/inst/extdata/StatsPipeline/UpdatePackagesFromGithub.R
133+
Rscript UpdatePackagesFromGithub.R mpi2 master
134+
rm UpdatePackagesFromGithub.R
137135
```
138136

139137
### Run Statistical Pipeline
140138
8. Start screen.
141139
```console
142140
cd ~
143141
screen -S stats-pipeline
144-
bsub -Is -q long bash
145142
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
146143
```
147144

148145
9. Run statistical pipeline.
149146
```console
150-
R
151-
DRrequiredAgeing:::StatsPipeline(DRversion=20.2)
147+
alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
148+
bsub-1gb -o ../stats_pipeline_logs/stats_pipeline_20.2.log -e ../stats_pipeline_logs/stats_pipeline_20.2.err R -e 'DRrequiredAgeing:::StatsPipeline(DRversion=20.2)'
152149
```
153150
- To leave screen press combination `Ctrl + A + D`.
154151
- Don't forget to write down the number that will appear after leaving the screen, for example, 3507472, and number of cluster node.
@@ -162,10 +159,9 @@ DRrequiredAgeing:::StatsPipeline(DRversion=20.2)
162159
## Step 2. Run Annotation Pipeline
163160
The `IMPC_HadoopLoad` command uses the power of LSF cluster to assign the annotations to the StatPackets and transfers the files to the Hadoop cluster. The files will be transferred to Hadoop:/hadoop/user/mi_stats/impc/statpackets/DRXX.
164161
```console
165-
q()
166162
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2/SP/jobs/Results_IMPC_SP_Windowed
167-
R
168-
DRrequiredAgeing:::IMPC_HadoopLoad(prefix='DR20.2',transfer=FALSE)
163+
alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
164+
bsub-1gb -o ../stats_pipeline_logs/annotation_pipeline_20.2.log -e ../stats_pipeline_logs/annotation_pipeline_20.2.err R -e 'DRrequiredAgeing:::IMPC_HadoopLoad(prefix="DR20.2",transfer=FALSE)'
169165
```
170166
- The most complex part of this process is that some files will fail to transfer and you need to use scp command to transfer files to the Hadoop cluster manually.
171167
- When you are sure that all files are there, you can share the path with Federico.

0 commit comments

Comments
 (0)