You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We copied files from 20.0 release, but next time the location of the input files could differ.<br>
101
-
According to [Observations Output Schema](https://github.com/mpi2/impc-etl/wiki/Observations-Output-Schema) some fields hava array data type. However in current dataset those fields, instead of being array, are comma-separated lists.
102
95
103
-
3. Convert JSON mp_chooser file to Rdata.
96
+
2. Set necessary variables:
104
97
```console
105
-
R -e "a = jsonlite::fromJSON('part-00000-b2483dca-4c84-4c90-a79b-e97df8c95091-c000.txt');save(a,file='mp_chooser_20230411.json.Rdata')"
98
+
export VERSION="21.0"
99
+
export REMOTE="mpi2"
100
+
export BRANCH="master"
101
+
export KOMP_PATH="<absolute_path_to_directory>"
106
102
```
107
-
**Note:** we kept the name of the mp_chooser file exactly as mp_chooser_20230411.json.Rdata, because it is used on the code.
**Note:** Be cautious, the location of the input files may vary.<br>
116
+
Refer to the [Observations Output Schema](https://github.com/mpi2/impc-etl/wiki/Observations-Output-Schema). In the current dataset, some fields that should be arrays are presented as comma-separated lists.
120
117
121
-
6. Update master branch of the repository on GitHub with the new version of mp_chooser.
118
+
5. Convert the mp_chooser JSON file to Rdata:
122
119
```console
123
-
git add mp_chooser_20230411.json.Rdata
124
-
git commit -m "Replace an mp_chooser"
125
-
git push origin master
120
+
R -e "a = jsonlite::fromJSON('mp_chooser.json');save(a,file='mp_chooser.json.Rdata')"
**Note:** Login with [credentials](https://www.ebi.ac.uk/seqdb/confluence/display/MouseInformatics/GitHub+Machine+User) using personal access token for impc-stats-pipeline repo.
128
123
129
-
7. Update packages to the latest version.
124
+
6. Update packages to the latest version:
130
125
```console
131
-
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2
**Note:** Remember to note down the job ID number that will appear after submitting the job.
144
144
145
-
9. Run statistical pipeline.
146
-
```console
147
-
alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
148
-
bsub-1gb -o ../stats_pipeline_logs/stats_pipeline_20.2.log -e ../stats_pipeline_logs/stats_pipeline_20.2.err R -e 'DRrequiredAgeing:::StatsPipeline(DRversion=20.2)'
149
-
```
150
-
- To leave screen press combination `Ctrl + A + D`.
145
+
- To leave screen, press combination `Ctrl + A + D`.
151
146
- Don't forget to write down the number that will appear after leaving the screen, for example, 3507472, and number of cluster node.
147
+
- Also make sure to remember which login node you started the screen session on.
152
148
153
-
10. Check progress with this commands as an example.
154
-
- To log in on specific node:
155
-
`ssh codon-login-01`
156
-
- Activate screen to check progress:
157
-
`screen -r 3507472.stats-pipeline`
149
+
8. Monitor progress using the following commands:
150
+
- Activate screen to check progress: `screen -r 3507472.stats-pipeline`
151
+
- Use `squeue` to check job status.
152
+
- Review the log files:
153
+
```console
154
+
less ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_logs/stats_pipeline_${VERSION}.log
155
+
less ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_logs/stats_pipeline_${VERSION}.err
156
+
```
158
157
159
158
## Step 2. Run Annotation Pipeline
160
-
The `IMPC_HadoopLoad` command uses the power of LSF cluster to assign the annotations to the StatPackets and transfers the files to the Hadoop cluster. The files will be transferred to Hadoop:/hadoop/user/mi_stats/impc/statpackets/DRXX.
159
+
The `IMPC_HadoopLoad` command uses the power of cluster to assign the annotations to the StatPackets and transfers the files to the Hadoop cluster. The files will be transferred to Hadoop:/hadoop/user/mi_stats/impc/statpackets/DRXX.
160
+
1. Reconnect to screen session
161
+
Make sure to connect to the same login node you used to start the screen session.
161
162
```console
162
-
cd ${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_dr20.2/SP/jobs/Results_IMPC_SP_Windowed
163
-
alias bsub-1gb='bsub -q long -R "select[type==X86_64 && mem > 1000] rusage[mem=1000]" -M1000'
164
-
bsub-1gb -o ../stats_pipeline_logs/annotation_pipeline_20.2.log -e ../stats_pipeline_logs/annotation_pipeline_20.2.err R -e 'DRrequiredAgeing:::IMPC_HadoopLoad(prefix="DR20.2",transfer=FALSE)'
- The most complex part of this process is that some files will fail to transfer and you need to use scp command to transfer files to the Hadoop cluster manually.
167
185
- When you are sure that all files are there, you can share the path with Federico.
168
186
**Note**: in the slides transfer=TRUE, which means we haven't transfered files this time.
@@ -175,7 +193,7 @@ This process generates statistical reports typically utilized by the IMPC workin
175
193
3. The commands below will generate two CSV files in the `${KOMP_PATH}/impc_statistical_pipeline/IMPC_DRs/stats_pipeline_input_drXX.y/SP/jobs/Results_IMPC_SP_Windowed` directory for the unidimentional and categorical results. The files can be gzip and moved to the FTP directory. You can decorate and format the files by using one of the formatted files in the previous data releases.
0 commit comments