Skip to content

replace URLs with versioned urls where possible since some are 'disappearing' already #15

@yarikoptic

Description

@yarikoptic

What would you like to do:

  • Report an issue

while preparing datalad dataset we ran into a bunch of URLs 404ing since there were deleted in the bucket. But bucket was versioned seems after they were added and before they were removed so possibly those versions (or some other versions) are still available if null revision id would be provided, e.g.

$> wget -S 'http://fcp-indi.s3.amazonaws.com/data/Projects/CORR/Outputs/IBA_TRT/freesurfer/0027256-session_2/mri/T1.mgz?versionId=null' 
--2018-03-23 09:06:04--  http://fcp-indi.s3.amazonaws.com/data/Projects/CORR/Outputs/IBA_TRT/freesurfer/0027256-session_2/mri/T1.mgz?versionId=null
Resolving fcp-indi.s3.amazonaws.com (fcp-indi.s3.amazonaws.com)... 52.216.133.139
Connecting to fcp-indi.s3.amazonaws.com (fcp-indi.s3.amazonaws.com)|52.216.133.139|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  x-amz-id-2: jv1iiXrsK4IGUiRUAESIivfdxWabFalvSyDeW5SeHN0fpqfqY21l50xXf81cqvEsso8sBd8UOVA=
  x-amz-request-id: FA464ADD438B76F3
  Date: Fri, 23 Mar 2018 13:06:05 GMT
  Last-Modified: Mon, 17 Oct 2016 19:49:07 GMT
  ETag: "f71962c9688a8cc17e4e6ddff40c1946"
  x-amz-version-id: null
  Accept-Ranges: bytes
  Content-Type: application/octet-stream
  Content-Length: 3777778
  Server: AmazonS3
Length: 3777778 (3,6M) [application/octet-stream]
Saving to: ‘T1.mgz?versionId=null’

T1.mgz?versionId=null                                    100%[================================================================================================================================>]   3,60M  1,21MB/s    in 3,0s    

2018-03-23 09:06:07 (1,21 MB/s) - ‘T1.mgz?versionId=null’ saved [3777778/3777778]

$> wget -S 'http://fcp-indi.s3.amazonaws.com/data/Projects/CORR/Outputs/IBA_TRT/freesurfer/0027256-session_2/mri/T1.mgz'               
--2018-03-23 09:13:40--  http://fcp-indi.s3.amazonaws.com/data/Projects/CORR/Outputs/IBA_TRT/freesurfer/0027256-session_2/mri/T1.mgz
Resolving fcp-indi.s3.amazonaws.com (fcp-indi.s3.amazonaws.com)... 54.231.33.131
Connecting to fcp-indi.s3.amazonaws.com (fcp-indi.s3.amazonaws.com)|54.231.33.131|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 404 Not Found
  x-amz-request-id: D524E765315BF904
  x-amz-id-2: A49WIVJJZJB+N92BpqNIiSt75osl29SojPLHKzvgX1XPZRumO+43YGBjwwfPSEYWrTBCBwmxqX4=
  x-amz-delete-marker: true
  x-amz-version-id: ZT77s.ror9NN7Yt7bjGtH5h36leBw8Yp
  Content-Type: application/xml
  Transfer-Encoding: chunked
  Date: Fri, 23 Mar 2018 13:13:39 GMT
  Server: AmazonS3
2018-03-23 09:13:40 ERROR 404: Not Found.

since many urls do come from versioned fcp-indi bucket it I wondered if it would be great to remove ambiguity and make access more robust (unless bucket gets removed/recreated which would invalidate versionIds) by replacing URLs with versioned urls, like
http://fcp-indi.s3.amazonaws.com/data/Projects/BGSP/orig_bids/sub-1435/ses-01/anat/sub-1435_ses-01_T1w.nii.gz?versionId=ZzwCQ1fzDpWfUZzNvVGqwAONQ_QL.eI9
instead of
http://fcp-indi.s3.amazonaws.com/data/Projects/BGSP/orig_bids/sub-1435/ses-01/anat/sub-1435_ses-01_T1w.nii.gz . datalad ls could be of help here:

$> datalad ls -aL s3://fcp-indi/data/Projects/BGSP/orig_bids/sub-1435/ses-01/anat/sub-1435_ses-01_T1w.nii.gz                                                                          
Connecting to bucket: fcp-indi
[INFO   ] S3 session: Connecting to the bucket fcp-indi 
Bucket info:
  Versioning: S3ResponseError: 403 Forbidden
     Website: S3ResponseError: 403 Forbidden
         ACL: S3ResponseError: 403 Forbidden
data/Projects/BGSP/orig_bids/sub-1435/ses-01/anat/sub-1435_ses-01_T1w.nii.gz 2016-12-04T13:20:43.000Z 4853715 ver:ZzwCQ1fzDpWfUZzNvVGqwAONQ_QL.eI9  acl:AccessDenied  http://fcp-indi.s3.amazonaws.com/data/Projects/BGSP/orig_bids/sub-1435/ses-01/anat/sub-1435_ses-01_T1w.nii.gz?versionId=ZzwCQ1fzDpWfUZzNvVGqwAONQ_QL.eI9 [OK]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions