Fetching the HBN example dataset¶

This example demonstrates how to use bids_bep16_conv.datasets to fetch the HBN example dataset.

Much of the functionality of the bids_bep16_conv toolbox relies on downloading candidate example datasets. Each dataset has its own functions to check and evaluate QC files to find suitable participants, as well as dedicated download functions that will obtain the data from the BIDS connectivity OSF project. The respective files differ between example datasets and respectively utilized pipeline/workflow but are obtained in a way that they confirm to BIDS common derivatives, specifically as input for tools that generate BEP16-related output.

Here we show how the HBN example dataset was generated and can be assessed, via describing the respective workflow and utilized functions.

The HBN dataset and its derivatives are provided openly via the FCP-INDI AWS bucket, entailing various pipeline/workflow outputs. Here, we are going to focus on the preprocessing conducted via QSIprep.

At first, we need to find a suitable participant, in terms of overall data quality, Luckily, QSIprep provides a respective file that includes a quality control score for each participant. Using the datasets.get_HBN_qc() function we can obtain and check this file:

from bids_bep16_conv import datasets

HBN_qc_file = datasets.get_HBN_qc(return_df=True)
print(HBN_qc_file)

Data will be downloaded to bids_bep16_datasets/HBN/source-HBN_desc-qsiprep_participants.tsv

  0%|          | 0/210607 [00:00<?, ?it/s]
 31%|###1      | 64.0k/206k [00:00<00:00, 628kB/s]
100%|##########| 206k/206k [00:00<00:00, 1.33MB/s]
            subject_id scan_site_id  ... dl_qc_score            site_variant
0     sub-NDARAA306NT2           RU  ...       0.470    RU_64dir_Most_Common
1     sub-NDARAA536PTU           SI  ...       0.701      SI_64dir_Obliquity
2     sub-NDARAA947ZG5         CBIC  ...       0.509  CBIC_64dir_Most_Common
3     sub-NDARAA948VFH           RU  ...       0.979    RU_64dir_Most_Common
4     sub-NDARAB055BPR           RU  ...       0.035    RU_64dir_Most_Common
...                ...          ...  ...         ...                     ...
2129  sub-NDARZW873DN3         CBIC  ...       0.982  CBIC_64dir_Most_Common
2130  sub-NDARZX163EWC         CBIC  ...       0.993  CBIC_64dir_Most_Common
2131  sub-NDARZY101JNB         CBIC  ...       0.992  CBIC_64dir_Most_Common
2132  sub-NDARZZ740MLM           RU  ...       0.014    RU_64dir_Most_Common
2133  sub-NDARZZ810LVF         CBIC  ...       0.861  CBIC_64dir_Most_Common

[2134 rows x 12 columns]

What we get is a DataFrame entailing the content of QSIprep’s participant.tsv file. In contains various demographic variables but also the Quality Control scores we are interested in. In order to make the respective evaluation more straightforward, we can use the datasets.get_HBN_qc() function, which will sort the DataFrame based on the dl_qc_score variable. We can furthermore indicate how many participants with the highest score, as well as if the sorted DataFrame and a raincloud plot of the dl_qc_score variable across the dataset should be returned.

Here, we going to get the participants that have the 3 highest scores, the sorted DataFrame and the raincloud plot.

HBN_qc_participants_df_sorted = datasets.eval_HBN_qc(HBN_qc_file,
                                                     n_high_participants=3,
                                                     visualize=True, return_sorted_df=True)

The 3 participants with the highest QC score are:
418     sub-NDAREK918EC2
2002    sub-NDARYM277DEA
1151    sub-NDARMV189NXG
Name: subject_id, dtype: object
/usr/share/miniconda/envs/bids_bep16_conv/lib/python3.8/site-packages/seaborn/_core.py:1303: UserWarning: Vertical orientation ignored with only `x` specified.
  warnings.warn(single_var_warning.format("Vertical", "x"))
/usr/share/miniconda/envs/bids_bep16_conv/lib/python3.8/site-packages/seaborn/_core.py:1303: UserWarning: Vertical orientation ignored with only `x` specified.
  warnings.warn(single_var_warning.format("Vertical", "x"))
/usr/share/miniconda/envs/bids_bep16_conv/lib/python3.8/site-packages/seaborn/_core.py:1303: UserWarning: Vertical orientation ignored with only `x` specified.
  warnings.warn(single_var_warning.format("Vertical", "x"))

As you can see in the raincloud plot, the score has a rather interesting distribution but the above obtained Series indicates that participant sub-NDAREK918EC2 has the highest dl_qc_score. However, upon closer inspection it was noticed that this participant doesn’t have all files necessary to test multiple analysis pipelines and the respective conversion to BEP16. Thus, the participant’s data with the second highest dl_qc_score was utilized. This refers to sub-NDARYM277DEA’s QSIprep outputs which were downloaded from the FCP-INDI AWS bucket and subsequently uploaded to the dataset component of the BIDS connectivity project OSF project for access and management.

That being said, we can use datasets.download_HBN() function to download the respective data, for example to our Desktop.

HBN_dataset_path = datasets.download_HBN()

Downloading sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.bval

  0%|          | 0/642 [00:00<?, ?it/s]
100%|##########| 642/642 [00:00<00:00, 715kB/s]
Downloading sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.bvec

  0%|          | 0/4457 [00:00<?, ?it/s]
100%|##########| 4.35k/4.35k [00:00<00:00, 4.86MB/s]
Downloading sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.nii.gz

  0%|          | 0/359208318 [00:00<?, ?it/s]
  0%|          | 1.69M/343M [00:00<00:20, 17.7MB/s]
  2%|2         | 8.00M/343M [00:00<00:07, 46.0MB/s]
  4%|4         | 14.7M/343M [00:00<00:06, 55.5MB/s]
  6%|5         | 20.0M/343M [00:00<00:06, 52.9MB/s]
  8%|7         | 26.6M/343M [00:00<00:05, 55.9MB/s]
 10%|9         | 33.8M/343M [00:00<00:05, 62.1MB/s]
 12%|#1        | 39.8M/343M [00:00<00:05, 59.5MB/s]
 13%|#3        | 45.5M/343M [00:00<00:05, 58.3MB/s]
 15%|#5        | 52.2M/343M [00:00<00:05, 59.9MB/s]
 17%|#6        | 58.0M/343M [00:01<00:04, 60.2MB/s]
 19%|#8        | 64.1M/343M [00:01<00:04, 61.1MB/s]
 20%|##        | 69.9M/343M [00:01<00:04, 61.2MB/s]
 22%|##2       | 75.8M/343M [00:01<00:04, 60.2MB/s]
 24%|##4       | 82.2M/343M [00:01<00:04, 61.8MB/s]
 26%|##5       | 88.2M/343M [00:01<00:04, 61.4MB/s]
 27%|##7       | 94.1M/343M [00:01<00:04, 61.2MB/s]
 29%|##9       | 100M/343M [00:01<00:04, 61.8MB/s]
 31%|###       | 106M/343M [00:01<00:04, 59.7MB/s]
 33%|###2      | 112M/343M [00:01<00:03, 61.7MB/s]
 35%|###4      | 118M/343M [00:02<00:03, 60.1MB/s]
 36%|###6      | 124M/343M [00:02<00:03, 58.4MB/s]
 38%|###8      | 131M/343M [00:02<00:03, 62.4MB/s]
 40%|####      | 137M/343M [00:02<00:03, 60.2MB/s]
 42%|####1     | 143M/343M [00:02<00:03, 58.3MB/s]
 44%|####3     | 150M/343M [00:02<00:03, 60.4MB/s]
 46%|####5     | 156M/343M [00:02<00:03, 62.3MB/s]
 47%|####7     | 162M/343M [00:02<00:03, 57.2MB/s]
 49%|####9     | 169M/343M [00:03<00:03, 57.1MB/s]
 51%|#####1    | 175M/343M [00:03<00:02, 59.9MB/s]
 53%|#####2    | 181M/343M [00:03<00:02, 59.5MB/s]
 55%|#####4    | 187M/343M [00:03<00:02, 59.7MB/s]
 56%|#####6    | 194M/343M [00:03<00:02, 61.6MB/s]
 58%|#####8    | 199M/343M [00:03<00:02, 60.9MB/s]
 60%|#####9    | 205M/343M [00:03<00:02, 60.9MB/s]
 62%|######1   | 212M/343M [00:03<00:02, 61.6MB/s]
 63%|######3   | 217M/343M [00:03<00:02, 61.3MB/s]
 65%|######5   | 223M/343M [00:03<00:02, 61.2MB/s]
 67%|######6   | 229M/343M [00:04<00:01, 60.5MB/s]
 69%|######8   | 235M/343M [00:04<00:01, 60.4MB/s]
 70%|#######   | 241M/343M [00:04<00:01, 62.1MB/s]
 72%|#######2  | 247M/343M [00:04<00:01, 61.4MB/s]
 74%|#######3  | 253M/343M [00:04<00:01, 60.2MB/s]
 76%|#######5  | 259M/343M [00:04<00:01, 60.0MB/s]
 77%|#######7  | 265M/343M [00:04<00:01, 58.3MB/s]
 79%|#######8  | 271M/343M [00:04<00:01, 58.3MB/s]
 81%|########1 | 278M/343M [00:04<00:01, 58.8MB/s]
 83%|########3 | 285M/343M [00:05<00:01, 59.4MB/s]
 85%|########5 | 292M/343M [00:05<00:00, 62.0MB/s]
 87%|########7 | 298M/343M [00:05<00:00, 61.8MB/s]
 89%|########8 | 304M/343M [00:05<00:00, 59.7MB/s]
 91%|######### | 310M/343M [00:05<00:00, 59.4MB/s]
 93%|#########2| 317M/343M [00:05<00:00, 60.8MB/s]
 94%|#########4| 323M/343M [00:05<00:00, 59.6MB/s]
 96%|#########6| 329M/343M [00:05<00:00, 60.8MB/s]
 98%|#########7| 335M/343M [00:05<00:00, 60.1MB/s]
 99%|#########9| 341M/343M [00:05<00:00, 59.2MB/s]
100%|##########| 343M/343M [00:06<00:00, 59.8MB/s]
Downloading sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-brain_mask.nii.gz

  0%|          | 0/13243 [00:00<?, ?it/s]
100%|##########| 12.9k/12.9k [00:00<00:00, 13.6MB/s]
Downloading sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.json

  0%|          | 0/3250 [00:00<?, ?it/s]
100%|##########| 3.17k/3.17k [00:00<00:00, 3.57MB/s]
Downloading QSIprep/dataset_description.json

  0%|          | 0/499 [00:00<?, ?it/s]
100%|##########| 499/499 [00:00<00:00, 569kB/s]
Downloading HBN/dataset_description.json

  0%|          | 0/60 [00:00<?, ?it/s]
100%|##########| 60.0/60.0 [00:00<00:00, 66.2kB/s]
The following HBN files are available:
HBN/
├─dataset_description.json
├─derivatives/
│ └─QSIprep/
│   ├─dataset_description.json
│   └─sub-NDARYM277DEA/
│     └─ses-HBNsiteCBIC/
│       └─dwi/
│         ├─sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.json
│         ├─sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.bval
│         ├─sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.nii.gz
│         ├─sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-brain_mask.nii.gz
│         └─sub-NDARYM277DEA_ses-HBNsiteCBIC_acq-64dir_space-T1w_desc-preproc_dwi.bvec
└─source-HBN_desc-qsiprep_participants.tsv

Importantly, this function does not only obtain the participant’s QSIprep output, but also obtains the dataset_description.json and generates the data json sidecar file required by BIDS common derivatives. The latter is achieved by downloading the respective raw data json sidecar file and appending the needed inheritance-related & spatial reference-related information.

With that, we have a feasible HBN sub-dataset, confirming to BIDS common derivatives, as well as inputs required by BEP16 and respective further processing.

Total running time of the script: ( 0 minutes 15.515 seconds)

Gallery generated by Sphinx-Gallery