Data input for BIDS datasets

DataGrabber and SelectFiles are great if you are dealing with generic datasets with arbitrary organization. However, if you have decided to use Brain Imaging Data Structure (BIDS) to organize your data (or got your hands on a BIDS dataset) you can take advantage of a formal structure BIDS imposes. In this short tutorial, you will learn how to do this.

pybids - a Python API for working with BIDS datasets

pybids is a lightweight python API for querying BIDS folder structure for specific files and metadata. You can install it from PyPi:

pip install pybids

Please note it should be already installed in the tutorial Docker image.

The layout object and simple queries

To begin working with pybids we need to initialize a layout object. We will need it to do all of our queries

from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds000114/")
!tree -L 4 /data/ds000114/
/data/ds000114/
├── dataset_description.json
├── derivatives
│   └── fmriprep
│       ├── mni_icbm152_nlin_asym_09c
│       │   ├── 1mm_brainmask.nii.gz
│       │   ├── 1mm_T1.nii.gz
│       │   ├── 1mm_tpm_csf.nii.gz
│       │   ├── 1mm_tpm_gm.nii.gz
│       │   ├── 1mm_tpm_wm.nii.gz
│       │   ├── 2mm_brainmask.nii.gz
│       │   ├── 2mm_T1.nii.gz
│       │   ├── 2mm_tpm_csf.nii.gz
│       │   ├── 2mm_tpm_gm.nii.gz
│       │   └── 2mm_tpm_wm.nii.gz
│       ├── sub-01
│       │   └── ses-test
│       ├── sub-02
│       │   └── ses-test
│       ├── sub-03
│       │   └── ses-test
│       └── sub-07
│           └── ses-test
├── sub-01
│   └── ses-test
│       ├── anat
│       │   ├── sub-01_ses-test_T1w_bet.nii.gz
│       │   └── sub-01_ses-test_T1w.nii.gz
│       └── func
│           ├── sub-01_ses-test_task-fingerfootlips_bold.nii.gz
│           └── sub-01_ses-test_task-fingerfootlips_events.tsv
├── sub-02
│   └── ses-test
│       ├── anat
│       │   └── sub-02_ses-test_T1w.nii.gz
│       └── func
│           ├── sub-02_ses-test_task-fingerfootlips_bold.nii.gz
│           └── sub-02_ses-test_task-fingerfootlips_events.tsv
├── sub-03
│   └── ses-test
│       ├── anat
│       │   └── sub-03_ses-test_T1w.nii.gz
│       └── func
│           ├── sub-03_ses-test_task-fingerfootlips_bold.nii.gz
│           └── sub-03_ses-test_task-fingerfootlips_events.tsv
├── sub-07
│   └── ses-test
│       ├── anat
│       │   └── sub-07_ses-test_T1w.nii.gz
│       └── func
│           ├── sub-07_ses-test_task-fingerfootlips_bold.nii.gz
│           └── sub-07_ses-test_task-fingerfootlips_events.tsv
├── task-covertverbgeneration_bold.json
├── task-covertverbgeneration_events.tsv
├── task-fingerfootlips_bold.json
├── task-fingerfootlips_events.tsv
├── task-linebisection_bold.json
├── task-overtverbgeneration_bold.json
├── task-overtverbgeneration_events.tsv
├── task-overtwordrepetition_bold.json
└── task-overtwordrepetition_events.tsv

27 directories, 33 files

Let’s figure out what are the subject labels in this dataset

layout.get_subjects()
['01', '02', '03', '07']

What datatypes are included in this dataset?

layout.get_datatypes()
['anat', 'func']

Which different data suffixes are included in this dataset?

layout.get_suffixes(datatype='func')
['bold', 'events']

What are the different tasks included in this dataset?

layout.get_tasks()
['covertverbgeneration',
 'fingerfootlips',
 'linebisection',
 'overtverbgeneration',
 'overtwordrepetition']

We can also ask for all of the data for a particular subject and one datatype.

layout.get(subject='01', datatype="anat", session="test")
[<BIDSImageFile filename='/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz'>]

We can also ask for a specific subset of data. Note that we are using extension filter to get just the imaging data (BIDS allows both .nii and .nii.gz so we need to include both).

layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'])
[<BIDSImageFile filename='/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'>]

You probably noticed that this method does not only return the file paths, but objects with relevant query fields. We can easily extract just the file paths.

layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'], return_type='file')
['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz']

Exercise 1:

List all files for the “linebisection” task for subject 02.

#write your solution here
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds000114/")

layout.get(subject='02', return_type='file', task="linebisection")
[]

BIDSDataGrabber: Including pybids in your nipype workflow

This is great, but what we really want is to include this into our nipype workflows. To do this, we can import BIDSDataGrabber, which provides an Interface for BIDSLayout.get

from nipype.interfaces.io import BIDSDataGrabber
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.utility import Function

bg = Node(BIDSDataGrabber(), name='bids-grabber')
bg.inputs.base_dir = '/data/ds000114'

You can define static filters, that will apply to all queries, by modifying the appropriate input

bg.inputs.subject = '01'
res = bg.run()
res.outputs
211017-17:06:08,618 nipype.workflow INFO:
	 [Node] Setting-up "bids-grabber" in "/tmp/tmphxehejzv/bids-grabber".
211017-17:06:08,628 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:08,798 nipype.workflow INFO:
	 [Node] Finished "bids-grabber".
T1w = ['/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz']
bold = ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz']

Note that by default BIDSDataGrabber will fetch nifti files matching datatype func and anat, and output them as two output fields.

To define custom fields, simply define the arguments to pass to BIDSLayout.get as dictionary, like so:

bg.inputs.output_query = {'bolds': dict(suffix='bold')}
res = bg.run()
res.outputs
211017-17:06:08,808 nipype.workflow INFO:
	 [Node] Setting-up "bids-grabber" in "/tmp/tmphxehejzv/bids-grabber".
211017-17:06:08,815 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:08,968 nipype.workflow INFO:
	 [Node] Finished "bids-grabber".
bolds = ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz']

This results in a single output field bold, which returns all files with suffix:bold for subject:"01"

Now, lets put it in a workflow. We are not going to analyze any data, but for demonstration purposes, we will add a couple of nodes that pretend to analyze their inputs

def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeBOLD")
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD, "paths")
wf.run()
211017-17:06:09,466 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211017-17:06:09,488 nipype.workflow INFO:
	 Running serially.
211017-17:06:09,490 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmphxehejzv/bids-grabber".
211017-17:06:09,496 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:09,637 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211017-17:06:09,638 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/tmp/tmptpeuydx8/bids_demo/analyzeBOLD".
211017-17:06:09,644 nipype.workflow INFO:
	 [Node] Running "analyzeBOLD" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz']


211017-17:06:09,650 nipype.workflow INFO:
	 [Node] Finished "bids_demo.analyzeBOLD".
<networkx.classes.digraph.DiGraph at 0x7efe348cf250>

Exercise 2:

Modify the BIDSDataGrabber and the workflow to collect T1ws images for subject 7.

# write your solution here
ls /data/ds000114/sub-07/ses-test/anat/
sub-07_ses-test_T1w.nii.gz*
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex2_BIDSDataGrabber = BIDSDataGrabber()
ex2_BIDSDataGrabber.inputs.base_dir = '/data/ds000114'
ex2_BIDSDataGrabber.inputs.subject = '07'
ex2_BIDSDataGrabber.inputs.output_query = {'T1w': dict(datatype='anat')}

ex2_res = ex2_BIDSDataGrabber.run()
ex2_res.outputs
T1w = ['/data/ds000114/sub-07/ses-test/anat/sub-07_ses-test_T1w.nii.gz']

Iterating over subject labels

In the previous example, we demonstrated how to use pybids to “analyze” one subject. How can we scale it for all subjects? Easy - using iterables (more in Iteration/Iterables).

bg_all = Node(BIDSDataGrabber(), name='bids-grabber')
bg_all.inputs.base_dir = '/data/ds000114'
bg_all.inputs.output_query = {'bolds': dict(suffix='bold')}
bg_all.iterables = ('subject', layout.get_subjects()[:2])
wf = Workflow(name="bids_demo")
wf.connect(bg_all, "bolds", analyzeBOLD, "paths")
wf.run()
211017-17:06:10,516 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211017-17:06:10,551 nipype.workflow INFO:
	 Running serially.
211017-17:06:10,553 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmplzu8x624/bids_demo/_subject_02/bids-grabber".
211017-17:06:10,559 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:10,705 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211017-17:06:10,706 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/tmp/tmpgy59onqc/bids_demo/_subject_02/analyzeBOLD".
211017-17:06:10,715 nipype.workflow INFO:
	 [Node] Running "analyzeBOLD" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-fingerfootlips_bold.nii.gz']


211017-17:06:10,731 nipype.workflow INFO:
	 [Node] Finished "bids_demo.analyzeBOLD".
211017-17:06:10,733 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpci8w2j1j/bids_demo/_subject_01/bids-grabber".
211017-17:06:10,745 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:10,892 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211017-17:06:10,893 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/tmp/tmpjj2y9d__/bids_demo/_subject_01/analyzeBOLD".
211017-17:06:10,900 nipype.workflow INFO:
	 [Node] Running "analyzeBOLD" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz']


211017-17:06:10,945 nipype.workflow INFO:
	 [Node] Finished "bids_demo.analyzeBOLD".
<networkx.classes.digraph.DiGraph at 0x7efe30401550>

Accessing additional metadata

Querying different files is nice, but sometimes you want to access more metadata. For example RepetitionTime. pybids can help with that as well

layout.get_metadata('/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz')
{'EchoTime': 0.05,
 'FlipAngle': 90,
 'RepetitionTime': 2.5,
 'SliceTiming': [0.0,
  1.2499999999999998,
  0.08333333333333333,
  1.333333333333333,
  0.16666666666666666,
  1.4166666666666663,
  0.25,
  1.4999999999999996,
  0.3333333333333333,
  1.5833333333333328,
  0.41666666666666663,
  1.666666666666666,
  0.5,
  1.7499999999999993,
  0.5833333333333333,
  1.8333333333333326,
  0.6666666666666666,
  1.9166666666666659,
  0.75,
  1.9999999999999991,
  0.8333333333333333,
  2.083333333333332,
  0.9166666666666666,
  2.1666666666666656,
  1.0,
  2.249999999999999,
  1.0833333333333333,
  2.333333333333332,
  1.1666666666666665,
  2.416666666666665],
 'TaskName': 'finger_foot_lips'}

Can we incorporate this into our pipeline? Yes, we can! To do so, let’s use a Function node to use BIDSLayout in a custom way. (More about MapNode in MapNode)

def printMetadata(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ str(layout.get_metadata(path)["RepetitionTime"]) + "\n\n")
    
analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                             output_names=[]), name="analyzeBOLD2", iterfield="path")
analyzeBOLD2.inputs.data_dir = "/data/ds000114/"
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD2, "path")
wf.run()
211017-17:06:10,998 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211017-17:06:11,32 nipype.workflow INFO:
	 Running serially.
211017-17:06:11,34 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmphxehejzv/bids-grabber".
211017-17:06:11,40 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:11,207 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211017-17:06:11,208 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD2" in "/tmp/tmpu4dk_t8u/bids_demo/analyzeBOLD2".
211017-17:06:11,216 nipype.workflow INFO:
	 [Node] Setting-up "_analyzeBOLD20" in "/tmp/tmpu4dk_t8u/bids_demo/analyzeBOLD2/mapflow/_analyzeBOLD20".
211017-17:06:11,220 nipype.workflow INFO:
	 [Node] Running "_analyzeBOLD20" ("nipype.interfaces.utility.wrappers.Function")


analyzing /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz
TR: 2.5


211017-17:06:11,360 nipype.workflow INFO:
	 [Node] Finished "_analyzeBOLD20".
211017-17:06:11,364 nipype.workflow INFO:
	 [Node] Finished "bids_demo.analyzeBOLD2".
<networkx.classes.digraph.DiGraph at 0x7efe3037eb50>

Exercise 3:

Modify the printMetadata function to also print EchoTime

# write your solution here
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex3_BIDSDataGrabber = Node(BIDSDataGrabber(), name='bids-grabber')
ex3_BIDSDataGrabber.inputs.base_dir = '/data/ds000114'
ex3_BIDSDataGrabber.inputs.subject = '01'
ex3_BIDSDataGrabber.inputs.output_query = {'bolds': dict(suffix='bold')}
# and now modify analyzeBOLD2
def printMetadata_et(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ 
          str(layout.get_metadata(path)["RepetitionTime"]) +
          "\nET: "+ str(layout.get_metadata(path)["EchoTime"])+ "\n\n")
    
ex3_analyzeBOLD2 = MapNode(Function(function=printMetadata_et, 
                                    input_names=["path", "data_dir"],
                                    output_names=[]), 
                           name="ex3", iterfield="path")
ex3_analyzeBOLD2.inputs.data_dir = "/data/ds000114/"

# and create a new workflow
ex3_wf = Workflow(name="ex3")
ex3_wf.connect(ex3_BIDSDataGrabber, "bolds", ex3_analyzeBOLD2, "path")
ex3_wf.run()
211017-17:06:11,407 nipype.workflow INFO:
	 Workflow ex3 settings: ['check', 'execution', 'logging', 'monitoring']
211017-17:06:11,435 nipype.workflow INFO:
	 Running serially.
211017-17:06:11,436 nipype.workflow INFO:
	 [Node] Setting-up "ex3.bids-grabber" in "/tmp/tmpt33niaf5/ex3/bids-grabber".
211017-17:06:11,443 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211017-17:06:11,594 nipype.workflow INFO:
	 [Node] Finished "ex3.bids-grabber".
211017-17:06:11,596 nipype.workflow INFO:
	 [Node] Setting-up "ex3.ex3" in "/tmp/tmpyzlhckrd/ex3/ex3".
211017-17:06:11,604 nipype.workflow INFO:
	 [Node] Setting-up "_ex30" in "/tmp/tmpyzlhckrd/ex3/ex3/mapflow/_ex30".
211017-17:06:11,609 nipype.workflow INFO:
	 [Node] Running "_ex30" ("nipype.interfaces.utility.wrappers.Function")


analyzing /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz
TR: 2.5
ET: 0.05


211017-17:06:11,774 nipype.workflow INFO:
	 [Node] Finished "_ex30".
211017-17:06:11,777 nipype.workflow INFO:
	 [Node] Finished "ex3.ex3".
<networkx.classes.digraph.DiGraph at 0x7efe302a5290>