Introduction VI - Introduction to Python II

Peer Herholz (he/him)
Habilitation candidate - Fiebach Lab, Neurocognitive Psychology at Goethe-University Frankfurt
Research affiliate - NeuroDataScience lab at MNI/McGill
Member - BIDS, ReproNim, Brainhack, Neuromod, OHBM SEA-SIG, UNIQUE

logo logo   @peerherholz

Before we get started …


Objectives 📍

  • learn basic and efficient usage of the python programming language

    • building blocks of & operations in python

    • operators & comparisons

    • strings, lists, tuples & dictionaries

What we will do in this section of the course is a short introduction to Python to help beginners to get familiar with this programming language.

It is divided into the following chapters:

Here’s what we will focus on in the first block:

Recap of the last session

Before we dive into new endeavors, it might be important to briefly recap the things we’ve talked about so far. Specifically, we will do this to evaluate if everyone’s roughly on the same page. Thus, if some of the aspects within the recap are either new or fuzzy to you, please have a quick look at the respective part of the last session again and as usual: ask questions wherever something is not clear.

What is Python?

  • Python is a programming language

  • Specifically, it’s a widely used/very flexible, high-level, general-purpose, dynamic programming language

  • That’s a mouthful! Let’s explore each of these points in more detail…

Module

Most of the functionality in Python is provided by modules. To use a module in a Python program it first has to be imported. A module can be imported using the import statement.

Assuming you want to import the entire pandas module to do some data exploration, wrangling and statistics, how would you do that?

# Pleae write your solution in this cell
import pandas

As this might be a bit hard to navigate, specifically for finding/referencing functions. Thus, it might be a good idea to provide a respective access name. For example, could you show how you would provide the pandas module the access name pd?

# Please write your solution in this cell

During your analyzes you recognize that some of the analyses you want to run require functions from the statistics module pingouin. Is there a way to only import the functions you want from this module, e.g. the wilcoxon test from within pingouin.nonparametric?

# Please write your solution in this cell

Variables and data types

  • in programming variables are things that store values

  • in Python, we declare a variable by assigning it a value with the = sign

    • name = value

    • code variables != math variables

      • in mathematics = refers to equality (statement of truth), e.g. y = 10x + 2

      • in coding = refers to assignments, e.g. x = x + 1

    • Variables are pointers, not data stores!

  • Python supports a variety of data types and structures:

    • booleans

    • numbers (ints, floats, etc.)

    • strings

    • lists

    • dictionaries

    • many others!

  • We don’t specify a variable’s type at assignment

Assignment

(Not your homework assignment but the operator in python.)

The assignment operator in Python is =. Python is a dynamically typed language, so we do not need to specify the type of a variable when we create one.

Assigning a value to a new variable creates the variable:

Within your analyzes you need to create a variable called n_students and assign it the value 21, how would that work?

# Please write your solution in this cell
n_students = 21

Quickly after you realize that the value should actually be 20. What options do you have to change the value of this variable?

# Please write your solution in this cell
n_students = 20
n_students = n_students - 1

During the analyzes you noticed that the data type of n_students changed. How can you find out the data type?

# Please write your solution in this cell
type(n_students)
int

Is there a way to change the data type of n_students to something else, e.g. float?

# Please write your solution in this cell
float(n_students)
19.0

Along the way you want to create another variable, this time called acquisition_time and the value December. How would we do that and what data type would that be?

from pingouin.nonparametric import wilcoxon
import pandas as pd
# Please write your solution here
acquisition_time = "December"
type(acquisition_time)
str

As a final step you want to create two variables that indicate that the outcome of a statistical test is either significant or not. How would you do that for the following example: for outcome_anova it’s true that the result was significant and for outcome_ancova it’s false that the result was significant?

# Please write your solution in this cell 
outcome_anova = True
outcome_ancova = False

Alright, thanks for taking the time to go through this recap. Again: if you could solve/answer all questions, you should have the information/knowledge needed for this session.

Here’s what we will focus on in the first block:

Operators and comparisons

One of the most basic utilizations of python might be simple arithmetic operations and comparisons. operators and comparisons in python work as one would expect:

  • Arithmetic operators available in python: +, -, *, /, ** power, % modulo

  • comparisons available in python: <, >, >= (greater or equal), <= (less or equal), == (equal), != (not equal) and is (identical)

Obviously, these operators and comparisons can be used within tremendously complex analyzes and actually build their basis.

Lets check them out further via a few quick examples, starting with operators:

[1 + 2, 
 1 - 2,
 1 * 2,
 1 / 2,
 1 ** 2,
 1 % 2]
[3, -1, 2, 0.5, 1, 1]

In Python 2.7, what kind of division (/) will be executed, depends on the type of the numbers involved. If all numbers are integers, the division will be an integer division, otherwise, it will be a float division. In Python 3 this has been changed and fractions aren’t lost when dividing integers (for integer division you can use another operator, //). In Python 3 the following two operations will give the same result (in Python 2 the first one will be treated as an integer division). It’s thus important to remember that the data type of division outcomes will always be float.

print(1 / 2)
print(1 / 2.0)
0.5
0.5

Python also respects arithemic rules, like the sequence of +/- and *//.

1 + 2/4
1.5
1 + 2 + 3/4
3.75

The same holds true for () and operators:

(1 + 2)/4
0.75
(1 + 2 + 3)/4
1.5

Thus, always watch out for how you define arithmetic operations!

Just as a reminder: the power operator in python is ** and not ^:

2 ** 2
4

This arithmetic operations also show some “handy” properties in combination with assignments, specifically you can apply these operations and modify the value of a given variable “in-place”. This means that you don’t have to assign a given variable a new value via an additional line like so:

a = 2
a = a * 2
print(a)
4

but you can shortcut the command a = a * 2 to a *= 2. This also works with other operators: +=, -= and /=.

b = 3
b *= 3
print(b)
9

Interestingly, we meet booleans again. This time in the form of operators. So booleans can not only be referred to as a data type but also operators. Whereas the data type entails the values True and False, the operators are spelled out as the words and, not, or. They therefore allow us to evaluate if

  • something and something else is the case

  • something is not the case

  • something or something else is the case

How about we check this on an example, i.e. the significance of our test results from the recap:

outcome_anova = True
outcome_ancova = False
outcome_anova and outcome_ancova
False
not outcome_ancova
True
outcome_anova or outcome_ancova
True

While the “classic” operators appear to be rather simple and the “boolean” operators rather abstract, a sufficient understanding of both is very important to efficiently utilize the python programming language. However, don’t worry: we’ll use them throughout the entire course going forward to gain further experience.

After spending a look at operators, it’s time to check out comparisons in more detail. Again, most of them might seem familiar and work as you would expect. Here’s the list again:

  • Comparisons in python >, <, >= (greater or equal), <= (less or equal), == (equal), != (not equal) and is (identical)

The first four are the “classics” and something you might remember from your math classes in high school. Nevertheless, it’s worth to check how they exactly work in python.

If we compare numerical values, we obtain booleans that indicate if the comparisons is True or False. Lets start with the “classics”.

2 > 1, 2 < 1
(True, False)
2 > 2, 2 < 2
(False, False)
2 >= 2, 2 <= 2
(True, True)

So far so good and no major surprises. Now lets have a look at those comparisons that might be less familiar. At first, ==. You might think: “What, a double assignment?” but actually == is the equal comparison and thus compares two variables, numbers, etc., evaluating if they are equal to each other.

1 == 1
True
outcome_anova == outcome_ancova
False
'This course' == "cool"
False
1 == 1 == 2
False

One interesting thing to mention here is that equal values of different data types, i.e. integers and floats, are still evaluated as equal by ==:

1 == 1.0
True

Contrarily to evaluating if two or more things are equal via ==, we can utilize != to evaluate if two are more things are not equal. The behavior of these comparison concerning the outcome is however identical: we get booleans.

2 != 3
True
outcome_anova = True 
outcome_ancova = False
outcome_anova != outcome_ancova 
True
1 != 1 != 2
False

There’s actually one very specific comparison that only works for one data type: string comparison. The string comparison is reflected by the word in and evaluates if a string is part of another string. For example, you can evaluate if a word or certain string pattern is part of another string. Two fantastic beings are going to help showcasing this!

Please welcome, the Wombat & the Capybara.

"cool" in "Wombats are cool"
True
"ras ar" in "Wombats and capybaras are cool"
True

The string comparison can also be combined with the boolean operator to evaluate if a string or string pattern is not part of another string.

"stupid" not in "Wombats and capybaras"
True

Before we finish the operators & comparison part, it’s important to outline one important aspects that you’ve actually already seen here and there but was never mentioned/explained in detail: operators & comparisons work directly on variables, that is their values. For example, if we want to change the number of a variable called n_lectures from 5 to 6, we can simply run:

n_lectures = 5
n_lectures = n_lectures + 1 
n_lectures
6

or use the shortcut as seen before

n_lectures = 5
n_lectures += 1
n_lectures
6

This works with other types and operators/comparisons too, for example strings and ==:

'Wombats' == 'Capybaras'
False

Exercise 4.1

You want to compute the mean of the following reaction times: 1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7. Is there a way to achieve that using operators?

# Please write your solution here
(1.2 + 1.0 + 1.5 + 1.9 + 1.3 + 1.2 + 1.7)/7
1.4

Spoiler: there are of course many existing functions for all sorts of equations and statistics so you don’t have to write it yourself every time. For example, we could also compute the mean using numpy’s mean function:

import numpy as np
np.mean([1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]) 
1.4

Exercise 4.2

Having computed the mean, you need to compare it to a reference value. The latter can be possibly found in a string that entails data from a previous analyses. If that’s the case, the string should contain the words "mean reaction time". Is there a way to evaluate this?

The string would be: “In contrast to the majority of the prior studies the mean reaction time of the here described analyses was 1.2.”

# Please write your solution here
reference = "In contrast to the majority of the prior studies the mean reaction time of the here described analyses was `1.2`."
"mean reaction time" in reference
True

Exercise 4.3

Having found the reference value, that is 1.2 we can compare it to our mean. Specifically, you want to know if the mean is less or equal than 1.2. The outcome of the comparison should then be the value of a new variable called mean_reference_comp.

# Please write your solution here
mean = (1.2 + 1.0 + 1.5 + 1.9 + 1.3 + 1.2 + 1.7)/7
mean_reference_comp = mean <= 1.2
mean_reference_comp
False

Fantastic work folks, really really great! Time for a quick party!


via GIPHY

Having already checked out modules, help & descriptions, variables and data types, operators and comparisons, we will continue with the final section of the first block of our python introduction. More precisely, we will advance to new, more complex data types and structures: strings, lists, tuples and dictionaries.

Strings, List and dictionaries

So far, we’ve explored integers, float, strings and boolean as fundamental types. However, there are a few more that are equally important within the python programing language and allow you to easily achieve complex behavoir and ease up your everyday programming life: strings, lists, tuples and dictionaries.


Strings

Wait, what? Why are we talking about strings again? Well, actually, strings are more than a “just” fundamental type. There are quite a few things you can do with strings that we haven’t talked about yet. However, first things first: strings contain text:

statement = "The wombat and the capybara are equally cute. However, while the wombat lives in Australia, the capybara can be found in south america."
statement
'The wombat and the capybara are equally cute. However, while the wombat lives in Australia, the capybara can be found in south america.'
type(statement)
str

So, what else can we do? For example, we can get the length of the string which reflects the number of characters in the string. The respective len function is one of the python functions that’s always available to you, even without importing it. Notably, len can operate on various data types which we will explore later.

len(statement)
135

The string data types also allows us to replace parts of it, i.e. substrings, with a different string. The respective syntax is string.replace("substring_to_replace", "replacement_string"), that is, .replace searches for "substring_to_replace" and replaces it with "replacement_string". If we for example want to state that wombats and capybaras are awesome instead of cute, we could do the following:

statement.replace("cute", "awesome")
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'

Importantly, strings are not replaced in-place but require a new variable assignment.

statement
'The wombat and the capybara are equally cute. However, while the wombat lives in Australia, the capybara can be found in south america.'
statement_2 = statement.replace("cute", "awesome")
statement_2
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'

We can also index a string using string[x] to get the character at the specified index:

statement_2
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'
statement_2[1]
'h'

Pump the breaks right there: why do we get h when we specify 1 as index? Shouldn’t this get us the first index and thus T?

via GIPHY

HEADS UP EVERYONE: INDEXING IN PYTHON STARTS AT 0


This means that the first index is 0, the second index 1, the third index 2, etc. . This holds true independent of the data type and is one of the major confusions when folks start programming in python, so always watch out!

statement_2
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'
statement_2[0]
'T'
statement_2[1]
'h'
statement_2[2]
'e'

If we want to get more than one character of a string we can use the following syntax string[start:stop] which extracts characters between index start and stop. This technique is called slicing.

statement_2[4:10]
'wombat'

If we omit either (or both) of start or stop from [start:stop], the default is the beginning and the end of the string, respectively:

statement_2[:10]
'The wombat'
statement_2[10:]
' and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'
statement_2[:]
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'

We can also define the step size using the syntax [start:end:step] (the default value for step is 1, as we saw above):

statement_2[::1]
'The wombat and the capybara are equally awesome. However, while the wombat lives in Australia, the capybara can be found in south america.'
statement_2[::2]
'Tewma n h ayaaaeeulyaeoe oee,wietewma ie nAsrla h ayaacnb on nsuhaeia'

String formatting

Besides operating on strings we can also apply different formatting styles. More precisely, this refers to different ways of displaying strings. The main function we’ll explore regarding this will be the print function. Comparable to len, it’s one of the python functions that’s always available to you, even without import.

For example, if we print strings added with +, they are concatenated without space:

print("The" + "results" + "were" + "significant")
Theresultsweresignificant

The print function concatenates strings differently, depending how the inputs are specified. If we just provide all strings without anything else, they will be concatenated without spaces:

print("The" "results" "were" "significant")
Theresultsweresignificant

If we provide strings separated by ,, they will be concatenated with spaces:

print("The", "results", "were", "significant")
The results were significant

Interestingly, the print function converts all inputs to strings, no matter their actual type:

print("The", "results", "were", "significant", 0.049, False)
The results were significant 0.049 False

Another very cool and handy option that we can specify placeholders which will be filled with an input according to a given formatting style. Python has two string formatting styles. An example of the old style is below, the placeholder or specifier %.3f transforms the input number into a string, that corresponds to a floating point number with 3 decimal places and the specifier %d transforms the input number into a string, corresponding to a decimal number.

print("The results were significant at %.3f" %(0.049))
The results were significant at 0.049
print("The results were significant at %d" %(0.049))
The results were significant at 0

As you can see, you have to be very careful with string formatting as important information might otherwise get lost!

We can achieve the same outcome using the new style string formatting which uses {} followed by .format().

print("The results were significant at {:.3f}" .format(0.049))
The results were significant at 0.049

If you would like to include line-breaks and/or tabs in your strings, you can use \n and \t respectively:

print("Geez, there are some many things \nPython can do with \t strings.")
Geez, there are some many things 
Python can do with 	 strings.

We can of course also combine the different string formatting options:

print("Animal: {}\nHabitat: {}\nRating: {}".format("Wombat", "Australia", 5))
Animal: Wombat
Habitat: Australia
Rating: 5

Single Quote

You can specify strings using single quotes such as 'Quote me on this'. All white space i.e. spaces and tabs, within the quotes, are preserved as-is.

Double Quotes

Strings in double quotes work exactly the same way as strings in single quotes. An example is "What's your name?".

Triple Quotes

You can specify multi-line strings using triple quotes - (""" or '''). You can use single quotes and double quotes freely within the triple quotes. An example is:

'''I'm the first line. Check how line-breaks are shown in the second line.
Do you see the line-break?
"What's going on here?," you might ask.
Well, "that's just how tiple quotes work."
'''
'I\'m the first line. Check how line-breaks are shown in the second line.\nDo you see the line-break?\n"What\'s going on here?," you might ask.\nWell, "that\'s just how tiple quotes work."\n'
Exercise 5.1

Create two variables called info_wombat and info_capybara and provide them the following values respectively:

“The wombat is quadrupedal marsupial and can weigh up to 35 kg.”

“The capybara is the largest rodent on earth. Its relatives include the guinea pig and the chinchilla.”

Once created, please verify that the type is string.

# Please write your solution here
info_wombat = "The wombat is quadrupedal marsupial and can weigh up to 35 kg."
info_capybara = "The capybara is the largest rodent on earth. Its relatives include the guinea pig and the chinchilla."

print(type(info_wombat))
print(type(info_capybara))
<class 'str'>
<class 'str'>
Exercise 5.2

Compute the length and print within the strings “The wombat information has [insert length here] characters.” and “The capybara information has [insert length here] characters.” After that, please compare the length of the strings and print if they are equal.

# Please write your solution here
print("The wombat information has %s characters." %len(info_wombat))
print("The capybara information has %s characters." %len(info_capybara))
print("The information has an equal amount of characters: %s" %str(len(info_wombat)==len(info_capybara)))
The wombat information has 62 characters.
The capybara information has 101 characters.
The information has an equal amount of characters: False
Exercise 5.3

Get the following indices from the info_wombat and info_capybara respectively: 4-10 and 4-12. Replace the resulting word in info_wombat with capybara and the resulting word in info_capybara with wombat.

print(info_wombat[4:10])
print(info_capybara[4:12])

print(info_wombat.replace('wombat', 'capybara'))
print(info_capybara.replace('capybara', 'wombat'))
wombat
capybara
The capybara is quadrupedal marsupial and can weigh up to 35 kg.
The wombat is the largest rodent on earth. Its relatives include the guinea pig and the chinchilla.

List

Next up: lists. In general, lists are very similar to strings. One crucial difference is that list elements (things in the list) can be of any type: integers, floats, strings, etc. . Additionally, types can be freely mixed within a list, that is, each element of a list can be of a different type. Lists are among the data types and structures you’ll work with almost every time you do something in python. They are super handy and comparably to strings, have a lot of “in-built” functionality.


The basic syntax for creating lists in python is [...]:

[1,2,3,4]
[1, 2, 3, 4]
type([1,2,3,4])
list

You can of course also set lists as the value of a variable. For example, we can create a list with our reaction times from before:

reaction_times = [1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]
print(type(reaction_times))
print(reaction_times)
<class 'list'>
[1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]

Going back to the comparison with strings, we can use the same index and slicing techniques to manipulate lists as we could use on strings: list[index], list[start:stop].

print(reaction_times)
print(reaction_times[1:3])
print(reaction_times[::2])
[1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]
[1.0, 1.5]
[1.2, 1.5, 1.3, 1.7]
via GIPHY

HEADS UP EVERYONE: INDEXING IN PYTHON STARTS AT 0


This means that the first index is 0, the second index 1, the third index 2, etc. . This holds true independent of the data type and is one of the major confusions when folks start programming in python, so always watch out!

Thus, to get the first index of our reaction_times, we have to do the following:

reaction_times[0]
1.2

There’s another important aspect related to index and slicing. Have a look at the following example that should get us the reaction times from index 1 to 4:

print(reaction_times)
print(reaction_times[1:4])
[1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]
[1.0, 1.5, 1.9]

Isn’t there something missing, specifically the last index we wanted to grasp, i.e. 4?

via GIPHY

HEADS UP EVERYONE: SLICING IN PYTHON EXCLUDES THE “STOP” INDEX


This means that the slicing technique gives you everything up to the stop index but does not include the stop index itself. For example, reaction_times[1:4] will return the list elements from index 1 up to 4 but not the fourth index. This holds true independent of the data type and is one of the major confusions when folks start programming in python, so always watch out!

So, to get to list elements from index 0 - 4, including 4, we have to do the following:

print(reaction_times)
print(reaction_times[0:5])
[1.2, 1.0, 1.5, 1.9, 1.3, 1.2, 1.7]
[1.2, 1.0, 1.5, 1.9, 1.3]

As mentioned before, elements in a list do not all have to be of the same type:

mixed_list = [1, 'a', 4.0, 'What is happening?']

print(mixed_list)
[1, 'a', 4.0, 'What is happening?']

Another nice thing to know: python lists can be inhomogeneous and arbitrarily nested, meaning that we can define and access lists within lists. Is this list-ception??

nested_list = [1, [2, [3, [4, [5]]]]]

print("our nest list looks like this: %s" %nested_list)
print("the length of our nested list is: %s" %len(nested_list))
print("the first index of our nested_list is: %s" %nested_list[0])
print("the second index of our nested_list is: %s" %nested_list[1])
print("the second index of our nested_list is a list we can index again via nested_list[1][1]: %s" %nested_list[1][1])
our nest list looks like this: [1, [2, [3, [4, [5]]]]]
the length of our nested list is: 2
the first index of our nested_list is: 1
the second index of our nested_list is: [2, [3, [4, [5]]]]
the second index of our nested_list is a list we can index again via nested_list[1][index]: [3, [4, [5]]]

Lets have a look at another list. Assuming you obtain data describing the favorite movies and snacks of a sample population, we can put the respective responses in lists for easy handling:

movies = ['The_Intouchables', 'James Bond', 'Forrest Gump', 'Retired Extremely Dangerous', 'The Imitation Game',
          'The Philosophers', 'Call Me by Your Name', 'Shutter Island', 'Love actually', 'The Great Gatsby',
          'Interstellar', 'Inception', 'Lord of the Rings - The Two Towers', 'Fight Club', 'Shutter Island',
          'Harry Potter', 'Harry Potter and the MatLab-Prince', 'Shindlers List', 'Inception']

BTW: thanks so much for these, I really enjoyed going through the notebooks. However, Harry Potter and the MatLab-Prince….that one really got me. I mean there was even a short summary of the story…

snacks = ['pancakes', 'Banana', 'dominoes', 'carrots', 'hummus', 'chocolate', 'chocolate', 'Pringles', 'snickers',
          'chocolate','Kinder bueno', 'sushi',  'mint chocolate', 'fruit', 'dried mango', 'dark chocolate', 
          'too complicated', 'snickers', 'Rocher']

A smaller subsample also provided their favorite animal. (If you also provided one but it doesn’t show up it means that you most likely used the “add attachment” option to add images or a way to refers to local files but doesn’t embed them directly in the notebook. Unfortunately, things can be a bit strange there…so don’t worry, we address this in subsequent sessions.)

animals = ['cat', 'lizard', 'coral', 'elephant', 'barred owl', 'groundhog']

So lets check what we can do with these lists. At first, here they are again:

print('The favorite movies were: %s' %movies)
print('\n')
print('The favorite snacks were: %s'%snacks)
print('\n')
print('The favorite animals were: %s' %animals)
The favorite movies were: ['The_Intouchables', 'James Bond', 'Forrest Gump', 'Retired Extremely Dangerous', 'The Imitation Game', 'The Philosophers', 'Call Me by Your Name', 'Shutter Island', 'Love actually', 'The Great Gatsby', 'Interstellar', 'Inception', 'Lord of the Rings - The Two Towers', 'Fight Club', 'Shutter Island', 'Harry Potter', 'Harry Potter and the MatLab-Prince', 'Shindlers List', 'Inception']


The favorite snacks were: ['pancakes', 'Banana', 'dominoes', 'carrots', 'hummus', 'chocolate', 'chocolate', 'Pringles', 'snickers', 'chocolate', 'Kinder bueno', 'sushi', 'mint chocolate', 'fruit', 'dried mango', 'dark chocolate', 'too complicated', 'snickers', 'Rocher']


The favorite animals were: ['cat', 'lizard', 'coral', 'elephant', 'barred owl', 'groundhog']

Initially we might want to count how many responses there were. We can achieve this via our old friend the len function. If we want to also check if we got responses from all 19 participants of our sample population, we can directly use comparisons.

print('Regarding movies there were %s responses' %len(movies))
print('We got responses from all participants: %s' %str(len(movies)==19))
Regarding movies there were 19 responses
We got responses from all participants: True

We can do the same for snacks and animals:

print('Regarding snacks there were %s responses' %len(snacks))
print('We got responses from all participants: %s' %str(len(snacks)==19))
Regarding snacks there were 19 responses
We got responses from all participants: True
print('Regarding animals there were %s responses' %len(animals))
print('We got responses from all participants: %s' %str(len(animals)==19))
Regarding animals there were 6 responses
We got responses from all participants: False

Another thing we might want to check is the number of unique responses, that is if some values in our list appear multiple times and we might also want to get these values. In python we have various ways to do this, most often you’ll see (and most likely use) set and numpy’s unique functions. While the first is another example of in-built python functions that don’t need to be imported, the second is a function of the numpy module. They however can achieve the same goal, that is getting us the number of unique values.

print('There are %s unique responses regarding movies' %len(set(movies)))
There are 17 unique responses regarding movies
import numpy as np
print('There are %s unique responses regarding movies' %len(np.unique(movies)))
There are 17 unique responses regarding movies

The functions themselves will also give us the list of unique values:

import numpy as np
print('The unique responses regarding movies were:  %s' %np.unique(movies))
The unique responses regarding movies were:  ['Call Me by Your Name' 'Fight Club' 'Forrest Gump' 'Harry Potter'
 'Harry Potter and the MatLab-Prince' 'Inception' 'Interstellar'
 'James Bond' 'Lord of the Rings - The Two Towers' 'Love actually'
 'Retired Extremely Dangerous' 'Shindlers List' 'Shutter Island'
 'The Great Gatsby' 'The Imitation Game' 'The Philosophers'
 'The_Intouchables']

Doing the same for snacks and animals again is straightforward:

print('There are %s unique responses regarding snacks' %len(np.unique(snacks)))
print('The unique responses regarding snacks were:  %s' %np.unique(snacks))
There are 16 unique responses regarding snacks
The unique responses regarding snacks were:  ['Banana' 'Kinder bueno' 'Pringles' 'Rocher' 'carrots' 'chocolate'
 'dark chocolate' 'dominoes' 'dried mango' 'fruit' 'hummus'
 'mint chocolate' 'pancakes' 'snickers' 'sushi' 'too complicated']
print('There are %s unique responses regarding animals' %len(np.unique(animals)))
print('The unique responses regarding animals were:  %s' %np.unique(animals))
There are 6 unique responses regarding animals
The unique responses regarding animals were:  ['barred owl' 'cat' 'coral' 'elephant' 'groundhog' 'lizard']

In-built functions

As indicated before, lists have a great set of in-built functions that allow to perform various operations/transformations on them: sort, append, insert and remove. Please note, as these functions are part of the data typelist”, you don’t prepend (sort(list)) but append them: list.sort(), list.append(), list.insert() and list.remove().

Lets start with sort which, as you might have expected, will sort our list.

movies.sort()
movies
['Call Me by Your Name',
 'Fight Club',
 'Forrest Gump',
 'Harry Potter',
 'Harry Potter and the MatLab-Prince',
 'Inception',
 'Inception',
 'Interstellar',
 'James Bond',
 'Lord of the Rings - The Two Towers',
 'Love actually',
 'Retired Extremely Dangerous',
 'Shindlers List',
 'Shutter Island',
 'Shutter Island',
 'The Great Gatsby',
 'The Imitation Game',
 'The Philosophers',
 'The_Intouchables']

Please note that our list is modified/changed in-place. While it’s nice to not have to do a new assignment, this can become problematic if the index is of relevance!

.sort() also allows you to specify how the list should be sorted: ascending or descending. This is controlled via the reverse argument of the .sort() function. By default, lists are sorted in an descending order. If you want to sort your list in an ascending order you have to set the reverse argument to True: list.sort(reverse=True).

movies.sort(reverse=True)
movies
['The_Intouchables',
 'The Philosophers',
 'The Imitation Game',
 'The Great Gatsby',
 'Shutter Island',
 'Shutter Island',
 'Shindlers List',
 'Retired Extremely Dangerous',
 'Love actually',
 'Lord of the Rings - The Two Towers',
 'James Bond',
 'Interstellar',
 'Inception',
 'Inception',
 'Harry Potter and the MatLab-Prince',
 'Harry Potter',
 'Forrest Gump',
 'Fight Club',
 'Call Me by Your Name']

We of course also want to sort our lists of snacks and animals:

snacks.sort()
snacks
['Banana',
 'Kinder bueno',
 'Pringles',
 'Rocher',
 'carrots',
 'chocolate',
 'chocolate',
 'chocolate',
 'dark chocolate',
 'dominoes',
 'dried mango',
 'fruit',
 'hummus',
 'mint chocolate',
 'pancakes',
 'snickers',
 'snickers',
 'sushi',
 'too complicated']
animals.sort()
animals
['barred owl', 'cat', 'coral', 'elephant', 'groundhog', 'lizard']

Lets assume we got new data from two participants and thus need to update our list, we can simply use .append() to, well, append or add these new entries:

movies.append('My Neighbor Totoro')
movies
['The_Intouchables',
 'The Philosophers',
 'The Imitation Game',
 'The Great Gatsby',
 'Shutter Island',
 'Shutter Island',
 'Shindlers List',
 'Retired Extremely Dangerous',
 'Love actually',
 'Lord of the Rings - The Two Towers',
 'James Bond',
 'Interstellar',
 'Inception',
 'Inception',
 'Harry Potter and the MatLab-Prince',
 'Harry Potter',
 'Forrest Gump',
 'Fight Club',
 'Call Me by Your Name',
 'My Neighbor Totoro']

We obviously do the same for snacks and animals again:

snacks.append('Wasabi Peanuts')
snacks
['Banana',
 'Kinder bueno',
 'Pringles',
 'Rocher',
 'carrots',
 'chocolate',
 'chocolate',
 'chocolate',
 'dark chocolate',
 'dominoes',
 'dried mango',
 'fruit',
 'hummus',
 'mint chocolate',
 'pancakes',
 'snickers',
 'snickers',
 'sushi',
 'too complicated',
 'Wasabi Peanuts']
animals.append('bear')
animals
['barred owl', 'cat', 'coral', 'elephant', 'groundhog', 'lizard', 'bear']

Should the index of the new value be important, you have to use .insert as .append will only ever, you guessed it: append. The .insert functions takes two arguments, the index where a new value should be inserted and the value that should be inserted: list.insert(index, value). The index of the subsequent values will shift +1 accordingly. Assuming, we want to add a new value at the third index of each of our lists, this how we would do that:

movies.insert(2, 'The Big Lebowski')
print(movies)
['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Inception', 'Harry Potter and the MatLab-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']
snacks.insert(2, 'PB&J')
print(snacks)
['Banana', 'Kinder bueno', 'PB&J', 'Pringles', 'Rocher', 'carrots', 'chocolate', 'chocolate', 'chocolate', 'dark chocolate', 'dominoes', 'dried mango', 'fruit', 'hummus', 'mint chocolate', 'pancakes', 'snickers', 'snickers', 'sushi', 'too complicated', 'Wasabi Peanuts']
animals.insert(2, 'Manatee')
print(animals)
['barred owl', 'cat', 'Manatee', 'coral', 'elephant', 'groundhog', 'lizard', 'bear']

If you want to change the value of a list element (e.g. you noticed an error and need to change the value), you can do that directly by assigning a new value to the element at the given index:

print('The element at index 15 of the list movies is: %s\n' %movies[15])
movies[15] = 'Harry Potter and the Python-Prince'
print('It is now %s\n' %movies[15])
print(movies)
The element at index 14 of the list movies is: Harry Potter and the MatLab-Prince

It is now Harry Potter and the Python-Prince

['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Inception', 'Harry Potter and the Python-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']

Please note that this is final and the original value overwritten. The characteristic of modifying lists by assigning new values to elements in the list is called mutable in technical jargon.

If you want to remove an element of a given list (e.g. you noticed there are unwanted duplicates, etc.), you basically have two options list.remove(element) and del list[index] and which one you have to use depends on the goal. As you can see .remove(element) expects the element that should be removed from the list, that is the element with the specified value. For example, if we want to remove the duplicate Shutter Island, we can do the following:

movies.remove("Shutter Island")
print(movies)
['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Inception', 'Harry Potter and the Python-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']

As you can see, only one “Shutter Island” is removed and not both, this is because .remove(element) only removes the first element of the specified value. Thus, if there are more elements you want to remove, the del function could be more handy.

You may have noticed that the del function expects the index of the element that should be removed from the list. Thus, if we, for example, want to remove the duplicate Inception, we can also achieve that via the following:

del movies[12]
print(movies)
['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Harry Potter and the Python-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']

If there are multiple elements you want to remove, you can make use of slicing again. For example, our snack list has multiple elements with the value chocolate. Nothing against chocolate, but you might want to have this value only once in our list. To achieve that, we basically combine slicing and del via indicating what indices should be removed:

del snacks[7:9]
print(snacks)
['Banana', 'Kinder bueno', 'PB&J', 'Pringles', 'Rocher', 'carrots', 'chocolate', 'dark chocolate', 'dominoes', 'dried mango', 'fruit', 'hummus', 'mint chocolate', 'pancakes', 'snickers', 'snickers', 'sushi', 'too complicated', 'Wasabi Peanuts']

Lists play a very important role in python and are for example used in loops and other flow control structures (discussed in the next session). There are a number of convenient functions for generating lists of various types, for example, the range function (note that in Python 3 range creates a generator, so you have to use the list function to convert the output to a list). Creating a list that ranges from 10 to 50 advancing in steps of 2 is as easy as:

start = 10
stop = 50
step = 2

list(range(start, stop, step))
[10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48]

This can be very handy if you want to create participant lists and/or condition lists for your experiment and/or analyzes.

Exercise 6.1

Create a variable called rare_animals that stores the following values: ‘Addax’, ‘Black-footed Ferret’, ‘Northern Bald Ibis’, ‘Cross River Gorilla’, ‘Saola’, ‘Amur Leopard’, ‘Philippine Crocodile’, ‘Sumatran Rhino’, ‘South China Tiger’, ‘Vaquita’. After that, please count how many elements the list has and provide the info within the following statement: “There are [insert number of elements here] animals in the list.”

# Please write your solution here
rare_animals = ['Addax', 'Black-footed Ferret', 'Northern Bald Ibis', 'Cross River Gorilla', 'Saola', 'Amur Leopard', 'Philippine Crocodile', 'Sumatran Rhino', 'South China Tiger', 'Vaquita']
print("There are %s animals in the list." %len(rare_animals))
There are 10 animals in the list.

Exercise 6.2

Please add ‘Manatee’ to the list and subsequently evaluate how many unique elements the list has.

# Please write your solution here
rare_animals.append('Manatee')
len(set(rare_animals))
11

Exercise 6.3

Learning that the manatee is not endangered anymore, please remove it from the list. Unfortunately, we have to add “Giant Panda”. Could you please do that at index 3.

# Please write your solution here
rare_animals.remove('Manatee') # or del rare_animals[11]
rare_animals.insert(2, 'Giant Panda')
rare_animals
['Addax',
 'Black-footed Ferret',
 'Giant Panda',
 'Northern Bald Ibis',
 'Cross River Gorilla',
 'Saola',
 'Amur Leopard',
 'Philippine Crocodile',
 'Sumatran Rhino',
 'South China Tiger',
 'Vaquita']

Tuples

Tuples are like lists, except that they cannot be modified once created, that is they are immutable.


In python, tuples are created using the syntax (..., ..., ...), or even ..., ...:

point = (10, 20, 'Whoa another thing')
print(type(point))
print(point)
<class 'tuple'>
(10, 20, 'Whoa another thing')

Elements of tuples can also be referenced via their respective index:

print(point[0])
print(point[1])
print(point[2])
10
20
Whoa another thing

However, as mentioned above, if we try to assign a new value to an element in a tuple we get an error:

try:
    point[0] = 20
except(TypeError) as er:
    print("TypeError:", er)
else:
    raise
TypeError: 'tuple' object does not support item assignment

Thus, tuples also don’t have the set of functions to modify elements lists do.

Exercise 7.1

Please create a tuple called deep_thought with the following values: ‘answer’, 42.

# Please write your solution here
deep_thought = ('answer', 42)
deep_thought
('answer', 42)

Dictionaries

Dictionaries are also like lists, except that each element is a key-value pair. That is, elements or entries of the dictionary, the values, can only be assessed via their respective key and not via indexing, slicing, etc. .


The syntax for dictionaries is {key1 : value1, ...}.

Dictionaries are fantastic if you need to organize your data in a highly structured way where a precise mapping of a multiple of types is crucial and lists might be insufficient. For example, we want to have the information we assessed above for our lists in a detailed and holistic manner. Instead of specifying multiple lists and variables, we could also create a dictionary that comprises all of that information.

movie_info = {"n_responses" :        len(movies),
              "n_responses_unique" : len(set(movies)),
              "responses" :          movies}

print(type(movie_info))
print(movie_info)
<class 'dict'>
{'n_responses': 19, 'n_responses_unique': 19, 'responses': ['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Harry Potter and the Python-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']}

See how great this is? We have everything in one place and can access the information we need/want via the respective key.

movie_info['n_responses']
19
movie_info['n_responses_unique']
19
movie_info['responses']
['The_Intouchables',
 'The Philosophers',
 'The Big Lebowski',
 'The Imitation Game',
 'The Great Gatsby',
 'Shutter Island',
 'Shindlers List',
 'Retired Extremely Dangerous',
 'Love actually',
 'Lord of the Rings - The Two Towers',
 'James Bond',
 'Interstellar',
 'Inception',
 'Harry Potter and the Python-Prince',
 'Harry Potter',
 'Forrest Gump',
 'Fight Club',
 'Call Me by Your Name',
 'My Neighbor Totoro']

As you can see and mentioned before: like lists, each value of a dictionary can entail various types: integer, float, string and even data types: lists, tuples, etc. . However, in contrast to lists, we can access entries can only via their key name and not indices:

movie_info[1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/61/0lj9r7px3k52gv9yfyx6ky300000gn/T/ipykernel_86733/2976284230.py in <module>
----> 1 movie_info[1]

KeyError: 1

If you try that, you’ll get a KeyError indicating that the key whose value you want to access doesn’t exist. If you’re uncertain about the keys in your dictionary, you can get a list of all of them via dictionary.keys():

movie_info.keys()
dict_keys(['n_responses', 'n_responses_unique', 'responses'])

Comparably, if you want to get all the values, you can use dictionary.values() to obtain a respective list:

movie_info.values()
dict_values([19, 19, ['The_Intouchables', 'The Philosophers', 'The Big Lebowski', 'The Imitation Game', 'The Great Gatsby', 'Shutter Island', 'Shindlers List', 'Retired Extremely Dangerous', 'Love actually', 'Lord of the Rings - The Two Towers', 'James Bond', 'Interstellar', 'Inception', 'Harry Potter and the Python-Prince', 'Harry Potter', 'Forrest Gump', 'Fight Club', 'Call Me by Your Name', 'My Neighbor Totoro']])

Assuming you want to add new information to your dictionary, i.e. a new key, this can directly be done via dictionary[new_key] = value. For example, you run some stats on our list of movies and determined that this selection is significantly awesome with a p value of 0.000001, we can easily add this to our dictionary:

movie_info['movie_selection_awesome'] = True
movie_info['movie_selection_awesome_p_value'] = 0.000001
movie_info
{'n_responses': 19,
 'n_responses_unique': 19,
 'responses': ['The_Intouchables',
  'The Philosophers',
  'The Big Lebowski',
  'The Imitation Game',
  'The Great Gatsby',
  'Shutter Island',
  'Shindlers List',
  'Retired Extremely Dangerous',
  'Love actually',
  'Lord of the Rings - The Two Towers',
  'James Bond',
  'Interstellar',
  'Inception',
  'Harry Potter and the Python-Prince',
  'Harry Potter',
  'Forrest Gump',
  'Fight Club',
  'Call Me by Your Name',
  'My Neighbor Totoro'],
 'movie_selection_awesome': True,
 'movie_selection_awesome_p_value': 1e-06}

As with lists, the value of a given element, here entry can be modified and deleted. This means they are mutable. If we for example forgot to correct our p value for multiple comparisons, we can simply overwrite the original with corrected one:

movie_info['movie_selection_awesome_p_value'] = 0.001
movie_info
{'n_responses': 19,
 'n_responses_unique': 19,
 'responses': ['The_Intouchables',
  'The Philosophers',
  'The Big Lebowski',
  'The Imitation Game',
  'The Great Gatsby',
  'Shutter Island',
  'Shindlers List',
  'Retired Extremely Dangerous',
  'Love actually',
  'Lord of the Rings - The Two Towers',
  'James Bond',
  'Interstellar',
  'Inception',
  'Harry Potter and the Python-Prince',
  'Harry Potter',
  'Forrest Gump',
  'Fight Club',
  'Call Me by Your Name',
  'My Neighbor Totoro'],
 'movie_selection_awesome': True,
 'movie_selection_awesome_p_value': 0.001}

Assuming we then want to delete the p value from our dictionary because we remembered that the sole focus on significance and p values brought the entirety of science down to a very bad place, we can achieve this via:

del movie_info['movie_selection_awesome_p_value']
movie_info
{'n_responses': 19,
 'n_responses_unique': 19,
 'responses': ['The_Intouchables',
  'The Philosophers',
  'The Big Lebowski',
  'The Imitation Game',
  'The Great Gatsby',
  'Shutter Island',
  'Shindlers List',
  'Retired Extremely Dangerous',
  'Love actually',
  'Lord of the Rings - The Two Towers',
  'James Bond',
  'Interstellar',
  'Inception',
  'Harry Potter and the Python-Prince',
  'Harry Potter',
  'Forrest Gump',
  'Fight Club',
  'Call Me by Your Name',
  'My Neighbor Totoro'],
 'movie_selection_awesome': True}

Exercise 8.1

Oh damn, we completely forgot to create a comparable dictionary for our snacks list. How would create one that follows the example from the movie list? NB: you can skip the p value right away:

# Please write your solution here
snack_info = {"n_responses" :            len(snacks),
              "n_responses_unique" :     len(set(snacks)),
              "responses" :              snacks,
              "snack_selection_awesome": True}

print(type(snack_info))
print(snack_info)
<class 'dict'>
{'n_responses': 19, 'n_responses_unique': 18, 'responses': ['Banana', 'Kinder bueno', 'PB&J', 'Pringles', 'Rocher', 'carrots', 'chocolate', 'dark chocolate', 'dominoes', 'dried mango', 'fruit', 'hummus', 'mint chocolate', 'pancakes', 'snickers', 'snickers', 'sushi', 'too complicated', 'Wasabi Peanuts'], 'snack_selection_awesome': True}

Exercise 8.2

Obviously, we would love to do the same thing for our list of animals again!

# Please write your solution here
animal_info = {"n_responses" :            len(animals),
              "n_responses_unique" :      len(set(animals)),
              "responses" :               animals,
              "animal_selection_awesome": True}

print(type(animal_info))
print(animal_info)
<class 'dict'>
{'n_responses': 8, 'n_responses_unique': 8, 'responses': ['barred owl', 'cat', 'Manatee', 'coral', 'elephant', 'groundhog', 'lizard', 'bear'], 'animal_selection_awesome': True}

Everything is an object in Python

  • All of these data types are actually just objects in python

  • Everything is an object in python!

  • The operations you can perform with a variable depend on the object’s definition

  • E.g., the multiplication operator * is defined for some objects but not others


Homework assignment #4

Your fourth homework assignment will entail working through a few tasks covering the contents discussed in this session within of a jupyter notebook. You can download it here. In order to open it, put the homework assignment notebook within the folder you stored the course materials, start a jupyter notebook as during the sessions, navigate to the homework assignment notebook, open it and have fun! NB: a substantial part of it will be optional and thus the notebook will look way longer than it actually is.

Deadline: 15/12/2021, 11:59 PM EST

Also, in preparation for our switch to more applied things, starting with experiments in python and thus endeavors in psychopy, could you please create a new respective conda environment (remember those?) via the following lines:

conda create -n psychopy psychopy
conda activate psychopy
pip install jedi psychtoolbox pygame pyo pyobjc python-vlc ujson

and then test if everything works via running psychopy (type it and then press “enter”) from within the new environment?