Introduction V - Introduction to Python - I

Peer Herholz (he/him)
Habilitation candidate - Fiebach Lab, Neurocognitive Psychology at Goethe-University Frankfurt
Research affiliate - NeuroDataScience lab at MNI/McGill
Member - BIDS, ReproNim, Brainhack, Neuromod, OHBM SEA-SIG, UNIQUE

logo logo   @peerherholz

Before we get started 1…


Objectives 📍

  • learn basic and efficient usage of the python programming language

    • what is python & how to utilize it

    • building blocks of & operations in python

What is Python?

  • Python is a programming language

  • Specifically, it’s a widely used/very flexible, high-level, general-purpose, dynamic programming language

  • That’s a mouthful! Let’s explore each of these points in more detail…

Widely-used

  • Python is the fastest-growing major programming language

  • Top 3 overall (with JavaScript, Java)

High-level

Python features a high level of abstraction

  • Many operations that are explicit in lower-level languages (e.g., C/C++) are implicit in Python

  • E.g., memory allocation, garbage collection, etc.

  • Python lets you write code faster

File reading in Java

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
 
public class ReadFile {
    public static void main(String[] args) throws IOException{
        String fileContents = readEntireFile("./foo.txt");
    }
 
    private static String readEntireFile(String filename) throws IOException {
        FileReader in = new FileReader(filename);
        StringBuilder contents = new StringBuilder();
        char[] buffer = new char[4096];
        int read = 0;
        do {
            contents.append(buffer, 0, read);
            read = in.read(buffer);
        } while (read >= 0);
        return contents.toString();
    }
}

File-reading in Python

open(filename).read()

General-purpose

You can do almost everything in Python

  • Comprehensive standard library

  • Enormous ecosystem of third-party packages

  • Widely used in many areas of software development (web, dev-ops, data science, etc.)

Dynamic

Code is interpreted at run-time

  • No compilation process*; code is read line-by-line when executed

  • Eliminates delays between development and execution

  • The downside: poorer performance compared to compiled languages

(Try typing import antigravity into a new cell and running it!)

What we will do in this section of the course is a short introduction to Python to help beginners to get familiar with this programming language.

It is divided into the following chapters:

Here’s what we will focus on in the first block:

Modules

Most of the functionality in Python is provided by modules. To use a module in a Python program it first has to be imported. A module can be imported using the import statement.

For example, to import the module math, which contains many standard mathematical functions, we can do:

import math

This includes the whole module and makes it available for use later in the program. For example, we can do:

import math

x = math.cos(2 * math.pi)

print(x)
1.0

Importing the whole module us often times unnecessary and can lead to longer loading time or increase the memory consumption. An alternative to the previous method, we can also choose to import only a few selected functions from a module by explicitly listing which ones we want to import:

from math import cos, pi

x = cos(2 * pi)

print(x)
1.0

You can make use of tab again to get a list of functions/classes/etc. for a given module. Try it out via navigating the cursor behind the import statement and press tab:

from math import 

Comparably you can also use the help function to find out more about a given module:

import math
help(math)

It is also possible to give an imported module or symbol your own access name with the as additional:

import numpy as np
from math import pi as number_pi

x  = np.rad2deg(number_pi)

print(x)
180.0

You can basically provide any name (given it’s following python/coding conventions) but focusing on intelligibility won’t be the worst idea:

import matplotlib as pineapple
pineapple.

Exercise 1.1

Import the max from numpy and find out what it does.

# write your solution in this code cell
from numpy import max
help(max)

Exercise 1.2

Import the scipy package and assign the access name middle_earth and check its functions.

# write your solution in this code cell
import scipy as middle_earth
help(middle_earth)
Help on package scipy:

NAME
    scipy

DESCRIPTION
    SciPy: A scientific computing package for Python
    ================================================
    
    Documentation is available in the docstrings and
    online at https://docs.scipy.org.
    
    Contents
    --------
    SciPy imports all the functions from the NumPy namespace, and in
    addition provides:
    
    Subpackages
    -----------
    Using any of these subpackages requires an explicit import. For example,
    ``import scipy.cluster``.
    
    ::
    
     cluster                      --- Vector Quantization / Kmeans
     fft                          --- Discrete Fourier transforms
     fftpack                      --- Legacy discrete Fourier transforms
     integrate                    --- Integration routines
     interpolate                  --- Interpolation Tools
     io                           --- Data input and output
     linalg                       --- Linear algebra routines
     linalg.blas                  --- Wrappers to BLAS library
     linalg.lapack                --- Wrappers to LAPACK library
     misc                         --- Various utilities that don't have
                                      another home.
     ndimage                      --- N-D image package
     odr                          --- Orthogonal Distance Regression
     optimize                     --- Optimization Tools
     signal                       --- Signal Processing Tools
     signal.windows               --- Window functions
     sparse                       --- Sparse Matrices
     sparse.linalg                --- Sparse Linear Algebra
     sparse.linalg.dsolve         --- Linear Solvers
     sparse.linalg.dsolve.umfpack --- :Interface to the UMFPACK library:
                                      Conjugate Gradient Method (LOBPCG)
     sparse.linalg.eigen          --- Sparse Eigenvalue Solvers
     sparse.linalg.eigen.lobpcg   --- Locally Optimal Block Preconditioned
                                      Conjugate Gradient Method (LOBPCG)
     spatial                      --- Spatial data structures and algorithms
     special                      --- Special functions
     stats                        --- Statistical Functions
    
    Utility tools
    -------------
    ::
    
     test              --- Run scipy unittests
     show_config       --- Show scipy build configuration
     show_numpy_config --- Show numpy build configuration
     __version__       --- SciPy version string
     __numpy_version__ --- Numpy version string

PACKAGE CONTENTS
    __config__
    _build_utils (package)
    _distributor_init
    _lib (package)
    cluster (package)
    conftest
    constants (package)
    fft (package)
    fftpack (package)
    integrate (package)
    interpolate (package)
    io (package)
    linalg (package)
    misc (package)
    ndimage (package)
    odr (package)
    optimize (package)
    setup
    signal (package)
    sparse (package)
    spatial (package)
    special (package)
    stats (package)
    version

DATA
    test = <scipy._lib._testutils.PytestTester object>

VERSION
    1.7.1

FILE
    /Users/peerherholz/anaconda3/envs/pfp_2021/lib/python3.9/site-packages/scipy/__init__.py

Exercise 1.3

What happens when we try to import a module that is either misspelled or doesn’t exist in our environment or at all?

  1. python provides us a hint that the module name might be misspelled

  2. we’ll get an error telling us that the module doesn’t exist

  3. python automatically searches for the module and if it exists downloads/installs it

import welovethiscourse
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/var/folders/61/0lj9r7px3k52gv9yfyx6ky300000gn/T/ipykernel_52496/1194139970.py in <module>
----> 1 import welovethiscourse

ModuleNotFoundError: No module named 'welovethiscourse'

Namespaces and imports

  • Python is very serious about maintaining orderly namespaces

  • If you want to use some code outside the current scope, you need to explicitly “import” it

  • Python’s import system often annoys beginners, but it substantially increases code clarity

    • Almost completely eliminates naming conflicts and confusion

Help and Descriptions

Using the function help we can get a description of almost all functions.

help(math.log)
math.log(10)
math.log(10, 2)

Variables and data types

  • in programming variables are things that store values

  • in Python, we declare a variable by assigning it a value with the = sign

    • name = value

    • code variables != math variables

      • in mathematics = refers to equality (statement of truth), e.g. y = 10x + 2

      • in coding = refers to assignments, e.g. x = x + 1

    • Variables are pointers, not data stores!

  • Python supports a variety of data types and structures:

    • booleans

    • numbers (ints, floats, etc.)

    • strings

    • lists

    • dictionaries

    • many others!

  • We don’t specify a variable’s type at assignment

Variables and types

Symbol names

Variable names in Python can contain alphanumerical characters a-z, A-Z, 0-9 and some special characters such as _. Normal variable names must start with a letter.

By convention, variable names start with a lower-case letter, and Class names start with a capital letter.

In addition, there are a number of Python keywords that cannot be used as variable names. These keywords are:

and, as, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while, with, yield

Assignment

(Not your homework assignment but the operator in python.)

The assignment operator in Python is =. Python is a dynamically typed language, so we do not need to specify the type of a variable when we create one.

Assigning a value to a new variable creates the variable:

# variable assignment
x = 1.0

Again, this does not mean that x equals 1 but that the variable x has the value 1. Thus, our variable x is stored in the respective namespace:

x
1.0

This means that we can directly utilize the value of our variable:

x + 3
4.0

Although not explicitly specified, a variable does have a type associated with it. The type is derived from the value it was assigned.

type(x)
float

If we assign a new value to a variable, its type can change.

x = 1
type(x)
int

This outline one further very important characteristic of python (and many other programming languages): variables can be directly overwritten by assigning them a new value. We don’t get an error like “This namespace is already taken.” Thus, always remember/keep track of what namespaces were already used to avoid unintentional deletions/errors (reproducibility/replicability much?).

ring_bearer = 'Bilbo'
ring_bearer
'Bilbo'
ring_bearer = 'Frodo'
ring_bearer
'Frodo'

If we try to use a variable that has not yet been defined we get an NameError (Note for later sessions, that we will use in the notebooks try/except blocks to handle the exception, so the notebook doesn’t stop. The code below will try to execute print function and if the NameError occurs the error message will be printed. Otherwise, an error will be raised. You will learn more about exception handling later.):

try:
    print(Peer)
except(NameError) as err:
    print("NameError", err)
else:
    raise
NameError name 'Peer' is not defined

Variable names:

  • Can include letters (A-Z), digits (0-9), and underscores ( _ )

  • Cannot start with a digit

  • Are case sensitive (questions: where did “lower/upper case” originate?)

This means that, for example:

  • shire0 is a valid variable name, whereas 0shire is not

  • shire and Shire are different variables

Exercise 2.1

Create the following variables n_elves, n_dwarfs, n_humans with the respective values 3, 7.0 and nine.

# write your solution here
n_elves = 3
n_dwarfs = 7.0
n_humans = "nine"

Exercise 2.2

What’s the output of n_elves + n_dwarfs?

  1. n_elves + n_dwarfs

  2. 10

  3. 10.0

n_elves + n_dwarfs

Exercise 2.3

Consider the following lines of code.

ring_bearer = 'Gollum'
ring_bearer
ring_bearer = 'Bilbo'
ring_bearer

What is the final output?

  1. 'Bilbo'

  2. 'Gollum'

  3. neither, the variable got deleted

ring_bearer = 'Gollum'
ring_bearer  
ring_bearer = 'Bilbo'
ring_bearer

Fundamental types & data structures

  • Most code requires more complex structures built out of basic data types

  • data type refers to the value that is assigned to a variable

  • Python provides built-in support for many common structures

    • Many additional structures can be found in the collections module

Most of the time you’ll encounter the following data types

  • integers (e.g. 1, 42, 180)

  • floating-point numbers (e.g. 1.0, 42.42, 180.90)

  • strings (e.g. "Rivendell", "Weathertop")

  • Boolean (True, False)

If you’re unsure about the data type of a given variable, you can always use the type() command.

Integers

Lets check out the different data types in more detail, starting with integers. Intergers are natural numbers that can be signed (e.g. 1, 42, 180, -1, -42, -180).

x = 1
type(x)
int
n_nazgul = 9
type(n_nazgul)
int
remaining_rings = -1
type(remaining_rings)
int

Floating-point numbers

So what’s the difference to floating-point numbers? Floating-point numbers are decimal-point number that can be signed (e.g. 1.0, 42.42, 180.90, -1.0, -42.42, -180.90).

x_float = 1.0
type(x_float)
float
n_nazgul_float = 9.0
type(n_nazgul_float)
float
remaining_rings_float = -1.0
type(remaining_rings_float)
float

Strings

Next up: strings. Strings are basically text elements, from letters to words to sentences all can be/are strings in python. In order to define a string, Python needs quotation marks, more precisely strings start and end with quotation marks, e.g. "Rivendell". You can choose between " and ' as both will work (NB: python will put ' around strings even if you specified "). However, it is recommended to decide on one and be consistent.

location = "Weathertop"
type(location)
str
abbreviation = 'LOTR'
type(abbreviation)
str
book_one = "The fellowship of the ring"
type(book_one)
str

Booleans

How about some Booleans? At this point it gets a bit more “abstract”. While there are many possible numbers and strings, a Boolean can only have one of two values: True or False. That is, a Boolean says something about whether something is the case or not. It’s easier to understand with some examples. First try the type() function with a Boolean as an argument.

b1 = True
type(b1)
bool
b2 = False
type(b2)
bool
lotr_is_awesome = True
type(lotr_is_awesome)
bool

Interestingly, True and False also have numeric values! True has a value of 1 and False has a value of 0.

True + True
2
False + False
0

Converting data types

As mentioned before the data type is not set when assigning a value to a variable but determined based on its properties. Additionally, the data type of a given value can also be changed via set of functions.

  • int() -> convert the value of a variable to an integer

  • float() -> convert the value of a variable to a floating-point number

  • str() -> convert the value of a variable to a string

  • bool() -> convert the value of a variable to a Boolean

int("4")
4
float(3)
3.0
str(2)
'2'
bool(1)
True
Exercise 3.1

Define the following variables with the respective values and data types: fellowship_n_humans with a value of two as a float, fellowship_n_hobbits with a value of four as a string and fellowship_n_elves with a value of one as an integer.

# write your solution here
fellowship_n_humans = 2.0
fellowship_n_hobbits = 'four'
fellowship_n_elves = 1
Exercise 3.2

What outcome would you expect based on the following lines of code?

  1. True - False

  2. type(True)

  1. 1

  2. bool

Exercise 3.3

Define two variables, fellowship_n_dwarfs with a value of one as a string and fellowship_n_wizards with a value of one as a float. Subsequently, change the data type of fellowship_n_dwarfs to integer and the data type of fellowship_n_wizard to string.

fellowship_n_dwarfs = 1.0
fellowship_n_wizards = '1.0'
int(fellowship_n_dwarfs)
1
str(fellowship_n_wizards)
'1.0'

Why do programming/science in Python?

Lets go through some advantages of the python programming language.


https://funvizeo.com/media/memes/9114fb92b16ca1b8/java-python-think-why-waste-time-word-when-few-word-trick-meme-7a08727102156f3c-e9db4e91c4b2a7d5.jpg

Easy to learn

  • Readable, explicit syntax

  • Most packages are very well documented

    • e.g., scikit-learn’s documentation is widely held up as a model

  • A huge number of tutorials, guides, and other educational materials

Comprehensive standard library

  • The Python standard library contains a huge number of high-quality modules

  • When in doubt, check the standard library first before you write your own tools!

  • For example:

    • os: operating system tools

    • re: regular expressions

    • collections: useful data structures

    • multiprocessing: simple parallelization tools

    • pickle: serialization

    • json: reading and writing JSON

Exceptional external libraries

  • Python has very good (often best-in-class) external packages for almost everything

  • Particularly important for “data science”, which draws on a very broad toolkit

  • Package management is easy (conda, pip)

  • Examples:

    • Web development: flask, Django

    • Database ORMs: SQLAlchemy, Django ORM (w/ adapters for all major DBs)

    • Scraping/parsing text/markup: beautifulsoup, scrapy

    • Natural language processing (NLP): nltk, gensim, textblob

    • Numerical computation and data analysis: numpy, scipy, pandas, xarray, statsmodels, pingouin

    • Machine learning: scikit-learn, Tensorflow, keras

    • Image processing: pillow, scikit-image, OpenCV

    • audio processing: librosa, pyaudio

    • Plotting: matplotlib, seaborn, altair, ggplot, Bokeh

    • GUI development: pyQT, wxPython

    • Testing: py.test

    • Etc. etc. etc.

(Relatively) good performance

  • Python is a high-level dynamic language — this comes at a performance cost

  • For many (not all!) use cases, performance is irrelevant most of the time

  • In general, the less Python code you write yourself, the better your performance will be

    • Much of the standard library consists of Python interfaces to C functions

    • Numpy, scikit-learn, etc. all rely heavily on C/C++ or Fortran

Python vs. other data science languages

  • Python competes for mind share with many other languages

  • Most notably, R

  • To a lesser extent, Matlab, Mathematica, SAS, Julia, Java, Scala, etc.

R

  • R is dominant in traditional statistics and some fields of science

    • Has attracted many SAS, SPSS, and Stata users

  • Exceptional statistics support; hundreds of best-in-class libraries

  • Designed to make data analysis and visualization as easy as possible

  • Slow

  • Language quirks drive many experienced software developers crazy

  • Less support for most things non-data-related

MATLAB

  • A proprietary numerical computing language widely used by engineers

  • Good performance and very active development, but expensive

  • Closed ecosystem, relatively few third-party libraries

    • There is an open-source port (Octave)

  • Not suitable for use as a general-purpose language

So, why Python?

Why choose Python over other languages?

  • Arguably none of these offers the same combination of readability, flexibility, libraries, and performance

  • Python is sometimes described as “the second best language for everything”

  • Doesn’t mean you should always use Python

    • Depends on your needs, community, etc.

You can have your cake and eat it!

  • Many languages—particularly R—now interface seamlessly with Python

  • You can work primarily in Python, fall back on R when you need it (or vice versa)

  • The best of all possible worlds?

The core Python “data science” stack

  • The Python ecosystem contains tens of thousands of packages

  • Several are very widely used in data science applications:

  • We’ll cover the first three very briefly here

    • Other tutorials will go into greater detail on most of the others

The core “Python for psychology” stack

  • The Python ecosystem contains tens of thousands of packages

  • Several are very widely used in psychology research:

    • Jupyter: interactive notebooks

    • Numpy: numerical computing in Python

    • pandas: data structures for Python

    • Scipy: scientific Python tools

    • Matplotlib: plotting in Python

    • seaborn: plotting in Python

    • scikit-learn: machine learning in Python

    • statsmodels: statistical analyses in Python

    • pingouin: statistical analyses in Python

    • psychopy: running experiments in Python

    • nilearn: brain imaging analyses in `Python``

    • mne: electrophysiology analyses in Python

  • Execept scikit-learn, nilearn and mne, we’ll cover all very briefly in this course

    • there are many free tutorials online that will go into greater detail and also cover the other packages

Homework assignment #3

Your third homework assignment will entail working through a few tasks covering the contents discussed in this session within of a jupyter notebook. You can download it here. In order to open it, put the homework assignment notebook within the folder you stored the course materials, start a jupyter notebook as during the sessions, navigate to the homework assignment notebook, open it and have fun!

Deadline: 08/12/2021, 11:59 PM EST