Introduction to the (unix) command line: bash#
Before we get started 1…#
most of what you’ll see within this lecture was prepared by Ross Markello and further adapted by Peer Herholz
based on the Software Carpentries “Introduction to the Shell” under CC-BY 4.0
Before we get started 2…#
We’re going to be working with a dataset from https://swcarpentry.github.io/shell-novice/data/data-shell.zip.
Download that file and unzip it on your Desktop!
(The link will be poste so you can just click on it.)
Goals#
learn basic and efficient usage of the shell for various tasks
navigating directories
file handling: copy, paste, create, delete
What is the “shell”?#
The shell is a command-line interface (CLI) to your computer
This is in contrast to the graphical user interfaces (GUIs) that you normally use!
The shell is also a scripting language that can be used to automate repetitive tasks
But what’s this “bash shell”?#
It’s one of many available shells!
sh
- Bourne SHellksh
- Korn SHelldash
- Debian Almquist SHellcsh
- C SHelltcsh
- TENEX C SHellzsh
- Z SHellbash
- Bourne Again SHell <– We’ll focus on this one!
WHY so many?#
They all have different strengths / weaknesses
You will see many of them throughout much of neuroimaging software, too!
sh
is most frequently used in FSLcsh
/tcsh
is very common in FreeSurfer and AFNI
So we’re going to focus on the bash shell?#
Yes! It’s perhaps the most common shell, available on almost every OS:
It’s the default shell on most Linux systems
It’s the default shell in the Windows Subsytem for Linux (WSL)
It’s the default shell on Mac <=10.14
zsh
is the new default on Mac Catalina (for licensing reasons 🙄)But
bash
is still available!!
Alright, but why use the shell at all?#
Isn’t the GUI good enough?
Yes, but the shell is very powerful
Sequences of shell commands can be strung together to quickly and reproducibly make powerful pipelines
Also, you need to use the shell to accesss remote machine / high-performance computing environments (like Compute Canada)
NOTE: We will not be able to cover all (or even most) aspects of the shell today.
But, we’ll get through some basics that you can build on going forward.
The (bash) shell#
Now, let’s open up your terminal!
Windows: Open the Ubuntu application
Mac/Linux: Open the Terminal
When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input:
$
The shell typically uses $
as the prompt, but may use a different symbol.
IMPORTANT: When typing commands, either in this lesson or from other sources, do not type the prompt, only the commands that follow it!
Am I using bash?#
Let’s check! You can use the following command to determine what shell you’re using:
echo $SHELL
If that doesn’t say something like /bin/bash
, then simply type bash
, press Enter, and try running the command again.
then simply type
bash
, pressEnter
, and try running the command againthere might be other ways depending on your
OS/installation
, please let us know
Note: The echo
command does exactly what its name implies: it simply echoes whatever we provide it to the screen!
(It’s like print
in Python / R or disp
in MATLAB or printf
in C or …)
What’s with the $SHELL
?#
Things prefixed with
$
in bash are (mostly) environmental variablesAll programming languages have variables!
We can assign variables in bash but when we want to reference them we need to add the
$
prefixWe’ll dig into this a bit more later, but by default our shell comes with some preset variables
$SHELL
is one of them!
Soooo, let’s try our ~first~ second command in bash!
This command lists the contents of our current directory:
ls
What happens if we make a typo? Or if the program we want isn’t installed on our computer?
Will the computer magically understand what we were trying to do?
ks
Nope! But you will get a (moderately) helpful error message 😁
The cons of the CLI#
You need to know the names of the commands you want to run!
Sometimes, commands are not immediately obvious
E.g., why
ls
overlist_contents
?
Key Points#
A shell is a program whose primary purpose is to accept commands and run programs
The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access remote machines
The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be
Working with Files and Directories#
How do we actually make new files and directories from the command line?
First, let’s remind ourselves of where we are:
cd ~/Desktop/data-shell
pwd
ls -F
Creating a directory#
We can create new directories with the mkdir
(make directory) command:
mkdir thesis
Since we provided a relative path, we can expect that to have been created in our current working directory:
ls -F
(You could have also opened up the file explorer and made a new folder that way, too!)
Good naming conventions#
Don’t use spaces
Don’t begin the name with
-
Stick with letters, numbers,
.
,-
, and_
That is, avoid other special characters like
~!@#$%^&*()
Creating a text file#
Let’s navigate into our (empty) thesis
directory and create a new file:
cd thesis
We can make a file via the following command:
touch draft.txt
touch
creates an empty file. We can see that with ls -l
:
ls -l
Moving files and directories#
Let’s start by going back to the data-shell
directory:
cd ~/Desktop/data-shell
We now have a thesis/draft.txt
file, which isn’t very informatively named. Let’s move it:
mv thesis/draft.txt thesis/quotes.txt
The first argument of mv
is the file we’re moving, and the last argument is where we want it to go!
Let’s make sure that worked:
ls thesis
Note: we can provide more than two arguments to mv
, as long as the final argument is a directory! That would mean “move all these things into this directory”.
Also note: mv
is quite dangerous, because it will silently overwrite files if the destination already exists! Refer to the -i
flag for “interactive” moving (with warnings!).
More on mv
#
Note that we use mv
to change files to a different directory (rather than just re-naming):
mv thesis/quotes.txt .
The .
means “the current directory”, so we should have moved quotes.txt
out of the thesis
directory into our current directory.
Let’s check that worked as expected:
ls thesis
ls quotes.txt
(Note: providing a filename to ls
instead of a directory will list only that filename if it exists. Otherwise, it will throw an error.)
Exercise: Moving files to a new folder#
After running the following commands, Jamie realizes that she put the files sucrose.dat
and maltose.dat
into the wrong folder. The files should have been placed in the raw
folder.
$ ls -F
analyzed/ raw/
$ ls -F analyzed
fructose.dat glucose.dat maltose.dat sucrose.dat
$ cd analyzed
Fill in the blanks to move these files to the raw/ folder (i.e. the one she forgot to put them in):
$ mv sucrose.dat maltose.dat ____/____
mv sucrose.dat maltose.dat ../raw
Remember, the ..
refers to the parent directory (i.e., one above the current directory)
Copying files and directories#
The cp
(copy) command is like mv
, but copies instead of moving!
cp quotes.txt thesis/quotations.txt
ls quotes.txt thesis/quotations.txt
We can use the -r
(recursive) flag to copy a directory and all its contents:
cp -r thesis thesis_backup
ls thesis thesis_backup
Exercise: Renaming files#
Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt
After creating and saving this file you realize you misspelled the filename! You want to correct the mistake and remove the incorrectly named file. Which of the following commands could you use to do so?
cp statstics.txt statistics.txt
mv statstics.txt statistics.txt
mv statstics.txt .
cp statstics.txt .
No: this would create a file with the correct name but would not remove the incorrectly named file
Yes: this would rename the file!
No, the
.
indicates where to move the file but does not provide a new name.
No, the
.
indicates where to copy the file but does not provide a new name.
Moving and Copying#
What is the output of the closing ls
command in the aequence shown below:
$ pwd
/Users/jamie/data
$ ls
proteins.dat
$ mkdir recombine
$ mv proteins.dat recombine
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls
proteins-saved.dat recombine
recombine
proteins.dat recombine
proteins-saved.dat
No:
proteins-saved.dat
is located at/Users/jamie
Yes!
No:
proteins.dat
is located at/Users/jamie/data/recombine
No,
proteins-saved.dat
is located at/Users/jamie
Removing files#
Let’s go back to data-shell
and remove the quotes.txt
file we created:
cd ~/Desktop/data-shell
rm quotes.txt
The rm
command deletes files. Let’s check that the file is gone:
ls quotes.txt
Deleting is FOREVER 💀💀#
The shell DOES NOT HAVE A TRASH BIN.
You CANNOT recover files that have been deleted with
rm
But, you can use the
-i
flag to do things a bit more safely!This will prompt you to type
Y
orN
before every file that is going to be deleted.
Removing directories#
Let’s try and remove the thesis
directory:
rm thesis
rm
only works on files, by default, but we can tell it to recursively delete a directory and all its contents with the -r
flag:
rm -r thesis
Because deleting is forever 💀💀, the rm -r
command should be used with GREAT CAUTION.
Operations with multiple files and directories#
Oftentimes you need to copy or move several files at once. You can do this by specifiying a list of filenames
Exercise: Copy with Multiple Filenames#
(Work through these in the data-shell/data
directory.)
In the example below, what does cp
do when given several filenames and a directory name?
$ mkdir backup
$ cp amino-acids.txt animals.txt backup/
What does cp
do when given three or more filenames?
$ ls
amino-acids.txt animals.txt backup/ elements/ morse.txt pdb/ planets.txt salmon.txt sunspot.txt
$ cp amino-acids.txt animals.txt morse.txt
When given multiple filenames followed by a directory all the files are copied into the directory.
When give multiple filenames with no directory,
cp
throws an error:
cp: target morse.txt is not a directory
Using wildcards for accessing multiple files at once#
*
is a wildcard which matches zero or more characters.
Consider the data-shell/molecules
directory:
ls molecules/*
This matches every file in the molecules
directory.
ls molecules/*pdb
This matches every file in the molecules
directory ending in .pdb
.
ls molecules/p*.pdb
This matches all files in the molecules
directory starting with p
and ending with .pdb
Using wildcards for accessing multiple files at once (cont’d)#
?
is a wildcard matching exactly one character.
ls molecules/?ethane.pdb
This matches any file in molecules
that has one character followed by ethane.pdb
. Compare to:
ls molecules/*ethane.pdb
Which matches any file in molecules
that ends in ethane.pdb
.
Using wildcards for accessing multiple files at once (cont’d)#
You can string wildcards together, too!
ls molecules/???ane.pdb
This matches and file in molecules
that has any three characters and ends in ane.pdb
Wildcards are said to be “expanded” to create a list of matching files. This happens before running the relevant command. For example, the following command will fail:
ls molecules/*pdf
Exercise: List filenames matching a pattern#
When run in the molecules
directory, which ls
command(s) will produce this output?
ethane.pdb methane.pdb
ls *t*ane.pdb
ls *t?ne.*
ls *t??ne.pdb
ls ethane.*
No: This will give
ethane.pdb methane.pdb octane.pdb pentane.pdb
No: this will give
octane.pdb pentane.pdb
Yes!
No: This only shows file starting with
ethane
Key points#
cp old new
copies a filemkdir path
creates a new directorymv old new
moves (renames) a file or directoryrm path
removes (deletes) a file*
matches zero or more characters in a filename, so*.txt
matches all files ending in.txt
?
matches any single character in a filename, so?.txt
matchesa.txt
but notany.txt
The shell does not have a trash bin: once something is deleted, it’s really gone
Summary#
The bash shell is very powerful!
It offers a command-line interface to your computer and file system
It makes it easy to operate on files quickly and efficiently (copying, renaming, etc.)
Sequences of shell commands can be strung together to quickly and reproducibly make powerful pipelines
Soapbox#
Bash is fantastic and you will (likely) find yourself using it a lot!
However, for complex pipelines and programs we would strongly encourage you to use a “newer” programming lanuage
Like Python, which we will also be discussed in this workshop!
There are a number of reasons for this (e.g., better control flow, error handling, and debugging)
References#
There are lots of excellent resources online for learning more about bash:
The GNU Manual is the reference for all bash commands: http://www.gnu.org/manual/manual.html
“Learning the Bash Shell” book: http://shop.oreilly.com/product/9780596009656.do
An interactive on-line bash shell course: https://www.learnshell.org/
Finding Things#
Oftentimes, our file system can be quite complex, with sub-directories inside sub-directories inside sub-directories.
What happens in we want to find one (or several) files, without having to type ls
hundreds or thousands of times?
First, let’s navigate to the data-shell/writing
directory:
cd ~/Desktop/data-shell/writing
The directory structure of data-shell/writing
looks like:
Let’s get our bearings with ls
:
ls
Unfortunately, this doesn’t list any of the files in the sub-directories. Enter find
:
find .
Remember, .
means “the current working directory”. Here, find
provides us a full list of the entire directory structure!
Filtering find
#
We can add some helpful options to find
to filter things a bit:
find . -type d
This will list only the directories underneath our current directory (incluing sub-directories).
Alternatively, we can list only the files with:
find . -type f
We can also match things by name:
find . -name *.txt
Why didn’t this also get the other files??
Remember: wildcards are expanded BEFORE being passed to the command. So, we really want:
find . -name "*.txt"
Executing with find
#
What if we want to perform some operation on the output of our find
command? Say, list the file sizes for each file (as in ls -lh
)?
We can do that with a bit of extra work:
find . -name "*.txt" -exec ls -lh {} \;
Note the very funky syntax:
The
-exec
option means execute the following command,ls -lh
is the command we want to execute,{}
signifies where the output offind
should go so as to be provided to the command we’re executing, and\;
means “this is the end of command we want to execute”
We can also “pipe” the output of find
to the ls -lh
command as follows:
ls -lh $( find . -name "*.txt" )
Here, the $( )
syntax means “run this command first and insert it’s output here”, so ls -lh
is provided the output of the find . -name "*.txt"
command as arguments.