{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Model evaluation & cross-validation\n",
"\n",
"[José C. García Alanis (he/him)](https://github.com/JoseAlanis) \n",
"Research Fellow - Child and Adolescent Psychology at [Uni Marburg](https://www.uni-marburg.de/de) \n",
"Member - [RTG 2271 | Breaking Expectations](https://www.uni-marburg.de/en/fb04/rtg-2271), [Brainhack](https://brainhack.org/)\n",
"\n",
"
@JoiAlhaniz \n",
"\n",
"\n",
"
\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Aim(s) of this section\n",
"\n",
"As mention in the previous section, it is not sufficient to apply these methods to learn somthing about the nature of our data. It is always necessary to assess the quality of the implemented model. The goal of these section is to look at ways to estimate the generalization accuracy of a model on future (e.g.,unseen, out-of-sample) data.\n",
"\n",
"In other words, at the end of these sections you should know:\n",
"- 1) different techniques to evaluate a given model\n",
"- 2) understand the basic idea of cross-validation and different kinds of the same\n",
"- 3) get an idea how to assess the significance (e.g., via permutation tests)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Prepare data for model\n",
"\n",
"Lets bring back our example data set (you know the song ...)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 155 samples and 2016 features\n"
]
}
],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# get the data set\n",
"data = np.load('MAIN2019_BASC064_subsamp_features.npz')['a']\n",
"\n",
"# get the labels\n",
"info = pd.read_csv('participants.csv')\n",
"\n",
"\n",
"print('There are %s samples and %s features' % (data.shape[0], data.shape[1]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Now let's look at the labels"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | participant_id | \n", "Age | \n", "AgeGroup | \n", "Child_Adult | \n", "Gender | \n", "Handedness | \n", "
---|---|---|---|---|---|---|
0 | \n", "sub-pixar123 | \n", "27.06 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "
1 | \n", "sub-pixar124 | \n", "33.44 | \n", "Adult | \n", "adult | \n", "M | \n", "R | \n", "
2 | \n", "sub-pixar125 | \n", "31.00 | \n", "Adult | \n", "adult | \n", "M | \n", "R | \n", "
3 | \n", "sub-pixar126 | \n", "19.00 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "
4 | \n", "sub-pixar127 | \n", "23.00 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "