diff --git a/notebooks/1_audio_files.ipynb b/notebooks/1_audio_files.ipynb index 1d15ab38305c258ecf337207046570b2ad985fdc..c7038f7a3b23334d33a1ffb6c1b841bba3e2d5dc 100644 --- a/notebooks/1_audio_files.ipynb +++ b/notebooks/1_audio_files.ipynb @@ -17,7 +17,7 @@ "source": [ "# Audio Files\n", "\n", - "Bundle the provided audio files (400, in MP3) in a tar, encrypt it using gzip and store it in the output folder." + "Bundles the provided audio files (400, in MP3) flattened in a tar ball and compresses it using gzip. This conversion is done to use the filenames as unique identifiers and add the genre information to the filenames. The resulting tar ball is then used as input to generate features from these files." ] }, { @@ -245,4 +245,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/notebooks/2_generate_features.ipynb b/notebooks/2_generate_features.ipynb index 9db55c9285abdb6e507929c01ad5ecdd90ceba12..41e91bf91d4c942786cb288335959979473cedc7 100644 --- a/notebooks/2_generate_features.ipynb +++ b/notebooks/2_generate_features.ipynb @@ -14,7 +14,12 @@ "tags": [] }, "source": [ - "# Feature Extraction of Base audio files from Invenio" + "# Feature Extraction of Base audio files from Invenio\n", + "\n", + "The gzip compressed tarball (audio_tar input) is decompressed. Then for each file the MFCC features are extracted using librosa, \n", + "generating a dataframe with 40 frequency bands (columns) and approximately 2500 samples (rows) for each file.\n", + "\n", + "To persist the dataframe, it is written to a csv file (raw_features output)." ] }, { @@ -702,4 +707,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/notebooks/3_aggregate_features.ipynb b/notebooks/3_aggregate_features.ipynb index 1747a0d10f7b4d238aa614e06025dd679752f0b3..1d73fadfe0bb0e00cbc96939c1408b0f2d49c379 100644 --- a/notebooks/3_aggregate_features.ipynb +++ b/notebooks/3_aggregate_features.ipynb @@ -1,5 +1,13 @@ { "cells": [ + { + "cell_type": "markdown", + "source": [], + "metadata": { + "collapsed": false + }, + "id": "f8fcc84e1bdaecbc" + }, { "cell_type": "markdown", "id": "f48a4573", @@ -16,7 +24,11 @@ "source": [ "# Aggregate MFCC Features\n", "\n", - "Aggregate from n rows par file to 1 (calculate min, max, etc. for each feature)." + "Previously we generated MFCC features for each file with each sample as a row. \\\n", + "Now we want to remove the time dimension and aggregate the features for each file, resulting in one row per file (calculate min, max, etc. for each feature).\n", + "\n", + "The resulting dataframe (400 rows, 40 * 5 columns) is written to the file system as a csv and will be used in the next step to split it into training and test data.\n", + "\n" ] }, { @@ -633,4 +645,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/notebooks/4_split.ipynb b/notebooks/4_split.ipynb index 8fdabc8b2552daa6ee663c19e197de42e1b5e4c2..8cf23376ffb7c9e93653d0a2bd9c2c6b888fbe8e 100644 --- a/notebooks/4_split.ipynb +++ b/notebooks/4_split.ipynb @@ -14,7 +14,11 @@ "tags": [] }, "source": [ - "# Split the Features into Train and Test Set" + "# Split the Features into Train and Test Set\n", + "\n", + "The aggregated MFCC data is loaded and a random 80/20 split is performed, the resulting split is saved to a csv file, indicating whether a file (by filename) is in the training set.\n", + "\n", + "This dataframe will be used to filter the features dataframe to only include the training set for the training of the models in the next step." ] }, { @@ -387,4 +391,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/notebooks/5_ml_model.ipynb b/notebooks/5_ml_model.ipynb index b9e6a9de92ba2987e5b276202690877475c0a8cb..d075e30dea03bdf51430b0aad221d0f2b3ed296d 100644 --- a/notebooks/5_ml_model.ipynb +++ b/notebooks/5_ml_model.ipynb @@ -14,9 +14,20 @@ "tags": [] }, "source": [ - "# ML Experiment code\n", + "# ML Model\n", "\n", - "# Inputs: splits & aggregated features" + "This notebook is used to experiment with a machine learning model (SVM) for the music genre classification task.\n", + "Previously created and aggregated MFCC features are used as training and test data for the SVM model.\n", + "\n", + "The split of the data was determined in the previous step and is also loaded from a csv file.\n", + "\n", + "Once the feature dataframe is split into a training and test set, PCA is applied to reduce the high number of dimensions (200 columns) to a more manageable number (30 components).\n", + "\n", + "The training data is then used to find the optimal SVM parameters via 5-Fold cross validated hyperparameter tuning using grid search. The best model is then used to predict the test data. \n", + "To visualize the results, a confusion matrix is created and the overall (all classes) accuracy is calculated.\n", + "Accuracies for the top 2 predictions are also calculated and visualized.\n", + "\n", + "The final model is then saved to a pickle file, and the predictions are saved to a csv file." ] }, { @@ -3742,4 +3753,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/notebooks/main.ipynb b/notebooks/main.ipynb index c5cad290a6f6f68c0b1324eaaa9f723cb2b93478..f55bf74d78932b1fc1038f262af5b60f58682f53 100644 --- a/notebooks/main.ipynb +++ b/notebooks/main.ipynb @@ -7,9 +7,9 @@ }, "source": [ "# Main Notebook\n", - "Executes all research notebooks one by one and initializes needed connectors and nbconfig.\n", + "Executes all research notebooks one by one and initializes needed connectors and nbconfig. All produced entities are always saved to disk and uploaded (using FAIRnb) if ONLY_LOCAL is False. \n", "\n", - "The entities created by one notebook are passed to the next notebook as dependencies, while moving the entites location. This way the output entities are separated from the input entities." + "The entities created by one notebook are passed to the next notebook as dependencies, creating a pipeline of notebooks." ] }, { diff --git a/notebooks/standalone.ipynb b/notebooks/standalone.ipynb index 187c9ffe4dbc026c09a591c9baa9a009288f99af..50b3b515e74278c531794481922940b5c366783f 100644 --- a/notebooks/standalone.ipynb +++ b/notebooks/standalone.ipynb @@ -8,7 +8,7 @@ "source": [ "# Standalone Notebook\n", "\n", - "Notebook containing the same functionality as main.ipynb, but it includes all steps in one notebook and does not spin up separate Jupyter Kernels and uploads the entities directly." + "Notebook containing the same functionality as main.ipynb, but it includes all steps in one notebook and does not spin up separate Jupyter Kernels and uploads the entities directly, if ONLY_LOCAL is False." ] }, { @@ -96,7 +96,7 @@ "source": [ "## 1. Audio Files\n", "\n", - "Bundle the provided audio files (400, in MP3) in a tar, encrypt it using gzip and store it in the output folder." + "Bundles the provided audio files (400, in MP3) flattened in a tar ball and compresses it using gzip. This conversion is done to use the filenames as unique identifiers and add the genre information to the filenames. The resulting tar ball is then used as input to generate features from these files.\n" ] }, { @@ -179,7 +179,11 @@ "source": [ "## 2. Feature Extraction of Base audio Files from Invenio\n", "\n", - "Load the audio files from the tar, and extract the MFCC features from them and store them in a dataframe." + "\n", + "The gzip compressed tarball (audio_tar input) is decompressed. Then for each file the MFCC features are extracted using librosa, \n", + "generating a dataframe with 40 frequency bands (columns) and approximately 2500 samples (rows) for each file.\n", + "\n", + "To persist the dataframe, it is written to a csv file (raw_features output)." ] }, { @@ -283,7 +287,12 @@ "collapsed": false }, "source": [ - "## 3. Aggregate MFCC Features" + "## 3. Aggregate MFCC Features\n", + "\n", + "Previously we generated MFCC features for each file with each sample as a row. \\\n", + "Now we want to remove the time dimension and aggregate the features for each file, resulting in one row per file (calculate min, max, etc. for each feature).\n", + "\n", + "The resulting dataframe (400 rows, 40 * 5 columns) is written to the file system as a csv and will be used in the next step to split it into training and test data." ] }, { @@ -386,7 +395,11 @@ "collapsed": false }, "source": [ - "## 4. Split the Features into Train and Test Set" + "## 4. Split the Features into Train and Test Set\n", + "\n", + "The aggregated MFCC data is loaded and a random 80/20 split is performed, the resulting split is saved to a csv file, indicating whether a file (by filename) is in the training set.\n", + "\n", + "This dataframe will be used to filter the features dataframe to only include the training set for the training of the models in the next step." ] }, { @@ -436,7 +449,21 @@ "collapsed": false }, "source": [ - "## 5: Machine Learning model training and evaluation" + "## 5: ML model\n", + "\n", + "\n", + "This notebook is used to experiment with a machine learning model (SVM) for the music genre classification task.\n", + "Previously created and aggregated MFCC features are used as training and test data for the SVM model.\n", + "\n", + "The split of the data was determined in the previous step and is also loaded from a csv file.\n", + "\n", + "Once the feature dataframe is split into a training and test set, PCA is applied to reduce the high number of dimensions (200 columns) to a more manageable number (30 components).\n", + "\n", + "The training data is then used to find the optimal SVM parameters via 5-Fold cross validated hyperparameter tuning using grid search. The best model is then used to predict the test data. \n", + "To visualize the results, a confusion matrix is created and the overall (all classes) accuracy is calculated.\n", + "Accuracies for the top 2 predictions are also calculated and visualized.\n", + "\n", + "The final model is then saved to a pickle file, and the predictions are saved to a csv file." ] }, {