Classification of guitar playing techniques
In this blog post, I will describe a ML-project where I have attempted to classify different types of playing techniques performed on an electric guitar. The model is based on the use of a support vector machine with an RBF-kernel and linear discriminant analysis (LDA) as a dimensionality reduction-algorithm. The algorithm has been tuned with a grid-search and gone through 10-fold cross-validation. Where the best results average somewhere between 67% and 70% accuracy with the complete dataset used.
Why do you want to spend your valueable time training a model to distinguish between a bend and a slide?
Great question! The guitar is a complicated instrument. First of all it has a tuning system which really doesn´t make any sense. You also have the ability to play the same note in different positions on the neck. An E4 can be played on five different strings if you have a guitar with less than 24 frets. So, if you want to transcribe a guitar solo you need to figure out what notes are being played, on which strings they are being played, if the note is played with a pick or with a finger, is it a bended note, pre-bend, vibrato etc etc. So, why am I rambling on about this? Well, being able to classify different techniques is a building block in the art of automatic guitar transcription. If this sounds interesting you can have a look at this or this. Since I am in no way an expert at machine learning I put my focus on only one of the aspects of automatic guitar transcription.
For me to be able to perform this task I needed a bunch of audio clips. Luckily for me some people in Taiwan had compiled a nice dataset consisting of 6580 unique audio clips demonstrating 7 different playing techniques. If you want to have a look at this data I don´t recommend going to this website and try downloading it. The site is not being maintained and the download took me aprox. 12 hours. So just contact me instead and I will help you out. The dataset is divided into 7 main folders where different guitar sounds are used in order to emulate a “real-world” situation. And make the task of classifying stuff a bit more challenging.
As I hinted at in my intro the classifier I used for this project was a Support Vector Machine algorithm(SVM). SVMs gives you great flexibility, computational efficiency, capacity to handle high dimensional data. By conducting a grid search I did some fine tuning of the hyper-parameters and came to the conclusion that it performed okay with these parameters:
svm = SVC(C = 1.4, decision_function_shape = 'ovo', gamma = 'scale', kernel = 'rbf')
The kernel is the Gaussian RBF kernel with gamma set to scale. The gamma-parameter makes the bell-shaped curve of the RBF narrower and when it is set to scale the gamma is calculated depending on the size of the features. OvO stands for one-versus-one strategy. In the case of this particular dataset with 6 classes, the OvO decision function end up training 15 binary classifiers on the training set.
The features I choose to extract from the audio files were: Melspectrogram, MFCC, spectral centroid and spectral roll-off. The melspectrogram and MFCC were the most efficient features to extract. And both of these are for good reasons popular features when it comes to timbral tasks like this in speech recognition, music genre classification and other MIR-stuff.
In this 2-week intensive workshop I have gained a little insight into the world of machine learning. Although I didn´t manage to build a perfect model I got to experience the confusing and difficult task of classifying audio. If I were to continue on this project I would first and foremost spend more time understanding the basic concepts of meachine learning, dive a bit more into how the librosa library actually works and try to compare different approaches to this specific task. It would also by interesting to look at other components in the transcription process. And over time try to build a complete functioning transcribing system.