trebolium.github.io

CV available here

PhD Candidate

I’m Brendan. I am a PhD candidate at the Centre for Digital Music (C4DM), in the Electrical Engineering and Computer Science department (EECS), at Queen Mary University of London (QMUL). My research is focused on singing attribute conversion with neural networks.

My research at C4DM has allowed me to work intimately in the field of Machine Learning and Audio. While working here I have covered topics such as:

Languages: Python, Linux, git, php, xml
Libraries: Pytorch, Numpy, Pandas, Scikit-learn
Data Management: Collection, Mining, Preprocessing
Algorithms: Search, Evolutionary, Neural Networks
Neural Networks: CNNs, RNNs, Autoencoders, VAEs, GANs, Vocoders, Transformers, Diffusers
Other Predictive Algorithms: Linear and Logistic Regression, SVMs, Decision Trees, Clustering
Task Types: Discriminative and Generative Tasks, Supervised and Unsupervised Learning Techniques
PhD Projects: Voice Identification, Style Classification, Attribute Disentablement, Attribute Conversion, Audio Synthesis
Other Projects: Beat-tracking, Melodic Estimation, Audio Fingerprinting, Singing Voice Detection, Spoken Conversation Analysis
Experiment Design: User Interface, Listening Studies, Evaluation Strategies, Statistical Analysis

While working at C4DM, I have had the pleasure of working as a teaching assistant for a number of undergraduate and postgraduate courses such as:

Principles of Machine Learning
Artificial Intelligence
Python Programming
Creating Interactive Objects
Digital Audio
Professional Research Practice

Background

However, my background is immersed in music technology, performance, composition, and teaching. See my CV for more information about this.

Repositories to Visit

Wait till I’ve finished submiting my PhD in a few weeks! I plan to clean this section up. In the mean time, you are welcome to visit the repository on Perceptual Spaces of the Singing Voice here.

Repositories that you probably shouldn’t yet look at. But here they are for you, at my own peril, alongside the relevant description:

Title: Singing Technique Classification, Repo link, Paper Link
Title: Singing Technique Conversion, Repo link, Paper Link
Title: Singer Identity Conversion, Repo link, Paper Link
Title: Beat Tracker, Repo Link, Module on Music Informatics
Title: Audio Fingerprinting, Repo Link, Module on Music Information Retrieval
Title: Custom MIR Toolkit, Repo Link, <!– ## Voice Work

Our most recent work focuses on generating a voice timbre encoder designed specifically for the singing voice. Previous work exploring voice conversion in the singing domain have used encoder’s trained on speech data to achieve the task of singing voice conversion. We have used a similar architecture proposed by Wan et al. 2018 and implemented by CorentinJ for this task, and trained it on a number of features and combinations of datasets. The implementation of this network is featured here.

Commented on 2022.02.22 -> Please note that documentation and formatting for the public eye is underway. Links to some of the repos below have been temporarily disabled

Our published work on Zero-shot Singing Voice Conversion describes a process for converting the perceived singing technique of a sung passage to a target technique, without affecting any other vocal attributes. The framework involves using the AutoVC framework (a repository for this has kindly supplied by the author Kaizhi Qian/Auspicious3000) which is conditioned on the output embeddings of a pretrained singing technique classifier (the code of which is available here) The presentation can be found on the CMMR2021 Youtube channel here.

Our more recent work explores the WORLD vocoder for its voice-specific features and pitch-invariant properties. The WORLD vocoder is used to train a singer-identity encoder, which captures the most important features of the vocal timbre and provides these as embeddings. This will be used to train an autoVC, the bottleneck embeddings of which we look forward to using as input to a VAE.

To ensure our models’ latent space is reflecting something similar to that of human perception, we derived dissimilarity ratings by publishing a listening test, where users rated how different vocal sounds were from one another. The setup, analysis and conclusions are all documented in our paper, An Exploratory Study on Perceptual Spaces of the Singing Voice and illustrated at the Joint AI in Music Creativity conference presentation. Results and analysis are presented here!

MIR Tasks

I also have experience in DSP, which has been made applicable in my studies on music information retrieval. An example of this can be found in the Shazam imitator repo, were recorded clips of a song in a noisey environment can be submitted as a query and compared with a song database. The algorithm returns the top three most likely matches for the given query. Results were evaluated using a subset of classical and pop songs from the GTZAN dataset. The algorithm is inspired from Fundamentals of Music Processing (Muller, 2015).

I have also designed a basic beat-tracker that estimates tempo and follows the beat of music, using a combination of techniques from multiple researchers. Results were evaluated using the Ballroom dataset. The repository for this can be found here.

Separate documentation providing further referencing and context for these applications for these repositories are available on request. –>