Publication Type : Conference Paper
Source : Proceedings of 18th Sound and Music Computing Conference (SMC), Torino, Italy, 29 June – 01 July 2021, pp. 199–206
Url : https://zenodo.org/records/5043841
Campus : Coimbatore
School : School of Artificial Intelligence
Center : Center for Computational Engineering and Networking
Year : 2021
Abstract : Predominant instrument recognition in polyphonic music is addressed using the score-level fusion of two visual representations, namely, Mel-spectrogram and modgdgram. Modgdgram, a visual representation is obtained by stacking modified group delay functions of consecutive frames successively. Convolutional neural networks (CNN) with an attention mechanism, learn the distinctive local characteristics and classify the instrument to the group where it belongs. The proposed system is systematically evaluated using the IRMAS dataset with eleven classes. We train the network using fixed-length singlelabeled audio excerpts and estimate the predominant instruments from variable-length audio recordings. A wave generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. The proposed system reports a micro and macro F1 score of 0.65 and 0.60, respectively, which is 20.37% and 27.66% higher than those obtained by the state-of-the-art Han model. The experiments demonstrate the potential of CNN with attention mechanism on Mel-spectro/modgdgram fusion framework for the task of predominant instrument recognition.
Cite this Research Publication : Lekshmi C. Reghunath. and Rajeev Rajan., “Attention-based Predominant Instruments Recognition in Polyphonic Music”, in Proceedings of 18th Sound and Music Computing Conference (SMC), Torino, Italy, 29 June – 01 July 2021, pp. 199–206