Publication Type : Journal Article
Source : Circuits Syst Signal Process (CSSP) ,42, 3464–3484 (2023)., ,Impact factor: 2.3, SJR: 0.494, Indexing: SCIE
Url : https://doi.org/10.1007/s00034-022-02278-y
Campus : Coimbatore
School : School of Artificial Intelligence
Center : Center for Computational Engineering and Networking
Year : 2023
Abstract : Identification of multiple predominant instruments in polyphonic music is addressed using convolutional neural networks (CNN) through Mel-spectrogram, modgd-gram, and its fusion. Modgd-gram, a visual representation, is obtained by stacking modified group delay functions of consecutive frames successively. CNN learns the distinctive local characteristics from the visual representation and classifies the instrument to the group to which it belongs. The proposed system is systematically evaluated using the IRMAS dataset. We trained our networks using fixed-length audio excerpts to recognize multiple predominant instruments from the variable-length testing files. A wave-generative adversarial network (WaveGAN) architecture is also employed to generate audio files for data augmentation. We experimented with different fusion techniques, early fusion, mid-level fusion, and late or score-level fusion. The late fusion experiment reports a micro and macro F1 score of 0.69 and 0.62, respectively. These metrics are 7.81% and 12.73% higher than those obtained by the state-of-the-art Han’s model. The architectural choice of CNN with score-level fusion on Mel-spectro/modgd-gram has merit in recognizing the predominant instruments in polyphonic music.
Cite this Research Publication : Lekshmi C. R., Rajeev, R. “Multiple Predominant Instruments Recognition in Polyphonic Music using Spectro/Modgd-gram Fusion”, Circuits Syst Signal Process (CSSP) ,42, 3464–3484 (2023)., ,Impact factor: 2.3, SJR: 0.494, Indexing: SCIE