Publication Type : Conference Paper
Publisher : Procedia Computer Science
Source : Procedia Computer Science, Volume 233, 2024, Pages 733-742, 5th International Conference on Innovative Data Communication Technologies and Application, ICIDCA 2024; Coimbatore; India; 10 January 2024 through 11 January 2024; Code 199107
Campus : Amritapuri
School : School of Computing
Center : AmritaCREATE
Department : Computer Science and Engineering
Year : 2024
Abstract : Active Speaker Detection is pivotal in a multitude of applications, particularly in processing live Audio-Video (AV) streams. Current implementations predominantly focus on processing saved video files, limiting their real-time applicability. Addressing this gap, the proposed model leverages a multi-threading-based system to detect active speakers in live AV streams. This system forms a critical component in an innovative software solution designed to generate real-time subtitles and elegantly overlay them aside from the active speaker. This feature is especially beneficial for individuals with hearing impairments and facilitates the transcription of foreign languages into English, thereby improving human interaction and understanding. Our approach stands out for its ability to process live AV streams promptly for immediate speaker identification and subtitle overlay, marking a significant advancement in real-time communication assistance. © 2024 The Authors. Published by Elsevier B.V.
Cite this Research Publication : Madamanchi, S., Kushal, G., Ravikumar, S., Dhanvin, P., Remya, M.S., Nedungadi, P., "Real-Time Speaker Identification and Subtitle Overlay with Multithreaded Audio Video Processing," Procedia Computer Science, Volume 233, 2024, Pages 733-742, 5th International Conference on Innovative Data Communication Technologies and Application, ICIDCA 2024; Coimbatore; India; 10 January 2024 through 11 January 2024; Code 199107