Investigating Acoustic Features for Privacy-Aware Voice Liveness Detection
Voice liveness detection (VLD) is a crucial component of voice authentication systems, especially in the face of increasing voice replay attacks. These attacks involve an adversary using an electronic speaker to replay a genuine speaker's voice, potentially bypassing voice authentication systems. VLD aims to distinguish between a passphrase spoken by a live user and a replayed one, mitigating the risk of voice spoofing attacks.
In this study, we investigate acoustic features for privacy-aware VLD. Our approach involves pre-processing the audio signal and extracting temporal and frequency features, including the harmonic-to-noise ratio, pitch, formants, and spectral flatness. We then train different machine learning classifiers and evaluate their performance using accuracy, precision, recall, and F1 score metrics. We also use the Gini importance score for feature selection and test different sliding window lengths.
Our results show that the Random Forest classifier achieves the best accuracy of 97.19%, with a model size of 6.9 MB and a prediction latency of 7.5 us. The optimal sliding window length was found to be 1 second. We identify acoustic features that are correlated with VLD and plan to investigate more handcrafted features in future work. Our findings contribute to the development of a privacy-aware, lightweight, accurate, and robust VLD mechanism that can enhance security in voice-based authentication systems.
Research Area | Presenter | Title | Keywords |
---|---|---|---|
Probability, Statistics, and Machine Learning | Waghe, Shreyas | Fair Machine Learning (0.827586), Machine Learning (0.923077) | |
Computer Science | Berduo, Alan Jesse | Machine learning | |
Engineering | Li, Agnes | Machine learning | |
Probability, Statistics, and Machine Learning | Rizvanov, Timur | Machine learning | |
Mathematics and Statistics | Burns, Benjamin | machine learning |