The Role of Voice Cloning in Detecting Synthesized Audio

Presenter
James D. Cawley
Campus
UMass Boston
Sponsor
Zaur Rzakhanov, Department of Accounting and Finance, UMass Boston
Schedule
Session 2, 11:30 AM - 12:15 PM [Schedule by Time][Poster Grid for Time/Location]
Location
Poster Board A39, Campus Center Auditorium, Row 2 (A21-A40) [Poster Location Map]
Abstract
Voice phishing scams like fake IRS calls threaten vulnerable populations such as senior citizens. Emails written by generative artificial intelligence programs are used to deceive company employees into giving confidential information away. This article considers the newest threat to people's privacy which comes as the combination of generative artificial intelligence and voice cloning technology to deceive recipients of fraudulent phone calls. This article consists of a literature review and survey which will present the factors that enter into artificially intelligent voice cloning technology and how real voices differ from synthesized audio regarding these factors. To this end, I have cloned my voice using PlayHT, a neural network model that does not require an abundance of voice data to make voice clones. This article provides a layman's perspective of how synthesized audio is distinguished from real audio on a spectrogram and shows how the variables visually interact with each other. This study intends to provide ordinary citizens and employees of companies with a method of verifying whether a voice is synthesized or real via easily-accessible technology. Lastly, I discuss whether or not the survey data and variables gathered from the literature review apply to specific populations such as company employees or senior citizens. 
Keywords
generative AI, phishing, sound analysis, text-to-speech synthesis, voice cloning
Research Area
Cybersecurity

SIMILAR ABSTRACTS (BY KEYWORD)

Research Area Presenter Title Keywords