2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)
Download PDF

Abstract

Voice is an essential medium for human communication and collaboration, and its trustworthiness is of great importance to humans. Synthesizing fake voices and detecting synthesized voices are two sides of a coin. Both sides have made great strides with the recently prospering deep learning techniques. Attackers started using AI techniques to synthesize, even clone, human voices. Researchers also proposed a series of AI-synthesized voice detection approaches and achieved promising results in laboratory environments.In this paper, we introduced the concept of speaker-irrelative features (SiFs) and a novel detection-bypass idea to camouflage AI-synthesized voices: replacing SiFs of AI-synthesized voices with crafted ones. We implemented a proof-of-concept framework named SiF-DeepVC based on our detection-bypass idea. Experiments show that the existing detection systems would consider the voices output by SiF-DeepVC more human-like than human voices, proving our detection-bypass idea is effective and SiFs are noteworthy in camouflaging AI-synthesized voices.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles