Abstract
Data privacy in genome-wide association studies (GWAS) is a critical yet under-exploited research area. In this paper, we first provide a method to construct a two-layered bayesian network explicitly revealing the conditional dependency between SNPs and traits, from the public GWAS catalog. Then we develop efficient algorithms for two attacks: identity inference attack and trait inference attack based on reasoning with the dependency relationship captured in the constructed bayesian network. Different from previously proposed attacks, the possible target of our attacks may be any common people, not limited to GWAS participants. The empirical evaluations show that unprotected statistics released from GWAS can be exploited by attackers to identify individual or derive private information. Thus we show that mining GWAS statistics threatens the privacy of a much wider population and privacy protection mechanisms should be employed.