Hengyi Zhang and Qinli Zhang* Pages 512 - 523 ( 12 )
Background: A large number of studies have shown that susceptibility to diseases may be related to some Single Nucleotide Polymorphisms (SNPs). Therefore, the location of SNPs associated with diseases in genes can help us understand the genetic mechanism of disease, intervene in risk SNPs and prevent some genetic diseases.Methods: Based on Graph Signal Processing (GSP) theory, a novel method is proposed to locate the risk SNPs in this paper. The proposed method first builds the graph signal model of all SNP loci, and then realizes the location of abnormal SNPs (risk SNPs) based on the joint analysis of the vertex domain and frequency domain of the graph. Results: The experimental results on synthetic datasets show that our method outperforms many existing methods, including BOOST, SNPHarvester, SNPRule, Random Forest (RF), Chi-square Test and LASSO regression in terms of power. The experimental results on two real Genome-Wide Association Studies (GWAS) datasets, Agerelated Macular Degeneration (AMD) and Genetic Disease A (GDA), show that our method not only finds the risk SNPs found by several state-of-the-art methods, including RF, Chi-square Test and LASSO regression, but also discovers three potential risk SNPs. Conclusion: Our method is suitable and effective for the identification of risk SNPs in GWAS.
GSP, SNP, joint analysis of vertex domain and frequency domain, RF, LASSO, Chi-square test.
College of Animal Science and Technology, Northwest A&F University, Yangling, School of Computer Science and Engineering, Yulin Normal University, Yulin