Yongxian Fan* and Haibo Xu Pages 1169 - 1178 ( 10 )
Background: CRISPR/Cas9, a new generation of targeted gene editing technology with low cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible, but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction.
Methods: We compared three encoding methods based on One-Hot and combined the gene sequence with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance.
Results: The performance is compared with existing tools based on the ROC value and PRC value. The experimental results show that the XGBCRISPR model is superior to the existing tools.
Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy of model can be improved further as many off-target scores appear.
CRISPR/Cas9, off-target effects, machine learning, ensemblelearning, XGBoost, XGBCRISPR.
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi