Josephine (Hsin) Liu1,2, Phoebe (Yun) Liu1,2, Joseph (Yu) Liu1,2, Emily X. Ding1, Robert J. Hou1, 1Vineyards AI Lab, New Zealand, 2Rangitoto College, New Zealand
This paper explores the applications of Artificial intelligence (AI) techniques for classifying Deoxyribonucleic Acid (DNA) sequences into their corresponding gene families. The paper focuses on presenting how to treat DNA sequences as a human language to be understood and classified. Specifically, we first transformed the DNA sequences into a more human-like format, then we employed Natural Language Processing (NLP) and Multi-layer perceptron (MLP) algorithms to complete sequence classification into 7 gene families. Our research drew DNA sequence data from three organisms, including humans, dogs, and chimpanzees. Finally, various experiments are conducted to prove the classification performance. In addition, to prove the generalization of this solution, we designed experiments that involved cross-domain testing. These experimental results display not only high accuracy and efficiency but also intriguing findings in life sciences
DNA Sequences, Auto Recognition, Natural Language Processing(NLP), Multi-layer Perceptron (MLP)