keyboard_arrow_up
Exoplanets Identification and Clustering with Machine Learning Methods

Authors

Yucheng Jin, Lanyi Yang, Chia-En Chiang, University of California-Berkeley, USA

Abstract

The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.


Keywords

Exoplanets Identification and Clustering, Kepler Dataset, Classification Tree, Random Forest, Naïve Bayes, Multi-layer Perceptron, K-means Clustering, K-fold Cross-validation.