Automatic Spectral Classification of Stars using Machine Learning: An Approach based on the use of Unbalanced Data


Marco Oyarzo Huichaqueo1 and Renato Munoz Orrego2, 1Rovira i Virgili University, Spain, 2Technical University of Madrid, Spain


With the increase in astronomical surveys, astronomers are faced with the challenging task of analyzing a large amount of data in order to classify observed objects into hard-to-distinguish classes. This article presents a machine learning-based method for the automatic spectral classification of stars from the latest release of the SDSS database. We propose the combinatorial use of spectral data, derived stellar data, and calculated data to create patterns. Using these patterns as inputs, we develop a Random Forest model that outputs the spectral class of the observed star. Our model is able to classify data into six complex classes: A, F, G, K, M, and Carbon stars. Due to the unbalanced nature of the data, we train our model considering three data use cases: using the original data, using under-sampling, and over-sampling data techniques. We further test our model by using a fixed dataset and a stratified dataset. From this, we analyze the performance of our model through statistical metrics. The experimental results showed that the combinatorial use of data as an input pattern contributes to improve the prediction scores in all data use cases, meanwhile, the model trained with augmented data outperforms the other cases. Our results suggest that machine learning-based spectral classification of stars may be useful for astronomers.


Spectral Classification, Machine Learning, Data Analysis, Astronomy.