Last semester a group of classmates and I had the opportunity to tackle the “Galaxy Zoo Challenge” by Kaggle as a project for a machine learning course. We had quite a lot of fun, and ended up getting a Top 100 score. As a result, I thought it would be beneficial for others to make our report and results available. The full report is available here, and below I have provided its abstract:
In 2014, Kaggle organized the Galaxy Zoo challenge to analyze images of galaxies from the Sloan Digital Sky Survey in order to automate the tagging of morphological attributes. For a machine to automatically learn the features directly from data, a deep convolutional neural network was implemented based on an existing architecture that placed highly in a competitive image classification competition a number of years prior. The dataset consisted of 79975 test images and 61578 training images which were cropped and downsampled for dimensionality reduction and memory efficiency. The model was trained with a variety of batch sizes and input formats. These input formats included further dimensionality reductions, data augmentation, and color space conversions. One of the methods ultimately attained a Top 100 score on the Kaggle leaderboards. Further advances can be done to improve the performance of the model by tuning other hyper-parameters such as learning rate and number of epochs.