Using Machine Learning for Categorizing Satellite Data (Apprentice Position)
Summary
Cyclostationary signals are a special class of signals whose statistical properties do change with time in a periodic or close-to-periodic way. (https://cyclostationary.blog/) There is a lot of research that needs to be done to learn about satellite data mainly about cyclostationary signals (Gardner, 1994) and research on analyzing these data for further processing. Normally, these data needs to be manually analyzed and interesting features needs to be plotted and extracted. In this work, we want to use machine learning on the available data to extract these interesting features automatically. Unsupervised machine learning models help to analyze data and categorize data automatically based on their features. It will help group the data into various plots based on their common features. After that more work can be done including application of various other machine learning models to analyze the data such as learning about cyclostationary data, and their extraction methods.
A preliminary work has already been going on this Spring with two students. Students have completed basic literature review to understand the scope and problem which includes learning about satellite data analysis, cyclostationary signal properties and machine learning process. Basic analysis of a satellite data using data analysis has been completed to visualize various properties has been done. K-means clustering and neutral network algorithms have been implemented to analyze pulsar properties (Lorimer, Kramer, 2012). Currently python is used with its data analysis and machine learning libraries. The experiment results show that the process is computationally complex and taking significant resource and time to execute even for basic algorithms.
In this project, we will continue the research by adding more data sets and implementing other data analysis and machine learning algorithms. The experiment will require high performance computing resource which will be obtained from XSEDE resources.
References:
Gardner, W. A. (1994). An introduction to cyclostationary signals. In Cyclostationarity in communications and signal processing (pp. 1-90). New York: IEEE press.
Lorimer, D. R., & Kramer, M. (2012). Handbook of pulsar astronomy. Handbook of Pulsar Astronomy.
Job Description
Students will develop machine learning models to analyze the raw satellite data and analyze cyclostationary data. Students will be doing data preprocessing, primary data analysis and visualization and then developing models for extracting important features from the data. A few examples of feature analysis using machine learning: -Is significant signals at non-zero cycle frequencies -If so, how to characterize the shape -At which cycle frequencies signals are present -At which radio frequencies do the signals peak -Are they present in both polarizations -Is there any differences between the polarizations Students will be exploring to implement various machine learning models that would work properly with the available dataset. The developed machine learning model should be able to predict the presence and categorize the cyclostationary signals in new input data. Students will also try to parallelize the model to increase efficiency of the model. Students will write reports on the result and share with academic community.
Computational Resources
Students will be using the computational resource in XSEDE. Students will be responsible for exploring various resources available in XSEDE that can be used for machine learning computation. Students will be using Python libraries for data processing such as numpy, pandas and dataframes. Pytorch and Tensorflow libraries will be used.
Contribution to Community
The work done by the students will be shared to XSEDE community as reports. This work will contribute to machine learning and satellite data research domain. Students will also write a conference paper and submit it by the end of the internship.
Position Type
Apprentice
Training Plan
Students have experience with basic machine learning concepts and working with satellite data research. They have been continuously training to use python libraries for data analysis and machine learning. They will be taking HPC training from various XSEDE HPC workshop series include Bluewaters and Pittsburgh Supercomputing. Students will meet regularly with faculty and faculty will guide students through the research and paper publication process. Student will continue their work and will be provided the necessary support by the faculty.
Student Prerequisites/Conditions/Qualifications
Students are computer science majors and have good programming/computing experience.
Students have researched on Satellite data before which included creating website and database to store satellite data, researching on machine learning tools used by other researchers.