NCSI

   

Predictive Analytics for Extreme-Scale Scientific Computing Integrity using Provenance Data


Shodor > NCSI > XSEDE EMPOWER > XSEDE EMPOWER Positions > Predictive Analytics for Extreme-Scale Scientific Computing Integrity using Provenance Data

Status
Completed
Mentor NameNitin Sukhija
Mentor's XSEDE AffiliationCampus Champion
Mentor Has Been in XSEDE Community4-5 years
Project TitlePredictive Analytics for Extreme-Scale Scientific Computing Integrity using Provenance Data
SummaryThe goal of this project is to develop a conceptual framework that encompasses scalable provenance data analysis tools, predictive models using machine learning and optimization techniques to investigate causes and outcomes pertaining to loss of scientific computing integrity.
Job DescriptionThe student will identify and evaluate various the data processing engines such as MapReduce, Spark, H20, Pregel, Neo4j that can be used to collect and analyze the provenance data of scientific applications based on criteria’s such as latency, in-memory utilization, throughput and preprocessing tools. Moreover, the student will develop various empirical prediction models where various distributed machine learning toolkits such as MLlib, SAMOA and amazon ML will be employed along with the data processing engines to extract the features from the provenance data, and to detect and predict vulnerabilities/causes leading to degradation of integrity of scientific computations. Furthermore, students will be employing KARMA and other tools for Collecting, visualizing, and navigating provenance data.
Computational ResourcesIn Fall students are used the existing XSEDE project allocation. We will request for a start-up project allocation on Stampede, Comet, Bridges, and Jetstream. We anticipate XSEDE resources for our education and research activities over the next few months. Through this allocation the undergraduate student will be using the national cyberinfrastructure for investigating the benefit of employing various machine learning toolkits/algorithms for efficient prediction models for the provenance data stored on big data systems such as spark GraphX and others.
Contribution to CommunityThis exiting project will engage the students in a collaborative and active research leading to posters, oral presentations and publications at National and International Conferences. The summer research program will teach students necessary computing skills that are useful for academic researchers who want to become users of High Performance Computing (HPC) and over advanced cyberinfrastructure along with the skills necessary to work at large supercomputing centers. This program will work as workforce oriented program for the undergraduate students to help him gaining an intern at National Labs and preparing him for achieving his goal of getting a staff job at supercomputing centers and furthering mission of XSEDE project in coming years.
Position TypeIntern
Training PlanThe student will work closely with PI (Dr. Nitin Sukhija) and the Manager for the Operations Technology Group (Elizabeth Bautista) of DOE Office of Science computational facility, NERSC at Lawrence Berkeley National Laboratory. The student will be trained in various big data systems and machine learning models along with anomaly detection algorithms. The student is expected to attend bi-weekly research meetings and will be trained by us to develop test cases and perform sensitivity study of various predictive models.
Student Prerequisites/Conditions/QualificationsExperience with Python, Linux, machine learning, data analysis, and predictive modeling. Must be from Slippery Rock University of Pennsylvania
DurationSemester
Start Date01/23/2022
End Date05/14/2022

Not Logged In. Login