Parallel Cache Oblivious Data Structures and Algorithms 2.0
Summary
Students will be working on building a library of cache-oblivious data structures and measuring the performance under different workloads. We will first implement serial versions of the algorithms, and then implement the parallel version of several known cache oblivious data structures and algorithms.
(We will be building off of our previous work from a previously funded XSEDE proposal -- "Parallel Cache Oblivious Data Structures and Algorithms")
In our last session students have learned the foundations of performance engineering. I have two sets of students working on two projects.
Group 1 (Ryan and Rob):
- Students have read materials regarding Cache Oblivious Algorithms
- We have additionally been reading the Open Data Structures book to gain further insights on data structure tradeoffs.
- Students have spent a significant amount of time brushing up on C/C++ coding.
- Students have been using the LLVM compiler to write optimization passes (what we are starting) in order to analyze programs.
- We have had one publication accepted at a workshop
Group 2 (Faridat):
- Student has been practicing using super computing resources here at Northeastern (Our Discovery cluster)
- Student has been sharpening both C++ and Python skills.
- Student has implemented an edge detector algorithm for running with CUDA
- Student then used profiling tools (based off XSEDE Readings from previous semester) and learned how to use NVProf to measure performance.
Job Description
Student Duties: Students will be responsible for working a minimum of 12 weeks (a full semester) for 10 hours a week. Students will be meeting with me once a week for a weekly check-in for feedback, demonstration of progress, ask questions, and make forward progress on the next guided training tasks.
There are two scientific applications for this proposal: 1. Scientific Application: Students will be working with me to learn and understand how cache-oblivious data structures work in C/C++. 2. Scientific Application: After implementing the algorithms, we will be writing our results and submitting for workshops for publication (likely in the Fall).
Computational Resources
Students will make use of XSEDE trainings available online [1]. Additionally, an Education Allocation request will be used for these students to perform work primarily at the end of their training [2].
Note: This is a continuation of a previously funded XSEDE project. We will be continuing with the previous plan, and incorporating new developments with the LLVM compiler infrastructure to implement a way to swap in data structures. By the end of this term students will be writing a draft of a conference or workshop paper for submission.
Some Updates to our goals we will be doing on a weekly basis: [1] We are now learning how to read peer-reviewed papers from IEEE and ACM (exploring roughly 1 paper per week) [2] We are continuing to investigate weekly Performance Engineering course ware are MIT on a weekly basis [3] We will be commiting to github repositories our code projects. [4] We have a working edge detector program that we can continue to improve performance. We will start writing a draft of a conference paper for this work.
============(Previous plan included for reference)=================== The training plan is to bring in students(ideally in pairs of 2) who are currently sophomores/junior and have taken a Computer Systems course using C/C++. Students need not have any previous research experience, but generally will have experience using threads(e.g. pthreads) and have taken an algorithms course.
[1] (2 weeks) Students will first work through understanding the basics of Cache-Oblvious Algorithms and Data structures from: http://erikdemaine.org/papers/BRICS2002/paper.pdf [2] (2 weeks) Students will then work through select lectures and exercises on caches from here: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/ [3] (1 week) Students will then learn the basics of profiling [4] (2 weeks) Next students will implement a few data structures and algorithms, and then [5] (4 weeks) Students will work to find good real world benchmarks, mining github repositories for benchmarks that suffer from false-sharing performance related problems. [6] The remaining time will be writing up and polishing collected results.
As a reference text, we will be using Computer Architecture: A Quantitative Approach (Specifically the sections on Memory Hierarchy)
Before using XSEDE resources, we will take advantage of Northeastern University's Discovery Cluster to run preliminary experiments.