Parallel Techniques and Data Structure for Compressive Genomics
Summary
We have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as the decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. This project will present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso data processing. Our proposed structure and methodology will facilitate the enrichment of compressive genomics and sublinear analysis of big NGS datasets.
Job Description
The student will be helping in the design of experiments, running them and collecting the experimental data from the different FASTQ/NGSC datasets. The student will also learn how to connect to the XSEDE resources, how to use compilers, schedule jobs, navigate the Unix command line and understand the importance of parallel and distributed computing for the speedup of processes in computational science.
Computational Resources
Access a cluster with distributed and parallel capabilities (MPI and OpenMP). In my previous requests I was assigned to Comet, and before being decommissioned to Gordon from San Diego Super Computing. The project will use approximately 30,000 Service Units (SU) to be able to run experiments and around 6TB of storage space (better if is a parallel file system like Lustre) for the datasets we will be using (the biggest one is 2TB)
Contribution to Community
Position Type
Apprentice
Training Plan
I will provide the students with guidance: how to submit jobs using PBS, how to use commands in Unix, how to issue compiler directives for the different programs (MPI and OpenMP, and the hybrid of the two). I have plenty of material from my previous experience with XSEDE. But I would greatly appreciate it if you have any resources that more undergraduate-friendly.