NCSI

   

A Reproducible Workflow for Correlation Analysis of Proteomics and Morphological Data


Shodor > NCSI > XSEDE EMPOWER > XSEDE EMPOWER Positions > A Reproducible Workflow for Correlation Analysis of Proteomics and Morphological Data

Status
Completed
Mentor NameXinlian Liu
Mentor's XSEDE AffiliationCampus Champion Fellow
Mentor Has Been in XSEDE Community4-5 years
Project TitleA Reproducible Workflow for Correlation Analysis of Proteomics and Morphological Data
SummaryIntegrating multiple types of biological data for cancer samples shows potential in identifying new biomarkers to predict patient outcome. The relationship between them has been speculated but needs further work to confirm. We focused on the mRNA, proteomics, and tissue slide images of four different types of cancer: BRCA, OV, READ, and COAD. We will preprocess the data by selecting qualified data samples and extracting morphological features from tissue slides. Then we will perform pair-wise correlation analysis on genes and images. We will present the results side by side to detect patterns of correlations. We will investigate how mRNA and proteomics correlate with the morphological features. Finally, we will examine our results with clinical data to study the feasibility of using image features as markers of cancer prognosis.
Job Description1. Read literature to explore existing methods;
2. Use the National Cancer Institute's TCGA and CPTAC website to identify patient groups for retrospective study;
3. Feature clustering and extraction, algorithms and accelerations;
4. Correlation Analysis; statistics and coding;
5. A workflow, and a scientific gateway for matter experts
6. A joint proposal for XSEDE allocation to host the planned gateway
Computational ResourcesPSC Bridges
Contribution to Community
Position TypeLearner
Training Plan1. Job submission on a mainframe;
2. Parallelize existing opensource code;
3. Reproducible workflow
4. Dask
Student Prerequisites/Conditions/QualificationsCS: A.I., ML, Data Science, Python programming Math: statistics, linear algebra Some college level biology classess.
DurationSemester
Start Date09/01/2020
End Date12/31/2020

Not Logged In. Login