Project Info


Developing Robust Brain Imaging Genomics Data Mining Framework for Improved Cognitive Health

Hua Wang| huawang@mines.edu

The research objective of this project is to address the computational challenges in an innovative big data application on neuroinformatics. This project will study the problem of integrating multi-level data with the emerging key computational techniques: large-scale non-convex sparse learning models with linear convergence algorithms and linear computational cost multi-task multi-dimensional data integration algorithms.

More Information

Recent advances in multimodal brain imaging and high throughput genotyping and sequencing techniques provide exciting new opportunities to ultimately improve our understanding of brain structure and neural dynamics, their genetic architecture, and their influences on cognition and behavior. Research in the emerging fields brain imaging genomics and human connectomics holds great promise for a systems biology of the brain to better understand complex neurobiological systems, from genetic determinants to the complex interplay of brain structure, connectivity, function and cognition. Critical Barrier: The unprecedented scale and complexity of these data have presented critical computational bottlenecks requiring new concepts and enabling tools. It remains a major challenge to develop systematic big data mining approaches for revealing complex relationships between brain (e.g., up to 20 million voxels in 3T/7T/9.4T MR imaging systems) and genome (3 billion base pairs). Additional challenges include how to seamlessly integrate data mining methods with prior knowledge to produce interpretable findings, and how to translate the methods into user-friendly, interactive software tools that optimally combines human expertise and machine intelligence to enable novel contextually meaningful discoveries.
This project seeks to harness the opportunities of creating large-scale, principled computational strategies and effective software tools to reveal sophisticate relationships among heterogeneous brain data including genetic variations, multi-dimensional and longitudinal quantitative phenotypes, neural circuits, and outcomes, and addressing critical big data mining issues of scalability, efficiency, dimensionality, heterogeneity, complexity, and interactive visual exploration in order to realize the full potential of the data. Given massive genomic, imaging, and other phenotypic data sets available to us and our rich expertise in integrating neuroimaging and genomics, neuroinformatics is an ideal innovative application domain for the development, application and validation of the proposed big data mining framework. Massive continuous phenotypic measures from neuroimaging data, fluid biomarkers and cognitive scores have the potential to serve as useful traits intermediate on the chain of causality from genes to phenotypic outcomes. In this project, we will study principled and large-scale data mining models, coupled with rigorous theoretical foundation, data intensive computing, interactive visual exploration, to conduct the first comprehensive and integrative study of imaging genomics and connectomics. The success of this project on big data research will greatly support the BRAIN Initiative which has become a national goal and has been unveiled by the U.S. Government on research effort to revolutionize our understanding of the human brain.
More specifically, together with the graduate students in the research team, the undergraduate students will perform the following research tasks:
1. implement novel large-scale non-convex sparse learning algorithms for identifying genetic risk factors from multiscale imaging genomics data;
2. to better capture the underlying gene-to-QT mechanism, identify the genetic markers with biological structures from massive genome-wide SNP data;
3. investigate the large-scale non-convex sparse learning models via providing linear convergence optimization algorithms for big data feature selection.

Grand Engineering Challenge: Reverse-engineer the brain

Student Preparation


Qualifications

Students are expected to take CSCI 261, 262 before taking this project. It would be good if the student have already taken CSCI 303, 358, 404, 470, but this is not required.

Time Commitment

20-40 hours/month

Skills/Techniques Gained

1. The students will learn the skills to perform data processing and management.
2. The students will be involved my research team to perform research on machine learning and data mining.
3. The students will be involved into scientific paper writing for the results from this project.
4. The students will have chance to work together with my collaborators in medical schools.
5. The students will gain the fundamental knowledge on medical image computing, as well as how to use machine learning, as well as computational algorithms, to deal with problems in medical image computing.
In a word, after the training in this project by successfully completing the assigned research tasks, the student is expected to be ready for pursing a graduate degree in the area of machine learning, data mining, or artificial intelligence, or a broader area of computer science.

Mentoring Plan

1. One orientation meeting is planned at the beginning of the project, in which the undergraduate students will be introduced to the research team. The project and research culture of the faculty’s research team will be introduced to the undergraduate students in the meeting.
2. Technical seminars within the research team are planned, once per week. In the every meeting, the undergraduate students will present a research paper relevant to the project and lead discussions on it with the faculty and the graduate students in the research team.
3. Professional development sessions within the research team are planned, once per week. In every meeting, the faculty or the graduate students in the research team will examine the progress of the project and the recent research results, exchange the ideas with the undergraduate students, and help them
develop research skills, including algorithm development, experimental design, scientific results evaluation, paper writing, and so on.
4. A poster session will be conducted at the end of the project in which the results of this project will be presented to the research teams of the faculty, the Computer Science Department, and the collaborators of the faculty.