2021 Virtual Undergraduate Research Symposium

2021 Virtual Undergraduate Research Symposium

Multi-Instance Learning Methods for Cancer Detection in Histopathological Images

Multi-Instance Learning Methods for Cancer Detection in Histopathological Images

PROJECT NUMBER: 38 | AUTHOR: Lucia Saldana Barco​, Computer Science

MENTOR: Hua Wang​, Computer Science


Histopathology is the examination of tissue samples under a microscope to look for cells that might explain the existence of a disease. Histopathological images are fragments of tissue that have been digitalized and can be used for cancer diagnosis, among other medical image analysis purposes. In our research, we proposed the use of a weakly supervised multiple instance learning method (MIL) to determine the segments of tissue that exhibit an indication of an abnormality. MIL is an area of machine learning in which training and testing data are organized into sets of instances known as bags. MIL is a weakly supervised learning algorithm which means that the data are frequently provided at the bag-level instead of the instance-level. In our research, breast and colon cancer histopathological images are represented by a bag of patches. The bags, or images, are labeled as either malignant or benign while the instances, or patches, are unlabeled. We focused on the segmentation of the histopathological images to obtain a previously determined number of patches per image. We proceeded to extract the necessary features from each of the patches required by the MIL algorithm to predict the location of the malignant cells. With this work, we hope to help oncologists diagnose patients efficiently by examining the cells from a potentially cancerous tissue sample.



Lucia Saldana Barco is a freshman pursuing a B.S. in Computer Science + Data Science at Colorado School of Mines. Lucia conducts research with the Machine Learning MInDS@Mines lab, under the mentorship of Dr. Hua Wang and Ph.D. student Lodewijk Brand. Her current focus lies in the application of machine learning algorithms to cancer histopathological images to locate cancerous cells. In the future, Lucia hopes to continue doing research in the area of health/bioinformatics.


  1. Hi Lucia, fantastic project! I’m curious, what made you decide to go with using RGB values to represent each patch? Are there other potential representations that you considered?

    • Thank you, Zoe! This is a great question. In this work, we chose to represent patches as a sixty-four by sixty-four-pixel image segment. This means that a single patch contains four thousand ninety-six pixels. Since each pixel’s color is typically obtained by combining different red, green, and blue intensities, it is common to represent images as an array of all its pixels’ descriptive information. We used the Python Imaging Library (PIL) to acquire these RGB values. Since the PFTAS method requires the images to be passed in as a 2D ndarray, this ended up being an effective approach. I would be curious to learn about other potential representations. As of today, I have not considered or learned about other alternatives. Thank you for your comment!

  2. Awesome work! Do you predict this will help doctors diagnose cancer sooner in a patient?

    • Hello Allie, thank you very much! I expect that research of this kind, especially the use of multiple instance learning algorithms, will help improve both the speed and accuracy of a cancer diagnosis. Often, patients are not diagnosed with cancer until it is too late to deal with the complications that may arise. It would be wonderful to see this work contribute to the development of new solutions!

      • So very cool, Lucia!

Share This