OUTPUTS > PILOT / USE CASE 1

EXASCALE LEARNING IN MEDICAL IMAGE DATA

PATTERN RECOGNITION SYSTEM AS A DIAGNOSIS SUPPORT TOOL

Digital medicine and artificial intelligence (AI) are starting to change clinical practice:

  • Digital histopathology, or automated analysis of biopsies or surgical tissues, help diagnosis process to be faster and with better results.
  • Pictures are captured by a high-resolution scanner and stored as Whole Slide Images (WSIs).
  • Huge amounts of information are processed; AI (i.e., machine learning -ML- and deep learning -DL- ) may help to identify Region of Interest (ROI) to focus analysis.

Summary

PROCESS will help medical practice in supporting  diagnosis for patients, by increasing:

  • Accuracy: better quality of predictions.
  • Efficiency:  time savings (both in computational tasks and in diagnosis) and reduction of effort by physicians.

Challenges

  1. Digital histopathology implies much information to be analysed…
  2. Trying to scan everything (brute force approaches) takes too long and consumes too many resources…
  3. While on the other hand, artificial intelligence requires advanced analytical skills and a sophisticated infrastructure

Objectives

The final goal for PROCESS in Exascale Learning in Medical Image Data is to improve algorithms for cancer detection through:

  • New image analysis algorithms for supervised and weakly supervised learning – ML, DL -.
  • Distribution of training across multiple computational centres.
  • Better balance between performance and increased computational resources.

Methodology

Techniques and procedures

  • Approach: deep learning for computer vision tasks processing datasets of massive samples (up to millions) to detect patterns in data.

 

  • Structure: Camnet, a three-layer software architecture:

   1st layer: Data pre-processing and patch extraction

   2nd layer: Local and distributed training

   3rd layer: Performance boosting and interpretability

Resources

Source of images: CAMELYON (large dataset for histopathology research, with more than 1000 tissue Whole Slide Images (WSIs))

Components/parts:

  a) Applications stored in a containerized repository available to use communities.
  b) Virtualization layer. 
  c) Data management (distributed data federation and metadata).
  d) Computing Management.

Applications/SW:

  • Currently used technologies: Python 2.7, Tensorflow 1.4.0, Caffe, Theano, Lasagne, DIGITS, mxnet, Keras 2.1.2, TFLearn, Numpy, SciPy, Pandas, Scikit-learn,OpenCV, Openslide, Matplotlib, Seaborn, Skimage, h5py, PyTorch, OpenBLAS, cuDNN, FFmpeg, NLTK, Gensim, R, Weka.
  • Data Storage: NAS.
  • Data Processing: H5DS, Apache Spark, Hadoop.
  • Existing computing infrastructure: 8 GPUs.

Results

For the field of activity

A Pattern Recognition System as a Diagnosis Support Tool, consisting of:

  • A mechanism to run research algorithms in a sandboxed environment on data that is not allowed to be shared.
  • Better diagnosis models, allowing weakly supervised learning that does not require expensive manual annotation by physicians, typically required for deep learning approaches.

For the entire project

  • Technical testing: of architecture and tools, through the pre-exascale computing infrastructure with integrated GPU clusters provided by PROCESS.
  • Analytical improvements: lessons learnt from weakly annotated data by applying unsupervised learning techniques on large datasets, thanks to the massive computation tested by the project, as well and development of regression concept vectors as a technique for interpretability of the deep learning models

References and outcomes

Partners: