Seg + conclusion

hbertrand · Nov 10, 2018 · 06ac0de · 06ac0de
1 parent 590ffce
commit 06ac0de
Show file tree

Hide file tree

Showing 7 changed files with 154 additions and 74 deletions.
diff --git a/latex/chap_conclusion.tex b/latex/chap_conclusion.tex
@@ -7,45 +7,55 @@ \chapter{Conclusion}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Summary of the contributions}
 
-The following contributions were presented in this thesis.
+When work on this thesis began in early 2016, deep learning had already shown its worth on natural images, but contributions in medical imaging were rare. There was still a strong lack of tools and understanding on how to build an architecture adapted to a problem, which naturally led us to the topic of hyper-parameter optimization. We resume the work we did in this area below. Now armed with a set of tools to quickly find good architectures, the second part of this thesis focused on applications. Questions of transfer learning and template deformation, and their link with deep learning were explored in this context.
 
-\paragraph*{An incremental Cholesky decomposition to reduce the cost of Bayesian optimization.}
-Most of the computational cost of Bayesian optimization is in the inversion of the Gaussian process' Gram matrix. We exploited a particularity in the structure of this matrix specific to Bayesian optimization: each successive call adds new rows and columns while leaving the rest of the matrix unchanged. We have shown that this property stays true for the underlying Cholesky decomposition, and how to compute the new decomposition faster when the previous decomposition is available.
-
-\paragraph*{A comparison of the performance of random search and Bayesian optimization.} 
-We designed an experiment on a small hyper-parameter space to observe the behaviour of random search and Bayesian optimization over many runs. Bayesian optimization found better models than random search faster in the best, average and worst cases. We showed how the Gaussian process quickly became a good predictor of model performance and how the worst models were picked last. Random search behaved in accordance to the theoretical bounds we derived. Additionally we observed the distribution of models performance to be Gaussian.
-
-\paragraph*{A new hyper-parameter optimization method combining Hyperband and Bayesian optimization.}
-We proposed a method combining the strengths of Hyperband and Bayesian optimization. Model selection is done by Bayesian optimization, and model training follows Hyperband scheme. Unfortunately due to how the selection of multiple models simultaneously was handled, the method did not perform significantly better than Hyperband alone.
+\paragraph*{Hyper-parameter optimization}
+\begin{itemize}
+    \item An incremental Cholesky decomposition to reduce the cost of Bayesian optimization. Most of the computational cost of Bayesian optimization is in the inversion of the Gaussian process' Gram matrix. We exploited a specificity in the structure of this matrix in the case of Bayesian optimization: each successive call adds new rows and columns while leaving the rest of the matrix unchanged. We have shown that this property stays true for the underlying Cholesky decomposition, and how to compute the new decomposition faster when the previous decomposition is available.
+    \item A limited comparison of the performance of random search and Bayesian optimization. We designed an experiment on a small hyper-parameter space to observe the behavior of random search and Bayesian optimization over many runs. Bayesian optimization found faster better models than random search in the best, average and worst cases. We showed that the Gaussian process quickly became a good predictor of model performance and that the worst models were picked last. Random search behaved in accordance with the theoretical bounds we derived. 
+    \item A new hyper-parameter optimization method combining Hyperband and Bayesian optimization. We proposed a method combining the strengths of Hyperband and Bayesian optimization. Model selection is done by Bayesian optimization, and model training follows the Hyperband scheme. Unfortunately due to how the selection of multiple models simultaneously was handled, the method did not perform significantly better than Hyperband alone.
+\end{itemize}
 
 \paragraph*{A method to solve a classification problem of MRI field-of-view.}
-Using a dataset of MRI volumes from a multitude of protocols and machines, we developed a neural network able to classify each slice of the volumes into their anatomical regions (such as head or pelvis). We improved on this neural network by using Bayesian optimization to find a better architecture providing a non-negligible performance boost. Even though the classification was done at the slice level, we showed how it could be used for robust region localization through a decision scheme maximizing the likelihood of each region.
+Using a dataset of MRI volumes from a multitude of protocols and machines, we developed a neural network able to classify each slice of the volumes into their anatomical regions (six classes such as head or pelvis). We improved this neural network by using Bayesian optimization to find a better architecture providing a non-negligible performance boost. Even though the classification was done at the slice level, we showed that it could be used for robust region localization through a decision scheme maximizing the likelihood of each region.
 
 \paragraph*{A new transfer learning method and its application to the segmentation of the kidney in 3D ultrasound images.}
-Working with a dataset of 3D ultrasound kidney images across two populations, we investigated transfer learning methods for the segmentation of the kidney from one population (healthy adults) to the other (sick children). This led us to develop a new transfer learning approach, based on adding layers to the pre-trained network to predict parameters for geometric and intensity transformation. 
+Working with a dataset of 3D ultrasound kidney images across two populations, we investigated transfer learning methods for the segmentation of the kidney from one population (healthy adults) to the other population (sick children), where less examples are available. This led us to develop a new transfer learning approach, based on adding layers to the pre-trained network to predict parameters for geometric and intensity transformations. 
 
-\paragraph*{A statistical shape model approach using deep learning.}
-<TODO after chapter is done> 
+\paragraph*{A segmentation method of template deformation using deep learning.}
+The use of shape prior is still uncommon in deep learning. We proposed a new segmentation method based on the \textit{implicit template deformation} framework that uses deep learning to predict the transformation to be applied to the template. While this work is still preliminary, we obtained competitive performance on the 3D US kidney segmentation task previously explored.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Future Work}
 
-All of the contributions presented can be developed further as we discuss in this section.
+Even though all the contributions listed in the previous section have immediate ways they could be improved, those are discussed in their respective chapters. Here we take a longer term perspective and ask ourselves what we would do with more time, taking into account how we believe the field to evolve.
 
-\paragraph*{An incremental Cholesky decomposition to reduce the cost of Bayesian optimization.}
-Even though we proved the complexity gain, we didn't integrate the incremental decomposition into a Bayesian optimization framework. This would be the next step, and would allow measuring the time gained in average. The testing could be done on the limited CIFAR-10 hyper-parameter space on which we compared random search and Bayesian optimization. As the gain in time becomes more important with the number of models tested, it might be interesting to increase the hyper-parameter space to a couple thousand models.
+\paragraph*{User interactions.}
+An important aspect of the implicit template deformation framework is that it can integrate user interactions. After the template is deformed, a user can add or remove points from the segmentation. Those are taken into account by the method, which produces a new refined segmentation. This process can be repeated as many times as required. This kind of user interaction is key in medical practice but deep learning provides no easy way to incorporate them. While some attempts have been made such as~\textcite{cicek2016MICCAI} which provides a partial segmentation as an additional input of the segmentation network, none of them forces the network to make use of them. In our method of template deformation, user input could be incorporated as additional constraints on the deformation field, while forcing the final segmentation to still be an acceptable shape.
 
-\paragraph*{A comparison of the performance of random search and Bayesian optimization.} 
-Following the previous paragraph, extending the hyper-parameter space would yield insights into how Bayesian optimization behaves in larger spaces. This framework of testing could be used to observe the behaviour of other methods such as Hyperband. We observed models performance to be normally distributed, but the scope is limited to one hyper-parameter space on one task. To the best of our knowledge there is no theory on how to build good hyper-parameter spaces and understanding the relation between models, tasks and model performance would be of practical use to the use of hyper-parameter optimization methods.
+\paragraph*{A shift in applications.}
+Deep learning has fulfilled one of its promises: many of the tasks that were once difficult are now easily solvable with deep learning. The classification task of MRI field-of-view we worked on is a good example. Once the dataset is ready, automated methods of hyper-parameter optimization can be used to find a very efficient model with little manual effort. As automation tools improves and becomes more accessible\footnote{See for example \href{https://cloud.google.com/automl/}{Google AutoML}.}, many tasks will stop requiring image processing expertise to be solved. The focus will move to more challenging tasks, where the difficulty can come from a lack of data (rare disease, high variability), greater complexity such as segmentation of small structures (due to the reliance of neural networks on pooling layers), registration, surgical planning, ...
 
-\paragraph*{A new hyper-parameter optimization method combining Hyperband and Bayesian optimization.}
-We have already described the flaws of our method: normalizing the acquisition function to transform it into a distribution from which we can draw multiple combinations resulted in a quasi-uniform distribution, completely negating the point of Bayesian optimization. Using a different strategy this combination method would perform better than either Hyperband or Bayesian optimization, as shown recently in~\textcite{falkner2018}.
+\paragraph*{Multi-modality, multi-task learning.}
+The idea of transfer learning is to share knowledge between more or less loosely connected problems. Expanding on this, we could imagine a multi-input multi-output model that shares knowledge over many tasks and modalities at the same time. A first step in this direction could be the construction of modality-specific pre-trained models, a medical imaging equivalent to ImageNet pre-trained models. This requires that huge dataset are available per modality, but efforts in this direction are ongoing. For example, the NIH recently released an \href{https://nihcc.app.box.com/v/ChestXray-NIHCC}{X-Ray dataset} of over $100,000$ images that could be used to build a standard X-Ray pre-trained model. If ImageNet pre-trained models can improve performance on most medical tasks, it seems sensible a more specific model would improve performance even more.
 
-\paragraph*{A method to solve a classification problem of MRI field-of-view.}
-As the method presented gives robust results on our dataset, there is little need to improve it. It could be interesting however, to explore how to directly predict the boundaries of the regions using deep learning, instead of classifying each slice and using another method to obtain the regions. 
 
-\paragraph*{A new transfer learning method and its application to the segmentation of the kidney in 3D ultrasound images.}
-The limitation of the transfer learning method as presented is that it is highly specific to the kidney segmentation problem described. While the concept of adding specific transformation layers is general, it needs to be validated on other problems. 
+% All of the contributions presented can be developed further as we discuss in this section.
+
+% \paragraph*{An incremental Cholesky decomposition to reduce the cost of Bayesian optimization.}
+% Even though we proved the complexity gain, we didn't integrate the incremental decomposition into a Bayesian optimization framework. This would be the next step, and would allow measuring the time gained in average. The testing could be done on the limited CIFAR-10 hyper-parameter space on which we compared random search and Bayesian optimization. As the gain in time becomes more important with the number of models tested, it might be interesting to increase the hyper-parameter space to a couple thousand models.
+
+% \paragraph*{A comparison of the performance of random search and Bayesian optimization.} 
+% Following the previous paragraph, extending the hyper-parameter space would yield insights into how Bayesian optimization behaves in larger spaces. This framework of testing could be used to observe the behaviour of other methods such as Hyperband. We observed models performance to be normally distributed, but the scope is limited to one hyper-parameter space on one task. To the best of our knowledge there is no theory on how to build good hyper-parameter spaces and understanding the relation between models, tasks and model performance would be of practical use to the use of hyper-parameter optimization methods.
+
+% \paragraph*{A new hyper-parameter optimization method combining Hyperband and Bayesian optimization.}
+% We have already described the flaws of our method: normalizing the acquisition function to transform it into a distribution from which we can draw multiple combinations resulted in a quasi-uniform distribution, completely negating the point of Bayesian optimization. Using a different strategy this combination method would perform better than either Hyperband or Bayesian optimization, as shown recently in~\textcite{falkner2018}.
+
+% \paragraph*{A method to solve a classification problem of MRI field-of-view.}
+% As the method presented gives robust results on our dataset, there is little need to improve it. It could be interesting however, to explore how to directly predict the boundaries of the regions using deep learning, instead of classifying each slice and using another method to obtain the regions. 
+
+% \paragraph*{A new transfer learning method and its application to the segmentation of the kidney in 3D ultrasound images.}
+% The limitation of the transfer learning method as presented is that it is highly specific to the kidney segmentation problem described. While the concept of adding specific transformation layers is general, it needs to be validated on other problems. 
 
-\paragraph*{A statistical shape model approach using deep learning.}
-<TODO after chapter is done> 
+% \paragraph*{A statistical shape model approach using deep learning.}
+% <TODO after chapter is done> 
diff --git a/latex/chap_introduction.tex b/latex/chap_introduction.tex
@@ -46,7 +46,7 @@ \subsection{Deep Learning}
 \begin{itemize}
     \item Tensorflow (\textcite{tensorflow2015}) was just released and a nightmare to install - now it works everywhere from desktop to smartphone and has been cited in over 5000 papers.
     \item The original GAN paper by Goodfellow \textit{et al} was published mid 2014 (\textcite{goodfellow2014}). Early 2016, GANs had the reputation of being incredibly difficult to train and unusable outside of toy datasets. Three years and 5000 citations later, GANs have been applied to a wide range of domains, including medical imaging.
-    \item The omnipresent U-Net architecture had just made its debut a few months earlier at MICCAI 2015.
+    \item The omnipresent U-Net architecture had just made its debut a few months earlier at MICCAI 2015 (\textcite{ronneberger2015MICCAI}).
 \end{itemize}
 
 In the context of medical imaging, deep learning also brings technical challenges. Two of the most common criticisms are the lack of interpretability of neural networks and the lack of robustness. They are barriers to the adoption of deep learning in clinical use and resolving those issues would open the doors to new tasks such as diagnosis or surgical intervention.