These v1

hbertrand · Nov 14, 2018 · 8ff53cd · 8ff53cd
1 parent 23d7b43
commit 8ff53cd
Show file tree

Hide file tree

Showing 12 changed files with 168 additions and 159 deletions.
diff --git a/latex/These_HadrienBertrand_v1.pdf b/latex/These_HadrienBertrand_v1.pdf
diff --git a/latex/chap_conclusion.tex b/latex/chap_conclusion.tex
@@ -7,15 +7,15 @@ \chapter{Conclusion}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Summary of the contributions}
 
-When work on this thesis began in early 2016, deep learning had already shown its worth on natural images, but contributions in medical imaging were rare. There was still a strong lack of tools and understanding on how to build architectures adapted to a specific problem, which naturally led us to the topic of hyper-parameter optimization, in order to avoid tedious architecture handcrafting and hyper-parameter fine tuning. 
+When this thesis began in early 2016, deep learning had already shown its worth on natural images, but contributions in medical imaging were rare. There was still a strong lack of tools and understanding on how to build architectures adapted to a specific problem, which naturally led us to the topic of hyper-parameter optimization, in order to avoid tedious architecture handcrafting and hyper-parameter fine tuning. 
 
 Now armed with a set of tools to quickly find good models, the second part of this thesis focused on applications. Questions of transfer learning and template deformation, and their link with deep learning were explored in this context. They came from the lack of data so common in medical imaging, and thus a need to re-use the limited knowledge at our disposal as much as possible. 
 
 \paragraph*{Hyper-parameter optimization}
 \begin{itemize}
-    \item An incremental Cholesky decomposition to reduce the cost of Bayesian optimization. Most of the computational cost of Bayesian optimization is in the inversion of the Gaussian process' Gram matrix. We exploited a specificity in the structure of this matrix in the case of Bayesian optimization: each successive call adds new rows and columns while leaving the rest of the matrix unchanged. We have shown that this property stays true for the underlying Cholesky decomposition, and how to compute the new decomposition faster when the previous decomposition is available.
-    \item A limited comparison of the performance of random search and Bayesian optimization. We designed an experiment on a small hyper-parameter space to observe the behavior of random search and Bayesian optimization over many runs. Bayesian optimization found faster better models than random search in the best, average and worst cases. We showed that the Gaussian process quickly became a good predictor of model performance and that the worst models were picked last. Random search behaved in accordance with the theoretical bounds we derived. 
-    \item A new hyper-parameter optimization method combining Hyperband and Bayesian optimization. We proposed a method combining the strengths of Hyperband and Bayesian optimization. Model selection is done by Bayesian optimization, and model training follows the Hyperband scheme. Unfortunately due to how the selection of multiple models simultaneously was handled, the method did not perform significantly better than Hyperband alone.
+    \item \textbf{An incremental Cholesky decomposition to reduce the cost of Baye\-sian optimization.} Most of the computational cost of Bayesian optimization is in the inversion of the Gaussian process' Gram matrix. We exploited a specificity in the structure of this matrix in the case of Bayesian optimization: each successive call adds new rows and columns while leaving the rest of the matrix unchanged. We have shown that this property stays true for the underlying Cholesky decomposition, and how to compute the new decomposition faster when the previous decomposition is available.
+    \item \textbf{A limited comparison of the performance of random search and Bayesian optimization.} We designed an experiment on a small hyper-parameter space to observe the behavior of random search and Bayesian optimization over many runs. Bayesian optimization found faster better models than random search in the best, average and worst cases. We showed that the Gaussian process quickly became a good predictor of model performance and that the worst models were picked last. Random search behaved in accordance with the theoretical bounds we derived. 
+    \item \textbf{A new hyper-parameter optimization method combining Hyperband and Bayesian optimization.} We proposed a method combining the strengths of Hyperband and Bayesian optimization. Model selection is done by Bayesian optimization, and model training follows the Hyperband scheme. Unfortunately due to how the selection of multiple models simultaneously was handled, the method did not perform significantly better than Hyperband alone.
 \end{itemize}
 
 \paragraph*{A method to solve a classification problem of MRI field-of-view.}
@@ -30,19 +30,19 @@ \section{Summary of the contributions}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Future Work}
 
-Even though all the contributions listed in the previous section have immediate ways they could be improved, those are discussed in their respective chapters. Here we take a longer term perspective and ask ourselves what we would do with more time, taking into account how we believe the field is going to evolve.
+Even though all the contributions listed in the previous section have immediate ways they could be improved, those are discussed in their respective chapters. Here we take a longer term perspective and suggest future research directions, taking into account how we believe the field is going to evolve.
 
 \paragraph*{User interactions.}
-An important aspect of the implicit template deformation framework is that it can integrate user interactions. After the template is deformed, a user can add or remove points from the segmentation. Those are taken into account by the method, which produces a new refined segmentation. This process can be repeated as many times as required. This kind of user interaction is key in medical practice but deep learning provides no easy way to incorporate them. While some attempts have been made such as~\textcite{cicek2016MICCAI} which provides a partial segmentation as an additional input of the segmentation network, none of them forces the network to make use of them. In our method of template deformation, user input could be incorporated as additional constraints on the deformation field, while forcing the final segmentation to still be an acceptable shape.
+An important aspect of the implicit template deformation framework is that it can integrate user interactions. After the template is deformed, a user can add or remove points from the segmentation. Those are taken into account by the method, which produces a new refined segmentation. This process can be repeated as many times as required. This kind of user interaction is key in medical practice but deep learning provides no easy way to incorporate them. While some attempts have been made such as by~\textcite{cicek2016MICCAI}, who provide a partial segmentation as an additional input of the segmentation network, none of them forces the network to make use of them. In our method of template deformation, user input could be incorporated as additional constraints on the deformation field, while forcing the final segmentation to still be an acceptable shape.
 
 \paragraph*{A shift in applications.}
 Deep learning has fulfilled one of its promises: many of the tasks that were once difficult are now easily solvable with deep learning. The classification task of MRI field-of-view we worked on is a good example. Once the dataset is ready, automated methods of hyper-parameter optimization can be used to find a very efficient model with little manual effort. As automation tools improves and becomes more accessible\footnote{See for example \href{https://cloud.google.com/automl/}{Google AutoML}.}, many tasks will stop requiring image processing expertise to be solved. The focus will move to more challenging tasks, where the difficulty can come from a lack of data (rare disease, high variability), greater complexity such as segmentation of small structures (due to the reliance of neural networks on pooling layers), registration, surgical planning, ...
 
 \paragraph*{An integration of older methods.}
-This thesis first started completely focused on deep leaning and how it could be used to solve medical imaging problems. By the end of it, we had started working on a hybrid approach of deep learning and template deformation. Pre-deep learning methods were tailored for medical tasks, and were mostly discarded with the success of deep learning. But deep learning was developed first for natural images applications, without accounting for the differences between natural and medical images. The ideas used in those older methods (such as the use of shape information) are still relevant, and we expect their integration with deep learning will move the field forward in coming years.
+This thesis first started completely focused on deep leaning and how it could be used to solve medical imaging problems. By the end of it, we had started working on a hybrid approach of deep learning and template deformation. Pre-deep learning methods were tailored for medical tasks, and were mostly discarded with the success of deep learning. But deep learning was developed first for natural images applications, without accounting for the differences between natural and medical images. The ideas used in these older methods (such as the use of shape information) are still relevant, and we expect their integration with deep learning will move the field forward in coming years.
 
 \paragraph*{Multi-modality, multi-task learning.}
-The idea of transfer learning is to share knowledge between more or less loosely connected problems. Expanding on this, we could imagine a multi-input multi-output model that shares knowledge over many tasks and modalities at the same time. A first step in this direction could be the construction of modality-specific pre-trained models, a medical imaging equivalent to ImageNet pre-trained models. This requires that huge dataset are available per modality, but efforts in this direction are ongoing. For example, the NIH recently released an \href{https://nihcc.app.box.com/v/ChestXray-NIHCC}{X-Ray dataset} of over $100,000$ images that could be used to build a standard X-Ray pre-trained model. If ImageNet pre-trained models can improve performance on most medical tasks, it seems sensible a more specific model would improve performance even more.
+The idea of transfer learning is to share knowledge between more or less loosely connected problems. Expanding on this, we could imagine a multi-input multi-output model that shares knowledge over many tasks and modalities at the same time. A first step in this direction could be the construction of modality-specific pre-trained models, a medical imaging equivalent to ImageNet pre-trained models. This requires that huge datasets are available per modality, but efforts in this direction are ongoing. For example, the NIH recently released an \href{https://nihcc.app.box.com/v/ChestXray-NIHCC}{X-Ray dataset} of over $100,000$ images that could be used to build a standard X-Ray pre-trained model. If ImageNet pre-trained models can improve performance on most medical tasks, it seems likely that a more specific model would improve performance even more.
 
 \paragraph*{Differential privacy.}
 Due to the sensitive nature of medical images, sharing between research institutes or hospitals is difficult. A solution to this problem could be through the use of differential privacy. It is a mathematical framework in which the output of an algorithm is mostly the same, whether the data from one particular patient is included or not. This prevents an opponent from recovering patient information from the algorithm (it is possible to reconstruct training set images from computer vision systems, including deep neural networks, see for example~\textcite{fredrikson2015}). There has been some work to make deep neural networks differentially private (\textcite{abadi2016}), and this would allow institutions to release models trained on private data with strong guarantees of patient privacy.
@@ -53,17 +53,5 @@ \section{Future Work}
 % \paragraph*{An incremental Cholesky decomposition to reduce the cost of Bayesian optimization.}
 % Even though we proved the complexity gain, we didn't integrate the incremental decomposition into a Bayesian optimization framework. This would be the next step, and would allow measuring the time gained in average. The testing could be done on the limited CIFAR-10 hyper-parameter space on which we compared random search and Bayesian optimization. As the gain in time becomes more important with the number of models tested, it might be interesting to increase the hyper-parameter space to a couple thousand models.
 
-% \paragraph*{A comparison of the performance of random search and Bayesian optimization.} 
-% Following the previous paragraph, extending the hyper-parameter space would yield insights into how Bayesian optimization behaves in larger spaces. This framework of testing could be used to observe the behaviour of other methods such as Hyperband. We observed models performance to be normally distributed, but the scope is limited to one hyper-parameter space on one task. To the best of our knowledge there is no theory on how to build good hyper-parameter spaces and understanding the relation between models, tasks and model performance would be of practical use to the use of hyper-parameter optimization methods.
-
-% \paragraph*{A new hyper-parameter optimization method combining Hyperband and Bayesian optimization.}
-% We have already described the flaws of our method: normalizing the acquisition function to transform it into a distribution from which we can draw multiple combinations resulted in a quasi-uniform distribution, completely negating the point of Bayesian optimization. Using a different strategy this combination method would perform better than either Hyperband or Bayesian optimization, as shown recently in~\textcite{falkner2018}.
-
-% \paragraph*{A method to solve a classification problem of MRI field-of-view.}
-% As the method presented gives robust results on our dataset, there is little need to improve it. It could be interesting however, to explore how to directly predict the boundaries of the regions using deep learning, instead of classifying each slice and using another method to obtain the regions. 
-
 % \paragraph*{A new transfer learning method and its application to the segmentation of the kidney in 3D ultrasound images.}
 % The limitation of the transfer learning method as presented is that it is highly specific to the kidney segmentation problem described. While the concept of adding specific transformation layers is general, it needs to be validated on other problems. 
-
-% \paragraph*{A statistical shape model approach using deep learning.}
-% <TODO after chapter is done>