Edward Hallé-Hannan
Contact : edward.halle-hannan@polymtl.ca
Supervisors : Charles Audet, Sébastien Le Digabel
Private page
Mesh adaptive direct search optimization of categorical variables in deep learning and blackbox applications
Context and problematic
Categorical variables, for example the nationality or blood type, are often dismissed in the field of mathematical optimization. Indeed, those variables are not intrinsically ordered, which makes them more difficult to be optimized. However, the quickly growing field of deep learning has invoked the need to optimize those categorical variables. In fact, most hyperparameters are categorical variables by their nature. A typical example of a hyperparameter is the optimizer, which is a method that minimizes the wrongness of the model for a given task. Hyperparameters are parameters that are set and chosen before any form of learning. This implies that decisions must be taken regarding which hyperparameters are best suited for treating the problem with a given model. Therefore, to find the best hyperparameters and maximize the performance of the model, we seek to optimize those variables. Deep learning models don't have any analytical representation (black-box), thus the classical optimization approach based on calculus cannot be used. Plus, evaluating black-boxes is a typically lengthy process, hence we cannot in pratice use inefficient algorithms, such as the grid search. On the other hand, there is an extensive algorithmic framework, called direct-search method, that can optimize continuous and/or integer variables efficiently. In this project, we wish to extend this algorithmic framework of the MADS algorithm to categorical variables, to optimize hyperparameters in deep learning.
This project is an extension of Dounia Lakhmiri's PhD project HyperNOMAD.
Objective and sub-objectives
The main objective of this project is to develop a method, based on the MADS algorithm, that can optimize hyperparameters that are categorical variables. However, if the project progresses well enough, we will generalize the method to any categorical variables (SO5). The main objective contains multiple sub-objectives (SO), which are the following:
- SO1: Develop a mathematical framework, that can generate a mathematical space, to explore hyperparameters that are categorical variables.
- SO2: Model the mathematical space with a mesh-adaptive numerical space.
- SO3: Implement direct-search algorithm that can be applied to the numerical space.
- SO4: Measure the performance of the algorithm with standard benchmarking problems in deep learning.
Methodology
The project will start with a thorough literature review on the research that have been previously done regarding the optimization of categorical variables (SO1) 1) 2) 3) 4). The first milestone of the project is to develop a routine that will mathematically define the neighborhood (set of neighbors) of the hyperparameters that are categorical variables (SO1) 5). For a given point in space, we will generate a neighborhood that will have a coarseness (mesh) and directions to explore (SO2, SO3) 6). We will start with a routine specifically for the neighborhood of hyperparameters. The mesh-adaptive direct-search algorithm (MADS) will then be applied to the numerical space with a progressive barrier method to handle constraints (SO3) 7) 8). We will verify and measure the relative performance of the method in comparison with other hyperparameter optimization methods (grid search, etc.) (SO4) 9). The test problems will be standard benchmarking problems, such as CIFAR-10, Fashion-MNIST, etc. (SO4) Lastly, if possible, we would generalize the routine that mathematically generates the neighborhood of categorical variables for any problem, with a user-defined set of neighbors (SO5). We would also test the relative performance of the generalized method with black-box applications.
Impact of research
The proposed method will reduce the required amount of time needed to optimize hyperparameters. Those optimized hyperparameters will be linked with more compact neural nets (less parameters), implying that those models could be trained on smaller scale hardware, such as cellphones. Consequently, data could stay locally, since we do not need the hardware of data centers to train neural networks, therefore enhancing the cybersecurity of personal data. If we can work through SO5, the generalized method will contribute to the general advancement of the MADS algorithm [9], which could benefit to many engineering applications that require optimization of categorical variables.
The main method to optimize hyperparameters will be added to the open-source library HyperNOMAD, which is a sublibrary of the NOMAD package. The generalized method might be added to the open-source library NOMAD, which is the practical implementation of the MADS