Section: New Results

Performance Modeling and Multi-Objective Optimization For the Cloud

We study cloud service models based on attaining user's performance objectives; these immediately lead to problems of multi-objective optimization.

Given different cost models, we consider the optimizer will search a multi-dimensional space, compute execution plans that are not dominated by others (known as Pareto plans) and explore meaningful tradeoffs between different objectives to find the optimal plan for each analytical task. We focused on analytical tasks encoded as dataflow programs as in Hadoop and Spark systems. When such dataflow programs are submitted to the cloud, we aim to provide a multi-objective optimizer that can automatically find an optimal execution plan of the dataflow program, which meets specific user performance objectives. Developing an optimizer for dataflow programs in the cloud raises two major challenges: The optimizer needs cost models for running complex dataflow programs in the cloud, and, it further needs a new algorithmic foundation for multi-objective optimization across user-specific objectives.

We have worked to develop a performance model for the optimizer in order to build the skylines for the user-objectives. We found that deep learning offers an incremental prediction framework (using embedding architecture) or online prediction framework (using auto-encoder along with a gradient boosting regressor) that are not available in a baseline regressor approach. However, there is a tradeoff between using the online prediction framework and having good performance, since of course retraining improves results. That said, the online prediction framework gave us acceptable generalization power over unseen jobs. This work has been carried in the M2 internship of Khaled Zaouk [27], and it continues through his PhD.