Authored By: Boyu Zhang, A. K. Qin, Hong Pan, Timos Sellis

A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization

Jan 26, 2024

In this blog post, we will explore a novel deep neural network (DNN) training framework, developed by Boyu Zhang, A. K. Qin, Hong Pan and Timos Sellis at Swinburne University of Technology. The proposed framework addresses challenges in conventional DNN training paradigms through a combination of data sampling and multi-task optimization strategies. 

The Issue with Conventional DNN Training Paradigms

Conventional DNN training paradigms generally rely on one training set and one validation set, obtained through the partitioning of an annotated data set intended for training purposes. The training set is utilized to train the model, while the validation set serves to estimate the model’s generalization performance as the training progresses. This method helps avoid the overfitting of the model.

However, two major drawbacks pose significant hurdles in this training process. First, the validation set may fail to guarantee an unbiased estimate of generalization performance due to potential mismatching with the test data. Secondly, training a DNN corresponds to solving a complex optimization problem, which can potentially lead to inferior local optima (undesirable training outcomes).

Introducing the Novel DNN Training Framework

To address these issues, a new DNN training framework has been proposed. It generates multiple pairs of training and validation sets from the gross training set via random splitting. Each DNN model with a pre-specified network structure is then trained on each pair. The useful knowledge obtained from one model training process is transferred to other model training processes through multi-task optimization. Finally, the best model among all trained models, which demonstrates the overall highest performance across the validation sets from all pairs, is selected as the output.

This framework offers two distinct advantages. It enhances training effectiveness by helping the model training process escape from local optima and improves generalization performance through the implicit regularization imposed on one model training process from other model training processes.

Testing the Framework

Several experiments were conducted to test the effectiveness of the proposed framework. It has been implemented, parallelized on a GPU cluster, and applied to train various widely-used DNN models. The results, based on several classification datasets of different nature, have shown the superiority of this proposed framework over conventional training paradigms.

Conclusion

Deep neural networks (DNNs) have been instrumental in achieving significant breakthroughs in multiple real-world applications. By addressing the common issues found in conventional DNN training paradigms, the novel framework discussed here has the potential to further enhance and streamline the DNN training process. This work is particularly significant given the increasing reliance on DNNs in a wide range of fields, from artificial intelligence and machine learning to medtech and fintech.