A New Strategy for Tuning ReLUs: Self-Adaptive Linear Units (SALUs)
Conference: ICMLCA 2021 - 2nd International Conference on Machine Learning and Computer Application
12/17/2021 - 12/19/2021 at Shenyang, China
Proceedings: ICMLCA 2021
Pages: 8Language: englishTyp: PDF
Personal VDE Members are entitled to a 10% discount on this title
Authors:
Zhang, Lonxiang; Ma, Xiaodong (College of Science, China Agricultural University, Beijing, China)
Zhang, Yan (College of Information and Electrical Engineering, China Agricultural University, Beijing, China)
Abstract:
The choice of the nonlinear activation functions, to which neurons are associated with, as a hyperparameter in a deep learning model, could significantly affect the performance of the deep neural network. However, so far, there is no quantitative principle to assist us in choosing the appropriate activation function when designing a neural network. Plenty of experiments seem to show that deep neural networks could converge regardless of the chosen activation functions. While networks with different activation units need different spans of running time to finish training and exhibit different performance, all of which are always concerns of researchers. The activation unit is always chosen based on prior knowledge before the network is constructed. The rectified linear unit (ReLU) usually results in faster training and higher accuracy than any other traditional activation function. Whereas, it also has some problems, which researchers try to design new units to solve. Thus, several new activation units have been proposed, such as LReLU, PReLU, ELU, SELU, Etc. These units are all based on ReLU and provide different ways to improve it. In this paper, we propose a new activation unit, Self-Adaptive Linear Units (SALUs), by linearly combining several classic activation functions (including ReLU) that have been theoretically and experimentally proven to work. Then we set the linear combination coefficients as variables that can be learned by gradient back-propagation to achieve the appropriate activation functions during training. Many of the currently proposed activation units can be considered as exceptional cases of our design of the first activation unit, such as Leaky ReLU, PReLU, ELU, Etc. In some extreme combination cases, our new activation unit degenerates into the traditional function used as a primary function in the linear combination. Thus, at worst, the neural network using our designed activation function performs the same as the homogeneous network using the primary activation function, such as ReLU. After that, based on the idea of weight sharing, we propose Safe Self-Adaptive Linear Units (Safe-SALUs), which further improves the generalization ability of networks. Then, with the help of Taylor’s first-order expansion, we demonstrate that our activation unit leads to faster learning in the early stage of training. We finally verify the benefits of this unit by experimenting with different neural networks in different datasets.