Swish function
The swish function is a mathematical function defined as follows:
where β is either constant or a trainable parameter depending on the model. For β=1, the function becomes equivalent to the Sigmoid-weighted Linear Unit (SiL) function used in reinforcement learning,[3][2] whereas for β=0, the functions turns into the scaled linear function f(x)=x/2.[2] With β→∞, the sigmoid component approaches a 0-1 function, so swish becomes like the ReLU function. Thus, it can be viewed as a smoothing function which nonlinearly interpolates between a linear and the ReLU function.[2]
Applications
In 2017, after performing analysis on ImageNet data, researchers from Google alleged that using the function as an activation function in artificial neural networks improves the performance, compared to ReLU and sigmoid functions.[1][2] It is believed that one reason for the improvement is that the swish function helps alleviate the vanishing gradient problem during backpropagation.[4]
References
- Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (2017-10-16). "Swish: A Self-Gated Activation Function" (PDF). v1. Google Brain. arXiv:1710.05941v1. Archived (PDF) from the original on 2020-06-18. Retrieved 2020-06-18.
- Ramachandran, Prajit; Zoph, Barret; Le, Quoc V. (2017-10-27) [2017-10-16]. "Searching for Activation Functions" (PDF). v2. Google Brain. arXiv:1710.05941v2. Archived (PDF) from the original on 2020-06-18. Retrieved 2020-06-18.
- Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji (2017-11-02) [2017-02-23, 2017-02-10]. "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning" (PDF). v3. arXiv:1702.03118v3. Archived (PDF) from the original on 2020-06-18. Retrieved 2020-06-18.
- Serengil, Sefik Ilkin (2018-08-21). "Swish as Neural Networks Activation Function". Machine Learning, Math. Archived from the original on 2020-06-18. Retrieved 2020-06-18.