![]() To note, in this blog post, we will discuss Swish itself and not the NAS method that was used by the authors to discover it. This paper essentially evaluates Swish empirically on various NLP-focused tasks. However, this blog post is not only based on the paper specified above, but also on another paper published at EMNLP, titled " Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks". The paper proposes a novel activation function called Swish, which was discovered using a Neural Architecture Search (NAS) approach and showed significant improvement in performance compared to standard activation functions like ReLU or Leaky ReLU. In this blog post, however, we take a look at a paper proposed in 2018 by Google Brain titled " Searching for activation functions", which spurred a new wave of research into the role of different types of activation functions. ReLU (Rectified Linear Unit) has been widely accepted as the default activation function for training deep neural networks because of its versatility in different task domains and types of networks, as well as its extremely cheap cost in terms of computational complexity (considering the formula is essentially $max(0,x)$). From the early days of a step function to the current default activation in most domains, ReLU, activation functions have remained a key area of research. Since the inception of perceptrons, activation functions have been a key component impacting the training dynamics of neural networks. Activation functions not only help with training by introducing non-linearity, but they also help with network optimization. Activation functions might seem to be a very small component in the grand scheme of hundreds of layers and millions of parameters in deep neural networks, yet their importance is paramount. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |