Geometric Understanding of Convolutional Neural Networks

Encoder-decoder networks using convolutional neural network (CNN) architecture have been extensively used in deep learning literatures thanks to its excellent performance for various inverse problems. Inspired by recent theoretical understanding on {generalizability}, {expressivity} and {optimization landscape} of neural networks, as well as the theory of deep convolutional framelets, here we provide a unified theoretical framework that leads to a better understanding of the geometry of encoder-decoder CNN. Our unified framework shows that encoder-decoder CNN architecture is closely related to nonlinear frame representation using combinatorial convolution frames, whose expressivity increases exponentially with the depth. We also demonstrate the importance of skipped connection in terms of expressivity and optimization landscape. As an extension of this geometric understanding, we show that a novel attention scheme combined with bootstrapping and subnetwork aggregation improves network expressivity with minimal complexity increases. In particular, the attention module is shown to provide a redundant representation and an increased number of piecewise linear regions that improve the expressivity of the network. Thanks to the increased expressivity, the proposed network modification improves the reconstruction performance. As a proof of concept, we provide several modifications of the popular neural network baseline U-Net that is often used for image reconstruction. Experimental results show that the modified U-Net produces significantly better reconstruction results with negligible complexity increases.
  • IEEE MemberUS $11.00
  • Society MemberUS $0.00
  • IEEE Student MemberUS $11.00
  • Non-IEEE MemberUS $15.00
Purchase

Videos in this product

Geometric Understanding of Convolutional Neural Networks

00:26:43
0 views
Encoder-decoder networks using convolutional neural network (CNN) architecture have been extensively used in deep learning literatures thanks to its excellent performance for various inverse problems. Inspired by recent theoretical understanding on {generalizability}, {expressivity} and {optimization landscape} of neural networks, as well as the theory of deep convolutional framelets, here we provide a unified theoretical framework that leads to a better understanding of the geometry of encoder-decoder CNN. Our unified framework shows that encoder-decoder CNN architecture is closely related to nonlinear frame representation using combinatorial convolution frames, whose expressivity increases exponentially with the depth. We also demonstrate the importance of skipped connection in terms of expressivity and optimization landscape. As an extension of this geometric understanding, we show that a novel attention scheme combined with bootstrapping and subnetwork aggregation improves network expressivity with minimal complexity increases. In particular, the attention module is shown to provide a redundant representation and an increased number of piecewise linear regions that improve the expressivity of the network. Thanks to the increased expressivity, the proposed network modification improves the reconstruction performance. As a proof of concept, we provide several modifications of the popular neural network baseline U-Net that is often used for image reconstruction. Experimental results show that the modified U-Net produces significantly better reconstruction results with negligible complexity increases.