How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

This video program is a part of the Premium package:

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers


  • IEEE MemberUS $11.00
  • Society MemberUS $0.00
  • IEEE Student MemberUS $11.00
  • Non-IEEE MemberUS $15.00
Purchase

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

0 views
  • Share
Create Account or Sign In to post comments
We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan
We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan