How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

This video program is a part of the Premium package:

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Purchase

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

0 views

Create Account or Sign In to post comments

We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan

We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan

Next Up

00:10:00

Anti-Jamming Routing For Internet Of Satellites: A Reinforcement Learning Approach

00:10:00

00:10:00

00:11:17

Article Production Process: Author Gateway and POPP - PoE 2020

00:05:01

Article Production Process: Service Levels & Workflow Options - PoE 2020

00:28:33

AWS Partner Solution Showcase presented by Scott Francis IoT Partner Solutions Architect at Amazon Web Services