How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

Collection:

IEEE ICASSP 2020 Virtual Conference May 2020

We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Purchase

All Channels page: Communities submenu block

Communities

IEEE Awards

IEEE Women in Engineering

IEEE TechEthics™

IEEE Students

IEEE Future Networks

All Channels page: Societies submenu block

Societies

IEEE Computer Society

IEEE Society on Social Implications of Technology

IEEE Communications Society

IEEE Nuclear and Plasma Sciences Society

IEEE Signal Processing Society

Events Showcase: ES submenu block

Event showcases

Recently Added Speakers

Events Hub Submenu block

Education: Education submenu block

Education Activity

Evolutionary Computation - A Technology Inspired by Nature

Artificial Neural Networks, Intro

SOC DESIGN METHODOLOGY FOR IMPROVED ROBUSTNESS

Modulaciones Digitales Avanzadas - Parte 1

Voltage Metrology with Superconductive Electronics

2020 EAB AWARDS

2020 EAB AWARDS

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

About IEEE

All Channels page: Communities submenu block

Communities

All Channels page: Societies submenu block

Societies

Events Showcase: ES submenu block

Event showcases

Recently Added Speakers

Events Hub Submenu block

Education: Education submenu block

Education Activity

2020 EAB AWARDS

2020 EAB AWARDS

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

Videos in this product