How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

Collection:

IEEE ICASSP 2020 Virtual Conference May 2020

We propose simple architectural modifications in the standard Transformer with the goal to reduce its total state size (defined as the number of self-attention layers times the sum of the key and value dimensions, times position) without loss of performan

IEEE MemberUS $11.00
Society MemberUS $0.00
IEEE Student MemberUS $11.00
Non-IEEE MemberUS $15.00

Purchase

All Channels page: Communities submenu block

Communities

IEEE Awards

IEEE TechEthics™

IEEE Students

IEEE Women in Engineering

IEEE Future Networks

All Channels page: Societies submenu block

Societies

IEEE Nuclear and Plasma Sciences Society

IEEE Society on Social Implications of Technology

IEEE Computer Society

IEEE Signal Processing Society

IEEE Power Electronics Society

Events Showcase: ES submenu block

Event showcases

Recently Added Speakers

Events Hub Submenu block

Education: Education submenu block

Education Activity

Genetic Programming Hyper-heuristics for Combinatorial Optimisation: Yi Mei CIS Webinar

Signal Processing on Manifolds

30 Years to High Temperature Superconductivity (HTS): Status and Perspectives

Educational Resources for Humanitarian Activities - Michael Lightner - Brief Sessions: Sections Congress 2017

Implantable, Insertable and Wearable Micro-optical Devices for Early Detection of Cancer - Plenary Speaker, Christopher Contag - IPC 2018

2020 EAB AWARDS

2020 EAB AWARDS

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

About IEEE

All Channels page: Communities submenu block

Communities

All Channels page: Societies submenu block

Societies

Events Showcase: ES submenu block

Event showcases

Recently Added Speakers

Events Hub Submenu block

Education: Education submenu block

Education Activity

2020 EAB AWARDS

2020 EAB AWARDS

How Much Self-Attention Do We Need? Trading Attention For Feed-Forward Layers

Videos in this product