Attention and Transformers Networks Lecture

Nowadays, Artificial Intelligence drives scientific and economic growth worldwide. This is largely due to advances in Machine Learning (ML), notably in Deep Neural Networks (DNNs), which are essentially massive ‘learning by experience/examples’ systems. Their applications span and revolutionize almost every human activity:

-Autonomous Systems (cars, drones, vessels),
-Media Content and Art Creation (including fake data creation/detection), Social Media Analytics,
-Medical Imaging and Diagnosis,
-Financial Engineering (forecasting and analytics), Big Data Analytics,
-Broadcasting, Internet and Communications,
-Robotics/Control
-Intelligent Human-Machine Interaction, Anthropocentric (human-centered)Computing,
-Smart Cities/Buildings and Assisted living.
-Scientific Modeling and Analytics.

Several DNN advances and challenges hit the news almost every day, arising discussions on AI ethics, privacy protection and its societal impact.

Tutorial

Go to lecture

Education Provider

Intelligent Systems Lab - School of Informatics - Aristotle University of Thessaloniki

Short Summary

In this lecture, the limitations of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) in effectively processing sequences are emphasized. However, a breakthrough solution known as Transformers is introduced, which addresses these limitations comprehensively. The architecture of Transformers is meticulously described, with a particular emphasis on its fundamental building blocks. These include positional encoding, which captures the sequential information of input data, as well as multi-headed self or cross attention mechanisms that enable the model to capture dependencies between different elements of the sequence. Additionally, the lecture covers important concepts such as residual connections, which aid in the smooth propagation of information through the network, and layer normalization, which ensures stable training and efficient learning. The lecture also delves into the causal self-attention mechanism employed in decoding, enabling the model to generate output sequences in an autoregressive manner. Lastly, a brief mention is made regarding the optimization algorithms used to train Transformers effectively. Overall, this lecture provides a comprehensive understanding of Transformers and its key components, highlighting its ability to overcome the limitations of traditional RNNs and CNNs in sequence processing tasks.

Pace

Self-paced

Language

English

Country

Greece

Credits

Subject Categories

Machine learning

Keywords

Business Categories

Public Services

Last updated

Wed, 01/03/2024 - 09:18