Leonel Sousa received his PhD in electrical and computer engineering from the Instituto Superior Técnico (IST), Universidade de Lisboa (UL), Lisbon, Portugal, in 1996. He is currently a Full Professor and Chair of the Electrical and Computer Engineering Department at the IST and a Senior Researcher with the Instituto de Engenharia de Sistemas e Computadores – Investigação e Desenvolvimento (INESC-ID), Lisbon, Portugal. He spent three months in Japan at the beginning of 2017 with a prestigious JSPS Invitation Fellowship for Research, and he has been a Visiting Professor at The Carnegie Mellon University (CM) in the fall semester of 2017/2018. He has given more than 30 keynote, invited talks and tutorials. He has authored or co-authored more than 250 papers, appearing in international journals and conferences, and edited five special issues of international journals. As professor, he has given several undergraduate and graduate courses, and supervised 15 PhD Theses.
His research interests include computer architectures, parallel computing, computer arithmetic, and multimedia systems. Prof. Sousa is a Senior Member of IEEE, Fellow of the IET, and a Distinguished Scientist of the ACM. He served as a member of the organization committee for several international conferences, and he is currently an Associate Editor and Editor-in-Chief of several renowned international journals, including two IEEE Transactions and the IEEE Access. He received several awards for the quality and impact of his scientific publications (DASIP, SAMOS, UL/Santander).
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa
Email: las@inesc-id.pt
Phone: +351969737935
DVP term expires December 2021
Presentations
Modeling Performance and Energy-Efficiency of Multi-Cores: The Cache-Aware Roofline Approach and the Intel Advisor
As architectures evolve towards more complex multi-core designs, deciding what optimizations provide the best trade-off between performance and efficiency is becoming a prominent issue. To help in this decision process, a set of fundamental Cache-aware Roofline Models (CARMs) are presented in this tutorial, which allow characterizing the upper bounds of contemporary parallel architectures, namely multicore CPU and GPU architectures, for performance, power, energy and energy-efficiency. These models evaluate how key micro-architectural aspects, such as accessing different functional units or different memory hierarchy levels, affect the attainable performance, power and energy-efficiency.
Recently, the performance CARM was integrated by Intel as a fully supported feature into their proprietary Intel Advisor software tool, and it is described as “an incredibly useful diagnosis tool (…) that developers can use to guide them (in the application optimization process), ensuring that they can squeeze the maximum performance out of their code with minimal time and effort.” The proposed models are also rigorously validated on different CPU and GPU architectures by relying on hardware counters and specifically developed performance/power monitoring tools. Experimental results show a very high accuracy of the proposed models, and their ability to provide more intuitive and useful guidelines than the state-of-the-art approaches, when characterizing real-world applications from standard benchmark suites.
Modular Arithmetic-based Circuits and Systems for Emerging Technologies and Applications: Deep Neural Networks and Cryptography
Energy efficiency and limited power consumption are key aspects for the next-generation of integrated circuits and systems. Thus, together with the increase of performance, they should drive the design of new architectures and arithmetic units. Unconventional number systems, namely Residue Number Systems (RNSs), may hold the answer to these emerging challenges. RNS relies on the use of modular arithmetic to perform additions, subtractions and multiplications in parallel without any dependency between the RNS-digits, thus improving the energy efficiency. Due to a few limitations, such as conversion overheads and division, only recently have RNSs experienced a significant number of advances in its application to new domains, such as Deep Convolutional Neural Networks (DCNN) and cryptography. In this talk, we present a state-of-the-art overview concerning the use of the RNS not only to improve the performance of public-key cryptographic algorithms but also to make them more resistant to attacks. RNS for emerging post-quantum algorithms, namely the ones supporting lattice-based cryptosystems (LBCs), and Fully Homomorphic Encryption (FHE) are also covered in this seminar. The potential of RNS for the high-performance implementation of deep convolutional neural networks (DCNNs) is unveiled. A novel hardware implementation of RNS-based matrix multiplication useful for implementing DCNNs is discussed in this seminar.
Microprocessors/ MCUs for Internet of Things
Internet of Things (IoT) is a very hot topic in research and it has no universal definition. It generally refers to scenarios where different machines, such as objects, sensors and everyday items, are connected and require minimal human intervention—machine-to-machine interface (M2M). In this talk, the state-of-the-art architectures of the current microprocessors (MPUs) and microcontrollers (MCUs) will be analyzed, as well as the benchmarks for evaluating their performance and efficiency. This is quite challenging because, for the last 20 years, these devices were designed by focusing on their performance, while “neglecting” power/energy consumption, which is a very important aspect in IoT. Furthermore, the support for interconnection and communication with users, things and cloud services will also be discussed. Examples of commercial MPUs and MCUs will be provided, and the main investigation paths for developing the future processing devices for the IoT will be underlined.