Organizers
Nicu Sebe, University of Trento (Italy) and Alejandro (Alex) Jaimes, Yahoo! Research (Spain) and Hamid Aghajan, University of Stanford.
I. Synopsis
This tutorial will focus on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing multimedia systems. The tutorial will take a holistic view on the research issues and applications of Human-Centered Systems, focusing on three main areas:
- multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis;
- image indexing, and retrieval: user behavior, context modeling,cultural issues, and machine learning for user-centric approaches;
-
multimedia data: conceptual analysis at different levels (feature, cognitive, and affective).
This full-day tutorial will consist of two parts: the first half will consist of presentations by the instructors, and the second part will consist of practical workgroup activities.
II. Motivation
Multimedia and Computer Vision play a fundamental role in many new types of interfaces and application areas (multimodal and attentive interfaces, applications such as surveillance, medicine, art, etc.) in which humans play a central role. This implies that building multimedia systems (e.g., for human-computer interaction, etc.) lies at the crossroads of many research areas (psychology, artificial intelligence, pattern recognition, multimedia, etc.). Although many existing multimedia systems were designed with human uses in mind, many of them are far from being user friendly or are rooted on real-world human-needs (few existing systems can be considered “Human-Centered”). What are the current trends in computing and what can the scientific/engineering community do to effect a change for the better? On one hand, the fact that computers are quickly becoming integrated into everyday objects (ubiquitous and pervasive computing) implies that effective natural human-computer interaction is becoming critical (in many applications, users need to be able to interact naturally with computers the way face-to-face human-human interaction takes place). On the other hand, the wide range of applications that use multimedia, and the amount of multimedia content currently available, imply that building successful multimedia applications requires a deep understanding of multimedia content. The success of human-centered multimedia systems, therefore, depends highly on two joint aspects: (1) the way humans interact naturally with such systems (using speech and body language) to express emotion, mood, attitude, and attention, and (2) the human factors that pertain to multimedia data (human subjectivity, levels of interpretation).
In this tutorial, we take a holistic approach to developing human-centered multimedia systems. We aim to identify the important research issues, and to ascertain potentially fruitful future research directions in relation to the two aspects above. In particular, we introduce key concepts, discuss technical approaches and open issues in three areas:
- multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis;
- image indexing, and retrieval: user behavior, context modeling, cultural issues, and machine learning for user-centric approaches;
- multimedia data: conceptual analysis at different levels (feature, cognitive, and affective).
The focus of the tutorial, therefore, is on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing Human-Centered Multimedia Systems.
III. Benefits & List of Topics
This tutorial will enable the participants to understand key concepts, state-of-the-art techniques, and open issues in the areas described below. The tutorial will cover parts of the following topic areas:
- Vision for multimodal interaction: overview of techniques and state of the art in body tracking, gaze detection, and gesture recognition.
- Multimodal emotion recognition for affective retrieval and in affective interfaces: approaches to multimedia content analysis and interaction that use speech and facial expression recognition.
- Machine learning: adaptive multimodal interfaces and learning of visual concepts from user input for automatic detection and recognition (detection of scenes, objects, or events of interest).
- Multimodal fusion: technical approaches and issues in combining multiple media (e.g., audio-visual) for multimodal interaction and multimedia analysis.
-
Multimedia indexing: an overview of how humans perceive, index, organize, and search multimedia content. Discussion of studies in art, psychology, library sciences, and the development of conceptual frameworks for computational frameworks.
- Human issues: the role of memory, subjectivity, culture, context, and examples of technical approaches to multimedia analysis and interaction that consider these factors.
- Applications: traditional and emerging application areas will be described with specific examples in smart conference room research, arts, interaction for people with disabilities, entertainment, and others.
IV. Intended Audience
The tutorial is intended for Ph.D. students, scientists, engineers, application developers, computer vision specialists and others interested in the areas of information retrieval and human-computer interaction. A basic understanding of image processing and machine learning is a prerequisite.
V. Tentative Schedule & Format
Duration: full-day
Format: the tutorial will consist of two parts:
- Morning session in which there will be presentations by the organizers and willencourage discussion by the attendees.
- Afternoon session in which initially the participants will be asked to break into groups and focus on a particular theme or application in HCC. At the end of the thematic break out group sessions, each group will make a presentation. After this, new break out groups will be formed, in which the goal will be to think of the processes or methodologies presented and discussed during the morning session. The outcomes of this second stage will also be presented.
A tentative program of the tutorial is as follows:
09:00 – 10:20 Morning session: presentations by the organizers
10:20 – 10:40 Break
10:40 – 12:00 Morning session: presentation by the organizers
14:00 – 15:20 Thematic break-out group meetings & presentations (3 to 4 groups, each given a scenario and a task)
15:20 – 15:40 Break
15:40 – 17:00 Process break-out groups (1 hour, 3 to 4 groups, asked to “abstract” a processbased on the presentations).
Final presentations & summary (20 minutes)
VI. Materials
This tutorial has been specifically designed for the audience of ACM Multimedia –while the focus of the tutorial will be technical, we aim at giving participants a broad view of research and important topics for developing Human-Centered Multimedia Systems. Materials will include an overview of technical approaches for Vision-Based Human-Computer Interaction (largely from our 2007 CVIU survey), as well as materials from numerous sources (our ACM Multimedia 2006 paper discusses some sources we will use; articles from the IEEE Computer Special Issue on Human-Centered Computing that we co-edited, etc.).
VII. Related Tutorials
This tutorial differs from the previous tutorials (ICCV 2009, ICCV 2007, ACM MM 2007, etc.) in that we will include newer research results, demonstrations, and examples. We have also improved the structure and propose now a practical approach in the second half in which the audience is asked to actively participate and contribute to the discussion on the subject of the tutorial.
VIII. Organizers and Backgrounds
Nicu Sebe, University of Trento, Italy, email: sebe@disi.unitn.it
Alejandro (Alex) Jaimes, Yahoo! Research, Spain, email: ajaimes@yahoo-inc.com
Bios
Nicu Sebe received his PhD degree in 2001 from University of Leiden, The Netherlands. He is with the Faculty of Cognitive Sciences, University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He is the author of Robust Computer Vision—Theory and Applications (Kluwer, April 2003) and of Machine Learning in Computer Vision (Springer, May 2005). He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Image and Video Retrieval (CIVR) 2007, and WIAMIS 2009 and as one of the initiators and a Program Co-Chair of the Human-Centered Multimedia track of the ACM Multimedia 2007 conference. He is the general chair of ACM CIVR 2010 and a track chair of ICPR 2010. He has served as the guest editor for several special issues in IEEE Computer, Computer Vision and Image Understanding, Image and Vision Computing, Multimedia Systems, and ACM TOMCCAP. He has been a visiting professor in Beckman Institute, University of Illinois at Urbana-Champaign and in the Electrical Engineering Department, Darmstadt University of Technology, Germany. He was the recipient of a British Telecomm Fellowship. He is the co-chair of the IEEE Computer Society Task Force on Humancentered Computing and is an associate editor of Machine Vision and Applications, Image and Vision Computing, Electronic Imaging and of Journal of Multimedia.
Alejandro Jaimes is Senior Research Scientist at Yahoo! Research where he is leading new initiatives at the intersection of web-scale data analysis and user experience, with a particular focus on cultural differences in the context of social media and user engagement (social network analysis, data mining, user modeling). Dr. Jaimes is the founder of the ACM Multimedia Interactive Art program, and he is Director at Large for the Arts for ACM SIG Multimedia. He is Industry Track chair for ACM Recommender Systems 2010, was Industry Track chair for UMAP 2009, panels chair for KDD 2009, and Late-Breaking Results co-chair for WSDM 2009. He was program co-chair of ACM Multimedia 2008, co-editor of the IEEE Trans. on Multimedia Special issue on Integration of Context and Content for Multimedia Management (2008), and a founding member of the IEEE CS Taskforce on Human-Centered Computing. His work has led to over 60 technical publications in international conferences and journals, and to numerous contributions to MPEG-7. He has been granted several patents, and serves in the program committee of several international conferences (WWW, ACM Multimedia, CVPR, ICME, ICIP, CIVR, Creativity and Cognition, ICCV and ECCV Workshops on HCI, etc.). He has been an invited speaker at KDD 2009 and ECML-PKDD 2010 (Industry tracks), ACM Recommender Systems 2008 (panel), DAGM 2008 (keynote), 2007 ICCV Workshop on HCI, and several others. Dr. Jaimes received a Ph.D. in Electrical Engineering (2003) and a M.S. in Computer Science from Columbia U. (1997) in NYC. Before joining Yahoo! Dr. Jaimes managed the User Modeling and Data Mining group at Telefónica Research in Madrid. Prior to that Dr. Jaimes was Scientific Manager at IDIAPEPFL (Switzerland). He was also previously at Fuji Xerox (Japan), IBM TJ Watson (USA), IBM Tokyo Research Laboratory (Japan), Siemens Corporate Research (USA), and AT&T Bell Laboratories (USA).
Hamid Aghajan is a professor of Electrical Engineering (consulting) at Stanford University since 2003, where he leads the Ambient Intelligence Research lab. . Areas of research in his group consist of multi-camera networks and human interfaces for smart, vision-based reasoning environments, with application to smart homes, occupancy-based services, assisted living and well being, smart meetings, and avatar-based communication and social interactions. Hamid is co-editor-in-chief of the Journal of Ambient Intelligence and Smart Environments. He has co-authored 4 edited volumes on: Human-centric Interfaces for Ambient Intelligence, Multi-Camera Networks – Principles and Applications, and Handbook of Ambient Intelligence and Smart Environments, and Behaviour Monitoring and Interpretation. He has been editorial board member of the book series on Artificial Intelligence and Smart Environments by IOS Press, associate editor of Machine Vision and Applications, guest editor of IEEE J-STSP special issue on Distributed Processing in Vision Networks, guest editor of CVIU special issue on Multimodal Sensor Fusion, and guest editor of IEEE Trans. on Multimedia special issue on Affective Multimodal Interfaces. Hamid has been co-founder and technical co-chair of the first International Conference on Distributed Smart Cameras (ICDSC 2007), and general co-chair of ICDSC 2008. He has organized short courses on Human-centered Vision Systems and Multi-Camera Networks at CVPR 2007,
T10 – Human-Centered Multimedia Systems
Organizers
Nicu Sebe, University of Trento (Italy) and Alejandro (Alex) Jaimes, Yahoo! Research (Spain) and Hamid Aghajan, University of Stanford.
I. Synopsis
This tutorial will focus on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing multimedia systems. The tutorial will take a holistic view on the research issues and applications of Human-Centered Systems, focusing on three main areas:
This full-day tutorial will consist of two parts: the first half will consist of presentations by the instructors, and the second part will consist of practical workgroup activities.
II. Motivation
Multimedia and Computer Vision play a fundamental role in many new types of interfaces and application areas (multimodal and attentive interfaces, applications such as surveillance, medicine, art, etc.) in which humans play a central role. This implies that building multimedia systems (e.g., for human-computer interaction, etc.) lies at the crossroads of many research areas (psychology, artificial intelligence, pattern recognition, multimedia, etc.). Although many existing multimedia systems were designed with human uses in mind, many of them are far from being user friendly or are rooted on real-world human-needs (few existing systems can be considered “Human-Centered”). What are the current trends in computing and what can the scientific/engineering community do to effect a change for the better? On one hand, the fact that computers are quickly becoming integrated into everyday objects (ubiquitous and pervasive computing) implies that effective natural human-computer interaction is becoming critical (in many applications, users need to be able to interact naturally with computers the way face-to-face human-human interaction takes place). On the other hand, the wide range of applications that use multimedia, and the amount of multimedia content currently available, imply that building successful multimedia applications requires a deep understanding of multimedia content. The success of human-centered multimedia systems, therefore, depends highly on two joint aspects: (1) the way humans interact naturally with such systems (using speech and body language) to express emotion, mood, attitude, and attention, and (2) the human factors that pertain to multimedia data (human subjectivity, levels of interpretation).
In this tutorial, we take a holistic approach to developing human-centered multimedia systems. We aim to identify the important research issues, and to ascertain potentially fruitful future research directions in relation to the two aspects above. In particular, we introduce key concepts, discuss technical approaches and open issues in three areas:
The focus of the tutorial, therefore, is on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing Human-Centered Multimedia Systems.
III. Benefits & List of Topics
This tutorial will enable the participants to understand key concepts, state-of-the-art techniques, and open issues in the areas described below. The tutorial will cover parts of the following topic areas:
IV. Intended Audience
The tutorial is intended for Ph.D. students, scientists, engineers, application developers, computer vision specialists and others interested in the areas of information retrieval and human-computer interaction. A basic understanding of image processing and machine learning is a prerequisite.
V. Tentative Schedule & Format
Duration: full-day
Format: the tutorial will consist of two parts:
A tentative program of the tutorial is as follows:
09:00 – 10:20 Morning session: presentations by the organizers
10:20 – 10:40 Break
10:40 – 12:00 Morning session: presentation by the organizers
14:00 – 15:20 Thematic break-out group meetings & presentations (3 to 4 groups, each given a scenario and a task)
15:20 – 15:40 Break
15:40 – 17:00 Process break-out groups (1 hour, 3 to 4 groups, asked to “abstract” a processbased on the presentations).
Final presentations & summary (20 minutes)
VI. Materials
This tutorial has been specifically designed for the audience of ACM Multimedia –while the focus of the tutorial will be technical, we aim at giving participants a broad view of research and important topics for developing Human-Centered Multimedia Systems. Materials will include an overview of technical approaches for Vision-Based Human-Computer Interaction (largely from our 2007 CVIU survey), as well as materials from numerous sources (our ACM Multimedia 2006 paper discusses some sources we will use; articles from the IEEE Computer Special Issue on Human-Centered Computing that we co-edited, etc.).
VII. Related Tutorials
This tutorial differs from the previous tutorials (ICCV 2009, ICCV 2007, ACM MM 2007, etc.) in that we will include newer research results, demonstrations, and examples. We have also improved the structure and propose now a practical approach in the second half in which the audience is asked to actively participate and contribute to the discussion on the subject of the tutorial.
VIII. Organizers and Backgrounds
Nicu Sebe, University of Trento, Italy, email: sebe@disi.unitn.it
Alejandro (Alex) Jaimes, Yahoo! Research, Spain, email: ajaimes@yahoo-inc.com
Bios
Nicu Sebe received his PhD degree in 2001 from University of Leiden, The Netherlands. He is with the Faculty of Cognitive Sciences, University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He is the author of Robust Computer Vision—Theory and Applications (Kluwer, April 2003) and of Machine Learning in Computer Vision (Springer, May 2005). He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Image and Video Retrieval (CIVR) 2007, and WIAMIS 2009 and as one of the initiators and a Program Co-Chair of the Human-Centered Multimedia track of the ACM Multimedia 2007 conference. He is the general chair of ACM CIVR 2010 and a track chair of ICPR 2010. He has served as the guest editor for several special issues in IEEE Computer, Computer Vision and Image Understanding, Image and Vision Computing, Multimedia Systems, and ACM TOMCCAP. He has been a visiting professor in Beckman Institute, University of Illinois at Urbana-Champaign and in the Electrical Engineering Department, Darmstadt University of Technology, Germany. He was the recipient of a British Telecomm Fellowship. He is the co-chair of the IEEE Computer Society Task Force on Humancentered Computing and is an associate editor of Machine Vision and Applications, Image and Vision Computing, Electronic Imaging and of Journal of Multimedia.
Alejandro Jaimes is Senior Research Scientist at Yahoo! Research where he is leading new initiatives at the intersection of web-scale data analysis and user experience, with a particular focus on cultural differences in the context of social media and user engagement (social network analysis, data mining, user modeling). Dr. Jaimes is the founder of the ACM Multimedia Interactive Art program, and he is Director at Large for the Arts for ACM SIG Multimedia. He is Industry Track chair for ACM Recommender Systems 2010, was Industry Track chair for UMAP 2009, panels chair for KDD 2009, and Late-Breaking Results co-chair for WSDM 2009. He was program co-chair of ACM Multimedia 2008, co-editor of the IEEE Trans. on Multimedia Special issue on Integration of Context and Content for Multimedia Management (2008), and a founding member of the IEEE CS Taskforce on Human-Centered Computing. His work has led to over 60 technical publications in international conferences and journals, and to numerous contributions to MPEG-7. He has been granted several patents, and serves in the program committee of several international conferences (WWW, ACM Multimedia, CVPR, ICME, ICIP, CIVR, Creativity and Cognition, ICCV and ECCV Workshops on HCI, etc.). He has been an invited speaker at KDD 2009 and ECML-PKDD 2010 (Industry tracks), ACM Recommender Systems 2008 (panel), DAGM 2008 (keynote), 2007 ICCV Workshop on HCI, and several others. Dr. Jaimes received a Ph.D. in Electrical Engineering (2003) and a M.S. in Computer Science from Columbia U. (1997) in NYC. Before joining Yahoo! Dr. Jaimes managed the User Modeling and Data Mining group at Telefónica Research in Madrid. Prior to that Dr. Jaimes was Scientific Manager at IDIAPEPFL (Switzerland). He was also previously at Fuji Xerox (Japan), IBM TJ Watson (USA), IBM Tokyo Research Laboratory (Japan), Siemens Corporate Research (USA), and AT&T Bell Laboratories (USA).
Hamid Aghajan is a professor of Electrical Engineering (consulting) at Stanford University since 2003, where he leads the Ambient Intelligence Research lab. . Areas of research in his group consist of multi-camera networks and human interfaces for smart, vision-based reasoning environments, with application to smart homes, occupancy-based services, assisted living and well being, smart meetings, and avatar-based communication and social interactions. Hamid is co-editor-in-chief of the Journal of Ambient Intelligence and Smart Environments. He has co-authored 4 edited volumes on: Human-centric Interfaces for Ambient Intelligence, Multi-Camera Networks – Principles and Applications, and Handbook of Ambient Intelligence and Smart Environments, and Behaviour Monitoring and Interpretation. He has been editorial board member of the book series on Artificial Intelligence and Smart Environments by IOS Press, associate editor of Machine Vision and Applications, guest editor of IEEE J-STSP special issue on Distributed Processing in Vision Networks, guest editor of CVIU special issue on Multimodal Sensor Fusion, and guest editor of IEEE Trans. on Multimedia special issue on Affective Multimodal Interfaces. Hamid has been co-founder and technical co-chair of the first International Conference on Distributed Smart Cameras (ICDSC 2007), and general co-chair of ICDSC 2008. He has organized short courses on Human-centered Vision Systems and Multi-Camera Networks at CVPR 2007,