From expressive and context-aware multimodal generation of digital humans to understanding the social cognition of real humans
Due to the rapid growth of virtual, augmented, and hybrid reality together with spectacular advances in artificial intelligence, the ultra-realistic generation and animation of digital humans with human-like behaviors is becoming a massive topic of interest. This complex endeavor requires modeling several elements of human behavior including the natural coordination of multimodal behaviors including text, speech, face, and body, plus the contextualization of behavior in response to interlocutors of different cultures and motivations. Thus, challenges in this topic are two folds—the generation and animation of coherent multimodal behaviors, and modeling the expressivity and contextualization of the virtual agent with respect to human behavior, plus understanding and modeling virtual agent behavior adaptation to increase human’s engagement. The aim of this workshop is to connect traditionally distinct communities (e.g., speech, vision, cognitive neurosciences, social psychology) to elaborate and discuss the future of human interaction with human-like virtual agents. We expect contributions from the fields of signal processing, speech and vision, machine learning and artificial intelligence, perceptual studies, and cognitive and neuroscience. Topics will range from multimodal generative modeling of virtual agent behaviors, and speech-to-face and posture 2D and 3D animation, to original research topics including style, expressivity, and context-aware animation of virtual agents. Moreover, the availability of controllable real-time virtual agent models can be used as state-of-the-art experimental stimuli and confederates to design novel, groundbreaking experiments to advance understanding of social cognition in humans. Finally, these virtual humans can be used to create virtual environments for medical purposes including rehabilitation and training.
This workshop, organized by renowned scientists in complementary domains, aims to connect communities of research scientists interested in virtual agents and human-agent or human-human interactions, with impact for future technological innovations and fundamental knowledge of human social behavior. Specifically, Mixing participants from academia and industry, participants will present and discuss current research trends and envision frontiers in the modeling and generation of human-like multimodal behavior and their application to the fields of neuroscience and social cognition. Further, mixing computational and cognitive neuroscience communities will enable knowledge exchange about human multimodal behavior and cognition to in turn create virtual agents capable of natural, engaging, and seamless multimodal social behavior with real humans.
SIVA'23 is organized as a satellite workshop of the IEEE International Conference on Automatic Face and Gesture Recognition 2023
|Deadline for submission||September, 12 2022|
|Acceptance notification to authors||October, 16 2022|
|Camera-ready workshop paper due||October, 31 2022|
|Workshop||January, 4 or 5 2023|
Topics of interest include but are not limited to:
- Analyzing and understanding of human multimodal behavior (speech, gesture, face)
- Creating datasets for the study and modeling of human multimodal behavior
- Coordination and synchronization of human multimodal behavior
- Analysis of style and expressivity in human multimodal behavior
- Cultural variability of social multimodal behavior
+ Modeling and generation
- Multimodal generation of human-like behavior (speech, gesture, face)
- Face and gesture generation driven by text and speech
- Context-aware generation of multimodal human-like behavior
- Modeling of style and expressivity for the generation of multimodal behavior
- Modeling paralinguistic cues for multimodal behavior generation
- Few-shots or zero-shot transfer of style and expressivity
- Slightly-supervised adaptation of multimodal behavior to context
+ Psychology and Cognition
- Cognition of deep fakes and ultra-realistic digital manipulation of human-like behavior
- Social agents/robots as tools for capturing, measuring and understanding multimodal behavior (speech, gesture, face)
- Neuroscience and social cognition of real humans using virtual agents and physical robots
Submissions to SIVA'23 should have no substantial overlap with any other paper already submitted or published, or to be submitted during the SIVA'23 review period. All persons who have made any substantial contribution to the work should be listed as authors, and all listed authors should have made some substantial contribution to the work. All authors should be aware that the paper is submitted to SIVA'23.
Papers presented at the SIVA'23 workshop will be published as part of the proceedings of the main FG'2023 conference and should, therefore, follow the same presentation guidelines as the main conference. Workshop papers will be included in IEEE Xplore.
The reviewing process for SIVA'23 will be “single-blind”, which means that only reviews are anonymous.
The submission process will be handled through the CMT System.
Two submission formats are proposed:
- Short paper : 3 pages + 1 extra page for references
- Long paper : 6 to 8 pages including references
Short papers encourage submissions of early research in original emerging fields.
Long papers promote the presentation of strongly original contributions, positional or survey papers.
Papers that use different formatting from the FG 2023 Latex or Word templates or exceeds the expected length will be automatically removed from the reviewing process.
Supplementary material (images, video, etc.) may optionally be submitted with papers, but be sure to maintain anonymity, including the file properties or other hidden text. The supplemental material has a file size limit of 100MB. The supplemental materials will not be part of the conference proceedings, so they are only there to aid the reviewing process. Reviewers are not required to view the supplemental material (though most reviewers are likely to do so), so any information critical to understanding the work should be in the main paper.
All supplementary material must be self-contained and zipped into a single file. The following formats are allowed: avi, mp4, pdf, wmv. Note that reviewers will be encouraged to look at it, but are not obligated to do so.
- Prepare your manuscripts as per the IEEE specification using the provided Latex or Word templates
- Carefully proofread your submission;
- Submit to the SIVA'23 CMT system.
NOTE: List all authors during initial submission. Modifying the author list after the review process is not allowed.
Nicolas Obin, Ircam and Sorbonne Université
Nicolas Obin is associate professor at the Faculty of Sciences and Engineering of Sorbonne Université and research scientist in the Sound Analysis and Synthesis team at the Science and Technology for Sound and Music laboratory (Ircam, CNRS, Sorbonne Université). He received a PhD. thesis in computer sciences on the modeling of speech prosody and speaking style for text-to-speech synthesis (2011) for which he obtained the best PhD thesis award from La Fondation Des Treilles in 2011. Through the years he has developed a strong interest in the behavior and communication between humans, animals, and robots. His main area of research is the structured generative modeling of complex human productions with various applications in neural speech synthesis and transformation, multi-modal virtual agent animation, and humanoid robotics and their use to study human cognition and biases.
Ryo Ishii, NTT Human Informatics Laboratories
Ryo Ishii received his M.S. degree in engineering from the Tokyo University of Agriculture and Technology and joined the NTT Corporation in 2008. He received his Ph.D. degree in informatics from Kyoto University in 2013. He was a visiting scholar at Carnegie Mellon University from 2019 to 2020. He is currently a distinguished research scientist at NTT Human Informatics Laboratories. His research interests are multimodal interaction and social signal processing. He is a member of IEICE (the Institute of Electronics, Information and Communication Engineers), JSAI (the Japanese Society for Artificial Intelligence), and (the HIS Human Interface Society).
Rachael E. Jack, University of Glasgow
Rachael E. Jack is Professor of Computational Social Cognition in the School of Psychology & Neuroscience at the University of Glasgow, Scotland. She is director of the ERC-funded FACEYNTAX laboratory, Lead of the Center for Social, Cognitive & Affective Neurosciences, and Lead of the Multimodal Social Interactions Group (MOSAIC). Jack’s research focuses on understanding human social communication with specific expertise in modeling dynamic facial signals within and across cultures with transference to artificial agents. Jack is recipient of several international awards and honors, including elected Fellow of the Association for Psychological Science (APS), the Spearman Medal of the British Psychology Society (BPS), and the Innovation award from the Social & Affective Neuroscience Society (SANS). She also serves several roles, including Associate Editor at Psychological Science, PC member (e.g., IEEE conferences), Chair of the APS Globalization Committee, and ERC Advanced Grant panel member.
Louis-Philippe Morency, Carnegie Mellon University
Louis-Philippe Morency is Associate Professor in the Language Technology Institute at Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He was formerly research faculty in the Computer Sciences Department at University of Southern California and received his Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions. He received diverse awards including AI’s 10 to Watch by IEEE Intelligent Systems, NetExplo Award in partnership with UNESCO and 10 best paper awards at IEEE and ACM conferences. His research was covered by media outlets such as Wall Street Journal, The Economist and NPR.
Catherine Pelachaud, CNRS - ISIR, Sorbonne Université
Catherine Pelachaud is Director of Research at CNRS in the laboratory ISIR, Sorbonne University. She received her PhD in Computer Graphics at the University of Pennsylvania, Philadelphia, USA in 1991. Her research interests include socially interactive agent, nonverbal communication (face, gaze, and gesture), expressive behaviors and socio-emotional agents. She has been in several organizing committees such as AAMAS2022, ICMI2021, ICMI2020, CASA'19, IVA 2019, FG'19, FG'17, and numerous workshops. She is recipient of the ACM – SIGAI Autonomous Agents Research Award 2015 and was honored with the title Doctor Honoris Causa of University of Geneva in 2016. Her Siggraph’94 paper received the Influential paper Award of IFAAMAS (the International Foundation for Autonomous Agents and Multiagent Systems).
Diversity, Equity & Inclusion
The format of this workshop will be hybrid online and onsite. This format proposes format of scientific exchanges in order to satisfy travel restrictions and COVID sanitary precautions, to promote inclusion in the research community (travel costs are high, online presentations will encourage research contributions from geographical regions which would normally be excluded), and to consider ecological issues (e.g., CO2 footprint). The organizing committee is committed to paying attention to equality, diversity, and inclusivity in consideration of invited speakers. This effort starts from the organizing committee and the invited speakers to the program committee.