SIVA'23 - Workshop on Socially Interactive Human-like Virtual Agents

Program

Program schedule is now available at the following link

Workshop description

From expressive and context-aware multimodal generation of digital humans to understanding the social cognition of real humans

Due to the rapid growth of virtual, augmented, and hybrid reality together with spectacular advances in artificial intelligence, the ultra-realistic generation and animation of digital humans with human-like behaviors is becoming a massive topic of interest. This complex endeavor requires modeling several elements of human behavior including the natural coordination of multimodal behaviors including text, speech, face, and body, plus the contextualization of behavior in response to interlocutors of different cultures and motivations. Thus, challenges in this topic are two folds—the generation and animation of coherent multimodal behaviors, and modeling the expressivity and contextualization of the virtual agent with respect to human behavior, plus understanding and modeling virtual agent behavior adaptation to increase human’s engagement. The aim of this workshop is to connect traditionally distinct communities (e.g., speech, vision, cognitive neurosciences, social psychology) to elaborate and discuss the future of human interaction with human-like virtual agents. We expect contributions from the fields of signal processing, speech and vision, machine learning and artificial intelligence, perceptual studies, and cognitive and neuroscience. Topics will range from multimodal generative modeling of virtual agent behaviors, and speech-to-face and posture 2D and 3D animation, to original research topics including style, expressivity, and context-aware animation of virtual agents. Moreover, the availability of controllable real-time virtual agent models can be used as state-of-the-art experimental stimuli and confederates to design novel, groundbreaking experiments to advance understanding of social cognition in humans. Finally, these virtual humans can be used to create virtual environments for medical purposes including rehabilitation and training.

This workshop, organized by renowned scientists in complementary domains, aims to connect communities of research scientists interested in virtual agents and human-agent or human-human interactions, with impact for future technological innovations and fundamental knowledge of human social behavior. Specifically, Mixing participants from academia and industry, participants will present and discuss current research trends and envision frontiers in the modeling and generation of human-like multimodal behavior and their application to the fields of neuroscience and social cognition. Further, mixing computational and cognitive neuroscience communities will enable knowledge exchange about human multimodal behavior and cognition to in turn create virtual agents capable of natural, engaging, and seamless multimodal social behavior with real humans.

SIVA'23 is organized as a satellite workshop of the IEEE International Conference on Automatic Face and Gesture Recognition 2023 and the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

Keynote speakers

The psychological benefits of virtual human agent
Gale Lucas (Institute for Creative Technologies, University of Southern California)

Dr. Gale M. Lucas is a research assistant professor at the University of Southern California, affiliated with the Institute for Creative Technologies and the Departments of Computer Science, Civil Engineering, and Psychology. She obtained her her PhD from Northwestern University in 2010. She works in behavioral science at the intersection of social science and engineering disciplines. Her research is in the areas of human-computer interaction, affective computing, trust-in-automation, and human-building interaction. Her research focuses on rapport, disclosure, trust, persuasion, and negotiation with virtual agents, social robots and intelligent built environments.

In order to explore the benefits of virtual human agents with the ability to engage socially with users, this talk presents research comparing such agents both to non-social machines and to humans. Social agents have the potential to build rapport like humans (which non-social machines cannot do), but do so while assuring anonymity (which humans cannot do). In this way, they may offer the “best of both worlds” in terms of psychological benefits, especially feeling comfortable in situations where they would otherwise be afraid of being negatively evaluated. This has implications for user design and offers possibilities for future research.

On Challenges and Opportunities in Situated Language Interaction
Dan Bohus (Microsoft Research)

Dan Bohus

Dan Bohus is a Senior Principal Researcher in the Adaptive Systems and Interaction Group at Microsoft Research. His work centers on the study and development of computational models for physically situated spoken language interaction and collaboration. The long-term question that shapes his research agenda is how can we enable interactive systems to reason more deeply about their surroundings and seamlessly participate in open-world, multiparty dialog and collaboration with people? Prior to joining Microsoft Research, Dan obtained his Ph.D. from Carnegie Mellon University.

Situated language interaction is a complex, multimodal affair that extends well beyond the spoken word. When interacting with each other, we use a wide array of verbal and non-verbal signals to resolve several problems in parallel: we manage engagement, coordinate on taking turns, recognize intentions, and establish and maintain common ground. Proximity and body pose, attention and gaze, head nods and hand gestures, as well as prosody and facial expressions, all play very important roles in this process. Recent advances with deep learning methods on various perceptual tasks promise to create a more robust foundation for tracking these types of signals. Yet, developing agents that can engage in fluid, natural interactions with people in physically situated settings requires not just detecting these signals, but incrementally coordinating with people, in real time, on producing them. In this talk, using a few research vignettes from work we have done over the last decade at Microsoft Research, I will draw attention to some of the challenges and opportunities that lie ahead of us in constructing systems that understand the world around and collaborate with people in physical space.

Paper Submission

Submissions to SIVA'23 should have no substantial overlap with any other paper already submitted or published, or to be submitted during the SIVA'23 review period. All persons who have made any substantial contribution to the work should be listed as authors, and all listed authors should have made some substantial contribution to the work. All authors should be aware that the paper is submitted to SIVA'23.

Papers presented at the SIVA'23 workshop will be published as part of the proceedings of the main FG'2023 conference and should, therefore, follow the same presentation guidelines as the main conference. Workshop papers will be included in IEEE Xplore.

The reviewing process for SIVA'23 will be “single-blind”, which means that only reviews are anonymous.

The submission process will be handled through the CMT System.

Two submission formats are proposed:

- Short paper : 3 pages + 1 extra page for references
- Long paper : 6 to 8 pages including references

Short papers encourage submissions of early research in original emerging fields.
Long papers promote the presentation of strongly original contributions, positional or survey papers.

Papers that use different formatting from the FG 2023 Latex or Word templates or exceeds the expected length will be automatically removed from the reviewing process.

Supplementary material (images, video, etc.) may optionally be submitted with papers, but be sure to maintain anonymity, including the file properties or other hidden text. The supplemental material has a file size limit of 100MB. The supplemental materials will not be part of the conference proceedings, so they are only there to aid the reviewing process. Reviewers are not required to view the supplemental material (though most reviewers are likely to do so), so any information critical to understanding the work should be in the main paper.

All supplementary material must be self-contained and zipped into a single file. The following formats are allowed: avi, mp4, pdf, wmv. Note that reviewers will be encouraged to look at it, but are not obligated to do so.

SUBMISSION STEPS

Prepare your manuscripts as per the IEEE specification using the provided Latex or Word templates
Carefully proofread your submission;
Submit to the SIVA'23 CMT system.

NOTE: List all authors during initial submission. Modifying the author list after the review process is not allowed.

Important dates

Deadline for submission	September, 12 2022 (23.59 pm Pacific Time)
Deadline extension for paper submission	September, 20 2022 (23.59 pm Pacific Time)
Acceptance notification to authors	October, 16 2022
Camera-ready workshop paper due	October, 31 2022
Workshop	January, 5 2023

Organizers

Nicolas Obin, Ircam and Sorbonne Université

Nicolas Obin is associate professor at the Faculty of Sciences and Engineering of Sorbonne Université and research scientist in the Sound Analysis and Synthesis team at the Science and Technology for Sound and Music laboratory (Ircam, CNRS, Sorbonne Université). He received a PhD. thesis in computer sciences on the modeling of speech prosody and speaking style for text-to-speech synthesis (2011) for which he obtained the best PhD thesis award from La Fondation Des Treilles in 2011. Through the years he has developed a strong interest in the behavior and communication between humans, animals, and robots. His main area of research is the structured generative modeling of complex human productions with various applications in neural speech synthesis and transformation, multi-modal virtual agent animation, and humanoid robotics and their use to study human cognition and biases.

Ryo Ishii, NTT Human Informatics Laboratories

Ryo Ishii

Ryo Ishii received his M.S. degree in engineering from the Tokyo University of Agriculture and Technology and joined the NTT Corporation in 2008. He received his Ph.D. degree in informatics from Kyoto University in 2013. He was a visiting scholar at Carnegie Mellon University from 2019 to 2020. He is currently a distinguished research scientist at NTT Human Informatics Laboratories. His research interests are multimodal interaction and social signal processing. He is a member of IEICE (the Institute of Electronics, Information and Communication Engineers), JSAI (the Japanese Society for Artificial Intelligence), and (the HIS Human Interface Society).

Rachael E. Jack, University of Glasgow

Rachael Jack

Rachael E. Jack is Professor of Computational Social Cognition in the School of Psychology & Neuroscience at the University of Glasgow, Scotland. She is director of the ERC-funded FACEYNTAX laboratory, Lead of the Center for Social, Cognitive & Affective Neurosciences, and Lead of the Multimodal Social Interactions Group (MOSAIC). Jack’s research focuses on understanding human social communication with specific expertise in modeling dynamic facial signals within and across cultures with transference to artificial agents. Jack is recipient of several international awards and honors, including elected Fellow of the Association for Psychological Science (APS), the Spearman Medal of the British Psychology Society (BPS), and the Innovation award from the Social & Affective Neuroscience Society (SANS). She also serves several roles, including Associate Editor at Psychological Science, PC member (e.g., IEEE conferences), Chair of the APS Globalization Committee, and ERC Advanced Grant panel member.

Louis-Philippe Morency, Carnegie Mellon University

Louis-Philippe Morency

Louis-Philippe Morency is Associate Professor in the Language Technology Institute at Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He was formerly research faculty in the Computer Sciences Department at University of Southern California and received his Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions. He received diverse awards including AI’s 10 to Watch by IEEE Intelligent Systems, NetExplo Award in partnership with UNESCO and 10 best paper awards at IEEE and ACM conferences. His research was covered by media outlets such as Wall Street Journal, The Economist and NPR.

Catherine Pelachaud, CNRS - ISIR, Sorbonne Université

Catherine Pelachaud

Catherine Pelachaud is Director of Research at CNRS in the laboratory ISIR, Sorbonne University. She received her PhD in Computer Graphics at the University of Pennsylvania, Philadelphia, USA in 1991. Her research interests include socially interactive agent, nonverbal communication (face, gaze, and gesture), expressive behaviors and socio-emotional agents. She has been in several organizing committees such as AAMAS2022, ICMI2021, ICMI2020, CASA'19, IVA 2019, FG'19, FG'17, and numerous workshops. She is recipient of the ACM – SIGAI Autonomous Agents Research Award 2015 and was honored with the title Doctor Honoris Causa of University of Geneva in 2016. Her Siggraph’94 paper received the Influential paper Award of IFAAMAS (the International Foundation for Autonomous Agents and Multiagent Systems).

Topics

Topics of interest include but are not limited to:

+ Analysis
- Analyzing and understanding of human multimodal behavior (speech, gesture, face)
- Creating datasets for the study and modeling of human multimodal behavior
- Coordination and synchronization of human multimodal behavior
- Analysis of style and expressivity in human multimodal behavior
- Cultural variability of social multimodal behavior

+ Modeling and generation
- Multimodal generation of human-like behavior (speech, gesture, face)
- Face and gesture generation driven by text and speech
- Context-aware generation of multimodal human-like behavior
- Modeling of style and expressivity for the generation of multimodal behavior
- Modeling paralinguistic cues for multimodal behavior generation
- Few-shots or zero-shot transfer of style and expressivity
- Slightly-supervised adaptation of multimodal behavior to context

+ Psychology and Cognition
- Cognition of deep fakes and ultra-realistic digital manipulation of human-like behavior
- Social agents/robots as tools for capturing, measuring and understanding multimodal behavior (speech, gesture, face)
- Neuroscience and social cognition of real humans using virtual agents and physical robots

Diversity, Equity & Inclusion

The format of this workshop will be hybrid online and onsite. This format proposes format of scientific exchanges in order to satisfy travel restrictions and COVID sanitary precautions, to promote inclusion in the research community (travel costs are high, online presentations will encourage research contributions from geographical regions which would normally be excluded), and to consider ecological issues (e.g., CO2 footprint). The organizing committee is committed to paying attention to equality, diversity, and inclusivity in consideration of invited speakers. This effort starts from the organizing committee and the invited speakers to the program committee.

Titre : App : Model : Id : Fields : SIVA'23 - Workshop on Socially Interactive Human-like Virtual Agents Éditer