Shixia Liu
Tsinghua University
Lingyun Yu
Xi'an Jiaotong-Liverpool University
Siming Chen
Fudan University
14:00-17:00, August 19, 2025 (Tuesday)
Yunhai Ballroom 1, 3rd Floor, Westin Hotel
Many real-world scenarios require a remote expert guiding a local user to perform physical tasks, such as remote machine maintenance. Theories and systems have been developed to support this type of collaboration by augmenting the local user’s workspace with the expert’s hand gestures. In these systems, hand gestures are shared in different formats such as raw hands, projected hands, digital representations of gestures and sketches. However, effects of combination of these gesturing formats have not been fully explored and understood. We therefore have developed a series of systems to meet the needs of different real-world working scenarios using emerging and wearable technologies. In this talk, I will introduce some innovative techniques and systems that we designed, developed and evaluated in supporting remote guidance through augmented reality-based sharing of hand gestures.
Test-time adaptation (TTA) has emerged as a powerful approach for adapting pre-trained models to novel, unseen data distributions during inference—without requiring access to the original training data. Unlike traditional domain adaptation techniques that rely on source data, TTA operates in a source-free and unsupervised manner, updating the model on-the-fly using incoming test batches.
While TTA has been extensively studied in the context of natural images, its extension to other data modalities remains relatively unexplored. In this talk, we highlight recent advances in TTA for multimodal and 3D data. First, we demonstrate how a widely used Vision-Language Model (VLM), based on Contrastive Language-Image Pre-training (CLIP), can be adapted at test time to address a variety of domain shifts and corruptions in both image classification and segmentation tasks. We also introduce TTA techniques tailored for 3D point cloud data, improving robustness to modality-specific challenges such as occlusions, viewpoint variation, and background noise.
Conventional interactive scientific visualization systems rely heavily on graphical user interfaces, which, while flexible, often present a steep learning curve, particularly when expressing complex analytical intent. Semantics-based methods offer an alternative by allowing users to specify goals and interests through natural language. The system interprets these queries, extracts relevant features from scalar or vector field data, and generates appropriate visualizations. This approach lowers the barrier to entry and improves analytical efficiency. However, challenges remain, including aligning semantics with data, expressing and extracting complex features, and selecting suitable visualization parameters. This talk presents our ongoing work on natural language interfaces for semantics-driven scientific visualization.
In fields such as weather forecasting, oceanographic research, and disaster early warning, ensemble data plays a crucial role. However, effectively communicating the various uncertainties present in ensemble data remains a significant challenge in the field of visualization. Currently, ensemble data visualization mainly relies on two-dimensional displays, which are limited by the constraints of visual channels. This can result in cognitive biases, visual clutter, and obscured information. This report aims to discuss stereoscopic visualization techniques for ensemble data, utilizing depth cues to enhance the visual encoding and representation of multidimensional information. The goal is to improve the perception and reasoning of uncertainty distributions.
In recent years, the widespread application of AI techniques has enabled the inspection, understanding, and prediction of the behaviors of individuals and groups within massive datasets, providing new insights into society and the world while optimizing social governance. However, the black-box nature of complex algorithms and models has limited users' understanding of the mechanisms and predictions of the models, thereby affecting trust in the models. This talk will address the challenges by exploring how visual analytics approaches can facilitate the comprehension of complex algorithms and models, with a particular emphasis on optimization processes and language models.
Agenda:
Time | Title | Speaker |
---|---|---|
Moderator: Siming Chen | ||
14:00 - 14:10 | Opening | Siming Chen |
14:10 - 14:45 | Supporting AR-based hand gestures for remote guidance | Tony Huang |
14:45 - 15:20 | Beyond the Image: Test-Time Adaptation for Multimodal and Point Cloud Data | Christian Desrosiers |
15:20 - 15:40 | Coffee break | |
Moderator: Lingyun Yu | ||
15:40 - 16:10 | Semantics-based Scientific Visualization | Jun Tao |
16:10 - 16:40 | Visualizing Multifaceted Forecast Uncertainty in Immersive Environments | Le Liu |
16:40 - 17:10 | Visual Analytics on Explainable AI: Case Studies on Analyzing Optimization Processes and Language Models | Yuxin Ma |
17:10 - 17:15 | Closing | Lingyun Yu |