Hui Huang
Shenzhen University
Mengyu Chu
Peking University
13:30-17: 30, August 20, 2025 (Wednesday)
Donglai Hall, 2nd Floor, Westin Hotel
Attention layers play a critical role in generative models. In this talk, I will show that these layers capture rich semantic information, and particularly semantic correspondences between elements within the image and across different images. Through several works, I will show that the rich representations learned by these layers can be leveraged for image manipulation, consistent image generation, and personalization. Additionally, I will discuss the challenges that arise, especially in scenarios involving complex prompts with multiple subjects. Specific issues, such as semantic leakage during the denoising process, can lead to inaccurate representations, resulting in poor generations.
Creating stylized content can either be done by stylizing existing photographs or by generating stylized images from scratch. Many methods have been proposed to do both tasks, but they often struggle to balane content fidelity and artistic style or more generally, to separate style and content. In this talk I will present two efforts to face these challenges by analyzing the sensitivity of the network and models to various aspects in the generation or stylization. B-LoRA, is a method that leverages LoRA (Low-Rank Adaptation) to implicitly separate the style and content components of a single image, and Conditional Balance allows fine-grained control over style and content in image generation.
Between training and inference, lies a growing class of AI problems that involve fast optimization of a pre-trained model for a specific inference task. Like System-2 in the "thinking fast and slow" model of cognitive processing, these are not pure “feed-forward” inference problems applied to a pre-trained model, because they involve some non-trivial inference-time optimization beyond what the model was trained for; neither are they training problems, because they focus on a specific input. These compute-heavy inference workflows raise new challenges in machine learning and open opportunities for new types of user experiences and use cases. In this talk, I describe flavors of the new workflows in the context of text-to-image generative models including recent work on Image Editing and teaching models to count. I will also briefly discuss the generation of rare classes, and future directions.
Thin structures such as vessels or pipelines pose challenges due to discontinuities, bifurcations, sparse 3D data, and low contrast.
ThinGAT tackles fine-scale segmentation using a lightweight graph neural network with a modified attention mechanism and edge smoothness loss, preserving continuity and achieving state-of-the-art accuracy on medical benchmarks with only 961K parameters.
For reconstruction, we propose a sliding-box depth projection approach: local orthographic projections from multiple views enable precise recovery of thin geometry. Local reconstructions are fused into coherent 3D models, demonstrated on pulmonary artery CT data and industrial pipeline scans.
Together, these methods deliver compact, accurate, and generalizable solutions for thin structure segmentation and 3D reconstruction across medical and industrial domains.
With millions of pre-trained models now available online, including Implicit Neural Representations (INRs) and Neural Radiance Fields (NeRFs), neural network weights have emerged as a rich new data modality. This talk explores treating these weights as structured data objects with inherent symmetries that can be exploited for learning. We present architectures that process weight spaces while preserving these symmetries, including our equivariant architectures for multilayer perceptron weights (ICML 2023) and Graph Metanetworks (GMN) (ICLR 2024), which extend this approach to diverse network architectures. We also discuss recent work on learning with Low-Rank Adaptations (LoRA) and processing neural gradients. This research enables novel approaches for analyzing and modifying neural networks, with applications spanning from INR manipulation and generation to weight pruning and model editing.
The rapid development of 3D vision has called for more efficient and intelligent approaches to point cloud understanding. This talk presents three complementary advances that collectively address the challenges of learning, reasoning, and adaptation in 3D point cloud processing. First, we propose a parsimonious tri-vector representation for efficient and expressive 3D shape generation, reducing computational cost without sacrificing quality. Second, we introduce PointLLM-R, a chain-of-thought guided reasoning framework that enhances 3D point cloud inference via structured multi-step prompts. Third, we tackle distribution shifts in 4D point cloud segmentation with an active test-time adaptation strategy that improves robustness under unseen scenarios. Together, these methods offer a unified and forward-looking perspective on efficient 3D point cloud learning across diverse tasks and conditions.
We introduce Gaussian-enhanced Surfels (GESs), a bi-scale representation for radiance field rendering, wherein a set of 2D opaque surfels with view dependent colors represent the coarse-scale geometry and appearance of scenes, and a few 3D Gaussians surrounding the surfels supplement fine-scale appearance details. The entirely sorting free rendering of GESs not only achieves very fast rates, but also produces view-consistent images, successfully avoiding popping artifacts under view changes. Experimental results show that GESs advance the state-of-the-arts as a compelling representation for ultra-fast high-fidelity radiance field rendering.
Recent breakthroughs in 3D generation have transformed content creation, yet key challenges remain in achieving precise control over scene composition and local geometry editing. In this talk, I will present two innovative solutions addressing these limitations. First, DIScene introduces a structured scene graph approach that distills 2D diffusion knowledge into 3D generation, enabling style-consistent object modeling with explicit interaction handling through node-edge representations. Second, RELATE3D tackles local editing challenges by decomposing 3D latent spaces into semantically meaningful components, facilitated by a novel Refocusing Adapter that enables part-level modifications through multimodal alignment. Together, these methods establish a comprehensive framework for controllable 3D content creation.
The production process of 3D digital avatars is typically time-consuming and costly. However, with the increasing maturity of AIGC technology, using this technology to accelerate the production process of 3D digital avatars is becoming more and more feasible. This project aims to develop a stylized 3D chat avatar system and explore its application in the production of game release materials. By combining advanced AIGC technology, and computer graphics techniques, users will be able to easily create personalized and uniquely styled 3D talking avatars. We'll delve into various techniques for creating stylized 3D avatars and achieving real-time animation.
Pannel topic: Intelligence, Collaboration, and the Next Frontiers of Graphics
Panelist:
Prof. Hui Huang, and the 9 speakers
Moderator: Mengyu Chu, Peking University
Agenda:
Time | Title | Speaker |
---|---|---|
13:30 - 13:40 | Opening | Shi-Min Hu |
13:40 - 14:00 | Attention and Semantic Control in Generative Models | Daniel Cohen-Or |
14:00 - 14:20 | Efficient Learning, Reasoning, and Adaptation for 3D Point Clouds | Chaoqi Chen |
14:20 - 14:40 | Sytle-Content Separation and Control in Images | Ariel Shamir |
14:40 - 15:00 | When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering | Tianjia Shao |
15:00 - 15:20 | A "System 2" in Visual Generative AI | Gal Chechik |
15:20 - 15:40 | Coffee break | |
15:40 - 16:00 | Controllable 3D Generation and Editing | Taijiang Mu |
16:00 - 16:20 | Learning Thin 3D Structure Reconstructions | Andrei Sharf |
16:20 - 16:40 | Stylized and Emotional Character Animation and Interaction | Ye Pan |
16:40 - 17:00 | Learning in Deep Weight Spaces Through Symmetries | Haggai Maron |
17:00 - 17:30 | Panel Discussion |