Taku Komura joined Hong Kong University in 2020. Before joining HKU, he worked at the University of Edinburgh (2006-2020), City University of Hong Kong (2002-2006) and RIKEN (2000-2002). He received his BSc, MSc and PhD in Information Science from University of Tokyo. His research has focused on data-driven character animation, physically-based character animation, crowd simulation, 3D modelling, cloth animation, anatomy-based modelling and robotics. Recently, his main research interests have been on physically-based animation and the application of machine learning techniques for animation synthesis. He received the Royal Society Industry Fellowship (2014) and the Google AR/VR Research Award (2017). In 2024, he was ranked 15th in the AI 2000 Most Influential Scholars list in Computer Graphics.https://www.cs.hku.hk/~taku
Physics-based Control for Human-Scene Interaction
In this talk, I will present our recent contributions to physics-based character control, which has a wide range of applications from computer games to humanoid robot control. Recent advances in deep reinforcement learning have significantly improved character controllability and expanded the range of possible movements by humanoids. Still, some challenges remain in topics such as human-scene interactions and high-level control.
In terms of human-scene interactions, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of multiple skills, e.g., sitting down while carrying an object. To address this issue, we propose TokenHSI, a single, unified transformer-based policy that can facilitate multi-skill unification and flexible adaptation. The key insight is to model the humanoid proprioception as a separate shared token and combine it with distinct task tokens via a masking mechanism. Such a unified policy enables effective knowledge sharing across skills, thereby facilitating multi-task training. Moreover, our policy architecture supports variable-length inputs, allowing for the flexible adaptation of learned skills to new scenarios.
In terms of high-level control, we introduce SIMS, which seamlessly bridges high-level script-driven intent with a low-level control policy, enabling more expressive and diverse human-scene interactions. Specifically, we employ Large Language Models with Retrieval-Augmented Generation (RAG) to generate coherent and diverse long-form scripts, providing a rich foundation for motion planning. A versatile multi-condition physics-based control policy is also developed, which leverages text embeddings from the generated scripts to encode stylistic cues, simultaneously perceiving environmental geometries and accomplishing task goals.
Finally, I will discuss other ongoing research for deploying humanoid robots to domestic environments.