I obtained my MPhil degree in Computer Science and Engineering from The Hong Kong University of Science and Technology (HKUST), where I was advised by Prof. Sai-Kit Yeung. Prior to that, I earned my B.S. in Computer Science (Advanced Program) with highest distinction (GPA: 3.93/4.0) from the University of Science, VNU - HCM, under the guidance of Prof. Minh-Triet Tran.
My research centers on advancing multimodal understanding, grounded in a strong background in computer vision, particularly in detection and segmentation. I first delved into these areas during my MPhil at HKUST and later expanded my expertise as an AI Research Engineer at Huawei HKRC, where I contributed to the MindONE project, focusing on multimodal and multitask generation models.
Currently, my work targets the development of World Foundation Multimodal Models capable of discovering the underlying physical laws and causal relationships within multimodal data. To achieve this, my research is structured around two interconnected pillars:
-
Egocentric Video Understanding (The Past & Present): Enabling AI to interpret human behavior and intent from continuous, first-person sensory data (such as video, sound, touch, and eye gaze). The goal is to build a grounded, queryable memory of events as they happen.
-
Physics-Plausible Video Generation (The Future): Moving beyond standard generation to build predictive, causally-consistent world models. This is the essential foundation for robust simulation and long-range planning.
π₯ News
- 2025.11: π― Paper Accepted! Excited to share that our paper βTransCuesβ was accepted to WACV 2026 after several attempts. Congratulations to anh Tuan-Anh and all the collaborators on this achievement! π
- 2025.08: π New Role! I have joined Fulbright University Vietnam as a Research Assistant in Computer Vision, where I am working on Physics-Plausible Video Generation.
- 2025.06 π Best Poster Presentation Award at AVSTC - Thrilled to receive this prestigious joint recognition from the Australian Government π¦πΊ, University of Technology Sydney (UTS) π, and PTIT for my research presentation, standing out among 100 participants! A fantastic validation of my research! β¨
- 2024.11: π New Role! Excited to announce that Iβve joined Huawei Hong Kong Research Center as an AI Research Engineer, where Iβll be contributing to cutting-edge AIGC (AI-Generated Content) projects. Looking forward to this new journey! π«
- 2024.10: π― Paper Accepted! My first paper βVATEXβ has been accepted to WACV 2025. Thrilled to share that this marks my debut as a co-first author in a top-tier computer vision conference! π
π» Experience
- 2025.08 - Current: Research Assistant in Computer Vision, Fulbright University Vietnam
- Conducted research on Video Diffusion Models, focus on Physical Plausibility of Video Generation
- Host weekly meeting and research seminar in our CV group.
- Assist junior student in Capstone and Thesis project on Computer Vision.
- 2024.11 - 2025.04: AI Research Engineer at Huawei Hong Kong Research Center
- Contributing to MindONE project, focusing on multimodal/multitask generation models and optimizing it on Ascend NPU.
- 2022.03 - 2022.07: Research Resident (Batch 7) at VinAI Research (now part of Qualcomm)
- Conducted research in spatio-temporal tasks: video instance/panoptic segmentation, 4D point cloud panoptic segmentation.
- 2021.03 - 2021.11: Research Intern at Vinbrain (now acquired by NVIDIA)
- Worked on medical image processing and developed deep learning models for healthcare solutions
π Publications

Tuan-Anh Vu, Nguyen Truong Hai, Zheng Ziqiang, Binh-Son Hua, Qing Guo, Ivor Tsang, Sai-Kit Yeung
- TransCues introduces an efficient transformer-based segmentation architecture capable of handling transparent, reflective, and general objects. By proposing Boundary Feature Enhancement (BFE) and Reflection Feature Enhancement (RFE), we enable the model to better capture subtle details in both glass and non-glass regions, resulting in more accurate and robust segmentation.

Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung
- VATEX is a novel method for referring image segmentation that leverages vision-aware text features to improve text understanding. By decomposing language cues into object and context understanding, the model can better localize objects and interpret complex sentences, leading to significant performance gains.

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation
Hai Nguyen-Truong, Quan-Dung Pham, Nam Nguyen Phuong, Khoa NA Nguyen, Chanh DT Nguyen, Trung Bui, Steven QH Truong
- SegTransVAE is the first work exploiting the hybrid architecture between CNN, Transformers with the Variational Autoencoder (VAE) branch to the network to reconstruct the input images jointly with segmentation.
π Honors and Awards
Academic Excellence
- 2025 Best Poster Presentation Award at AVSTC - Joint recognition from Australian Government, University of Technology Sydney (UTS), and PTIT.
- 2024 π UGC Research Travel Grant - HKUST Hong Kong
- 2022 - 2024 π° Postgraduate Scholarship (PGS) - HKUST Hong Kong
- 2022 π Merit Award for Highest Distinction - President of Vietnam National University
- 2022 π₯ Third Prize, EURΓKA - Student Scientific Research Award, Vietnam
- 2021 π Merit Award for Excellence in AI Research - Ho Chi Minh Government
- 2017, 2018, 2020, 2021 π Odon Vallet Scholarship - Vietnam
Mathematical Achievement
- 2017 & 2018 π₯ Third Prize - Vietnamese Mathematics Olympiads
- 2015 & 2016 π₯ Gold Medal - April 30th Traditional Mathematics Olympiad
- 2016 π Merit Award for Excellence - Minister of Education
- 2015 π Merit Award - Ho Chi Minh Young Pioneer Organization
π Challenges and Competitions
International Recognition
- 2023 & 2024 π₯ Runner-up - Maritime Computer Vision Workshop (WACV), Hawaii
- 2020 π₯ First Prize - MediaEval 2021 Sports Video Classification, Europe
National Achievements
- 2021 π₯ First Prize - Ho Chi Minh AI Challenge (Scene Text Recognition)
- 2021 π₯ Third Prize - AI4VN Hackathon (City Problem Classification)
- 2020 π₯ Runner-up - Zalo AI Challenge (Traffic Sign Detection)