Research & Publications

Exploring the intersection of Computer Vision, Multimodal Understanding, and Generative AI.

WACV 2026
TransCues

Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues

Tuan-Anh Vu, Nguyen Truong Hai, Zheng Ziqiang, Binh-Son Hua, Qing Guo, Ivor Tsang, Sai-Kit Yeung

TransCues introduces an efficient transformer-based segmentation architecture capable of handling transparent, reflective, and general objects. By proposing Boundary Feature Enhancement (BFE) and Reflection Feature Enhancement (RFE), we enable the model to better capture subtle details in both glass and non-glass regions, resulting in more accurate and robust segmentation.

Segmentation Transformer Transparent Objects
WACV 2025
VATEX

Vision-Aware Text Features in Referring Expression Segmentation: From Object Understanding to Context Understanding

Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

VATEX is a novel method for referring image segmentation that leverages vision-aware text features to improve text understanding. By decomposing language cues into object and context understanding, the model can better localize objects and interpret complex sentences, leading to significant performance gains.

Segmentation Referring Expression Multimodal
ISBI 2022
SegTransVAE

SegTransVAE: Hybrid CNN - Transformer with Regularization for medical image segmentation

Hai Nguyen-Truong, Quan-Dung Pham, Nam Nguyen Phuong, Khoa NA Nguyen, Chanh DT Nguyen, Trung Bui, Steven QH Truong

SegTransVAE is the first work exploiting the hybrid architecture between CNN, Transformers with the Variational Autoencoder (VAE) branch to the network to reconstruct the input images jointly with segmentation.

Medical Imaging Segmentation Transformer VAE