MLiu - Mens et Manus

Email
mingxuan.liu@unitn.it
Location

Trento, Trentino-Alto Adige, Italy

About me

Ciao! I am a final-year PhD student in Deep Learning and Computer Vision at the University of Trento, under the joint supervision of Prof. Elisa RICCI and Prof. Zhun ZHONG.

Recently, I completed my research visit at the University of California, Los Angeles (UCLA), where I worked on automatic urban simulation scene creation from city-tour videos under the guidance of Prof. Bolei ZHOU.

From June 2023 to December 2024, I also spent 18 wonderful months as a Visiting Researcher at NAVER LABS Europe exploring open-vocabulary object detection supervised by Gabriela CSURKA, Riccardo VOLPI, and Tyler L. HAYES, in the team led by Diane Larlus.

Before starting my PhD, I earned two Master’s degrees—Summa Cum Laude—from KTH Royal Institute of Technology (Sweden) in Intelligent Autonomous Systems and from the University of Trento (Italy) in Mechatronics Engineering. I also spent three years as an Innovation Engineer at SIEMENS Smart Infrastructure Division, designing IoT-based automation solutions.

🚨On the Job Market:🚨 I’ll be graduating in Spring 2026 — if you’re working on the future of practical open-world machines and multimodal foundation models, let’s chat!

The Research I Like & Do: Training open-world machines to see, understand, and reason about our chaotic visual, semantic, and physical world — Going Better and Wilder!

Knowledge Discovery

Uncovering structured and interpretable semantic and geometric knowledge from real-world visual data.
Open-vocabulary Recognition

Empowering users to freely define their vocabulary to localize, segment, and recognize objects of interest within image streams in a zero-shot workflow
Vision and Language

Leveraging language as a medium to enhance vision tasks, delivering improved interpretability and interactivity in more real-world scenarios.
Urban Embodied AI Simulation

Parsing our chaotic 3D semantic world from city-tour videos and turning it into high-fidelity urban simulation worlds for embodied AI learning.

Things I'd like to share

[10/2025]: After nine wonderful and inspiring months at UCLA, our new work UrbanVerse — scaling urban simulation scenes for embodied AI — is out! Check out the project here!
[01/2025]: I'm excited to share that I joined the Zhou Lab at UCLA as a visiting researcher, where I'll have the privilege of being advised by Prof. Bolei ZHOU.
[10/2025]: Honored to be selected as Outstanding Reviewer for CVPR 2025!
[07/2024]: I successfully got $5,000 funding (in credits) from OpenAI to support my research!
[05/2024]: One paper on incremental novel class discovery with large scale pre-trained models is accepted as an Oral paper at ICPR 2024.
[05/2024]: Filed my first US Patent: "A Method for Using Semantic Hierarchy Trees to Increase the Robustness of Open-vocabulary Object Detection Models"!
[02/2024]: One paper on open-vocabulary object detection with semantic hierarchy (work done with NAVER LABS Europe) is accepted as a Highlight paper, 2.8% acceptance rate at CVPR 2024! Thanks to the team!
[01/2024]: One paper on discovering fine-grained semantic concepts with LLMs is accepted to ICLR 2024! See you in Vienna this May!

Selected Publications

UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos

Mingxuan Liu^*, Honglin He^*, Elisa Ricci, Wayne Wu, Bolei Zhou

Technical report, arXiv:2510.15018, 2025

Project Page Paper

Organizing Unstructured Image Collections using Natural Language

Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci

Technical report, arXiv:2410.05217, 2025

Project Page Paper

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler L. Hayes, Gabriela Csurka, Elisa Ricci, Riccardo Volpi

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, Highlight paper, 2.8% acceptance rate), 2024

Paper Code

Democratizing Fine-grained Visual Recognition with Large Language Models

Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

International Conference on Learning Representations (ICLR), 2024

Paper Project Page Code Poster

Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci

International Conference on Pattern Recognition (ICPR, Oral paper), 2024

Paper Code

Class-incremental Novel Class Discovery

Subhankar Roy*, Mingxuan Liu*, Zhun Zhong, Nicu Sebe, Elisa Ricci

* denotes co-first authorship

European Conference on Computer Vision (ECCV), 2022

Paper Code Poster

Siemens RWG Control Platform Advanced Course and Practice

Jiaxin Han, Huixia Zhao, Kaixuan Zhang, Mingxuan Liu (Deputy Editor-in-Chief & Author)

China Electric Power Press (CEPP), 2022

ISBN: 9787519859947

Book

Tutorial book about IoT-based building automation control platform, including Programmable Logic Controller (PLC), Internet-of-Things (IoT), and cloud-based Software-as-a-Service (SaaS). Work carried out at SIEMENS.

Siemens Designo CC Building Management System Software and Practice

Huixia Zhao, Jiaxin Han, Kaixuan Zhang, Chao Wang, Lin Feng, Jianqiao Feng, Jian Li, Mingxuan Liu (Author)

China Electric Power Press (CEPP), 2022

ISBN: 9787519853341

Book

Tutorial book about building automation management software (Siemens Desigo CC). Work carried out at SIEMENS.

The V-SLAM Hurdler: A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation

Mingxuan Liu

Digitala Vetenskapliga Arkivet (DiVA, Master Thesis), 2022

Work done at Ericsson Lund, Sweden.

Paper Code owned by Ericsson, not available

Project

ORB-SLAM3 Deployment on Underwater Autonomous Vehicle (UAV) SAM in Simulator and Real-world

Mingxuan Liu

Work carried at KTH Royal Institute of Technology supervised by Prof. John Folkesson, 2021

Code

Hot Steel Plate Tracking and Rotation Angle Detection

Mingxuan Liu

Work carried at Technical University of Munich summer school for SMS Group, 2020

Code

Mini Cheetah Robotic Leg Design, and Kinematic and Dynamic Simulation

Mingxuan Liu

Work carried at University of Trento supervised by Prof. Francesco Biral, 2020

Code

Virtual RGB-D Camera Unity Implementation for Point Cloud Generation

Mingxuan Liu,

Work carried at University of Trento, 2020

Code

Community Service

ICLR: Reviewer'2025, 2026
NeurIPS: Reviewer'2024
ICML: Reviewer'2025
CVPR: Reviewer'2024, 2025
ICCV: Reviewer'2025
ECCV: Reviewer'2024
IJCV: Reviewer'2024

Resume

Education

University of Trento
Nov. 2022 — Present Trento, Trentino-Alto Adige, Italy
PhD student in Deep Learning and Computer Vision

Advisor: Elisa RICCI and Zhun ZHONG
University of California, Los Angeles
Jan. 2025 — Oct. 2025 Los Angeles, California, USA
Visiting Researcher on automatic urban simulation scene creation from city-tour videos

Advisor: Bolei ZHOU
KTH Royal Institute of Technology
Aug. 2021 — Jul. 2022 Stockholm, Sweden
Master's degree in Intelligent Autonomous Systems

Grade: A
University of Trento
Sep. 2020 — Aug. 2021 Trento, Trentino-Alto Adige, Italy
Master's degree in Mechatronics Engineering

Grade: 110L/110, Summa Cum Laude

Working Experience

NAVER LABS Europe
Jun. 2023 — Dec. 2024 Grenoble, Auvergne-Rhône-Alpes, France
Position: Visiting Researcher at Visual Representation Learning Team

Advisor: Gabriela CSURKA, Riccardo VOLPI, and Tyler L. HAYES

Responsibility:

(1) Improving open-vocabulary and vocabulary-free object detection on handling novel classes;

(2) Benchmarking Layout2Image diffusion models.
Ericsson
Jan. 2022 — Jun. 2022 Lund, Sweden
Position: Master Thesis Intern at the Department of Device Software Research

Topic: The V-SLAM Hurdler: A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation

(1) Approximate spatial computing for Simultaneous Localization and Mapping (SLAM) algorithm on AR/MR devices

(2) Investigate quantization methods for CNN-based Object Detection (YOLOv4) and Instance Segmentation algorithms (Mask R-CNN)

(3) Investigate the cloud-device cooperation mechanism of distributed Semantic SLAM based on the approximate computing techniques
Siemens Co., Ltd.
Jul. 2017 – Jul. 2020 Beijing, China
Position: Innovation Engineer at Department of Innovation, Smart Infrastructure Division

Responsibility:

(1) Conducted innovative application research, market analysis and competitive product analysis in building automation industry with Internet-of-Things (IoT) technology

(2) Designed software and hardware solution for IoT-based application and demonstrated the solution in a zero to one product definition and development fashion

(3) Defined features and functions for the innovative products based on the new use cases

(4) Cooperated with the internal R&D and production departments, third-party partners (OEM manufacturers, value-added partners, system integrators) to develop and deploy innovative products

(5) Managed and deployed pilot projects of the innovative IoT-based products

Patent

A Method for Using Semantic Hierarchy Trees to Increase the Robustness of OvOD Models
Mar. 2024 US Patent App. Status: Filed; under processing
Mingxuan Liu, Tyler L. Hayes, Gabriela Csurka, Elisa Ricci, Riccardo Volpi

Publication

UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Technical report, arXiv:2510.15018, 2025.
Mingxuan Liu, Honglin He, Elisa Ricci, Wayne Wu, Bolei Zhou
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding
WACV, Early acceptance paper, Top 6.4%, 2026.
Jun Li, Che Liu, Wenjia Bai, Mingxuan Liu, Rossella Arcucci, Cosmin I. Bercea, Julia A. Schnabel, Elisa Ricci
Superpowering Open-Vocabulary Object Detectors for X-ray Vision using Large Language Models
ICCV, 2025.
Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu, Feng Xue, Daniel Cores, Nicu Sebe, Manuel Mucientes, Elisa Ricci
Organizing Unstructured Image Collections using Natural Language
Technical report, arXiv:2410.05217, 2025.
Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci
Test-time Vocabulary Adaptation for Language-driven Object Detection
ICIP, 2025.
Mingxuan Liu, Tyler L. Hayes, Massimiliano Mancini, Gabriela Csurka, Elisa Ricci, Riccardo Volpi
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
CVPR, Highlight paper, 2.8% acceptance rate, 2024.
Mingxuan Liu, Tyler L. Hayes, Gabriela Csurka, Elisa Ricci, Riccardo Volpi
Democratizing Fine-grained Visual Recognition with Large Language Models
ICLR, 2024.
Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci
Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery
ICPR, Oral paper, 2024.
Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci
Class-incremental Novel Class Discovery
ECCV, 2022.
Subhankar Roy*, Mingxuan Liu*, Zhun Zhong, Nicu Sebe, Elisa Ricci

* denotes co-first authorship
The V-SLAM Hurdler: A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation
DiVA, Master Thesis, 2022.
Mingxuan Liu
Siemens RWG Control Platform Advanced Course and Practice
China Electric Power Press (CEPP), 2022. ISBN: 9787519859947
Jiaxin Han, Huixia Zhao, Kaixuan Zhang, Mingxuan Liu (Deputy Editor-in-Chief & Author)
Siemens Designo CC Building Management System Software and Practice
China Electric Power Press (CEPP), 2022. ISBN: 9787519853341
Huixia Zhao, Jiaxin Han, Kaixuan Zhang, Chao Wang, Lin Feng, Jianqiao Feng, Jian Li, Mingxuan Liu (Author)

Grant

OpenAI Researcher Access Program
Feb. 2024 Awarded $5,000 API credits
Role: Principal Investigator
Italian SuperComputing Resource Allocation – ISCRA
Jan. 2022 Awarded $30,000 (8,000 Nvidia DGX GPU Hours). Code: HP10C58YK9
Role: Principal Investigator

Technical Skill

Computer Vision
90%
Deep Learning
80%
Robotics
70%
Natural Language Processing
50%
Simulation (Kinematic, Dynamic, Sensor, Mobile Agents)
60%
Automatic Control
65%

Language Skill

Beijing Dialect
99%
Chinese
90%
English
80%
Italian
5%

Hobby

Bodybuilding
80%
Reading
85%
Playing Piano
40%
Hiking
90%
Cooking
95%

Publication

UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos

Mingxuan Liu, Honglin He, Elisa Ricci, Wayne Wu, Bolei Zhou

Technical report, arXiv:2510.15018, 2025

Project Page Paper
Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding

Jun Li, Che Liu, Wenjia Bai, Mingxuan Liu, Rossella Arcucci, Cosmin I. Bercea, Julia A. Schnabel

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV, Early acceptance paper, Top 6.4%), 2026

Project Page Paper
Superpowering Open-Vocabulary Object Detectors for X-ray Vision using Large Language Models

Pablo Garcia-Fernandez, Lorenzo Vaquero, Mingxuan Liu, Feng Xue, Daniel Cores, Nicu Sebe, Manuel Mucientes, Elisa Ricci

The International Conference on Computer Vision (ICCV), 2025

Project Page Paper
Organizing Unstructured Image Collections using Natural Language

Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci

Technical report, arXiv:2410.05217, 2025

Project Page Paper
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler L. Hayes, Gabriela Csurka, Elisa Ricci, Riccardo Volpi

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, Highlight paper, 2.8% acceptance rate), 2024

Paper Code
Democratizing Fine-grained Visual Recognition with Large Language Models

Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

International Conference on Learning Representations (ICLR), 2024

Paper Project Page Code Poster
Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci

International Conference on Pattern Recognition (ICPR, Oral paper), 2024

Paper Code
Class-incremental Novel Class Discovery

Subhankar Roy*, Mingxuan Liu*, Zhun Zhong, Nicu Sebe, Elisa Ricci

* denotes co-first authorship

European Conference on Computer Vision (ECCV), 2022

Paper Code Poster
Siemens RWG Control Platform Advanced Course and Practice

Jiaxin Han, Huixia Zhao, Kaixuan Zhang, Mingxuan Liu (Deputy Editor-in-Chief & Author)

China Electric Power Press (CEPP), 2022

ISBN: 9787519859947

Book

Tutorial book about IoT-based building automation control platform, including Programmable Logic Controller (PLC), Internet-of-Things (IoT), and cloud-based Software-as-a-Service (SaaS). Work carried out at SIEMENS.
Siemens Designo CC Building Management System Software and Practice

Huixia Zhao, Jiaxin Han, Kaixuan Zhang, Chao Wang, Lin Feng, Jianqiao Feng, Jian Li, Mingxuan Liu (Author)

China Electric Power Press (CEPP), 2022

ISBN: 9787519853341

Book

Tutorial book about building automation management software (Siemens Desigo CC). Work carried out at SIEMENS.
The V-SLAM Hurdler: A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation

Mingxuan Liu

Digitala Vetenskapliga Arkivet (DiVA, Master Thesis), 2022

Work done at Ericsson Lund, Sweden.

Paper Code owned by Ericsson, not available

Project

ORB-SLAM3 Deployment on Underwater Autonomous Vehicle (UAV) SAM in Simulator and Real-world

Mingxuan Liu

Work carried at KTH Royal Institute of Technology supervised by Prof. John Folkesson, 2021

Code

Hot Steel Plate Tracking and Rotation Angle Detection

Mingxuan Liu

Work carried at Technical University of Munich summer school for SMS Group, 2020

Code

Mini Cheetah Robotic Leg Design, and Kinematic and Dynamic Simulation

Mingxuan Liu

Work carried at University of Trento supervised by Prof. Francesco Biral, 2020

Code

Virtual RGB-D Camera Unity Implementation for Point Cloud Generation

Mingxuan Liu,

Work carried at University of Trento, 2020

Code

Fun Cards

Where am I from?

I grew up in Beijing. In fact, I have lived in Beijing for 26 years. Funny thing, I've never been to the Great Wall : P
Why Trento?

Come for the mountain, stay for the people.
What are my sports?

Bodybuilding and basketball! I love body building and the feeling of precise control over my muscles. Bodybuilding is one of the few things that you can get what you work for via scientific discipline. I used to be a semi-professional body builder back to my bachelor time. Actually, I earned most of my snacking (aka dining-out and clubbing) money by being an online coaching and selling fitness plan during my bachelor.
What championships have I won?

When I was in elementary school, I won the first place of Beijing Fangshan District Toy 4WD Competition. Thereby, I represented Fangshan District in the national Toy 4WD Competition.
What is my favorite cartoon?

Crayon Shin-chan, period.
Who is my favorite rapper?

2Pac is my spirit, Eminem is my life.

The Research I Like & Do: Training open-world machines to see, understand, and reason about our chaotic visual, semantic, and physical world — Going Better and Wilder!

Knowledge Discovery

Open-vocabulary Recognition

Vision and Language

Urban Embodied AI Simulation

Education

University of Trento

University of California, Los Angeles

KTH Royal Institute of Technology

University of Trento

Working Experience

NAVER LABS Europe

Ericsson

Siemens Co., Ltd.

Patent

A Method for Using Semantic Hierarchy Trees to Increase the Robustness of OvOD Models

Publication

UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos

Knowledge to Sight: Reasoning over Visual Attributes via Knowledge Decomposition for Abnormality Grounding

Superpowering Open-Vocabulary Object Detectors for X-ray Vision using Large Language Models

Organizing Unstructured Image Collections using Natural Language

Test-time Vocabulary Adaptation for Language-driven Object Detection

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Democratizing Fine-grained Visual Recognition with Large Language Models

Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

Class-incremental Novel Class Discovery

The V-SLAM Hurdler: A Faster V-SLAM System using Online Semantic Dynamic-and-Hardness-aware Approximation

Siemens RWG Control Platform Advanced Course and Practice

Siemens Designo CC Building Management System Software and Practice

Grant

OpenAI Researcher Access Program

Italian SuperComputing Resource Allocation – ISCRA

Technical Skill

Computer Vision

Deep Learning

Robotics

Natural Language Processing

Simulation (Kinematic, Dynamic, Sensor, Mobile Agents)

Automatic Control

Language Skill

Beijing Dialect

Chinese

English

Italian

Hobby

Bodybuilding

Reading

Playing Piano

Hiking

Cooking

Where am I from?

Why Trento?

What are my sports?

What championships have I won?

What is my favorite cartoon?

Who is my favorite rapper?

Daniel lewis

Contact Form