
It's a normal day as an analyst at a social media company like Instagram or Rednote. You open your laptop to a million new images—your task: figuring out what content are your users uploading and what's trending. Usually, you tag images into known categories under some predefined topics (Sports, Tech, Beauty...) to sort these images. But is that enough? What about the emerging topics you haven't thought of yet?
Meet X-Cluster. It automatically explores massive, unstructured image collections to discover meaningful and interpretable grouping criteria and organize the images for you—no predefined rules, no manual effort. It doesn't just sort the images; it identifies new, hidden distribution directly from the visual data. Just sit back and explore today's fresh insights, discover hidden opportunities, and stay ahead effortlessly.
Sounds great, right? Grazie!
Organizing unstructured visual data into semantic clusters is a key challenge in computer vision. Traditional deep clustering approaches focus on a single partition of data, while multiple clustering (MC) methods address this limitation by uncovering distinct clustering solutions. The rise of large language models (LLMs) and multimodal LLMs has enhanced MC by allowing users to define text clustering criteria. However, expecting users to manually define such criteria for large datasets before understanding the data is impractical.
In this work, we introduce the task of Open-ended Semantic Multiple Clustering (OpenSMC), that aims to automatically discover clustering criteria from large, unstructured image collections, uncovering interpretable substructures without requiring human input.
Our framework, X-Cluster: eXploratory Clustering, uses text as a proxy to concurrently reason over large image collections, discover partitioning criteria, expressed in natural language, and reveal semantic substructures. To evaluate X-Cluster, we introduce the COCO-4c and Food-4c benchmarks, each containing four grouping criteria and ground-truth annotations. We apply X-Cluster to various real-world applications, such as discovering biases and analyzing social media image popularity, demonstrating its utility as a practical tool for organizing large unstructured image collections and revealing novel insights. We will open-source our code and benchmarks for reproducibility and future research.