News
- 2025 May: Call of Paper and Submission Guidelines are now available.
- 2025 March: Website Launched: The MA-LLM25 workshop website is now launched.
- 2025 March: Workshop Accepted! Our workshop has been officially accepted at ACM MM 2025.
Getting insight in large multimedia collections is crucial in many domains. The emergence of Multimodal Large Language Models (MLLMs) has given an unprecedented boost in the accuracy and applicability of multimedia analysis. The primary way of interacting with those models, however, is still via text-based prompting or conversational agents. Text-based interaction adds an intermediate layer, thereby obfuscating the underlying data, which is a cumbersome way of getting insight from multimedia data.
Multimedia analytics, on the other hand, combines techniques from multimedia analysis, visualization, and data mining for extracting insights from large-scale multimedia collections. The synergetic interaction between expert and machine is crucial in this process as it allows for expert-driven extraction of rich and diverse insights. Visualizations can enable such interaction, by presenting scalable views of datasets, ranging from high-level summaries of collections to individual data-points, compact summaries of results, and possible navigation directions for exploration. Interactive visualizations combined with multimodal conversational agents, have the potential to significantly widen the communication channel between humans and MLLM, yielding much more effective ways of getting insight from the data.
Realizing this potential raises a number of questions for Multimedia Analytics, such as: