TBD
Rank | Team | Score |
1 | ? | ? |
TBD
Rank | Team | Score |
1 | ? | ? |
The Responsible Multimodal AI Challenge aims to foster advancements in the development of reliable and trustworthy multimodal AI systems by addressing two crucial tasks: i) multimodal hallucination detection and ii) multimodal factuality detection. These tasks are designed to highlight the challenges and encourage innovative solutions for mitigating critical risks associated with generative multimodal AI. Task A, Multimodal Hallucination Detection, focuses on identifying hallucinated content in AI-generated captions for images. Participants will analyze captions to detect objects, attributes, or relationships that are fabricated or unsupported by the visual input. Task B, Multimodal Factuality Detection, emphasizes verifying the factuality of textual claims using both visual and contextual textual information. Participants will assess the factuality of claims in realworld scenarios. By addressing these tasks, the challenge seeks to promote the development of robust evaluation methodologies and algorithms that mitigate risks such as misinformation, bias, and errors in multimodal systems.
The challenge contains two subtasks.
Task 1: Multimodal Hallucination Detection.
The goal of this task is to detect hallucinated information in captions generated by multimodal AI systems. Participants are provided with an image and a caption generated by an AI system describing the image. Along with this, a predefined list of options is provided, where each option represents an object, attribute, or relationship mentioned in the caption. Participants must analyze the generated caption to determine which options in the list are hallucinated, meaning they are not supported or justified by the content of the given image. These hallucinations might include objects that do not appear in the image, attributes that do not match the visual characteristics, or relationships between objects that are incorrectly described. The task is treated as a multi-label classification problem, where multiple options in the list can simultaneously be hallucinations or non-hallucinations.
Task 2: Event Argument Exraction.
This task focuses on verifying the factual accuracy of claims by analyzing multimodal inputs. Participants are given:
• A claim in textual form, which can be a news headline, a sentence from an article, or a social media post.
• An accompanying image related to the claim.
• Additional context in textual form, such as the full text of the news article, a related social media discussion, or other supplementary information.
Participants must determine the factual accuracy of the claim
based on all the provided inputs. They need to assign one
of four possible labels to the claim: "True", "False",
"Partially True", and "Not Verifiable". This
task is treated as a four-class classification problem, requiring
participants to consider both the visual and textual evidence
to assess the claim’s factuality comprehensively.
The F1 score is computed using precision (P) and recall (R), which are calculated as follows:
P = TP / ( TP + FP )
R = TP / ( TP + FN )
F1 = 2 * P * R / ( P + R )
where TP, FP, and FN represent specific items that are used to calculate the F1 score in the context of a Confusion_matrix. In particular, when computing the micro-F1 score, TP corresponds to the number of predicted tuple that match exactly with those in the gold set.We will utilize synthetic and real-world multimodal datasets for these tasks, including balanced distributions of various scenarios. Table I and Table II are the dataset statistics. The datasets include diverse scenarios such as indoor, outdoor, social, and news contexts to ensure robust evaluation.
Dataset for Task 1.
Dataset for Task 2.
To participate in the challenge, please first register by submitting the form.
Please submit your models and predicted results (in json files "results.json") and zip in one file. Participants should submit by sending email to us kai.lk@u.nus.edu. We will review the submissions publish the ranking here.
Please note: The submission deadline is at 11:59 p.m. (Anywhere on Earth) of the stated deadline date.
Registration Opens | February 01, 2025 |
Training Data Release | February 30, 2025 |
Challenge Result Submission Deadline | April 1, 2025 |
Leaderboard Release | April 10, 2025 |
Challenge Paper Submission Deadline | April 30, 2025 |
Kai Liu (National University of Singapore, Singapore)
Xudong Han (University of Sussex, UK)
Yanlin Li (National University of Singapore, Singapore)
Hao Li (Wuhan University, China)
Zheng Wang (Wuhan University, China)