The Third Run of the Automatic Minuting Shared Task (AutoMin) will be part of SIGDIAL 2025 conference. A great opportunity to assess LLMs' ability to summarize and retrieve information from long contexts We propose AutoMin 2025, the third instance of bi-annual shared task on meeting summarization into structured meeting minutes. We build upon our past experience from two previous AutoMins and add a new challenge to facilitate personalized access: answering questions about meeting content. As in the previous editions, the tasks are run in English and Czech while the new question-answering task will be run monolingually (English-only) and cross-lingually (Czech questions about English meetings). This challenge is highly relevant for assessing LLMs' ability to summarize and retrieve information from long contexts, equivalent to the length of an hour-long meeting. A dedicated workshop is scheduled for August 28th, 2025, as part of SIGDIAL 2025.
Challenge Website: https://ufal.github.io/automin-2025/
We propose two main tasks (you can participate to one of them (A or B) or to both of them).
Event | Date |
---|---|
Registration and CfP | From March 1st, 2025 |
Training and Dev Data Available | March 1st, 2025 |
Test Data Available | April 15th, 2025 |
System Run and Output Submission | April 29th, 2025 |
System Paper Submission | May 5th, 2025 |
Result Announcement | May 22nd, 2025 |
Notification for System Papers | June 2025 (TBA) |
Camera Ready for System Papers | TBA |
Event Date | August 28th, 2025 |
All participants should register using this Google form.
All teams will be required to submit a brief technical report describing their method. Details will be given on the challenge website soon.
Two main training data sources for the two domains of the test set are:
For ELITR Minuting Corpus, please use only the train and dev sets for training, do not use test or test2.
Aside from this, we recommend the following datasets to use in your training although our domains do not match them:
In any case, please clearly describe which data was used in what way in your system paper. A comprehensive list of summarization datasets could be found here:
The data sources for task B are:
Please use only the dev set for training and model selection, do not use test2. The ELITR-Bench repository describes in details the data, its JSON file structure and how to run the code for evaluation. The data for the monolingual and crosslingual tasks correspond respectively to the data.zip and czech.zip archives, which can be opened with the password "utter".
We will use several quality criteria which are common for evaluation of text samples produced by automatic language generation systems: adequacy, readability, grammaticality and relevance. Unlike other similar tasks, textual coherence will not be taken into account, because we believe meeting minutes are not always supposed to have a coherent textual form. The manual evaluation will be carried out blindly by our annotators.
Additionally, we will launch a pilot evaluation via our ALIGNMEET tool. The evaluation will be based on the alignment between the transcript and minutes.
ROUGE will be the primary metric for automatic evaluation (ROUGE-1, ROUGE-2, ROUGE-L). Additionally, we will use BERTScore and/or BARTScore.
For the task B, the evaluation will adhere to the methodology introduced in https://aclanthology.org/2025.coling-main.28/, employing large language models as automated judges to assess the quality of responses. Specifically, these models will compare the system-generated answers to the human-crafted gold reference answer for each given query. All details on Task B evaluation are available on ELITR-Bench repository.
For further information about this task and dataset, please contact us at automin@ufal.mff.cuni.cz.