AutoMin 2025 Call for Participation

AutoMin 2025: Shared Task on Automatic Minuting

The Third Run of the Automatic Minuting Shared Task (AutoMin) will be part of SIGDIAL 2025 conference. A great opportunity to assess LLMs' ability to summarize and retrieve information from long contexts We propose AutoMin 2025, the third instance of bi-annual shared task on meeting summarization into structured meeting minutes. We build upon our past experience from two previous AutoMins and add a new challenge to facilitate personalized access: answering questions about meeting content. As in the previous editions, the tasks are run in English and Czech while the new question-answering task will be run monolingually (English-only) and cross-lingually (Czech questions about English meetings). This challenge is highly relevant for assessing LLMs' ability to summarize and retrieve information from long contexts, equivalent to the length of an hour-long meeting. A dedicated workshop is scheduled for August 28th, 2025, as part of SIGDIAL 2025.

Challenge Website: https://ufal.github.io/automin-2025/

AutoMin Tasks

We propose two main tasks (you can participate to one of them (A or B) or to both of them).

Task A (minuting): The task consists of automatically creating minutes from multiparty meeting transcripts. The generated minute would be evaluated both via automatic and LLM-as-judge metrics.
Task B (question answering): The task consists in answering questions based on meeting transcripts. For this challenge, participants will be tasked with answering these questions either monolingually (English questions on English transcripts) or cross-lingually (Czech questions on English transcripts). Answers will be evaluated using a LLM-as-a-judge metric.

Important Dates

Event	Date
Registration and CfP	From March 1st, 2025
Training and Dev Data Available	March 1st, 2025
Test Data Available	April 15th, 2025
System Run and Output Submission	May 31st, 2025
System Paper Submission	June 7th, 2025
Result Announcement	June 22nd, 2025
Notification for System Papers	July 15th, 2025
Camera Ready for System Papers	July 31st, 2025
Event Date	August 28th, 2025

Participant registration

All participants should register using this Google form.

Publication

All teams will be required to submit a brief technical report describing their method. The report should follow the SIGDIAL 2025 paper format and be submitted as a PDF. Please send your completed report by email to automin@ufal.mff.cuni.cz.

Data

Task A

Two main training data sources for the two domains of the test set are:

For ELITR Minuting Corpus, please use only the train and dev sets for training, do not use test or test2.

Aside from this, we recommend the following datasets to use in your training although our domains do not match them:

CNN-Daily Mail: You can use the scripts in here to generate the non-anonymized version of the corpus.
The AMI Meeting Corpus. You can download the summary corpus from here.
The ICSI Meeting Corpus
The Spotify Podcast Dataset.
The SAMSum Corpus.
The XSum Dataset.

Task participants are allowed to use any further data. When submitting, you need to indicate which data was used to train your system:

Minimal - minimal submissions use only the in-domain training data (i.e. only ELITR Minuting Corpus and EuroParlMin, for the respective section of the test set),
Constraint - constraint submissions may use the in-domain training data and CNN-DM, AMI, ICSI, SAMSum and XSum.
Non-constraint - non-constraint submissions may use any other data. Systems using API calls to large pre-trained models like ChatGPT are welcome, but fall into this category, too.

In any case, please clearly describe which data was used in what way in your system paper. A comprehensive list of summarization datasets could be found here:

Task B

The data sources for task B are:

Please use only the dev set for training and model selection, do not use test2. The ELITR-Bench repository describes in details the data, its JSON file structure and how to run the code for evaluation. The data for the monolingual and crosslingual tasks correspond respectively to the data.zip and czech.zip archives, which can be opened with the password "utter".

Evaluation

Evaluation of Task A and Task B

For Task A, ROUGE will be the primary metric for automatic evaluation (ROUGE-1, ROUGE-2, ROUGE-L). Additionally, we will use BERTScore and/or BARTScore. We will use frontier LLM-as-judge to simulate evaluation by human evaluators (Adequacy, Fluency, Grammatical Correctness, Relevance).

Evaluation of Task B

For the task B, the evaluation will adhere to the methodology introduced in https://aclanthology.org/2025.coling-main.28/, employing large language models as automated judges to assess the quality of responses. Specifically, these models will compare the system-generated answers to the human-crafted gold reference answer for each given query. All details on Task B evaluation are available on ELITR-Bench repository.

Contact

For further information about this task and dataset, please contact us at automin@ufal.mff.cuni.cz.