Estonian First Encounter
Description
## Introduction
Within the project MINT (Multimodal Interaction – intercultural and technological aspects of video data collection, analysis, and use) we have collected a corpus of Estonian First Encounter dialogs. The goals of the MINT project are:
1. to create Estonian multi-modal video corpus on various conversational activities,
2. to provide analysis and annotation of the data that contributes to the previous work on annotation standards, guidelines, and schemes,
3. to study multimodal signals, especially gesturing, in social communication and conversation management, and indicating the interlocutors’ engagement and synchrony in communicative activity,
4. to build (computational) models for the coordination and controlling of interaction (e.g. taking turns, giving feedback), and constructing shared understanding, and
5. to investigate techniques and means for automatic recognition of multimodal signals, especially gestures.
This project aims to identify the communicative behaviors important from the perspective of a particular communicative situation in social interaction, and to analyze the choice and use of means of communication within this context. The first encounter dialogs engage participants, who do not know each other in advance, in an activity where their task is to chat and make acquaintance with each other.
## Data collection
Original data was collected in the Estonian language, and the data is annotated and analyzed using an annotation scheme which is co- measurable with the annotations used in NOMCO, a Nordic cooperation project. Each participant was given a short presentation of the project and the goals of the data collection before the recording, and they were also asked to sign a consent form (in the Estonian language) that grants permission for their video data to be used for research purposes, and to be shown to third parties without further permission.
The participants entered the recording environment through the doors at both ends of the video setup room, and if they arrived too early and needed to wait for their time, it was made sure that the pairs did not see each but in the experiment room. They were asked to proceed to the line marked on the floor. This was to ensure that both participants were approximately in the middle of the video camera views.
Three cameras were used: one recording each of the two partners. We used SonyHDR-XR550V cameras with three external Sony ECM-HW2 wireless microphones. The microphones were paired with cameras so that each camera had its own audio track. As for the video recording, we chose the full HD quality mode, although it turned out that the standard quality would have been good enough. The camera views were cut, edit and merged via Sony Vegas Pro 11, and they were syncronised and integrated into one single video film providing a mosaic view of the situation.
## Dataset statistics and organization
We have a total of 23 participants (12 male and 11 female), with age ranging between 21 and 61 years. The participants are native speakers of Estonian and they are students or university employees. Each participant took part in two encounters, i.e. with two different partners. The corpus contains 23 encounters, and each encounter is about 8 minutes long. They balanced with gender distribution and we have 8 female-female encounters, 7 female-male encounters, and 8 male-male encounters.
## Contents and annotations
The dataset provides recording in two primitive types of data: audio and video, which are annotated for three different tasks:
* The transcriptions are provided in both Estonian, and translated English.
* The laughter annotations mark the laughing events from each speaker.
* The topic annotations, in English, specify discussing topic along the conversation.
For more information concerned the structure of the dataset, you can check the Readme.txt included in the dataset.
Show moreYear of publication
2019
Authors
University of Helsinki - Publisher
Other information
Fields of science
Computer and information sciences; HUMANITIES; Languages
Language
English, Estonian
Open access
Permit required