README.txt file for the GENEA Challenge 2020 data release
Last updated 2021-08-03


Dataset contents:
This release of the GENEA Challenge 2020 data  (folder "GENEA_Challenge_2020_data_release") comprises the following files and subfolders:
* README.txt: This file.
* Training_data (folder):
	- Audio (folder): Recorded audio of a speaking and gesticulating actor in WAV format.
	- Motion (folder): Time-aligned 3D upper-body motion capture data from the actor in BVH format.
	- Transcripts (folder): Time-aligned text transcriptions of the audio files. For privacy reasons, the transcriptions do not include references to real persons by names; these have been replaced by tokens.
* Test_data (folder):
	- Audio (folder): Recorded actor audio in WAV format, similar to the training data.
	- Motion (folder): Time-aligned 3D upper-body actor motion capture, similar to the training data. This test-set motion was not revealed to challenge participants and should not be used to train or tune motion-generation systems!
	- Transcripts (folder): Time-aligned text transcriptions of the test audio, similar to the training data.


Remarks:
Although matching audio and motion files sometimes are of different length, t=0 is always the same time in both files and they stay synchronised until the end of the shorter of the two files. Note that BVH files from challenge entries and video stimuli visualising speech and motion are provided in the public GENEA Workshop 2020 proceedings at https://zenodo.org/communities/genea2020/ and not in this data release.


Licensing:
Since the GENEA Challenge 2020 data builds on the Trinity Speech-Gesture Dataset by Ferstl and McDonnell, the data is covered by the license agreement of that database, which you already signed in order to access this material and this README.txt.

As part of the license agreement, work that uses this data should cite the below paper associated with the Trinity Speech-Gesture Dataset:

@inproceedings{ferstl2018investigating,
  title={IVA: Investigating the use of recurrent motion modelling for speech gesture generation},
  author={Ferstl, Ylva and McDonnell, Rachel},
  booktitle = {IVA '18 Proceedings of the 18th International Conference on Intelligent Virtual Agents},
  year={2018},
  month = {Nov},
  url = {https://trinityspeechgesture.scss.tcd.ie},
  month_numeric = {11}
}

For work that uses the challenge data but appears after the GENEA Workshop 2020, the challenge organisers additionally request that such work cites the organisers' main publication on the GENEA Challenge 2020. This is currently our IUI paper:

@inproceedings{kucherenko2021large,
  author = {Kucherenko, Taras and Jonell, Patrik and Yoon, Youngwoo and Wolfert, Pieter and Henter, Gustav Eje},
  title = {A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: {T}he {GENEA} {C}hallenge 2020},
  year = {2021},
  isbn = {9781450380171},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3397481.3450692},
  doi = {10.1145/3397481.3450692},
  booktitle = {26th International Conference on Intelligent User Interfaces},
  pages = {11--21},
  numpages = {11},
  keywords = {evaluation paradigms, conversational agents, gesture generation},
  location = {College Station, TX, USA},
  series = {IUI '21}
}


Credits:
The GENEA Challenge 2020 uses the Trinity Speech-Gesture Dataset collected by Ylva Ferstl and Rachel McDonnell. That data was further post-processed by a team at KTH Royal Institute of Technology. First Taras Kucherenko converted the FBX files with motion data into BVH format and converted all audio files to 24-bit mono. Then Simon Alexanderson and Jonas Beskow corrected misalignment issues between the audio and the motion in several files due to dropped frames. Finally Taras Kucherenko, with the assistance of Jonatan Lindgren, transcribed the dataset, where all names of real persons were replaced by tokens by Pieter Wolfert at Ghent University.


Additional information:
* Other GENEA Challenge 2020 materials: https://svito-zar.github.io/GENEAchallenge2020/
* The GENEA Workshop 2020 website: https://genea-workshop.github.io/2020/
* Main contact address of the GENEA 2020 organisers: genea-contact@googlegroups.com
* The Trinity Speech-Gesture Dataset: https://trinityspeechgesture.scss.tcd.ie/