-
Notifications
You must be signed in to change notification settings - Fork 89
Description
Hello again! We've made some great progress with the new summer dataset and we think it might be useful as an out-of-sample test case for some of our other work. In fact, I'm writing up the results and I want to make sure I describe your experiments accurately.
Do you mind if I ask a few clarifying questions? We don't have any concerns about the data, but we want to include some basic descriptives of the population in our write-up. People will be curious and I want to get it correct! Your documentation has been useful so far, but I am still wondering about a few items:
-
What would be the right way to cite you all? Should we point to the repository? The 2018 ACL (Zhang et al.) doesn't cover the newer data, is there an update in the works?
-
The crowd workers on Yandex.Toloka - Is this similar to mechanical turk? Were they paid? Do you use any attention checks, or worker qualification, etc. (english proficiency?) to select people?
-
Similarly - What were the humans' instructions? Are they asked to chat for a set amount of time, or turns? Were they incentivized for their responses at all, or to finish?
-
The new single-question evaluation (on a five-point scale) is great, but do you have the word-for-word question of what participants were asked?
-
The last two items are "wants" but not "needs"... You didn't collect time stamps for each turn in the transcripts, did you? Likewise, do you know which bot comes from which team? It's clear there are a finite number of bots having several conversations. For example, some bots seem to always start with a very specific line (e.g. " i am a little tired from work") Others break down in a consistent way (e.g. search for "Traceback (most recent call last):") that we scrubbed. We'd be curious which bot comes from which team.