Friday, August 1, 2014

Some things I learned from running a big Mechanical Turk study

I'm not a big crowd researcher, but Mechanical Turk can be a great platform. The key words are "can be". It sounds great: pay a thousand people a dollar to do your survey, and for $1k, you have a huge amount of data overnight! But it's not really that simple. Here are some things I've learned. (there are a lot.)

Human/study design:
  • Read this. These folks have worked and researched on Turk a lot longer than I have. http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters
  • Pay people enough. This is maybe the #1 thing I hear on Turk forums, and the #1 piece of advice I can give to make the whole thing a good experience. These are people doing work. It's not just screwing around for fun. HITs are hard. Turking is hard. See also: Jeff Bigham's experiences Turking for a day.
  • $8/hour is a starting point. You're not getting by on $2/hour like you may have thought you could back in the early days. Or, if you're getting by on $2/hour, you are paying people sweatshop wages (and by the way, probably getting sweatshop level work). Why $8? People want to make about US minimum wage. It's a nice round benchmark, at least. It's still way cheaper than you could get it done any other way.
  • Pay more if you can. The US minimum wage is (adjusted for inflation) historically low. $8/hour is in no way a living wage. (also, it assumes that Turkers spend no time searching for tasks or any other overhead. Seattle recently voted to raise their wage to $15/hour. Maybe you can too. Related: Dynamo's Fair Payment page.
  • Turkers are good people. At least, the ones that you get for $8/hr are. They are not, for the most part, trying to scam you. Maybe 2% are. Accept that as a cost of doing business (that's what, 2 extra precious dollars?) and don't get too defensive about your task. 
  • 98%/1000 is a good threshold. That is, require turkers to have 98% acceptance and 1000 HITs completed. When we tried 95%, we got a few more rejects (though not dramatically more). When we restricted it to 5000 HITs completed, there were only about 300 qualified people who would do our task.
  • ACs (attention checks) are tough to get right. In our survey, we included three simple math questions: "what is two plus three?" etc. But then, even these are not perfect indicators of whether people are paying attention. We had 7-point likert scale answer options, so people would go try to click 5, and just miss and click 6 by mistake (or they'd be using a touchscreen or something). Also, dashing 15 minutes of work just because of one question seemed pretty cruel. I started accepting people if they got our ACs wrong, so long as they only missed one question and were just off by one. Relatedly:
  • Spell out exactly what will make you reject people. Our HIT had a list: "You will be rejected if..." This makes it much easier to deal with somewhat-angry Turkers who write to you. Many times, Turkers will get really mad if you reject them for something you didn't warn them about. I think that's a feature of mturk, not a bug.
  • Verifying they did surveys is mostly easy but not completely. You can't get their Turker number into your system. My standard approach is: you do our study, at the end we give you a number, then you enter the number into Mturk and we correlate your records with ours. About 1% of people don't understand this, or otherwise screw it up.
  • Reputation matters. If you're a crummy requester, folks can rate you on Turkopticon (TO) and talk about you on Turker Nation (TN). But if you're good, they'll rate you up on TO/TN, post you on Reddit's HITs Worth Turking For (hwtf), follow you with TurkAlert, and occasionally really get into it.
  • Engage. Get on Turker Nation and hwtf. (you can post your own HITs on hwtf, and in the requester forums on TN.) Respond to Turkers like you'd respond to people you hired to do a job, because they are people you hired to do a job.
  • Conflict is tough. You have only blunt weapons, and so do they. Because most HITs are accepted, and because the difference between being a 95% turker and a 98% turker is so big, every rejection really hurts workers. Also, they can trash you on sites like Turkopticon- not sure how much that matters, but it doesn't help. One disgruntled worker can make things difficult for you. So if someone's getting all angry at you, you're kind of incentivized to just pay them and get them off your back, before they go reviewing you all over the place.
  • Performance-based bonuses are good. We structured it as 30 cents base, plus 15 cents per Set you find. (our task was the game Set, where you try to find a bunch of sets of cards that fulfill certain criteria.) This meant that we ended up paying about $1.40 per person, but the best people would post in forums "I got $3.50 for this 15 minute HIT" or whatever. In general, most people ended up doing pretty well at the game, which I guess is a good sign that they were paying attention to it, which is good. We were worried that people would overlook our HIT because the base pay was low, but it was helpful that we could list it as a 30-cent HIT, and then say in the title "+ average $1.10 performance bonus!" (figure out that average value through pilot testing.)
  • If you let them, workers will repeat your HIT. We started off posting 40 assignments of a HIT at a time, but noticed that about 3/4 of our users each time would be repeaters. They use stuff like TurkAlert. So if you want not to have repeaters, make sure you just post one big HIT. (or use other 
Technical:
  • There's no official Python API, but boto is pretty good. Documentation can be sparse, but supplement it with the mturk API, and you can usually figure out what you need to do.
  • The main Manage Hits page is mostly garbage. The one you want for most things, especially if you use the API too, is Manage Hits Individually. Looks like they wanted to replace the MHI page with the MH page, because it's got shiny new progress bars and stuff, but the shiny bars don't update very fast. At least MHI is up to date. Also, you can see how much you bonused each worker on MHI, and you can email workers without bonusing them.
  • Except! Rejecting workers is best on the main MH page. It lets you easily republish those HITs to other people. (this is an option that you don't even get in the API. What a mess.)
  • Also: The only way to approve an Assignment that you previously rejected is via the download/upload CSV, which you get to through the main MH page. Yes, this is pretty wonky.
  • You can't change much about the HIT after you post it. But you can change the qualifications, and other minor details about the HIT. You can't change the price or the content. To change the qualifications: you have to use an API call that is sort of obscure: ChangeHitTypeOfHit. (first you have to register your new qualifications as a HitType by calling RegisterHitType.) Wah! If you need help with this, let me know, I've got a script I can send you.
  • You're debugging while you're gathering data. When someone says "your site failed and that's why my data isn't complete"... are you going to reject them? Over forty five cents? Are you that sure your site is working perfectly? Are you a fool?

1 comment:

  1. Hi--thanks for posting this and for answering the survey. There's good info here and in the WIKI link too. If you have anything else you want to share about MTurk please do keep in touch: ksheehan@uoregon.edu!

    ReplyDelete