Saturday, March 28, 2015

PhD Grind part 0.8 of N: picking a school

Man, we just had a bunch of students come through for open house, and I wish I could offer them some helpful advice, but we had a brief time to actually get to know them at all.

Sometimes people ask, why did you pick CMU? In order, it was 1. students, 2. profs, 3. outcomes, 4. everything else was fine. I think that decision order is pretty ok, but maybe profs should be tied for #1, not sure.

But! In more detail:
1. Students: I just felt pretty well at home with the students here. Hard to say why. Ok, not much more detail here, other than to say: this is important! Both the existing students, and the other incoming ones, will be working with you for many years. Maybe hanging out outside school too, depending on the place. (we're pretty social at CMU). You'll probably spend more time with them than the professors. You get a half a minute to get to know them at open houses, so that's frustrating, but if you find you really like (or dislike) the students at one place in particular, take that into account.

2. Professors: best case, you find an advisor. Second best case, you find a group of potential advisors where you'd be happy to be advised by any of them. (more info on this decision) If you don't have either of those, don't go to that place. Your advisor relationship is suuuuper important. More important than a typical boss. They will not only be your boss (so they can make your life easy or hard) but they will also be some inspiration, introduce you to other people, etc. So make sure you have an advisor idea before you sign the form.

One side note: if one place has your #1 top choice advisor, and another place has your #2 choice, go with #1, obviously. But if one place has your #1 top choice, and the other place has #2, #3, and #4, and you've never worked directly with any of them before, I'd say go with the 2-3-4 place. There's a lot of randomness that you don't figure out until you work with someone, and someone who seems perfect might just for whatever reason end up not a perfect match. So in that second case, go with the #2 advisor, and you've always got 3 and 4 if 2 doesn't work out.

3. Outcomes: notice what the graduating students are doing. If they're all getting the kind of sweet jobs you want to do, that's a good sign. If half are doing startups, and you want to do a startup, great! If they're all going to industry jobs, and you want to be a prof, not so great. If 2/3 of them are dropping out and sitting at home wallowing in depression, probably not so great either.

4. Everything else: these will probably not factor into your decision, but look out in case they do. City, stipend, classes, quals, teaching, office space, etc. Usually these will all be fine: the city is ok, the stipend is about the same, you have to take ~8-10 classes, you have to pass through some hazing ritual such as quals or comm talks or extra classes, you have to teach 2 classes, your office is a desk in some depressing basement, etc. These are all fine. But this is an area where there can be red flags: if you don't have a guaranteed stipend, if you have to teach every semester, if the city is super expensive and difficult to live in (lookin at you, Stanford) or in the middle of nowhere and depressing to you, the hazing/quals makes half the students quit, etc. If there are any red flags, they should factor into your decision as much as the above 3 points. Otherwise, don't worry about these things, they're fine.

So! If you've considered all these, go with your gut then, and you'll do fine. Also, if any current prospective students end up reading this and want to talk more, find my email (or twitter), drop me a line.

Friday, March 6, 2015

Not really research, but fun: Swot Perderder

I remember seeing stuff like this and this and thinking it was the funniest thing ever. I guess Know Your Meme categorizes it as "wurds", but to me, this will always be Swot Perderder, because this one in particular always made me crack up:


So I made a twitter bot that generates these:



I got a list of ~500 foods (harder than it may seem), used the CMU pronouncing dictionary to translate word -> phonemes, then mapped each phoneme to a randomly chosen letter that either matched it closely (s -> s) or not so closely (s -> zh).

That's all for now. It'd be neat to make it interactive someday, or make it a reddit bot or something, because the world needs more of these misspelled foods.

Also, it'd be neat to see which of these are favorited/retweeted/etc more, because then we could refine the rules to make them even funnier. Yes.

Edit: code's here: https://github.com/dantasse/swot_perderder

Thursday, February 19, 2015

Pittsburgh Tweets


Heyo. Here's where (the geotagged tweeters in) Pittsburgh tweeted in 2014.

Monday, February 9, 2015

CheeseHoods

CheeseHoods: a block of Swiss cheese as dense as your neighborhood.

CheeseHoods are blocks that look like Pittsburgh neighborhoods. They have holes in them like Swiss cheese. The denser the neighborhood is, in terms of dwellings per acre, the fewer holes it has.

Why neighborhood blocks? I thought it'd be fun to have a jigsaw puzzle of Pittsburgh, really. Plus, just as playing with world maps helps kids learn country names, playing with your city might help you learn more about it.

Why the holes? Density is important. Jane Jacobs wrote about it as one of the four most important characteristics in creating vibrant neighborhoods. Dwelling density, in particular, is quite important; human density can just indicate overcrowding, but dwelling density indicates vitality. I was interested to explore what density looks and feels like in 3D. Putting holes through a neighborhood seemed like an easy way to do so. Plus, it gives the least dense neighborhoods a rather icky pock-marked feel, while thriving denser ones are pleasantly solid, so this gives a really visceral feel to "density is good".

Pictures bloomfield just Bloomfield, above. Below, all of Pittsburgh. pittsburgh central oakland Central Oakland is a shining example of density, but you can see right through Greenfield (below). greenfield Code's on github. Not embedding it here because there's a lot. I used some tools from another repo (get_dwelling_densities.py) plus some census data to calculate dwelling density for each neighborhood, then computed a json file of all the borders of each neighborhood and random hole locations (nghd_to_shape.py), then finally slurped those into a last script that created objects in Rhino (rhino_script.py). Play with 'em:

Bloomfield by dantasse on Sketchfab

Pittsburgh by dantasse on Sketchfab

Sunday, October 5, 2014

PhD grind part X of N: Results from a two week time diary

I'm TAing with Jen Mankoff now, who's somewhat of an enthusiast for time management, so I'm trying to learn a few things from her. One thing she suggested was a "time diary" - just write down everything you do, so you can find out what's actually taking up your time. I did it for two weeks, and here's what I got per week (average of the two):

Research: 19 hrs
TAing: 12 hrs
Class: 4.8 hrs
Email/logistics: 7.6 hrs
Socializing: 6.7 hrs
Waste (checking the internet, etc): 4.8 hrs
Other (walking between buildings, lunches that didn't fit in other categories, fighting with the internet when it went down, WC breaks, etc): 2.5 hrs
Total: 57.2 hrs

Things I learned from this:
  • I'm not doing so bad, part 1. I'm putting in a lot of time into research. (This was on the week before and after the CHI deadline, so "research" is a little higher than normal. So if you're reading this and you think "oh, I'm such a lazybones, I only work 50 hours a week", two things: 1. these are perhaps abnormally high weeks for me, and 2. count it out yourself, you may find you work more than you think.) But I'm at least putting a lot of time into "research", which is good.
  • I'm not doing so bad, part 2. There's not a ton I could cut out. I guess I could cut the "waste" time down, but I don't think I'll ever hit 100% efficiency anyway, so 90% seems not so bad. Maybe I could cut down email/logistics, but there'll always be a need for some of it. I guess I could cut down socializing, but that... seems wrong. This "socializing" is the all-important "networking" if you want to be super utilitarian about it - this all may further the all-important career. Attending department lunches or lab group meetings, meeting visiting profs, hanging out with PhD friends and chatting, whatever. And no, I don't actually think about it as "networking" while I'm doing it.
  • I thought class was a big time suck. Maybe not. (though, again, CHI weeks, I really pushed class out of the way. I spent a ton more time on class the week after I time-diaried.)
  • Grad school is a great environment to do research on anything you want... after you finish your required stuff. And there's ~35 hours of required stuff a week. ("research" counts all the time I spent on research, including filling out IRBs etc, so even some of that wonderful 19hr chunk is not so wonderful.) Which means, if you're a 40 hour worker, you'll have 5 hours to do RESEARCH, and you'll be frustrated. If you're a 70 hour worker, you'll spend half your time doing RESEARCH, and it'll be great. ... Be warned.
Edit: related is this post where this guy Togelius comes to the exact same conclusion as me, but frames it as "increasing marginal utility." I guess it's an optimistic way to look at the same thing.

Or... I could also get a job writing dumb software for 35 hours a week and then do research in my spare time, and be equally effective. (and paid a lot more, and I get to crank out some dumb software in the meantime.) Well, except I wouldn't be equally effective; in those 35 hours, I get these 3 benefits:
1. I learn something from class, and something from teaching, and I guess something from filling out IRBs and stuff
2. being in the university gives me access to papers, conferences, research funds to run a study, etc.
3. I make actual friends while "networking".
1 is true, but I'd learn something from writing dumb software too. 2 is true but unfortunate; I mean, the system shouldn't exclude people who just don't happen to have a university affiliation. 3 is true, ok.
It still feels like a lot of waste and frustration for those three benefits.

Sunday, August 31, 2014

PhD Grind Part 1 of N: advisor picking

Philip Guo posted a great guide called The PhD Grind, a memoir of his computer science grad school experience. Mostly just "here's what I did", not so much advice, but there's a bit of both. It was really helpful for me, as one more data point of what grad school can be like. I'd love it if I could publish a similar thing, and further help people who are going into this path.

(I'd have to preface it with a bunch of disclaimers, and the biggest one would be this: grad schools vary A LOT. Everything I say will be very relevant if you go to get a PhD at CMU in the HCII. If you're going to other schools in HCI or related fields, this will be about 90% relevant and accurate. If you're going to other schools in CS (non-HCI), this will be maybe 60% relevant. If you're going to grad school in another field, maybe 10%. Seriously, I have no idea what grad school in other fields are like, besides that a lot of them are broken and terrible and you should not go to them. Grad school in CS, particularly HCI, is one of the least broken types of grad school.)

Picking Advisors

Anyway, one thing I realized I've gained a lot of insight into is picking advisors. At CMU HCII, we got the first few weeks to meet with different advisors before we had to decide. (I think, if you're going to PhD school, especially in HCI/CS, you should have at least a pretty good idea of who your advisor will be before you accept an offer, considering how important it is. But anyway, CMU let us choose.)

When I was picking advisors, I didn't really know the questions to ask. I mostly asked, "I don't know, are they good? What are their pluses and minuses?" Most people would say yes, they're good, tell me a couple obvious pros/cons, and then say something about whether they're "hands on" or "hands off." This is mildly helpful, but imprecise; it lumps together a lot of different factors. It's also not very distinguishing, because most professors (at least here) are "mostly hands off". You'll also get some platitudes about how they're very supportive, and they care about their students, and etc. These are true too, but also not very distinguishing, because all our professors here are pretty great.

You want to ask questions that will distinguish between profs who are right for you and who are not! And you want to know more dimensions than good/bad, hands-on/hands-off. Here are things you should ask. Try to ask these in ways that don't imply a value judgment, because if there's a "good" and a "bad" option, people will almost always tell you the good one, because most advisor/student relationships are good. (if they're bad, they usually don't last very long.)

Ask the prof

  • What grants do you have, or what project will I work on? (unless you have a fellowship.) Whether they tell you immediately or not, you will have to be officially working on one main project, and they will have to fund that somehow. If you can avoid it, don't go in with a vague area of focus (like "ubiquitous computing") and plan to figure out the project later.
  • How big is your lab? By asking this, what you really want to figure out is: how much time do they have for you? And it can be fine if they have only a little time - depends on your style. Some people like to do their work and be left alone; if the advisor only meets with you once a week and rarely responds to emails, that can be enough. But some people like to work more collaboratively and meet/discuss/email more often. This is one segment of the hands-on/hands-off distinction. And I'd say having like 3-5 PhD students, plus a handful of Masters/undergrads/postdocs, so like 8-10 people total, seems like a medium sized lab.
  • How much do they like/dislike collaboration with other students? I get the sense that most profs like collaboration, but maybe they'll give you a clue: some really like it, and others are kind of "meh, it's okay" about it.
  • Do you anticipate any big life changes in the next 6ish years? They'll probably have a sabbatical year sometime in there. If they're pre-tenure, that review may come up halfway through your career; unlikely to be a problem, but if you're 3 years in and your prof doesn't get tenure (which means they get more or less fired), that might be tricky. Are they considering moving schools? (they probably won't tell you if they are, but worth a shot) Are they considering retiring?
  • How intimidating are you? Okay, don't ask this, but get a sense of it. Some profs (usually older ones) are more intimidating than others. You should probably feel a healthy respect, but not fear; that will hamper your work and life. Take note of this feeling, because it's not likely to change a lot.
  • Industry or academia? I mean, if you already know which path you want, ask the prof if they will be good at helping you get to that path. They will be pretty honest about this.

Ask the prof's current PhD students

  • How much will the prof shield you from funding? Profs all try to do that, but sometimes it works out better than others. Good way to ask it: "Have you ever had to work on a project you weren't super into, because of funding? Tell me how that went." If none of the students have, that's pretty good/lucky; if they all have, take note of that.
  • How often does the prof ask you for work-related things? (this is part of the hands-on/hands-off thing too.) Some profs bug you every day or two for something, big or small. Some are fine if you don't give them anything for a month. The micro-managey prof can be good if that motivates you to work better.
  • How much will they ask you to do other stuff besides your research? Group meetings, mentoring students, maintaining servers, organizing stuff, meeting with funders, etc.
  • How much do they take your feelings into account? (you might be able to tell this from interacting with the professor too) Some profs have a very academic, businesslike "let's not sugarcoat it, let's just argue to find the truth" kind of demeanor. Some profs are more, well, friendly. Again, not a good/bad; some students like to just talk shop and not be all touchy feely and can deal with blunt criticism. (it's okay if you don't like blunt criticism; make sure you find a more friendly professor then.) This goes along with, and is less important than, the "intimidating" thing above; the reason to ask the students too is just to see if they have mood shifts that makes them affable most days but terrifying on days when there's bad news or something. 
  • How available is the prof, if you need help? Will they answer an email within a day? A couple days? Will they meet with you more than your once a week meeting if you need it? Can they help you to find other help if you need it?
  • How much does the prof like/dislike collaboration with other students? Do they push collaboration? (this can be good or bad depending on how you like to work) Do they discourage collaboration? (it happens) Do they really try hard to build up a group dynamic in their lab, or is it all a bunch of people working more or less individually? (either can be good.)

Not sure who to ask, but it's good to figure it out

  • How ambitious are they? You'll probably have to determine for yourself if you're on the "ambitious" side (want a professor job at a top tier university, want to be in the news, want to be in TR35, etc) or the "balanced life" side (want an industry or a prof-at-a-not-so-big-name-school job, have other interests you want to continue pursuing outside school, like the 9-5ish life, have other constraints like you have to finish in N years due to some visa thing, etc). Your profs are all going to be super successful, but if they're in the news all the time or getting big fancy awards or on track to do so, they might be on the "ambitious" side. Also, if they're pre-tenure, they're likely to be more ambitious. Also: it's okay to be "balanced life." I am.

Friday, August 1, 2014

Some things I learned from running a big Mechanical Turk study

I'm not a big crowd researcher, but Mechanical Turk can be a great platform. The key words are "can be". It sounds great: pay a thousand people a dollar to do your survey, and for $1k, you have a huge amount of data overnight! But it's not really that simple. Here are some things I've learned. (there are a lot.)

Human/study design:
  • Read this. These folks have worked and researched on Turk a lot longer than I have. http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters
  • Pay people enough. This is maybe the #1 thing I hear on Turk forums, and the #1 piece of advice I can give to make the whole thing a good experience. These are people doing work. It's not just screwing around for fun. HITs are hard. Turking is hard. See also: Jeff Bigham's experiences Turking for a day.
  • $8/hour is a starting point. You're not getting by on $2/hour like you may have thought you could back in the early days. Or, if you're getting by on $2/hour, you are paying people sweatshop wages (and by the way, probably getting sweatshop level work). Why $8? People want to make about US minimum wage. It's a nice round benchmark, at least. It's still way cheaper than you could get it done any other way.
  • Pay more if you can. The US minimum wage is (adjusted for inflation) historically low. $8/hour is in no way a living wage. (also, it assumes that Turkers spend no time searching for tasks or any other overhead. Seattle recently voted to raise their wage to $15/hour. Maybe you can too. Related: Dynamo's Fair Payment page.
  • Turkers are good people. At least, the ones that you get for $8/hr are. They are not, for the most part, trying to scam you. Maybe 2% are. Accept that as a cost of doing business (that's what, 2 extra precious dollars?) and don't get too defensive about your task. 
  • 98%/1000 is a good threshold. That is, require turkers to have 98% acceptance and 1000 HITs completed. When we tried 95%, we got a few more rejects (though not dramatically more). When we restricted it to 5000 HITs completed, there were only about 300 qualified people who would do our task.
  • ACs (attention checks) are tough to get right. In our survey, we included three simple math questions: "what is two plus three?" etc. But then, even these are not perfect indicators of whether people are paying attention. We had 7-point likert scale answer options, so people would go try to click 5, and just miss and click 6 by mistake (or they'd be using a touchscreen or something). Also, dashing 15 minutes of work just because of one question seemed pretty cruel. I started accepting people if they got our ACs wrong, so long as they only missed one question and were just off by one. Relatedly:
  • Spell out exactly what will make you reject people. Our HIT had a list: "You will be rejected if..." This makes it much easier to deal with somewhat-angry Turkers who write to you. Many times, Turkers will get really mad if you reject them for something you didn't warn them about. I think that's a feature of mturk, not a bug.
  • Verifying they did surveys is mostly easy but not completely. You can't get their Turker number into your system. My standard approach is: you do our study, at the end we give you a number, then you enter the number into Mturk and we correlate your records with ours. About 1% of people don't understand this, or otherwise screw it up.
  • Reputation matters. If you're a crummy requester, folks can rate you on Turkopticon (TO) and talk about you on Turker Nation (TN). But if you're good, they'll rate you up on TO/TN, post you on Reddit's HITs Worth Turking For (hwtf), follow you with TurkAlert, and occasionally really get into it.
  • Engage. Get on Turker Nation and hwtf. (you can post your own HITs on hwtf, and in the requester forums on TN.) Respond to Turkers like you'd respond to people you hired to do a job, because they are people you hired to do a job.
  • Conflict is tough. You have only blunt weapons, and so do they. Because most HITs are accepted, and because the difference between being a 95% turker and a 98% turker is so big, every rejection really hurts workers. Also, they can trash you on sites like Turkopticon- not sure how much that matters, but it doesn't help. One disgruntled worker can make things difficult for you. So if someone's getting all angry at you, you're kind of incentivized to just pay them and get them off your back, before they go reviewing you all over the place.
  • Performance-based bonuses are good. We structured it as 30 cents base, plus 15 cents per Set you find. (our task was the game Set, where you try to find a bunch of sets of cards that fulfill certain criteria.) This meant that we ended up paying about $1.40 per person, but the best people would post in forums "I got $3.50 for this 15 minute HIT" or whatever. In general, most people ended up doing pretty well at the game, which I guess is a good sign that they were paying attention to it, which is good. We were worried that people would overlook our HIT because the base pay was low, but it was helpful that we could list it as a 30-cent HIT, and then say in the title "+ average $1.10 performance bonus!" (figure out that average value through pilot testing.)
  • If you let them, workers will repeat your HIT. We started off posting 40 assignments of a HIT at a time, but noticed that about 3/4 of our users each time would be repeaters. They use stuff like TurkAlert. So if you want not to have repeaters, make sure you just post one big HIT. (or use other 
Technical:
  • There's no official Python API, but boto is pretty good. Documentation can be sparse, but supplement it with the mturk API, and you can usually figure out what you need to do.
  • The main Manage Hits page is mostly garbage. The one you want for most things, especially if you use the API too, is Manage Hits Individually. Looks like they wanted to replace the MHI page with the MH page, because it's got shiny new progress bars and stuff, but the shiny bars don't update very fast. At least MHI is up to date. Also, you can see how much you bonused each worker on MHI, and you can email workers without bonusing them.
  • Except! Rejecting workers is best on the main MH page. It lets you easily republish those HITs to other people. (this is an option that you don't even get in the API. What a mess.)
  • Also: The only way to approve an Assignment that you previously rejected is via the download/upload CSV, which you get to through the main MH page. Yes, this is pretty wonky.
  • You can't change much about the HIT after you post it. But you can change the qualifications, and other minor details about the HIT. You can't change the price or the content. To change the qualifications: you have to use an API call that is sort of obscure: ChangeHitTypeOfHit. (first you have to register your new qualifications as a HitType by calling RegisterHitType.) Wah! If you need help with this, let me know, I've got a script I can send you.
  • You're debugging while you're gathering data. When someone says "your site failed and that's why my data isn't complete"... are you going to reject them? Over forty five cents? Are you that sure your site is working perfectly? Are you a fool?