Saturday, April 30, 2016

Thesis proposed!

One step closer to graduation, and also a step closer to making a really neat and useful thing.

In a noun phrase: neighborhood guides, built from people's public social media posts.

In a sentence: I'll build guides to city neighborhoods out of people's public social media posts, to help people traveling find places to stay and places to hang out.

In a presentation: here! (11mb pdf) If you would rather read a more boring document, you can do that too I guess, here.
Usually I think a presentation to give and a presentation to read should be two different things. The presentation should be more visual, less words. But, you can't trust a roomful of academics to listen to you, so you have to put the words on the slides too so they can read them and tune out. And the presentation pdf linked here includes my speaker notes, so it might be somewhat comprehensible.

If you don't want to read a long document or talk, here's a summary:

Tourism's changed over the years - people used to all want to relax ("sun and sand"), then some of them wanted to see sights ("cultural tourism"), and now some of them want to be more active in guiding their own experience and discovering a place themselves ("creative tourism"). Guidebooks are mostly aimed at cultural tourists ("here are the sites to see, here are the top N hotels to stay in, etc") while creative tourists want to know more about the neighborhoods.

I'm developing a model of what creative tourists want based on 24 interviews (so far). It looks like they want something like this:
Aesthetic appeal
The "Ideal Everyday" - a picture of everyday life but focused on when you're relaxed and can explore at your leisure
Authenticity - ... whatever this means to you

So based on those six dimensions, I'm going to mash together crime statistics, census data, Walkscores, Flickr photo autotags, Yelp Third Place reviews, and Tweets to show you a guide of the neighborhood.

To help you narrow down your search (there are a lot of neighborhoods out there!), I'll start off with a comparison to neighborhoods you already know. So "I live in Pittsburgh, I'm going to San Francisco, show me a neighborhood that's like Bloomfield." And then it will show you the top N most similar neighborhoods, and why they're similar.

More details in the paper and talk, but that's the idea.

Monday, March 7, 2016

Some neat things from CSCW 2016

Standard disclaimer: I saw only a slice of this conference, and probably remembered a slice of that. That said, I thought this stuff was cool:

Campus-Scale Mobile Crowd-Tasking: Deployment & Behavioral Insights by Thivya Kandappu et al
They deployed a system around their campus that would let people answer questions to help out the facilities people - is this restroom clean? is this vending machine stocked? etc. They tried out a couple different ways to group tasks. Here's what I thought was most exciting:
- well, first, that they did it at all, had 80 ppl do 800 tasks
- second, that when it came to "push" (buzz you when there's a task nearby) vs "pull" (are there any tasks here?), the "super-agents" (25% of ppl who did 80% of the work) were less efficient in the pull case, but equally efficient in the push case.

On the bias: Self-esteem biases across communication channels during romantic couple conflict by Lauren Scissors and Darren Gergle
People who have lower self-esteem are likely to use technology to talk with romantic partners during conflicts, but that tends to make them assume the worst. I mean, I suspected this, but had no real reason to think it was true - this is cool evidence.

You get who you pay for: Impact of incentives on participation bias by Gary Hsieh and Rafal Kocielnik
Lottery rewards get people who are more open-to-change. Charity rewards get people who are more self-transcendence oriented. (though usually they're less effective in getting people than fixed rewards.) Higher fixed reward: people might not care about the task as much.

"Constantly Connected" panel - Alex Pang, Gordon Bell, Melissa Mazmanian, and Mary Czerwinski talking about all the issues about being constantly-connected, for better and worse. This is a tough topic because it means a million things to a million people - and indeed, Gordon Bell seems to have been talking about something different than the other three. But Pang, Mazmanian, and Czerwinski had really interesting takes:
Pang: there's focus/concentration, then there's mind-wandering/rest. We should make space for both. Maybe our phones are eroding our capacity for focus, but maybe they're even eroding our mind-wandering.
Mazmanian: first, it's not an individual problem: "you're too stressed", "you should take a break from your phone", etc. Second, instead of "phones are good" or "phones are bad", look at the role that the smartphone is allowing you to play, and decide whether you want to be that person.
Czerwinski: we've done all this research with interruptions and context, when is it ok to interrupt something etc, but why isn't anyone using that?

Closing keynote by Mike Krieger of Instagram - just a series of straight-up things they learned building Instagram from zero to today.
- multiple identities per person -> interesting "finstas" (fake instagrams) and flexibility to express yourself in different ways
- not much follow-back pressure, really make it interest based
- require square photos because they look good and force a crop. Later relax it.
- The Future: explore the world through Instagram. That sounds fun.

Thursday, February 18, 2016

The N questions you always need to answer for any research project, all the time

Especially when it's a new project, people will always ask you a lot of questions:

- What's your Research Question? (similarly, what's your Hypothesis?)
- What's your Contribution?
- Why is it Research?
- Who will it help? (or, who cares?)
- What are you doing?
- What problem are you addressing?
- Is that a real/important problem?
- Why will this solve that problem?
- How is your work different than (any one of 1000 tangentially-related things)?
- How is it done today, and what's wrong with that?
- How will you do it?
- How will you evaluate it?

It really helps if you can answer them all, all the time. You will get instant cred and people will let you do your thing. Unfortunately, it's kind of like air bubbles in a plastic sheet: when you squeeze some of them out, then some of them reappear. Like, if you nail down "how will you do it?" then people will ask "well if you know how to do it, then why is it research?" And if you nail down "Who cares?" down to a small subset, then people will ask "Is that an important problem?"

I'm going to order them from most to least important, in my mind: (note! I am not a grant funder.)

1. What are you doing? (Please, be concise. You get one sentence. Now try it in three sentences.)
2. What problem are you addressing? -- only if it's not obvious.
3. How is your work different than (100 closely-related things)? This is an important question if someone is actually bringing up something that they think is the same thing. This is not an important question if someone is just trying to sound clever.
4. Is this problem a real problem? -- downgraded because in HCI we solve lots of non-real-problems. And it's hard to tell what's a "real problem." If you mean "is it malaria?" then no, we're not solving malaria. You can always play problem-one-upmanship, and it's usually not a fun game to play.
5. Why will your work solve this problem? -- only worth asking if it's not obvious.
6. How is it done today, and what's wrong with that? -- downgraded because usually the answer is "it's not done today."
7. How will you evaluate it? -- again, sometimes it's hard to know until you do it.

These are sometimes not worth asking but people will anyway:
... 10. What's your Research Question or Hypothesis? -- This is valid for some kinds of research, like psychology. This is less valid in the more inventor-ish types of research. People will still ask it anyway.
11. How is your work different than (900 not-really-related things)? -- Sometimes people will ask this to try to sound smart.
12. How will you do it? -- if I knew, it wouldn't be research, would it? Still, people will ask this, and it helps to be able to wave your arms.

These are often not worth asking but people will anyway:
... 100. Why is it Research? -- Ugh. Academics love to ask this. Basically, why aren't you starting a company and doing this? And "because it's goddamn hard to start a company" or "this should be done but nobody wants to pay for it" don't count.
101. What's your Contribution? -- This is a thinly veiled version of "Why Is It Research?"

But yeah, I guess if you want to be good at research, answer all of them all the time.

Monday, February 8, 2016

Welcome to Domo

I've told this to a lot of people so I've decided to store it all in one place. This guide will range from super-basic to kinda-complicated, so apologies if it's obvious in parts, and apologies if you get lost in parts. ALSO, if you're reading this and you're not new to our group and/or server, then you may have some advice for me, and I'd appreciate it!

Domo is our Amazon Web Services server. It's named after this guy:

On Domo, we have some tweets in some cities: Pittsburgh since about fall 2014, about ten other cities since sometime in 2015. Instagrams in Pittsburgh since fall 2014 too. And some flickr photos and other misc data sets.

We really only interact with Domo via terminal windows, so if that's not your forte, you may have some difficulty. To log in, use "ssh (your username on Domo)@(Domo's hostname)"
If you want to make it easier, you can open ~/.ssh/config and add an entry:

Host domo
Hostname (Domo's hostname)
User (your username on Domo)

We store the tweets in PostgreSQL. If you've used other SQLs, it's pretty similar, but not the same. Things to know about Postgres and our DB in particular:

  • psql tweet to connect to our database (which is called "tweet").
  • \d to list all relations (aka tables, kinda)
  • \d tablename to get more info about a certain table.
  • The tweets go in basically direct from the Twitter 1% public feed (using this script). They're all stored as text and integers except for some things that are "hstores" - basically key-value sets - and the "coordinates", which are stored using PostGIS as Points.
  • To query all tweets within an area: SELECT * FROM tweet_pgh WHERE ST_MakeEnvelope(-79.9, 40.44, -79.899, 40.441, 4326) ~ coordinates
Things to know about Domo:

  • Change your password right away. Do this by typing "passwd" after you SSH in.
  • Don't store things in your homedir! Our whole homedir partition only has about 8Gb. Obviously, that fills up fast. Store anything you can in /data - that has 1Tb. I might bug you sometimes to clean up your homedir if you end up using a lot of space.
When I add you to Domo, I'll tell you:

  • your username on Domo
  • your temporary password (change this as soon as possible)
  • Domo's hostname (not shown here so we get attacked as little as possible)

Tuesday, January 19, 2016

Emojis (and words) of Pittsburgh on the SUDS blog!

Hey, check it out. An article about work that Jennifer Chou and I did, on the Students for Urban Data Systems (CMU org) blog!

Saturday, December 19, 2015

First paper(s) accepted, and thesis proposal proposal proposed!

It's an exciting time here for this mid-level PhD Student.

First! I've had a paper accepted to CHI, the biggest Human-Computer Interaction conference. Paper publishing is nice. It means that people can read about what you're doing, it means four people think your work is worthwhile and well done, and it's a tangible mark of success.

It's called "Getting Users' Attention in Web Apps in Likable, Minimally Annoying Ways", the title is pretty self explanatory, and it is but one more bit of sand in the large pile of research that's been done on notification systems. Still, web sites are still not very good at this, and our paper offers one potential piece of a solution, so for that reason I am pretty happy to have done it. Baby steps.

I have Anupriya Ankolekar and especially Josh Hailpern to thank for the opportunity to intern at HP Labs, the guidance in running the study and writing the paper, and the moral support throughout the tumultuous process. Thanks, you two, for helping out a research neophyte.

Second! I've had another paper, that I helped with, accepted to CHI, and this one with so many good friends and my fiancee Tatiana, so that's cool. "Mailing Archived Emails As Postcards: Probing the Value of Virtual Collections." This one's almost four years in the making, so thanks to Tati, Dave, and Jenny for gathering and analyzing data with me; Jason, Will, and John for advice along the way, and Beka, Will, and Dave for doing the writing heavy lifting. This was pretty epic, and I'm excited we can talk about it. Finally.

(I'll post both of the papers when stuff's more officially published.)

Third! Sorry if you are feeling down about your work and this feels like rubbing it in; I have been that student very many times (and will probably continue to be in the future) and I feel ya. I don't really mean to humblebrag, or even straight-up brag, so I've resisted facebooking or twittering any of this, but I'm really pretty relieved and excited to hit this milestone, and nobody reads this blog anyway. Plus, another data point for "hang in there, keep resubmitting your drafts, you'll win this game someday" I guess?

Fourth! I'm going to graduate someday. I've got my thesis proposal proposal done (meaning, I haven't done the proposal, but I've talked with most of my committee and figured out what my thesis proposal will be). Now all I have to do is run a study or two to plan the thing, do some background research, build a giant web app, do a lot more studies, etc etc etc, but you know, it's on the right track. I've got some great profs behind me, I'm stoked about the work, and I can do most of it while I'm with Tati. So.

Saturday, May 2, 2015

Roads Greenery Buildings

What is your neighborhood made of?

We don't interact with zoning or construction in our everyday lives. We just know that some places are more pleasant than others. We don't really see the effects of dedicating half our space to parking lots and roads. We sort of know that New York is denser than suburban Ohio, but how dense is it?

More pragmatically, you may be looking for a place to live in a new city. You like your neighborhood now, so you wouldn't mind a place that "feels like" it. Obviously, midtown Manhattan won't feel like Squirrel Hill, Pittsburgh, but what neighborhood would?

Roads Greenery Buildings is an attempt to partially answer that question.

Give it an address, it will look up the place on Google Maps and Google Earth, and tell you the approximate amount of that place's nearby area that's taken up with roads, green space, and buildings. You can look up a few places to compare. Here we see that my neighborhood (the third one) has more roads and buildings than Carnegie Mellon (the first one) - which makes sense; CMU is a college campus with some big lawns. My neighborhood is also a little greener than nearby Oakland.

Here's a comparison of some neighborhoods in San Francisco, based on some coffee shops I like. Haus Coffee (the first) is in the greenest area (24th st. in the Mission is full of trees) but greenery is in short supply all around. This is to be expected; it's a big city. I was surprised to find the Ritual Hayes Valley branch (#4) to have so many roads nearby, but on reflection, there are a couple of big boulevards right there. Meanwhile, the area around Four Barrel (#2) and Saint Frank (#5) look the densest in terms of buildings.

This doesn't tell you everything, of course. The space calculations are imperfect, and there's no description of what the green space is (a highway median is less good than a nice park) or what the buildings are (a parking garage, a house, and an office skyscraper all get the same weight). But it's a start. I think of this (or, you know, the platonic ideal of this) as a peer to Walkscore: by no means the only tool that helps you understand a place, but one of many.

What's good? Depends on you, I guess, but I think this tool shows how places with more buildings tend to be more approachable and interesting, while green space often just makes things farther apart.

Try it out! (disclaimer: link worked as of May 2015; apologies if it's rotted since then.)

Hat tip to Andrew Alexander Price for the blog post that inspired this work. (More details.)