CHI 2014 highlights, 3rd and final

And, finally, a wrap-up of my favorites [0] from CHI paper talks I attended, following up on Part 1 and Part 2. We probably don’t do enough to call attention to other good things and people in our community, so this is a modest attempt at that [1].

I’ll start with a quick nod to former co-conspirator Xuan Zhao and her paper with Siân Lindley about Curation through use: understanding the personal value of social media. At a high level, the talk put the paper at the intersection of the Many Faces of Facebook paper and some of Will Odom‘s stuff on digital possessions, but with a focus on the suitability of social media for personal archiving. I liked the “digital keepsake” with social media content exercise as a way to prime the pump, and some of the suggestions around identifying meaningfulness through use (a la Edit Wear and Read Wear [2]) felt fun. I also like the design implication to use social media content to help people build narratives for self and others [3]: instead of “see friendship”, you might “show friendship”.

Next up was a pair of papers that approached asking for help from friends and neighbors from very different value positions.

The first was Estimating the social costs of friendsourcing by Jeff Rzeszortarski and Merrie Morris. They note that asking for help can impose a burden on receivers and perhaps, via privacy concerns, on askers, then study how people balance those costs with the potential gains in social capital from asking and answering questions. The experimental design was plausible and the work related to parts of Munmun De Choudhury‘s presentation around seeking stigmatized health information online (with Merrie and Ryen White).

The second was my favorite talk at CHI, by Victoria Bellotti [4] on behalf of the authors of Towards community-centered support for peer-to-peer service exchange: rethinking the timebanking metaphor. She took a critical look at the idea that favors might be converted into time-based currencies to trade for later favors, suggesting that the metaphor misses the social meaning associated with doing favors [5] while highlighting largely-negative constructs such as debt. She then proposed a number of design vignettes for emphasizing social values of exchange in the most energetic, fun way I’ve seen in a CHI talk in a couple of years [6].

I found the contrast fascinating, and both papers were thoughtful and worked out. They were also in different sessions, so hopefully bringing them together here will encourage people in this space to read them on a long, lazy summer afternoon and think about how they come together.

I also enjoyed the talk about Alexandra Eveleigh and others’ paper about Designing for Dabblers and Deterring Drop-outs in Citizen Science [7]. The high-level story is that since participation in citizen science (and other peer production systems) follows a power law, much activity is in the tail, the “dabblers”. Thus, you might design to target them, rather than power users. To do this, they went out and asked both high and low contributors about their motivations for participating and came up with a fine set of design ideas that target infrequent contributors. I resonate with this goal — SuggestBot [8] was originally designed to help Wikipedia newbies do more useful work more easily. It was hard to actually get it in front of new editors (who often never return to log in or edit, and if they do, may not have known enough about Wikipedia software to see SuggestBot’s posts). The paper suggests that requests in the moment — to “tempt them to complete ‘just another page'” — may be more effective as a general strategy for engaging the infrequent [9].

Finally, Amy Voida‘s talk about Shared values/conflicting logics: working around e-government systems, a paper she did with several Irvine colleagues, gave me a couple of thoughts. First, the talk made clear that even when high-level values are shared between managers, designers, and workers around systems, the interpretations and instantiations of those values by the parties (“logics”) can lead to problems in practice. Not a totally new story [10] but it highlights the utility of design Processes [11] where communication might reduce the chance of this value drift. It also called out that designing for end user independence is not always appropriate. Even a perfectly capable user of the electronic application system might not be able to effectively get help from the government aid System. Instead of designing to reduce applicants’ reliance on workers, you could imagine a design that helps applicants and workers cooperate to complete applications, providing support for situations when applicants get stuck and really do need help from people who know how the System works.

That is pretty much it for the story of favorites, so let’s be done. But think about doing trip reports yourself and sharing them with the world. It’s good to recognize interesting work, useful for learning more about the community, smart for connecting to the people and work that you call out, and hopefully a service to other people who benefit from your experiences.

#30#

[0] This is a personal view based on my tastes and the pragmatics of session attendance; I’m sure there were lots of other cool things, while other people will have different papers that take them to their own happy places. Another reason for you to do your own trip reports.

[1] Which has the nice side effect of me learning about the community as I put it together.

[2] Still one of the most inspiring papers I’ve ever read.

[3] It’s somewhere between scrapbooking and a “social media mix tape”.

[4] Who, at the time I searched for her on Google Scholar, had exactly 9,000 citations. Soon she will be “over 9000“, as it were.

[5] As a borderline Aspergers kind of guy, when people come to me with problems, I also tend to focus on the problem, rather than the person and their needs around the problem. As you can imagine, this goes over great with my fiancee when she’s seeking support rather than solutions.

[6] Sadly, the paper didn’t have as many vignettes, very few visual. I wonder if there had been napkin sketch interfaces of the kind that were in the talk if it would have triggered “and so does it work?” reactions that system papers often get at CHI.

[7] It’s very cool that they tapped into this “dark matter” of infrequent contributors; we often only study the large, the successful, the vocal, the frequent.

[8] Google search results say “You’ve visited this page many times”. Indeed I have.

[9] Related to this, one of our goals at the CeRI project is to give people feedback about the comments they submit to public civic discussions while they write them, in order to improve quality and engagement.

[10] It reminded me of the idea of “work to rule” as a deliberate way to cause conflict.

[11] In the same way that I am about to use “system” to mean a technological artifact and “System” to refer to a set of concerns, people, and interactions around that artifact, here I am thinking something a little higher-level than the process of just designing the artifact. Maybe participatory design is more like it.

How I review papers

Pernille Bjørn is spearheading a mentoring program for new reviewers as part of CSCW 2015, which I think is awesome. I am mentoring a couple of students, and I figured as long as I was talking to them about how I approach reviews I would share it with others as well [0].

The first question is how close to the deadline to do the review [1]. A lot of people do them near the deadline, partly because academics are somewhat deadline-driven. Also, in processes where there is some kind of discussion among reviewers or a response/rebuttal/revision from the authors, the less time that’s passed between your review and the subsequent action, the more context you will have.

However, I tend to do them as early as practicable given my schedule. I don’t like having outstanding tasks, and although PC members know that many reviews are last minute, it is still nervous-making [2]. I also don’t mind taking a second look at the paper weeks or even months later, in case my take on the paper has changed in the context of new things I’ve learned. And, for folks who are getting review assistance from advisors or mentors, getting the reviews done earlier is better so those people have time to give feedback [3].

I still print papers and make margin written notes whenever I can, because I find I give a little more attention in printed versus screen form [4]. If I’m reading on a screen I’ll take notes in a text editor and save them. I usually read in some comfy place like a coffeeshop (pick your own “this is nice” place: your porch, the beach, a park, whatever) so that I start with good vibes about the paper and also reward myself a little bit for doing reviews [5]. Try to do your reading and reviewing when you’re in a neutral or better mood; it’s not so fair to the authors if you’re trying to just squeeze it in, or you’re miffed about something else.

What I typically do these days is read the paper and take lots of notes on it, wherever I see something that’s smart or interesting or confusing or questionable. Cool ideas, confusing definitions, (un)clear explanations of methods, strong justifications for both design and experimental choices, notation that’s useful or not, good-bad-missing related work, figures that are readable or not, helpful or not, clever turns of phrase, typos, strong and weak arguments, etc. Anything I notice, I note.

The notes are helpful for several reasons. First, actively writing notes helps me engage with the paper more deeply [6]. Second, those notes will be handy later on, when papers are discussed and authors sumbit responses, rebuttals, or revisions. Third, they can themselves be of benefit to authors (see below).

Fourth, taking notes allows me to let it sit for a couple of days before writing the review. Not too long, or else even with the notes I’ll start forgetting some of what was going on [7]. But taking a day or two lets initial reactions and impressions fade away — sometimes you have an immediate visceral reaction either good or bad, and that’s not so fair to the authors either.

Letting it sit also lets me start to sort out what the main contributions and problems are. I make _lots_ of notes and a review that’s just a brain dump of them is not very helpful for the program committee or other reviewers. So, after a couple of days, I look back over my notes and the paper, and write the review. People have a lot of different styles; my own style usually looks something like this [8]:

—-

Summary:

2 sentences or so about the key points I’m thinking about when I’m making my recommendation. This helps the program committee, other reviewers, and authors get a feel for where things are going right away.

Main review:

1 paragraph description of paper’s goals and intended contributions. Here, I’m summarizing and not reviewing, so that other reviewers and authors feel comfortable that I’ve gotten the main points [9]. Sometimes you really will just not get it, and in those cases your review should be weighed appropriately.

1-2 paragraphs laying out the good things. This is important [10]. In a paper that’s rough, it’s still useful to talk about what’s done well: authors can use that info to know where they’re on the right track, plus it is good for morale to not just get a steady dose of criticism. In a medium or good paper, it’s important to say just what the good things are so that PC members can talk about them at the meeting and weigh them in decision-making. Sometimes you see reviews that have a high score but mostly list problems; these are confusing to both PC members and authors.

1 short paragraph listing out important problems. Smaller problems go in the “Other notes from read” section below; the ones that weigh most heavily in my evaluation are the ones that go here.

Then, one paragraph for each problem to talk about it: what and where the issue is, why I think it’s a problem. If I have suggestions on how to address it, I’ll also give those [12]. I try to be pretty sensitive about how I critique; I refer to “the paper” rather than “the authors”, and I look for things that feel mean-spirited or could be taken the wrong way.

A concluding paragraph that expands on the summary: how I weighed the good and bad and what my recommendation is for the program committee. Sometimes I’ll suggest other venues that it might fit and/or audiences I think would appreciate it, if I don’t think it’ll get in [13]. I usually wish people luck going forward, and be as positive as I can for both good and less good papers.

Other thoughts:

Here I go through my notes, page by page, and list anything that I think the authors would benefit from knowing about how a reader processed their paper. I don’t transcribe every note but I do a lot of them; I went to the effort and so I’d rather squeeze all the benefit out of it that I can.

Scores:

Different venues ask for different kinds of ratings; for CSCW, there are multiple scales. The expertise scale runs from 4 (expert) to 1 (no knowledge). I try to be honest about expertise; if I am strong with both domain and methods, I’m “expert”; if I’m okay-to-strong with both, I’m “knowledgeable”; I try not to review papers where I feel weak in either domain or methods, but I will put a “passing knowledge” if I have to, and I try hard to turn down reviews where I’d have to say “no knowledge” unless the editor/PC member/program officer is explicitly asking me to review as an outsider.

The evaluation scales change a bit from year to year. This year, the first round scale is a five-pointer about acceptability to move on to the revise and resubmit [14]: definitely, probably, maybe, probably not, not. The way I would think about it is: given that authors will have 3 weeks or so to revise the paper and respond to review comments, will that revision have a good chance of getting an “accept” rating from me in a month? And, I’d go from there.

——-

Again, not everyone writes reviews this way, but I find that it works pretty well for me and for the most part these kinds of reviews appear to be helpful to PC members and authors based on the feedback I’ve gotten. Hopefully it’s useful to you and I (and other new reviewers) would be happy to hear your own stories and opinions about the process.

Just for fun, below the footnotes are the notes I took on three papers for class last semester. These are on published final versions of papers, so there are fewer negative things than would probably show in an average review. Further, I was noting for class discussions, not reviews, so the level of detail is lower than I’d do if I were reviewing (this is more what would show up in the “other thoughts” section). I don’t want to share actual reviews of actual papers in a review state since that feels a little less clean, but hopefully these will give a good taste.

#30#

[0] Note that many other people have also wrote and thought about review in general. Jennifer Raff has a nice set of thoughts and links.

[1] Well, the first question is whether to do the review at all. Will you have time (I guesstimate 4 hrs/review on average for all the bits)? If no, say no. It’s okay. Are you comfortable reviewing this paper in terms of topic, methods, expertise? If no, say no.

[2] I was papers chair for WikiSym 2012 and although almost everything came in on time, the emphasis was on “on”.

[3] Doing your read early will also help you think about whether this is really a paper you know enough about the related work to review; when I was a student, I was pretty scared to review stuff outside my wheelhouse, and rightly so.

[4] Yes, I’m old. Plus, there’s some evidence that handwritten notes are better than typed.

[5] There’s a fair bit of literature about the value of positive affect. For example, Environmentally Induced Positive Affect: Its Impact on Self‐Efficacy, Task Performance, Negotiation, and Conflict.

[6] See the second half of [4].

[7] See the first half of [4].

[8] Yes, I realize this means that some people will learn that there’s a higher-than-normal chance that a given review is from me despite the shield of anonymity. I’m fine with that.

[9] Save things like “the authors wanted, but failed, to show X” that for the critiquey bits (and probably, say it nicer than that even there).

[10] Especially in CS/HCI, we’re known to “eat our own” in reviewing contexts [11]; program officers at NSF have told me that the average panel review in CISE is about a full grade lower than the average in other places like physics. My physicist friends would say that’s because they’re smarter, but…

[11] For instance, at CHI 2012, I was a PC member on a subcommittee. 800 reviews, total. 8 reviews gave a score of 5. That is, only 1 percent of reviewers would strongly argue that _any_ paper they read should be in the conference.

[12] Done heavy-handedly, this could come off as “I wish you’d written a different paper on a topic I like more or using a method I like more”. So I try to give suggestions that are in the context of the paper’s own goals and methods, unless I have strong reasons to believe the goals and methods are broken.

[13] There’s a version of this that’s “this isn’t really an [insert conference X] paper” that’s sometimes used to recommend rejecting a paper. I tend to be broader rather than narrower in what I’m willing to accept, but there are cases where the right audience won’t see the paper if it’s published in conference X. In those cases it’s not clear whether accepting the paper is actually good for the authors.

[14] I love revise and resubmit because it gives papers in the “flawed but interesting” category a chance to fix themselves; in a process without an R&R these are pretty hard to deal with.

Sharma, A., & Cosley, D. (2013, May). Do social explanations work?: studying and modeling the effects of social explanations in recommender systems. In Proceedings of the 22nd international conference on World Wide Web (pp. 1133-1144). International World Wide Web Conferences Steering Committee.
http://www.cs.cornell.edu/~danco/research/papers/sharma-explanations-www2013.pdf

I don’t know that we ever really pressed on the general framework, unfortunately.

It would have been nice to give explicit examples of social proof and interpersonal influence right up front; the “friends liked a restaurant” is somewhere in between.

p. 2

This whole discussion of informative, likelihood, and consumption, makes assumptions about the goals being served; in particular, it’s pretty user-focused. A retailer, especially for one-off customers (as in a tourism context), might be happy enough to make one sale and move on.

Should probably have made the explicit parallels between likelihood/consumption and the Bilgic and Mooney promotion and satisfaction.

A reasonable job of setting up the question of measuring persuasiveness from the general work (though I wish we’d explicitly compared that to Bilgic and Mooney’s setup). Also unclear that laying out all the dimensions from Tintarev really helped the argument here.

Models based on _which_ theories?

p. 3

Okay, I like the attempt to generalize across different explanation structures/info sources and to connect them to theories related to influence and decision-making.

Wish it had said “and so systems might show friends with similar tastes as well as with high tie strength” as two separate categories (though, in the CHI 06/HP tech report study, ‘friends’ beat ‘similar’ from what I remember).

Okay, mentioning that there might be different goals here.

“Reduce”, not “minimize”. You could imagine a version where you chose completely random artists and lied about which friends liked them… though that has other side effects as an experimental design (suppose, for instance, you chose an artist that a friend actually hated).

p. 4

Yeah, they kind of goofed by missing “similar friend”.

_Very_ loosely inspired. The Gilbert and Karahalios paper is fun.

Seeing all those little empty bins for ‘5’ ratings that start showing up in later figures was a little sad — I wish we’d have caught that people would want to move the slider, and done something else.

We never actually use the surety ratings, I think.

Overall this felt like a pretty clean, competent description of what happened. I wish we’d had a better strategy for collecting more data from the good friend conditions, but…

The idea of identifying with the source of the explanation was interesting to see (and ties back in some ways to Herlocker; one of the most liked explanations was a generic “MovieLens accurately predicts for you %N percent of the time” — in some ways, getting them to identify with the system itself.

p. 5

We kind of got away with not explaining how we did the coding here… probably an artifact of submitting to WWW where the HCI track is relatively new and there aren’t as many qualitative/social science researchers in the reviewing pool compared to CHI.

It’s a little inconsistent that we say that a person might be differently influenced by different explanations, but then go on to cluster people across all explanation types.

p. 6

Should have reminded in the caption, something like “the anomalous 5 (where the slider started)”

Is 5 really a “neutral” rating on the scale we used? Did we have explicit labels for ratings?

I keep seeing a typo every page or so, and it makes me sad. “continous”

Constraining parameters in theoretically meaningful ways is a good thing to do. For instance, if a parameter shouldn’t change between conditions, the models should probably be constrained so it can’t change (it’s kind of cheating to let the models fit better by changing those kinds of params).

p. 7

We talk about parameters for “the user”, but then go on to study these in aggregate. Probably “okay” but a little sloppy.

We really should have removed 5 entirely and scaled down ratings above 5. It probably wouldn’t change things drastically, but it would be mathematically cleaner as well as closer to average behavior.

So, for instance, maybe we should have constrained the discernment parameter to be the same across all models.

Not sure I believe the bit about the receptiveness and variability scores together.

p. 8

There’s an alternate explanation for the clustering, which is that some people are just “ratings tightwads” who are uninterested in giving high ratings to something that they haven’t seen.

I’m only lukewarm about the idea of personalizing explanation type, mostly because I think it’ll take scads of data, more than most systems will get about most users.
The point that likelihood and consumption are different I do like (and that we ack Bilgic and Mooney in finding this as well); and I like the idea of trying to model them separately to support different goals even better (though that too has the “you need data” problem) — we come back to this in the discussion pretty effectively (I think).

p. 9

The discussion starts with a very pithy but useful high-level recap of the findings, which is usually a good thing; you’ve been going through details for a while and so it’s good to zoom back up to the bigger picture.

The flow isn’t quite right; the first and third section headers in the discussion are actually quite similar and so probably would be better moved together.

p. 10

Jamming all the stuff about privacy into the “acceptability of social explanation” part is up and down for me. It’s better than the gratuitous nod to privacy that a lot of papers have, but it’s not as good as having it woven throughout the discussion to give context (and, it’s not connected to theories around impression management, identity work, etc., in a way it probably should be). Some parallels to this year’s 6010 class, where we did a full week on implications of applying computation to personal data last week (and talked about it sometimes as we went along).

I really like that we clearly lay out limitations.

=====

Walter S. Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW ’14). ACM, New York, NY, USA, 248-256. DOI=10.1145/2531602.2531733 http://doi.acm.org/10.1145/2531602.2531733

Unclear that you’d want Turkers doing surveillance camera monitoring, but okay.

I like that they test it in multiple contexts.

The intro kind of begs the question here, that the problem is labeling quickly.

The idea of algorithms that use confidence to make decisions (e.g., about classification, recommendation, when to get extra info) is a good general idea, assuming your algos generate reasonable confidence estimates. There was an AI paper a while go about a crossword puzzle solving system that had a bunch of independent learners who reported confidence, and then the system that combined them used those weights and started to bias them once it saw when learners would over- or under-estimate them. Proverb: The Probabilistic Cruciaverbalist. It was a fun paper.

Okay, some concrete privacy steps, which is good.

I’m less convinced by the categorical argument that fully automated activity recognition systems are “more private” than semi-automated ones. Scaling up surveillance and having labels attached to you without human judgment are potential risks on the automated side.

p. 2
Blurring faces and silhouettes is better than nothing, but there’s a lot of “side leakage” similar to the “what about other people in your pics” kind that Jean pointed out last week: details of the room, stuff lying about, etc., might all give away clues.

I hope this paper is careful about the kinds of activities it claims the system works for, and the accuracy level. I’m happy if they talk about a set of known activities in a restricted domain based on what I’ve seen so far, but the intro is not being very careful about these things.

I usually like clear contribution statements but it feels redundant with the earlier discussion this time.

p. 3

Overall a fairly readable story of the related work and the paper’s doing a better-than-CHI-average job of talking about how it fits in.

p. 4

It’s less clear to me what a system might do with a completely novel activity label — I guess forward it, along with video, along to someone who’s in charge of the care home, etc. (?)

p. 5

I wonder if a version tuned to recognize group activities that didn’t carve the streams up into individuals might be useful/interesting.

One thing that this paper exemplifies is the idea of computers and humans working together to solve tasks in a way that neither can do alone. This isn’t novel to the paper — David McDonald led out on an NSF grant program called SoCS (Social-Computational Systems) where that was the general goal — but this is a reasonable example of that kind of system.

Oh, okay, I was wondering just how the on-demand force was recruited, and apparently it’s by paying a little bit and giving a little fun while you wait. (Worth, maybe, relating to the Quinn and Bederson motivations bit.)

You could imagine using a system like this to get activities at varying levels of detail and having people label relationships between the parts to build kinds of “task models” for different tasks (c.f. Iqbal and Bailey’s mid-2000 stuff on interruption) — I guess they talk a little bit about this on p. 6.

I was confused by the description of the combining inputs part of the algorithm.

p. 6

The references to public squares, etc., add just a touch of Big Brother.

For some reason I thought the system would also use some of the video features in its learning, but at least according to the “Training the learning model” section it’s about activity labels and sensed tags. I guess thinking about ‘sequences of objects’ as a way to identify tags is reasonable, and maybe you won’t need the RFID tags as computer vision improves, but it felt like useful info was being left on the table.

Okay, the paper’s explicitly aware of the side information leak problem and has concrete thoughts about it. This is better than a nominal nod to privacy that often shows up in papers.

p. 7

I’m not sure the evaluation of crowd versus single is that compelling to me. I guess showing that redundancy is useful here, and that it can compare with an expert is as well, but it felt a little hollow.

p. 8

I’m not sure what it means to get 85% correct on average. Not really enough detail about some of these mini-experiments here.

Heh, the whole “5 is a magic number” for user testing thing shows up again here.

I’m guessing if the expert were allowed to watch the video multiple times they too could have more detailed labels. The expert thing feels kind of weak to me. And, on p. 9 they say the expert generated labels offline — maybe they did get to review it. Really not explained enough for confidence in interpreting this.

p. 9

The idea that showing suggestions helped teach people what the desirable kinds of answers were is interesting (parallels Sukumaran et al. 2011 CHI paper on doing this in discussion forums and Solomon and Wash 2012 CSCW paper on templates in wikis). In some ways the ESP game does this as well, but more implicitly.

The intent recognition thing is kind of mysterious.

p. 10

This paper needs a limitations section. No subtlety about broadening the results beyond these domains. Cool idea, but.

===========

Hecht, B., Hong, L., Suh, B., & Chi, E. H. (2011, May). Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 237-246). ACM. http://extweb-prod.parc.com/content/attachments/tweets-from-justin.pdf

Okay, a paper that looks at actual practice around what we might assume is a pretty boring field and finds that it’s not that boring after all. It’s a little sad that traditional tools get fooled by sarcasm, though.

It’s always fun to read about anachronistic services (Friendster, Buzz, etc.)
I wonder if Facebook behavior is considerably different than Twitter behavior either because of social norms or because the location field lets you (maybe) connect to others in the same location.

This does a nice job of motivating the problem and the assumption here that location data is frooty and tooty.

On the social norms front, it would be cool to see to what extent appropriation of the location field for non-locations follows social networks (i.e., if my friends are from JB’s heart, will I come from Arnold’s bicep?)

p. 2

(And, on the location question, I haven’t read Priedhosrsky et al’s 2014 paper on location inference, but I think they use signal from a user’s friends’ behavior to infer that user’s location. — which I guess the Backstrom et al paper cited here does as well)

Okay, they also do a reasonable job of explicitly motivating the “can we predict location” question, connecting it to the usefulness/privacy question that often comes up around trace data.

Nice explicit contribution statement. They don’t really talk about point 3 in the intro (I guess this is the “fooling” part of the abstract), but I’m guessing we’ll get there.

Another mostly-empty advance organizer, but then an explicit discussion of what the paper _doesn’t_ do, which is kind of cool (though maybe a little out of place in the intro — feels more like a place to launch future work from).

The RW not as exciting here, so far reading more like annotated bibliography again. For instance, I wonder if the Barkhuus et al paper would give fertile ground for speculating about the “whys” explicitly set aside earlier; saying that “our context is the Twittersphere” is not helpful.

At the end they try to talk about how it relates to the RW, but not very deeply.

p. 3

The fact that half the data can be classified as English is interesting — though I’m not sure I’m surprised it’s that little, or that much. (Which is part of why it feels interesting to me.)

Not sure I buy the sampling biases rationale for not studying geolocated info (after all, there are biases in who fills out profile info, too). This feels like one of those kind of “reviewer chow” things where a reviewer was like “what about X” and they had to jam it in somewhere. (The “not studying reasons” para at the end of the intro had a little bit of that feel as well.)

10,000 entries… whoa.

p. 4

Hating on pie charts, although the info is interesting. It does make me want to say more like “of people who enter something, about 80% of it is good” — it’s a little more nuanced.
The “insert clever phrase here” bit suggests that social practice and social proof really do affect these appropriation behaviors — and it is also cool that some themes emerge in them.

It’s tempting to connect the idea of “self report” from the other paper to the idea of “disclosing location” here. The other paper had self-disclosure as part of an experiment, which probably increases compliance — so really, part of the answer about whether data is bias is thinking about how it was collected. Not a surprising thing to say, I guess, but a really nice, clear example coming out here.

So, no info about coding agreement and resolution for the table 1 story. I’m also assuming the percents are of the 16% non-geographic population. Most of the mass isn’t here: I wonder what the “long tail” of this looks like, or if there are other categories (in-jokes, gibberish), etc. that would come out with more analysis.

The identity story is a cool one to come out, and intuitively believable. It would be cool, as with the other paper, to connect to literature that talks about the ways people think about place.

p. 5

So, the description of lat long tags as profiles is a little weird — it means we’re not really looking at 10K users. We’re looking at 9K. That’s still a lot of users, but this is something I’d probably have divulged earlier in the game.

I wonder if one of the big implications on geocoding is that geocoders should report some confidence info along the way so that programs (or people) can decide whether to believe them — oh, okay, they go on to talk about geoparsers as a way of filtering out the junk early.

p. 6

I’m not sure how to think about the implication to replace lat/long with a semantically meaningful location for profile location fields. My main reaction was like “why is Blackberry automatically adding precise location info to a profile field in the first place?”

The idea of machine-readable location + a custom vernacular label is interesting, but it’s more machinery — and it’s not clear that most people who had nonstandard info actually wanted to be locatable. The reverse implication, to force folks to select from a pre-set of places if you want to reduce appropriate, seems fine if that’s your goal. There’s something in between that combines both approaches, where the user picks a point, a geocoder suggests possible place names, and they choose from those.

All of these are a “if your goal is X” set of suggestions, though, rather than a “and this is The Right Thing To Do” kind of suggestion, and it’s good that the paper explicitly ties recommendations to design goals. Sometimes papers make their design implications sound like the voice of God, prescribing appropriate behavior in a context-free manner.

Unclear how many locations people really need; multi-location use seemed pretty rare, and it’s unclear that you want to design for the rare case (especially if it might affect the common one).

It’s often good to look for secondary analysis you can do on data you collect, and study 2 is high-level connected to study 1. Further, I tend to like rich papers that weave stories together well. But here they feel a little distant, and unless there’s some useful discussion connecting them together later I wonder if two separate papers aimed at the appropriate audiences would have increased the impact here.

p. 7

I’m not sure why CALGARI is an appropriate algorithm for picking distinguishing terms. It feels related to just choosing maximally distinguishing terms from our playing with Naive Bayes earlier, and I also wonder if measures of probability inequality (Gini, entropy, etc.) would have more info than just looking at the max probability. (Though, these are likely to be correlated.)

Again, a pretty clear description of data cleaning and ML procedure, which was nice to see.

I’d probably call “RANDOM” either “PRIOR” or “PROPORTIONAL”, which feel more accurate (I’m assuming that UNIFORM also did random selection as it was picking its N users for each category).

p. 8

Also nice that they’re explaining more-or-less arbitrary-looking parameters (such as the selection of # of validation instances).

Note that they’re using validation a little differently than we did in class: their “validation” set is our “test” set, and they’re presumably building their Bayesian classifiers by splitting the training set into training and validation data.

So, what these classifiers are doing are looking at regional differences in language and referents (recall Paul talking about identifying native language by looking at linguistic style features). It looks like, based on p. 9’s table, that referring to local things is more the story here than differences in language style.

p. 9

Not sure about the claims about being able to deceive systems by using a wider portfolio of language… it really is an analog of how spammers would need to defeat Naive Bayes classifiers.

It doesn’t really go back and do the work to tie study 1 and study 2 together for me.

CHI 2014, highlights part 2

And, we’re back for another six-pack of things I found interesting [1] in CHI 2014 papers, continuing my last post.

I really enjoyed Jina Huh‘s talk about her paper with Wanda Pratt about “Weaving clinical expertise in online health communities” . Asking practitioners how they interpret peer-to-peer health conversations and how they might go about improving them was a good idea and although the answers were often “well, I’d do the things I’d do if I were working with them”, knowing what clinicians might need to know and thinking about mechanisms to learn them is smart [2].

The next four all came from a session on critical design, which is not normally in my wheelhouse but I had prior interests in Will Odom‘s paper because of the connections to Pensieve, so I took a chance that went well [3].

Melanie Feinberg‘s talk about her paper with Daniel Carter and Julia Bullard called “Always somewhere, never there: using critical design to understand database interactions”  [4] was a fun look at designing subversive navigation structures for a document collection to make commentary on both collections and taxonomies in general [5]. She’s a classification scholar and so expressed deep skepticism about folksonomies, but thinking about how to “read” subsets of a user-generated tag collection might be interesting design research along the lines she’s already on.

Jeff Bardzell presented a paper with Shaowen Bardzell and Erik Stolterman called “Reading critical designs: supporting reasoned interpretations of critical design” that might help those of us who are not steeped in critical design better-process critical analyses [6]. My outsider takeaway is their taxonomy of six high-level design featurs (topic, goal, form, etc.) crossed with four main flavors of critique (changing perspectives, reflectiveness, etc.) felt like a useful, relatively accessible tool for engaging (students) with critique [7].

Will Odom then talked about his ongoing work with memory, photographs, and slow technology from his (and several others’) paper “Designing for slowness, anticipation and re-visitation: a long term field study of the photobox“. Here some of the major effects about people’s move from frustration with lack of control to acceptance took several months to appear, and the idea that a slow technology like this might make things “special” is very cool. It would be interesting to probe both at the role of long deployments in CHI [8] and the design spaces in which slow/imperturbable technology might be most interesting [9].

Steve Whittaker gave the last talk on behalf of Corina Sas and co-authors of their paper “Generating Implications for Design through Design Research“. Unpacking the ways people think about design implications [10]–what they are, how they get made, whether they’re good–made me want to run off and look at all the design implications I’ve written [11] in the past. Maybe design implications are the personas of fieldwork and user studies: they should help you think about the rich awesomeness that went into generating them but in practice often feel a little two-dimensional.

Finally for today, an upstate New York shout out to Yang Wang from Syracuse talking about “A field trial of privacy nudges for Facebook” with folks from CMU. Although one, a post delay [12], is more about mitigating regret than privacy per se, the general plan of interfaces that help people reason about audiences, disclosure, and consequences feels solid [13]. Some people liked the interfaces and others didn’t, also fine–a feature that half the people can turn off and half of people benefit from is pretty awesome. And I found it generative, as their nudge of showing some audience members has a bunch of variations: chosen by user or computer, strong or weak ties, present or possible future contacts, etc.

And that’s all for now. Hope these were useful pointers and thoughts, and also that I finish the job later because my very favorite talk is still not listed. As before, feel free to share your own favorites; I always like hearing about good things.

— 30 —

[1] This was originally “liked most”, but I’m trying to give up on the superlative. There’s some cognitive difference between “Think of your fondest childhood memory” and “Think of a fond childhood memory”. One is a search task, and one is an invitation.

[2] Their goal of semi-automated moderation/support for conversation also directly ties to things the Cornell eRulemaking Initiative  is looking to do to support public discourse more generally.

[3] Going back to the Ron Burt story from last time, one of his suggestions was to regularly go to unfamiliar conferences. The nice thing about CHI is that you can get the same effect by just going to unfamiliar sessions.

[4] I also was amused that searching for “Melanie Texas Databases” while writing this got me directly to her homepage. Yay, Google.

[5] In post-talk conversation with Melanie, Danyel Fisher pointed out the “Celestial Emporium of Benevolent Knowledge” as a cool counterpoint to the idea of taxonomies of any kind being universal, which I will in turn pass on to you.

[6] One of the frustrations of not having a design background but teaching some design courses is that I’ve had trouble finding materials to support doing critique and peer review. This paper from a critical side and some of Scott Klemmer’s stuff for peer evaluation of design both seem useful.

[7] It felt so much like a design space that I wondered if with small changes it could be used as a framework for critiquing designs more generally.

[8] We don’t do enough longitudinal studies. Just sayin’.

[9] I vote for email. I’ve already turned off notifications, but I’ve regularly pondered using/writing a tool that adds delays to mail that I both send and receive (what would the world be like if email were the pace of postal mail?).

[10] Snarkily, “the things you have to add to make CHI reviewers happy”, as eloquently critiqued by Paul Dourish in his Implications for Design paper.

[11] Badly.

[12] And, we’re back to slow technologies.

[13] Some of which I hope to be doing soon with Natalie Bazarova and Janis Whitlock if our grant proposal goes through. They say the third time is a charm…

CHI 2014, some highlights

So, another CHI in the books. I was feeling more anti-social than usual [1] this time around [2], so I wound up going to more talks than normal in search of ideas [3]. This year was pretty fertile [4], so I’m glad I made the choice. In this post, I’ll share some of my favorites with very brief notes; in some perfect worlds I’ll write longer bits about the most generative ones later.

Allison Woodruff had a nice talk about her paper “Necessary, unpleasant, and disempowering: reputation management in the internet age“. What can people, and CHI, do about the problem and pain of other people posting negative information about you on the internet? Right now, not much, hence the title. [5]

Aaron Halfaker, R. Stuart Geiger, and Loren Terveen had a nice talk on their paper about a Wikipedia tool called “Snuggle: designing for efficient socialization and ideological critique“. I liked Aaron’s socio-technical argument about how Wikipedians’ perceptions of the problem of vandalism in 2007 became reified in tools such as Huggle in ways that had strong negative consequences for helping to educate and socialize newcomers. [6]

Frank Bentley talked about a TOCHI paper he and many others did, “Health Mashups: Presenting statistical patterns between wellbeing data and context in natural language to promote behavior change“. This was cool: find interesting significant correlations between different streams of logged or sensed data, present them to people in simple English (“On weekends you’re happier”), and invite them to reflect on what it means. [7]

This was immediately followed by Eun Choe et al.’s “Understanding Quantified Selfers’ Practices in Collecting and Exploring Personal Data“. Super-clever in using videos recorded at quantified self meetups as a qualitative data source and in focusing on extreme cases to gain insight, I learned that an amazing number of serious practitioners roll their own tools and visualizations to help them out here. [8]

I also liked Flavio Figueiredo et al.’s note “Does content determine information popularity in social media?: a case study of youtube videos’ content and their popularity” . It is one of several papers claiming that asking people about what they think other people might like could quickly generate predictions of ultimate popularity. It also asked this at three levels of analysis: individual liking, willingness to share with friends, and general expected popularity. [9]

I have a little conflict of interest because I advise Liz Murnane, but her paper with Scott Counts on “Unraveling abstinence and relapse: smoking cessation reflected in social media” was a nice contrast to the “mine everything” a-theoretical approach adopted by much quantitative social media research. She gave a strong talk about using domain knowledge and relevant theory to guide feature construction for detecting smoking cessation attempts, motivations, and outcomes in Twitter. There was a reasonable argument that this specific case could improve interventions and I hope to see more projects adopt the general mindset/method that it’s useful to know something about what you’re mining before you start grunging around. [10]

This is starting to be a pretty long post, so I’m going to stop there for now and leave the second half of the conference for a hopeful sequel. Hope this was useful; if you have your own favorites to share, happy to get pointers to them as well. [11]

–30–

[1] Technically, I’m almost always just a little anti-social in crowd situations; it took me a long time to get comfortable with conferences, and my first experience ended with me walking away from the reception hall crying because I couldn’t make myself go in and sit at a table with 9 strangers for dinner.

[2] This is part of a more pervasive feeling that I’ve gone to too many conferences and that there’s too much travel in the game, which I’ve talked about before.

[3] There’s a huge set of things you do at a conference; Michael Ernst has a brief guide.

[4] My own experience was that I got more out of talks than conversations early in my academic life (though, see [1]), but that the incremental value of talks has become a little smaller as you get older, especially if you hang out in your specialty, because you get exposed to less new stuff. Ron Burt did a CSCW 2013 keynote about going new places that followed from an earlier paper of his.

[5] My short form answer is that much of the problem is that you can find this stuff with search engines, so you might start with ranking algorithms that penalize negative information about people. This is one that would be worth a whole post about.

[6] The talk didn’t describe the use of Snuggle (here’s a link to the tool), and based on chatting with Aaron it’s in part because it’s been hard to get a critical mass of adopters. It’s hard to change hearts and minds.

[7] One other awesome thing about the talk was its explicit discussion of how lack of statistically significant differences in behavior did not mean that there weren’t cool things to learn. Too many papers have the opposite: statistically but not practically or interestingly significant results.

[8] It also highlighted the common first-timer’s mistake of tracking too much with not enough purpose; combined with the Bentley et al. paper, it highlights a key issue for supporting this kind of reflection: How might systems help people think about questions they’re interested in? As with Woodruff, this is a whole post (although Abi Sellen and Steve Whittaker have covered some of this ground already).

[9] Pointing out that social networks/diffusion research often ignores aspects of content when thinking about what’s likely to spread was a nice touch as well.

[10] I have pretty strong views on this, as you can tell.

[11] Self-promotion is okay if done tastefully.

A reference letter a year makes the doctor(ate) appear?

Part of my recent writing a recommendation letter for Victoria Sosik had me looking back at my history of letters I’d written for her [1]. Doing this leads me to propose a worthy goal for PhD students: try to need at least one recommendation letter a year from your advisor [2]. Why?

First, recommendation letters mean that you’re trying to achieve something good: a fellowship, an internship, a job, a scholarship, a doctoral consortium, an invited workshop [3]. They don’t have to be big: local (department, college, or school-level) recognition and support for research is worth finding too, and you should have these on your radar [4]. Being recognized is never a bad thing, and for ones that come with support, this gives you freedom and opportunities [5].

Second, it helps keep your advisor up on what you’re doing. In principle, we should be pretty good at this through regular contact and scheduled times for reflection on the bigger picture [6]. However, people get busy (or as Amy Bruckman points out, overcommit their time), go on sabbatical or leave, or fly under the radar for a while, so distance can form. Needing to write a letter is a forcing function for attention.

Third, it means there’s always a letter ready to go that’s pretty current. Opportunities come up, sometimes on short notice [7], and it’s normally awkward to ask someone to write a reference letter from scratch on short notice. A recent version can often be tweaked pretty fast.

I’d be interested to hear other folks’ takes, but my general advice to students would be to get crackin’ and start askin’ [8].

[1] Which is a fair number at this point as she prepares to head out the door.

[2] Sorry, advisors! Actually, though, as you’ll see, this is not such a bad thing for you.

[3] It turns out that the essays, plans, abstracts, goals, proposals, and other documents you have to write for these things are also often helpful for your thinking. Grant writing, in moderation, is like that for me, as was writing up the tenure docs.

[4] I try to shield students from the need to garner resources, especially early on, but finding resources–whether it’s money, material, sites, people, or support–that can help you get things done is part of the job, and it doesn’t hurt to get practice at this early on.

[5] There’s also a kind of moral suasion this should pose on your advisor: the more you help yourself, the more they should feel like they should help you with ancillary things: travel costs, participant payments, tools and/or transcribers.

[6] I try to have everyone I work with or teach do at least one self-evaluation a semester. These are super lightweight but do remind people that they should take stock and give them occasions to make changes they see as valuable.

[7] Sometimes on longer notice, because you go back and forth, then make the call late in the game. FWIW, it doesn’t hurt to ask your advisor whether shooting for some X is a good idea earlier on.

[8] But ask in ways that make it easy, by including pointers to the thing you’re applying for, guidelines/criteria, your CV, a transcript, drafts of the docs you’re writing for it, your current website, and anything else you’ve got that’ll give people the information they need to write a better, faster, stronger letter.

The Incredible Evolving Research Statement

I wanted to follow up my “recent” post about research statements with a trace of how mine’s evolved over the last several years. I’m hoping this will help people at multiple career stages tell their own stories as well as give a concrete example of how one’s own story can change over time.

For my statement in fall of 2007, I was a year into my postdoc and writing my statement for my third time around on the job market. I wound up organizing it around five small themes that had emerged out of my work: “creating a better user experience in recommender systems”, “making recommendations for research papers”, “understanding how interfaces affect people’s contributions to communities”, “encouraging people to provide public goods in online communities”, and “exploring how technologies can support social interactions”.

This was a pretty reasonable chronological story of what I had done and how I had moved from algorithms toward interfaces, using the “understanding” part to help marry the algorithmic/interface work I had done with the move toward recommender and social science theory applied to growing online communities. It was fairly easy to tell, chronologically and intellectually accurate, and gave me a chance to talk about most of my work to that point–which at that point was a good thing.

My high level story of “help groups make sense of information” was probably too broad given my career stage and the work I’d done at the time, so that part kind of sucked; the coherence between the parts was also at a kind of surface level. I also had a future directions nod to “computational social science” at the end, because I was hanging out with the “Getting Connected: Social Science in the Age of Networks” group at Cornell’s Institute for Social Sciences. There were natural connections between my prior Wikipedia work and stuff in the air there, and it felt like a research direction that had legs.

Fast forward to late fall of 2010, where I’m writing my statement for my third year review. At this point I’ve advised students for a few years who are starting to take things in their own directions, writing grants with new research themes around reflection on social media data, and having lots of success collaborating with folks in the Networks group.

Thus, I’ve got a bigger story to tell and I zoomed out quite a bit to an overarching theme of how we might think of work around big data organized into three broad categories: “big data as a door” (that systems can unlock through modeling of users’ interests for recommendation and e-commerce), “big data as a window” (that researchers can observe and study human behavior through primarily for theory-building), and “big data as a mirror” (that systems present back to the people who create the data to support understanding and reflection). I then talk about how I think cool work brings these views together and give a few brief examples of ongoing projects with stories about how they marry the perspective.

I liked this telling for a number of reasons. In particular, I believe this: the big data story usually gets told from just one perspective and I do prefer work that marries them. It still aligns pretty well with the chronological and intellectual evolution of what I do. It also groups disparate work together, which is important because (as we will see in a future advising statement) I try hard to give students flexibility in research topics and make it my job to help it fit together.

For consumers, it lets people me in ways that  map well to their own background: Dan the recommender systems guy, Dan the computational social science guy, Dan the reflection on and reuse of social media guy. On the other hand, it tries to get away from particular systems: I don’t want to be “the MovieLens guy”, “the Pensieve guy”, or “the Wikipedia guy”; being known for studying one system feels a little narrow, even though Wikipedia research is super-broad and I know that some of my reputation is based on the Wikipedia work.

It doesn’t do a great job of talking about specific impacts, and as Jeff Hancock pointed out, among other problems it has way too much typography. It’s also still a little scattershot; the “next directions” in particular are kind of a mish-mash of the things that were going on at the time.

But it felt like a good story, good enough that the high level framing is still the backbone in my current draft of the statement for tenure. The themes are the same, although I reduced the emphasis on the metaphor of doors, windows, and people, instead foregrounding the main point about who was doing the meaning-making. The “next directions” are also more coherent with prior work and are described in more detail, which felt strong. I also still try to give credit to collaborators and to contextualize things relative to other work, both of which I see as important.

There’s also value in deleting here. I took out much of the typography, leaving only a main idea per paragraph highlighted that tries to call out the primary impact or novelty in that chunk. I also left out pieces of the work that were hard to connect to the story or that have had lower impact so far. You don’t need to talk about everything and it can get confusing (and too long) for readers’ comfort.

I’m pretty happy with it now, for now. I’m still not sure about the explicit attempt to demonstrate that the work has had impact at the end. It feels a little braggy and Geri Gay says (correctly I think) that this should be more detailed and that citation counts only go so far.

But it’s probably what I’m going to run with. Wish me luck and good luck in your own research statement writing endeavors.

Writing a Research Statement (for a Tenure Package)

tl/dr: Research statements should demonstrate that you have made or will make an impact through effective, clear storytelling about what you have done and how it connects to your research community. Careful organization and clear evidence of impact can help you make this case to the many different kinds of people who will read your statement.

Writing a Research Statement (for a Tenure Package)

One of the main docs you write as part of the tenure process is a research statement, and before revising mine, I wanted to spend some time thinking about what makes for an effective statement. We also write these during the job search and various other times during the career, so hopefully this post will have broad appeal. The thoughts below are based on my own thinking, talking with other professors, and looking at my own and other people’s past research statements for tenure [0].

We’ll start with a few key points up front. First, in line with the typical tenure and promotion criteria at research universities [1], a main goal of the statement is to demonstrate that your work has had, and will continue to have, an impact on your research community. So a glorified annotated bibliography of your work is not going to cut it. You need to talk about how your work fits into the broader conversation, why it’s interesting and exciting and important.

Second, as stated by Mor Naaman in a comment on my original tenure post, not everyone who reads your statement (or your dossier [2]) is going to be an expert in your field. So, a glorified annotated bibliography of your work is not going to cut it. Not only do you need to position your work in your community, you need to do this in a way that letter writers, your dean, and faculty across the university will appreciate.

Third, even for those who are experts, they’re not likely to be experts on you, meaning that your research statement has real impact on how and when people think about you [3]. So, a glorified… well, you get the picture, but the key insight here is that the research statement is telling a story about you just as much as it is about the research [4].

So, how do research statement writers go about accomplishing these goals? For the most part, what I saw was a lot of work around organizing the story and showing current impact in ways that was broadly accessible, but less on the questions of ‘so what’ and ‘what next’.

Organizing the story

Based on the statements I looked at, the general approach was to focus on some small number of broad topical themes that represent research questions or areas that people claim to make key contributors to. The work itself is used to illustrate the contributions, possibly with some sub-themes inside the area to help readers group the individual papers. Then, an overall story ties the areas together with some kind of bigger picture and/or longer-term research goals.

How broad the goals, themes, and sub-sections are depends in part on how long you’ve been in the game and how broad your interests are–which implies that your research statement will continue to evolve over time [5]. For instance, my fall 2007 job hunt statement  and spring 2011 third year review statements  are organized quite differently because I had another 3.5 years of deepening and broadening my work and thinking both on specific projects and on how the different strands tied together [6]. (I wrote a bit about this evolution in “The Incredible Evolving Research Statement“, which is a reasonable companion to this post.)

Most of the statements were broadly chronological, especially within areas. I think this on balance was used to show the accumulation, evolution, and deepening of one’s own work in an area. Some (including mine), but not all, were also chronological across the areas, which as a reader I saw as illustrating the person’s career arc. None was comprehensive, and some work was left out; instead, the statements focused on telling a more or less coherent story [7].

There are other ways to tell the story of your research besides chronology plus research areas. For instance, I could imagine talking about my own work as a grid where levels of analysis (individual, dyad, group/community) are on one axis and major research area/question (recommendation, user modeling, system-building, reflection) is on the other [8], then positioning work in the grid cells. This would be particularly useful for showing breadth across a couple of intersecting areas, maybe for highlighting interdisciplinarity. If I wanted to emphasize my techy/system-building bits, I could imagine organizing the statement around the systems that I’ve built, supervised, and studies along the way, with research questions emerging as themes that repeatedly occur across the systems [9]. But the overall story plus themes and chronological evolution model feels both fairly common and effective, and I do like the 2011 version a lot — so I’m likely to do an update but not rework of it for the tenure package.

Showing (current) impact

Much of the discourse on this side focused on various forms of evidence that other people, mostly in the academic community, cared about the work.

Most folks worked in some mention of support for their work, notably grant funding. Funding is direct evidence that people think you and your work are interesting enough to spend money on [10]. Yes, this is in your CV, but so are many other things you’ll talk about in the statements, and yes, done to excess or done badly it could feel a little off-putting. But it is honest and valuable to acknowledge support and it is pretty easy to make it part of the story (e.g., “I received an NSF grant to help answer my questions around X”).

Likewise, everyone talked about collaborators and students they’ve worked with. Much as with grants, collaboration says people think you and your work are interesting enough to spend time on [11]. Further, to some extent we’re known by the company that we keep, and collaborating with good people reflects well on you. Again, done as an exercise in name-dropping this could be tedious, but again, it’s easy to work naturally into the conversation–and again, it’s a worthy and honest thing to point out that you had help along the way.

People also mentioned how the work connected to and through groups or workshops they organized, led, and contributed to that are directly related to their research [11a]. To some extent, this overlaps with the service statement, but as with direct collaboration, if people are willing to band together with you it shows that people value the kinds of work that you do and see you as a positive influence.

Some folks talked about citations, h-indices, and other citation metrics. Citations are a proxy for attention, interest, and quality in your work, both the particular work being cited and in your reputation more generally (because well-known and -regarded people are more likely to come to mind). There are some problems with quantitative metrics of scholarly impact: differing practices and sizes across fields affects numbers; not all citations are positive; to do it right you’d probably need to compare to peers’ citation activity; etc. But citations have some value as an indicator of impact [12]. It’s a little harder to weave this in naturally, though you can use the numbers to point out particularly impactful papers, or use the data to give an overview to make the case that your career as a whole has been noticed.

For the most part, those were the high points. I do want to point out that there are lots of other ways one might talk about making impact. I’ll pass the torch to Elizabeth Churchhill’s discussion of impact more generally  that among other things riffs off of Judy Olson’s Athena award talk about the many paths to scholarly impact at CSCW 2012. A group called altmetrics is pushing on other ways to think about impact, and other folks such as danah boyd [13] and Johnny Lee have carved careers out of making impact beyond research papers. These kinds of impact are worth talking about. However, for all that academia is pretty liberal politically, it’s fairly conservative in how it measures impact, and so a diversified portfolio with a fair percentage invested in traditional impact measures is probably less risky.

The statements didn’t have so much to say about potential future impact and work directly. There was sometimes a discussion of the next questions on a current line of work, and sometimes the overarching research question was used to highlight a general next line or lines. I guess this makes sense, because our next research moves are shaped by resources, people, contexts, and events [14], but it was a little surprising given the ‘future continued potential’ part of the tenure evaluation process.

Likewise, there was not as much “so what” as there probably could be, especially. There were reasonable connections to other work at a high level [15], to help make novelty claims and make the ‘so what’ case within the field. But there is much less of an argument about why the work is important to do in the grand scheme of things. This may be in part an artifact of length restrictions (there’s not a formal limit, but most of the tenure-time ones seem to clock in around 4-5 pages plus references). Our values around academic freedom also probably help us out when folks in other fields look at our tenure cases, even if they don’t see obvious indicators of importance, and our external letter writers are probably close enough to our work to appreciate it for its own sake. But I was still surprised at how little this was addressed in our statements.

So, that’s it for now–I should probably stop writing about writing research statements and get on to the business at hand. It was, however, useful spending some time thinking about what might make for a good research statement and hopefully some of this thinking will help future fellow travelers out.

#30#

[0] Web search turns up a variety of other useful resources and perhaps I should have just read them rather than writing my own. However, spending some time writing and analyzing myself felt valuable, and most of those I did find seem to be tuned toward research statements for the graduating PhD seeking a job rather than tenure. Many also seem to have been generated by searching for other articles about writing research statements. That said, this article on research statements from Penn’s career services  looked useful and had pointers to some examples. Oregon Academic Affairs also has some thoughtful slides on writing tenure statements, including the research statement.

[1] Here’s an example of promotion guidelines from Cornell’s College of Agriculture and Life Sciences.

[2] Also part of Cornell ADVANCE’s  “Successful Tenure Strategies” document.

[3] I haven’t been on a tenure committee yet, because you don’t get to vote on tenure cases until you have it, but for faculty hiring a number of recommendation letters look a lot like the candidate’s research statement or dissertation proposal/outline. I am guessing similar effects will happen for tenure letter writers.

[4] John Riedl often gave me talk advice that a key takeaway, in addition to the main points, should be that you’re awesome (not via self-aggrandizing–not John’s style–but through being interesting and demonstrating competence). It seems apropos here as well.

[5] Dan Frankowski, a research scientist at GroupLens when I was there, once claimed that the main thing we learn in grad school is how to tell bigger and better stories about the work.

[6] I made a followup post about how these statements evolved with some behind-the-scenes thinking, but this is already a pretty long post in its own right.

[7] It is fine to leave side projects out. A piece of career/tenure advice I have received from multiple sources is that it’s good to become known as “the X guy” for some very small number of X’s (often 1). Thus, focusing on the coherent and compelling story of ($1 to Richard Hamming) You and Your Research is probably best. Your side stuff will be in your CV and your online portfolio, and if people care about them and/or they’ve had an impact, you’ll get to talk about them.

[8] Joe Konstan sometimes talks about the grid as a useful way to organize a research story. For instance, for a dissertation you might try different items on the axes (levels of analysis, research questions, time periods, systems, theories, etc.), and think about a research path that cuts across a column, a row, or (to sample the space) a diagonal. If I were to do this for my tenure case, it feels like most of the cells should be filled in, at least some.

[9] Unless you’re in a clearly systems areas, though, focusing on systems runs the risk of pigeonholing you. You probably want to study recommender systems, not GroupLens; crisis informatics, not Katrina; collaboration, not Wikipedia; crowd work, not Mechanical Turk. I know that some people think of me as a “Wikipedia guy”, and that’s part of my story, but only part.

[10] The contrapositive is not true; if work isn’t funded, it still might be important and impactful. There are lots of ways to not get funding.

[11] Again, the contrapositive isn’t true; some disciplines and traditions value solo research more than my home area of HCI, and some people are just more comfy working alone and don’t seek collaborators.

[11a] Folks who are creating or colonizing quite new areas may find it useful to do a bunch of community-building through workshops, special issues, and the like to build and connect to fellow travelers.

[12] Here, unfortunately, the contrapositive is more plausible: you do want your work to be cited.

[13] Who has enough impact that, at least as I was writing this, if you typo her name to “danah body” Google will give you a “Did you mean: danah boyd”.

[14] FYI, although this is a true answer to kind of “Where do you see yourself in N years” question that you might get asked during a job interview, it is not a good answer. This I can attest from personal experience.

[15] Not many citations though, which was a little surprising, because that could both help ground the work and suggest appropriate tenure letter writers.

Mary Flanagan and Eric Paulos SoCS critical (game) design notes

And, notes from Mary Flanagan and Eric Paulos’ keynote and tutorial from the second day of the SoCS workshop. Mary first.

Not all games are serious games: critical games, causal games, art games, silly games — the question here is about how play and games interact. Or, even more basic, how do we know when people (or animals) are playing? Openness, non-threatening, relative safety, …

Instead, thinking about important elements of play might be a fruitful way to help designers (“Intro to game design 101” part of the talk). So, for instance, thinking about what people are looking for in meaningful play: meaning/context, choice/inquiry, action/agency, outcomes/feedback, integration/experience, discernment/legibility. Or, thinking about ways to carve up the elements of the game itself: rules/mechanics, play/dynamics, and culture/meaning.

So, what does this mean if you want to design games with a purpose, that have social good or social commentary or social change as their goals? In particular, gamification has a connotation of making people do things, a la persuasive computing, that makes people who design games feel really awkward because play is generally voluntary. Are Girl Scout merit badges about “play”?

Buffalo: a game that juxtaposes pairs of words that requires people to give names that encompass both the words (“contemporary” “robot”; “female” “scientist”; “multiracial” “superhero”; “Hispanic” “lawyer”), and the group of players to agree that the names are appropriate. It’s designed to help us reflect on stereotypes and implicit biases, and increase or broaden our “social identity complexity” (ability to belong to/respect/know about multiple social groups).

Which is important, because apparently these kinds of biases and divisions show up really young. (Though, it makes you wonder how to design games — both practically and ethically — for the very young, to help address issues like that.) And, because the biases are floating around in your social computing users and in your social computing software.

A fair number of games and studies suggest that you can have fairly dramatic effects on the biases, at least in the short term. Long-term effects are less clear, though — how would we study that?

I ran out of battery so lost most of the last bit, but one notable element was a design claim that having a more diverse team leads to more diverse games and outputs. This rang true on its face, and has some support from our own experiences writing prompts for the Pensieve project; the team was largely white rich kids from the Northeast and the prompts reflected that, in ways that sometimes made users from different backgrounds sad. So, this seems like a pretty useful nugget to take away.

Now, Eric, on design and intervention, and design vs. design research. In particular, claims about design research: it tends to focus on situations with topical/theoretical potential; it embeds designers’ judgment, value, and biases; and the results hopefully speak back to the theory and topics chosen, broadening the scope of knowledge and possibilities, as well as perhaps improving or reflecting on design processes themselves.

He’s also advocating for a more risky approach and perspective on design, with the claim that we often are good at solving well-defined problems (“good grades”) but not so good at having ideas that might help us think about tough problems (“creative thinking”). Further, the harder the problem, the less we know about it.

Like Mary, Eric is talking to some extent about critical design, speculative prototyping that calls out assumptions, possibilities, and hypotheses. There’s a general critical design meme that you’re looking for strong opinions that doesn’t seem necessary (suppose I reflect on assumption X and come to conclude I’m okay with it), but the general goal is to look outside of the normal ways that we look at a situation.

Now we’re going through a process that Phoebe Sengers talks about for thinking about design (and research) spaces: figure out what the core metaphors and assumptions of the field are, look at what’s left out, and then try to invert the goals to focus on what’s left out and leave out what’s a core assumption. Here we’re talking about this in the context of telepresence: what would it mean to think about telepresence not for work and fidelity but for fun and experience. One bit is an interesting parallel between early causal telepresence robots and, say, FaceTime.

Our next step is to think about the core assumptions of social computing (“find friends and connect”), what’s left out (“familiar strangers”), and inverting (“designing technology that exposes and respects the familiar stranger relationship”). Again, an impact claim here, that designs like Jabberwocky inspired things like Foursquare (with evidence that this is true, which is cool).

I wonder what the ratio of ‘successful’ to ‘unsuccessful’ or ‘more interesting’ to ‘less interesting’ critical design projects is. We’re now going through a large series of designs and it makes me wonder how many other designs we’re not talking about. Of course, you can say the same thing about research papers… and in both cases it would be really nice to see a little more of the sausage being made.

The large number of designs is also reminding me of the CHI 2013 keynote. Here we’re looking to illustrate the idea that looking at other design cultures might be useful for us, but the connections between particular designs and the underlying concepts/points/ideas are often not so clear: how do we make sense of these as a group. There’s always a talk tension where you want to talk about lots of cool stuff, making the connections between them, and managing the audience capacity, and again, this is not specific to designs (you see it in, say, job talks sometimes).

Now a discussion around the value of amateurs, DIY culture, and the idea that innovations often happen when people cross over these boundaries ($1 to Ron Burt, I think). It’s not clear that this follows from the set of designs we’ve talked about, but it’s a plausible and reasonable place and one that I try to live in a fair amount myself. There are costs to this — learning time and increase risk of failure — but I think that’s part of our game.

More heuristics around critical design, more than I expected, so kind of a giant list here:

  • Constraints (when is technology useful)
  • Questioning progress (what negative outcomes and lost elements arise)
  • Celebrate the noir/darkness (seek out unintended uses and effects)
  • Misuse technology (hacking/repurposing tech and intentionally doing things the wrong way)
  • Bend stereotypes (who are the ‘intended’ users)
  • Blend contexts (perhaps mainly physical and digital)
  • Be seamful (exploit the failure points of technology)
  • Tactful contrarianness (confidence in the value of inversions)
  • Embrace making problems (designs that cause or suggest issues, not resolve them)
  • Make it painful (versus easy or useful to use),
  • Read (not just papers), and
  • Be an amateur (a lover of what you do).

Noah Smith on NLP at SoCS

Continuing on the SoCS workshop, the afternoon session is a tutorial from Noah Smith at CMU about using NLP for socio-computational kinds of work.

Talking about how NLP people tend to make choices in models, algorithms, collection, and cleaning decisions, rather than what are “the right answers” (which are context-dependent), feels like a nice, fruitful way to discuss using NLP for socio-computational work. We’ll start with a discussion of document classification since that’s where Noah went.

Much of the story around NLP for document classification is thoughtful annotation/labeling of your data for the categories/attributes of interest. Having good justifications, theories, and research questions that lead you to create appropriate categories for the text and goals you have. And, once you get that dataset created, share it — people love useful datasets and might help you on the work.

Likewise, thinking carefully about how to transform the texts to text features–word counts, stemming, bigrams/trigrams, defining word categories (a la Linguistic Inquiry and Word Count, or LIWC)–is important and requires a thoughtful balance of intuition and justification.

Question: what are, for NLP folks, for CSCW folks, for social science folks, the “right” or “good” ways to justify choices of category schemes, labeling, feature construction, etc.?

One answer, around choice of ML algorithm, is to say “SVM performs a little better but you need to be able to talk about probabilities, so I’ll trade off a bit of performance for other kinds of interpretability”. And, especially if you choose a linear model from features to categories, the algorithms have relatively small (and predictable) kinds of differences — perhaps more noise than is worth optimizing on, versus spending your time on other stages that require more intuition/justification/art.

Another answer is that you should pick methods that you can talk sensibly about and that your community gets: if you can’t explain it at all, or to your community, you are in a world of hurt. Practical issues around tool choice that fit your research pipeline and skills and budget also matter.

Performance is only a piece of the tradeoff — and you really want to compare it on held out data. (You can be very careful about this by taking files with your test data and making them unreadable.) Likewise, you want to compare to a reasonable baseline; at the very least, against a “predict the most common class” zero-rule baseline. You might also think about the maximum expected performance, perhaps considering inter-coder agreement as an upper bound.

Performance went bad: what went wrong? Not enough data, bad labels, meaningless features, home-grown algorithms and implementations, (perhaps) the wrong algorithm, not enough experience or insight into the domain, …

Parsing for parts of speech or entity recognition is like sharing dinners. At dinner, the people around you will influence decisions on what to order. At NLP, the words nearby (and maybe some far away) might influence the classification of the words you’re looking at. The Viterbi algorithm for sequence labeling is a useful way to account for some of these dependencies.

Noah claims that this is going to be the next big idea from NLP that makes it big in the world of computational social science, because lots of important text analysis cames including part of speech tagging, entity recognition, and translation can be modeled pretty well as sequence labeling problems. Further, the algorithms for this kind of structured prediction are more or less generalizations of standard ML classification algorithms.

That said, there are a lot of really tough problems, especially around more semantic goals such as predicting framings, where there’s some in-progress work that is dangerous to rely on but perhaps fun to play with, including some of Noah’s own group’s work.

I’m going to not cover the clustering side, because I need a little break from typing and thinking, but hopefully this was useful/interesting for some folks.

Bonus note: can you predict if a bill will make it out of committee or a paper gets cited? Yes, at least better than chance, according to their paper.

Leysia Palen on crisis informatics at SoCS 2013

Since attempting to do a trip report after CHI was such a disaster, and since spamming twitter with N+1 tweets about an event may be annoying to some twitter folks, I’m going to try a kind of bloggy summary of the keynotes at SOCS 2013, the PI meeting for the socio-computational systems program at NSF.

This one is from Leysia Palen about understanding the use of social media, and ICTs more generally, in disaster response. Observations below (the first few are basically copied tweets before I realized I could do this, so they are pretty short).

— Dan

The sociology of disaster talks about the convergence of resources, information, volunteers; the “social media in disaster” question then might talk about how ICTs both add to and cut across these areas.

How to know you’ve found important online resources in disaster? Claim: people will mention them on Twitter, at least once, so if you collect Twitter, you get a pretty good sample of the world.

Leysia pointing out that SoCS/data mining proposals might need a substantial software engineering (or database) bit to support data management. This seems interesting as a way to build new collaborations and techniques both in the context of the kind of work lots of social media folks are doing.

Tweets for earthquakes to get human experience as well as magnitude and to supplement location. http://earthquake.usgs.gov/earthquakes/dyfi/: “Did you Feel It” encourages people to report on their earthquake experiences. However, crowd tasks for disaster response can’t put people at risk: “Where’s the lava?” would be a bad app. More prosaically, broadcasting to people asking whether a road is open will lead to people converging there — possibly putting them at risk, and the agency at liability.

Now we’re looking at people posting to Twitter about disaster, more or less intentionally and annotated with metadata, in a way that would let us think of it as data. Or journalism. Can we use it in real time/for situational awareness? Can it become useful data for longer term planning and policy?

So there’s a question about how both “spontaneous” (info people create anyways as part of responding to the disaster for their own reasons) and “solicited” (agency requests for specific info; apps like Did You Feel It or the Gulf Oil Spill app) arise, what they’re useful for, how they compare, what are the ethical and legal responsibilities around them, etc.

A model: “Generative”, initial tweets create raw material and report experience and conditions, mostly from locals. “Syncretic”/synthetic tweets use existing material to build out insight (example: looking at flood reports to predict future flooding). “Derivative” material, retweets and URL posting, is a kind of filter/recommender system, one that really helps deal with the bad actors/bad info problem because the useless stuff tends not to be generated. (And in particular, “official” material tends to be retweeted more often than other material.)

Talking about distance from and details of disaster experiences reminds me some of the opinion spam detection work from Myle Ott, Yejin Choi, Claire Cardie, Jeff Hancock at Cornell: http://arxiv.org/abs/1107.4557

The part of the talk about the recent Haitian disaster, and how online digital volunteers/digital activists built major bits of useful information infrastructure, including Haiti’s first really good maps (period), applications of them to the earthquake, and other infrastructure for distributing good, was cool. There’s a huge question, though, about how and when these things work, and why. Further, part of the story was that people had been gathering before the earthquake with a good of doing good, building capacity that was then ready to be turned toward an event such as the earthquake.

So, how do formal/existing organizations manage and work with social media? It’s hard, because on the ground they are so busy responding that social media is just not that important (not unlike, but more serious, than professors never updating their webpages because there is always something more important to do). Apart from questions of liability, and practicality, there’s also just a question of voice: what are they trying to accomplish and to convey by being online? One strategy they use is to correct misinformation, and allow likely-good information to go by (but not endorse it, for the liability reasons).

It turns out that it’s hard to go the other way, too: it was hard for social media researchers just to _find_ the accounts and locations of, e.g., police and fire departments that responded to Sandy, in order to look at their communication with the public. And, there’s a whole parallel discussion about internal use of social media in (and between) these organizations.

Interesting audience question was how do we measure the impact of social media activity and content on the ground? Answer: not well, pretty hard to do this. I guess you could look at retweets/likes, and someone else suggested using a summarization tool to help suss through tweets to find ones that are represenative/important/meaningful.

I wonder if it would be useful to look at parallels between microlending dynamics and disaster response dynamics. It’s clearly not a perfect parallel, but some of the social dynamics around convergence might be interesting to poke at. Likewise, the general crowdsourcing literature and infrastructure might be a fun connection.