Neurohackademy 2018: A wrap-up

It’s become something of a truism in recent years that scientists in many fields find themselves drowning in data. This is certainly the case in neuroimaging, where even small functional MRI datasets typically consist of several billion observations (e.g., 100,000 points in the brain, each measured at 1,000 distinct timepoints, in each of 20 subjects). Figuring out how to store, manage, analyze, and interpret data on this scale is a monumental challenge–and one that arguably requires a healthy marriage between traditional neuroimaging and neuroscience expertise, and computational skills more commonly found in data science, statistics, or computer science departments.

In an effort to help bridge this gap, Ariel Rokem and I have spent part of our summer each of the last three years organizing a summer institute at the intersection of neuroimaging and data science. The most recent edition of the institute–Neurohackademy 2018–just wrapped up last week, so I thought this would be a good time to write up a summary of the course: what the course is about, who attended and instructed, what everyone did, and what lessons we’ve learned.

What is Neurohackademy?

Neurohackademy started its life in Summer 2016 as the somewhat more modestly-named Neurohackweek–a one-week program for 40 participants modeled on Astrohackweek, a course organized by the eScience Institute in collaboration with data science initiatives at Berkeley and NYU. The course was (and continues to be) held on the University of Washington’s beautiful campus in Seattle, where Ariel is based (I make the trip from Austin, Texas every year–which, as you can imagine, is a terrible sacrifice on my part given the two locales’ respective summer climates). The first two editions were supported by UW’s eScience Institute (and indirectly, by grants from the Moore and Sloan foundations). Thanks to generous support from the National Institute of Mental Health (NIMH), this year the course expanded to two weeks, 60 participants, and over 20 instructors (our funding continues through 2021, so there will be at least 3 more editions).

The overarching goal of the course is to give neuroimaging researchers the scientific computing and data science skills they need in order to get the most out of their data. Over the course of two weeks, we cover a variety of introductory and (occasionally) advanced topics in data science, and demonstrate how they can be productively used in a range of neuroimaging applications. The course is loosely structured into three phases (see the full schedule here): the first few days feature domain-general data science tutorials; the next few days focus on sample neuroimaging applications; and the last few days consist of a full-blown hackathon in which participants pitch potential projects, self-organize into groups, and spend their time collaboratively working on a variety of software, analysis, and documentation projects.

Who attended?

Admission to Neurohackademy 2018 was extremely competitive: we received nearly 400 applications for just 60 spots. This was a very large increase from the previous two years, presumably reflecting the longer duration of the course and/or our increased efforts to publicize it. While we were delighted by the deluge of applications, it also meant we had to be far more selective about admissions than in previous years. The highly interactive nature of the course, coupled with the high per-participant costs (we provide two weeks of accommodations and meals), makes it unlikely that Neurohackademy will grow beyond 60 participants in future editions, despite the clear demand. Our rough sense is that somewhere between half and two-thirds of all applicants were fully qualified and could have easily been admitted, so there’s no question that, for many applicants, blind luck played a large role in determining whether or not they were accepted. I mention this mainly for the benefit of people who applied for the 2018 course and didn’t make it in: don’t take it personally! There’s always next year. (And, for that matter, there are also a number of other related summer schools we encourage people to apply to, including the Methods in Neuroscience at Dartmouth Computational Summer School, Allen Institute Summer Workshop on the Dynamic Brain, Summer School in Computational Sensory-Motor Neuroscience, and many others.)

The 60 participants who ended up joining us came from a diverse range of demographic backgrounds, academic disciplines, and skill levels. Most of our participants were trainees in academic programs (40 graduate students, 12 postdocs), but we also had 2 faculty members, 6 research staff, and 2 medical residents (note that all of these counts include 4 participants who were admitted to the course but declined to, or could not, attend). We had nearly equal numbers of male and female participants (30F, 33M), and 11 participants came from traditionally underrepresented backgrounds. 43 participants were from institutions or organizations based in the United States, with the remainder coming from 14 different countries around the world.

The disciplinary backgrounds and expertise levels of participants are a bit harder to estimate for various reasons, but our sense is that the majority (perhaps two-thirds) of participants received their primary training in non-computational fields (psychology, neuroscience, etc.). This was not necessarily by design–i.e., we didn’t deliberately favor applicants from biomedical fields over applicants from computational fields–and primarily mirrored the properties of the initial applicant pool. We did impose a hard requirement that participants should have at least some prior expertise in both programming and neuroimaging, but subject to that constraint, there was enormous variation in previous experience along both dimensions–something that we see as a desirable feature of the course (more on this below).

We intend to continue to emphasize and encourage diversity at Neurohackademy, and we hope that all of our participants experienced the 2018 edition as a truly inclusive, welcoming event.

Who taught?

We were fortunate to be able to bring together more than 20 instructors with world-class expertise in a diverse range of areas related to neuroimaging and data science. “Instructor” is a fairly loose term at Neurohackademy: we deliberately try to keep the course non-hierarchical, so that for the most part, instructors are just participants who happen to fall on the high-experience tail of the experience distribution. That said, someone does have to teach the tutorials and lectures, and we were lucky to have a stellar cast of experts on hand. Many of the data science tutorials during the first phase of the course were taught by eScience staff and UW faculty kind enough to take time out of their other duties to help teach participants a range of core computing skills: Git and GitHub (Bernease Herman), R (Valentina Staneva and Tara Madhyastha), web development (Anisha Keshavan), and machine learning (Jake Vanderplas), among others.

In addition to the local instructors, we were joined for the tutorial phase by Kirstie Whitaker (Turing Institute), Chris Gorgolewski (Stanford), Satra Ghosh (MIT), and JB Poline (McGill)–all veterans of the course from previous years (Kirstie was a participant at the first edition!). We’re particularly indebted to Kirstie and Chris for their immense help. Kirstie was instrumental in helping a number of participants bridge the (large!) gap between using git privately, and using it to actively collaborate on a public project. As one of the participants elegantly put it:

Chris shouldered a herculean teaching load, covering Docker, software testing, BIDS and BIDS-Apps, and also leading an open science panel. I’m told he even sleeps on occasion.

We were also extremely lucky to have Fernando Perez (Berkeley)–the creator of IPython and leader of the Jupyter team–join us for several days; his presentation on Jupyter (videos: part 1 and part 2) was one of the highlights of the course for me personally, and I heard many other instructors and participants share the same sentiment. Jupyter was a critical part of our course infrastructure (more on that below), so it was fantastic to have Fernando join us and share his insights on the fascinating history of Jupyter, and on reproducible science more generally.

As the course went on, we transitioned from tutorials focused on core data science skills to more traditional lectures focusing on sample applications of data science methods to neuroimaging data. Instructors during this phase of the course included Tor Wager (Colorado), Eva Dyer (Georgia Tech), Gael Varoquaux (INRIA), Tara Madhyastha (UW), Sanmi Koyejo (UIUC), and Nick Cain and Justin Kiggins (Allen Institute for Brain Science). We continued to emphasize hands-on interaction with data; many of the presenters during this phase spent much of their time showing participants how to work with programmatic tools to generate the kinds of results one might find in papers they’ve authored (e.g., Tor Wager and Gael Varoquaux demonstrated tools for neuroimaging data analysis written in Matlab and Python, respectively).

The fact that so many leading experts were willing to take large chunks of time out of their schedule (most of the instructors hung around for several days, facilitating extended interactions with participants) to visit with us at Neurohackademy speaks volumes about the kind of people who make up the neuroimaging data science community. We’re tremendously grateful to these folks for their contributions, and hope they’ll return to teach at future editions of the institute.

What did we cover?

The short answer is: see for yourself! We’ve put most of the slides, code, and videos from the course online, and encourage people to interact with, learn from, and reuse these materials.

Now the long(er) answer. One of the challenges in organizing scientific training courses that focus on technical skill development is that participants almost invariably arrive with a wide range of backgrounds and expertise levels. At Neurohackademy, some of the participants were effectively interchangeable with instructors, while others were relatively new to programming and/or neuroimaging. The large variance in technical skill is a feature of the course, not a bug: while we require all admitted participants to have some prior programming background, we’ve found that having a range of skill levels is an excellent way to make sure that everyone is surrounded by people who they can alternately learn from, help out, and collaborate with.

That said, the wide range of backgrounds does present some organizational challenges: introductory sessions often bore more advanced participants, while advanced sessions tend to frustrate newcomers. To accommodate the range of skill levels, we tried to design the course in a way that benefits as many people as possible (though we don’t pretend to think it worked great for everyone). During the first two days, we featured two tracks of tutorials at most times, with simultaneously-held presentations generally differing in topic and/or difficulty (e.g., Git/GitHub opposite Docker; introduction to Python opposite introduction to R; basic data visualization opposite computer vision).

Throughout Neurohackademy, we deliberately placed heavy emphasis on the Python programming language. We think Python has a lot going for it as a lingua franca of data science and scientific computing. The language is free, performant, relatively easy to learn, and very widely used within the data science, neuroimaging, and software development communities. It also helps that many of our instructors (e.g., Fernando Perez, Jake Vanderplas, and Gael Varoquaux) are major contributors to the scientific Python ecosystem, so there was a very high concentration of local Python expertise to draw on. That said, while most of our instruction was done in Python, we were careful to emphasize that participants were free to work in whatever language(s) they like. We deliberately include tutorials and lectures that featured R, Matlab, or JavaScript, and a number of participant projects (see below) were written partly or entirely in other languages, including R, Matlab, JavaScript, and C.

We’ve also found that the tooling we provide to participants matters–a lot. A robust, common computing platform can spell the difference between endless installation problems that eat into valuable course time, and a nearly seamless experience that participants can dive into right away. At Neurohackademy, we made extensive use of the Jupyter suite of tools for interactive computing. In particular, thanks to Ariel’s heroic efforts (which built on some very helpful docs, similarly heroic efforts by Chris Holdgraf, Yuvi Panda, and Satra Ghosh last year), we were able to conduct a huge portion of our instruction and collaborative hacking using a course-wide Jupyter Hub allocation, deployed via Kubernetes, running on the Google Cloud. This setup allowed Ariel to create a common web-accessible environment for all course participants, so that, at the push of a button, each participant was dropped into a Jupyter Lab environment containing many of the software dependencies, notebooks, and datasets we used throughout the course. While we did run into occasional scaling bottlenecks (usually when an instructor demoed a computationally intensive method, prompting dozens of people to launch the same process in their pods), for the most part, our participants were able to drop into a running JupyterLab instance within seconds and immediately start interactively playing with the code being presented by instructors.

Surprisingly (at least to us), our total Google Cloud computing costs for the entire two-week, 60-participant course came to just $425. Obviously, that number could have easily skyrocketed had we scaled up our allocation dramatically and allowed our participants to execute arbitrarily large jobs (e.g., preprocessing data from all ~1,200 HCP subjects). But we thought the limits we imposed were pretty reasonable, and our experience suggests that not only is Jupyter Hub an excellent platform from a pedagogical standpoint, but it can also be an extremely cost-effective one.

What did we produce?

Had Neurohackademy produced nothing at all besides the tutorials, slides, and videos generated by instructors, I think it’s fair to say that participants would still have come away feeling that they learned a lot (more on that below). But a major focus of the institute was on actively hacking on the brain–or at least, on data related to the brain. To this effect, the last 3.5 days of the course were dedicated exclusively to a full-blown hackathon in which participants pitched potential projects, self-organized into groups, and then spent their time collaboratively working on a variety of software, analysis, and documentation projects. You can find a list of most of the projects on the course projects repository (most link out to additional code or resources).

As one might expect given the large variation in participant experience, project group size, and time investment (some people stuck to one project for all three days, while others moved around), the scope of projects varied widely. From our perspective–and we tried to emphasize this point throughout the hackathon–the important thing was not what participants’ final product looked like, but how much they learned along the way. There’s always a tension between exploitation and exploration at hackathons, with some people choosing to spend most of their time expanding on existing projects using technologies they’re already familiar with, and others deciding to start something completely new, or to try out a new language–and then having to grapple with the attendant learning curve. While some of the projects were based on packages that predated Neurohackademy, most participants ended up working on projects they came up with de novo at the institute, often based on tools or resources they first learned about during the course. I’ll highlight just three projects here that provide a representative cross-section of the range of things people worked on:

1. Peer Herholz and Rita Ludwig created a new BIDS-app called Bidsonym for automated de-identification of neuroimaging data. The app is available from Docker Hub, and features not one, not two, but three different de-identification algorithms. If you want to shave the faces off of your MRI participants with minimal fuss, make friends with Bidsonym.

2. A group of eight participants ambitiously set out to develop a new “O-Factor” metric intended to serve as a relative measure of the openness of articles published in different neuroscience-related journals. The project involved a variety of very different tasks, including scraping (public) data from the PubMed Central API, computing new metrics of code and data sharing, and interactively visualizing the results using a d3 dashboard. While the group was quick to note that their work is preliminary, and has a bunch of current limitations, the results look pretty great–though some disappointment was (facetiously) expressed during the project presentations that the journal Nature is not, as some might have imagined, a safe house where scientific datasets can hide from the prying public.

3. Emily Wood, Rebecca Martin, and Rosa Li worked on tools to facilitate mixed-model analysis of fMRI data using R. Following a talk by Tara Madhyastha  on her Neuropointillist R framework for fMRI data analysis, the group decided to create a new series of fully reproducible Markdown-based tutorials for the package (the original documentation was based on non-public datasets). The group expanded on the existing installation instructions (discovering some problems in the process), created several tutorials and examples, and also ended up patching the neuropointillist code to work around a very heavy dependency (FSL).

You can read more about these 3 projects and 14 others on the project repository, and in some cases, you can even start using the tools right away in your own work. Or you could just click through and stare at some of the lovely images participants produced.

So, how did it go?

It went great!

Admittedly, Ariel and I aren’t exactly impartial parties–we wouldn’t keep doing this if we didn’t think participants get a lot out of it. But our assessment isn’t based just on our personal impressions; we have participants fill out a detailed (and anonymous) survey every year, and go out of our way to encourage additional constructive criticism from the participants (which a majority provide). So I don’t think we’re being hyperbolic when we say that most people who participated in the course had an extremely educational and enjoyable experience. Exhibit A is this set of unsolicited public testimonials, courtesy of twitter:

The organizers and instructors all worked hard to build an event that would bring people together as a collaborative and productive (if temporary) community, and it’s very gratifying to see those goals reflected in participants’ experiences.

Of course, that’s not to say there weren’t things we could do better; there were plenty, and we’ve already made plans to adjust and improve the course next year based on feedback we received. For example, some suggestions we received from multiple participants included adding more ice-breaking activities early on in the course; reducing the intensity of the tutorial/lecture schedule the first week (we went 9 am to 6 pm every day, stopping only for an hourlong lunch and a few short breaks); and adding designated periods for interaction with instructors and other participants. We’ve already made plans to address these (and several other) recommendations in next year’s edition, and expect it to looks slightly different from (and hopefully better than!) Neurohackademy 2018.

Thank you!

I think that’s a reasonable summary of what went on at Neurohackademy 2018. We’re delighted at how the event turned out, and are happy to answer questions (feel free to leave them in the comments below, or to email Ariel and/or me).

We’d like to end by thanking all of the people and organizations who helped make Neurohackademy 2018 a success: NIMH for providing the funding that makes Neurohackademy possible; the eScience Institute and staff for throwing their wholehearted support behind the course (particularly our awesome course coordinator, Rachael Murray); and the many instructors who each generously took several days (and in a few cases, more than a week!) out of their schedule, unpaid, to come to Seattle and share their knowledge with a bunch of enthusiastic strangers. On a personal note, I’d also like to thank Ariel, who did the lion’s share of the actual course directing. I mostly just get to show up in Seattle, teach some stuff, hang out with great people, and write a blog post about it.

Lastly, and above all else, we’d like to thank our participants. It’s a huge source of inspiration and joy to us each year to see what a group of bright, enthusiastic, motivated researchers can achieve when given time, space, and freedom (and, okay, maybe also a large dollop of cloud computing credits). We’re looking forward to at least three more years of collaborative, productive neurohacking!

In defense of Facebook

[UPDATE July 1st: I’ve now posted some additional thoughts in a second post here.]

It feels a bit strange to write this post’s title, because I don’t find myself defending Facebook very often. But there seems to be some discontent in the socialmediaverse at the moment over a new study in which Facebook data scientists conducted a large-scale–over half a million participants!–experimental manipulation on Facebook in order to show that emotional contagion occurs on social networks. The news that Facebook has been actively manipulating its users’ emotions has, apparently, enraged a lot of people.

The study

Before getting into the sources of that rage–and why I think it’s misplaced–though, it’s worth describing the study and its results. Here’s a description of the basic procedure, from the paper:

The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether exposure to emotional content led people to post content that was consistent with the exposure—thereby testing whether exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion. People who viewed Facebook in English were qualified for selection into the experiment. Two parallel experiments were conducted for positive and negative emotion: One in which exposure to friends’ positive emotional content in their News Feed was reduced, and one in which exposure to negative emotional content in their News Feed was reduced. In these conditions, when a person loaded their News Feed, posts that contained emotional content of the relevant emotional valence, each emotional post had between a 10% and 90% chance (based on their User ID) of being omitted from their News Feed for that specific viewing.

And here’s their central finding:

What the figure shows is that, in the experimental conditions, where negative or positive emotional posts are censored, users produce correspondingly more positive or negative emotional words in their own status updates. Reducing the number of negative emotional posts users saw led those users to produce more positive, and fewer negative words (relative to the unmodified control condition); conversely, reducing the number of presented positive posts led users to produce more negative and fewer positive words of their own.

Taken at face value, these results are interesting and informative. For the sake of contextualizing the concerns I discuss below, though, two points are worth noting. First, these effects, while highly statistically significant, are tiny. The largest effect size reported had a Cohen’s d of 0.02–meaning that eliminating a substantial proportion of emotional content from a user’s feed had the monumental effect of shifting that user’s own emotional word use by two hundredths of a standard deviation. In other words, the manipulation had a negligible real-world impact on users’ behavior. To put it in intuitive terms, the effect of condition in the Facebook study is roughly comparable to a hypothetical treatment that increased the average height of the male population in the United States by about one twentieth of an inch (given a standard deviation of ~2.8 inches). Theoretically interesting, perhaps, but not very meaningful in practice.

Second, the fact that users in the experimental conditions produced content with very slightly more positive or negative emotional content doesn’t mean that those users actually felt any differently. It’s entirely possible–and I would argue, even probable–that much of the effect was driven by changes in the expression of ideas or feelings that were already on users’ minds. For example, suppose I log onto Facebook intending to write a status update to the effect that I had an “awesome day today at the beach with my besties!” Now imagine that, as soon as I log in, I see in my news feed that an acquaintance’s father just passed away. I might very well think twice about posting my own message–not necessarily because the news has made me feel sad myself, but because it surely seems a bit unseemly to celebrate one’s own good fortune around people who are currently grieving. I would argue that such subtle behavioral changes, while certainly responsive to others’ emotions, shouldn’t really be considered genuine cases of emotional contagion. Yet given how small the effects were, one wouldn’t need very many such changes to occur in order to produce the observed results. So, at the very least, the jury should still be out on the extent to which Facebook users actually feel differently as a result of this manipulation.

The concerns

Setting aside the rather modest (though still interesting!) results, let’s turn to look at the criticism. Here’s what Katy Waldman, writing in a Slate piece titled “Facebook’s Unethical Experiment“, had to say:

The researchers, who are affiliated with Facebook, Cornell, and the University of California““San Francisco, tested whether reducing the number of positive messages people saw made those people less likely to post positive content themselves. The same went for negative messages: Would scrubbing posts with sad or angry words from someone’s Facebook feed make that person write fewer gloomy updates?

The upshot? Yes, verily, social networks can propagate positive and negative feelings!

The other upshot: Facebook intentionally made thousands upon thousands of people sad.

Or consider an article in the The Wire, quoting Jacob Silverman:

“What’s disturbing about how Facebook went about this, though, is that they essentially manipulated the sentiments of hundreds of thousands of users without asking permission (blame the terms of service agreements we all opt into). This research may tell us something about online behavior, but it’s undoubtedly more useful for, and more revealing of, Facebook’s own practices.”

On Twitter, the reaction to the study has been similarly negative). A lot of people appear to be very upset at the revelation that Facebook would actively manipulate its users’ news feeds in a way that could potentially influence their emotions.

Why the concerns are misplaced

To my mind, the concerns expressed in the Slate piece and elsewhere are misplaced, for several reasons. First, they largely mischaracterize the study’s experimental procedures–to the point that I suspect most of the critics haven’t actually bothered to read the paper. In particular, the suggestion that Facebook “manipulated users’ emotions” is quite misleading. Framing it that way tacitly implies that Facebook must have done something specifically designed to induce a different emotional experience in its users. In reality, for users assigned to the experimental condition, Facebook simply removed a variable proportion of status messages that were automatically detected as containing positive or negative emotional words. Let me repeat that: Facebook removed emotional messages for some users. It did not, as many people seem to be assuming, add content specifically intended to induce specific emotions. Now, given that a large amount of content on Facebook is already highly emotional in nature–think about all the people sharing their news of births, deaths, break-ups, etc.–it seems very hard to argue that Facebook would have been introducing new risks to its users even if it had presented some of them with more emotional content. But it’s certainly not credible to suggest that replacing 10% – 90% of emotional content with neutral content constitutes a potentially dangerous manipulation of people’s subjective experience.

Second, it’s not clear what the notion that Facebook users’ experience is being “manipulated” really even means, because the Facebook news feed is, and has always been, a completely contrived environment. I hope that people who are concerned about Facebook “manipulating” user experience in support of research realize that Facebook is constantly manipulating its users’ experience. In fact, by definition, every single change Facebook makes to the site alters the user experience, since there simply isn’t any experience to be had on Facebook that isn’t entirely constructed by Facebook. When you log onto Facebook, you’re not seeing a comprehensive list of everything your friends are doing, nor are you seeing a completely random subset of events. In the former case, you would be overwhelmed with information, and in the latter case, you’d get bored of Facebook very quickly. Instead, what you’re presented with is a carefully curated experience that is, from the outset, crafted in such a way as to create a more engaging experience (read: keeps you spending more time on the site, and coming back more often). The items you get to see are determined by a complex and ever-changing algorithm that you make only a partial contribution to (by indicating what you like, what you want hidden, etc.). It has always been this way, and it’s not clear that it could be any other way. So I don’t really understand what people mean when they sarcastically suggest–as Katy Waldman does in her Slate piece–that “Facebook reserves the right to seriously bum you out by cutting all that is positive and beautiful from your news feed”. Where does Waldman think all that positive and beautiful stuff comes from in the first place? Does she think it spontaneously grows wild in her news feed, free from the meddling and unnatural influence of Facebook engineers?

Third, if you were to construct a scale of possible motives for manipulating users’ behavior–with the global betterment of society at one end, and something really bad at the other end–I submit that conducting basic scientific research would almost certainly be much closer to the former end than would the other standard motives we find on the web–like trying to get people to click on more ads. The reality is that Facebook–and virtually every other large company with a major web presence–is constantly conducting large controlled experiments on user behavior. Data scientists and user experience researchers at Facebook, Twitter, Google, etc. routinely run dozens, hundreds, or thousands of experiments a day, all of which involve random assignment of users to different conditions. Typically, these manipulations aren’t conducted in order to test basic questions about emotional contagion; they’re conducted with the explicit goal of helping to increase revenue. In other words, if the idea that Facebook would actively try to manipulate your behavior bothers you, you should probably stop reading this right now and go close your account. You also should definitely not read this paper suggesting that a single social message on Facebook prior to the last US presidential election the may have single-handedly increased national voter turn-out by as much as 0.6%). Oh, and you should probably also stop using Google, YouTube, Yahoo, Twitter, Amazon, and pretty much every other major website–because I can assure you that, in every single case, there are people out there who get paid a good salary to… yes, manipulate your emotions and behavior! For better or worse, this is the world we live in. If you don’t like it, you can abandon the internet, or at the very least close all of your social media accounts. But the suggestion that Facebook is doing something unethical simply by publishing the results of one particular experiment among thousands–and in this case, an experiment featuring a completely innocuous design that, if anything, is probably less motivated by a profit motive than most of what Facebook does–seems kind of absurd.

Fourth, it’s worth keeping in mind that there’s nothing intrinsically evil about the idea that large corporations might be trying to manipulate your experience and behavior. Everybody you interact with–including every one of your friends, family, and colleagues–is constantly trying to manipulate your behavior in various ways. Your mother wants you to eat more broccoli; your friends want you to come get smashed with them at a bar; your boss wants you to stay at work longer and take fewer breaks. We are always trying to get other people to feel, think, and do certain things that they would not otherwise have felt, thought, or done. So the meaningful question is not whether people are trying to manipulate your experience and behavior, but whether they’re trying to manipulate you in a way that aligns with or contradicts your own best interests. The mere fact that Facebook, Google, and Amazon run experiments intended to alter your emotional experience in a revenue-increasing way is not necessarily a bad thing if in the process of making more money off you, those companies also improve your quality of life. I’m not taking a stand one way or the other, mind you, but simply pointing out that without controlled experimentation, the user experience on Facebook, Google, Twitter, etc. would probably be very, very different–and most likely less pleasant. So before we lament the perceived loss of all those “positive and beautiful” items in our Facebook news feeds, we should probably remind ourselves that Facebook’s ability to identify and display those items consistently is itself in no small part a product of its continual effort to experimentally test its offering by, yes, experimentally manipulating its users’ feelings and thoughts.

What makes the backlash on this issue particularly strange is that I’m pretty sure most people do actually realize that their experience on Facebook (and on other websites, and on TV, and in restaurants, and in museums, and pretty much everywhere else) is constantly being manipulated. I expect that most of the people who’ve been complaining about the Facebook study on Twitter are perfectly well aware that Facebook constantly alters its user experience–I mean, they even see it happen in a noticeable way once in a while, whenever Facebook introduces a new interface. Given that Facebook has over half a billion users, it’s a foregone conclusion that every tiny change Facebook makes to the news feed or any other part of its websites induces a change in millions of people’s emotions. Yet nobody seems to complain about this much–presumably because, when you put it this way, it seems kind of silly to suggest that a company whose business model is predicated on getting its users to use its product more would do anything other than try to manipulate its users into, you know, using its product more.

Why the backlash is deeply counterproductive

Now, none of this is meant to suggest that there aren’t legitimate concerns one could raise about Facebook’s more general behavior–or about the immense and growing social and political influence that social media companies like Facebook wield. One can certainly question whether it’s really fair to expect users signing up for a service like Facebook’s to read and understand user agreements containing dozens of pages of dense legalese, or whether it would make sense to introduce new regulations on companies like Facebook to ensure that they don’t acquire or exert undue influence on their users’ behavior (though personally I think that would be unenforceable and kind of silly). So I’m certainly not suggesting that we give Facebook, or any other large web company, a free pass to do as it pleases. What I am suggesting, however, is that even if your real concerns are, at bottom, about the broader social and political context Facebook operates in, using this particular study as a lightning rod for criticism of Facebook is an extremely counterproductive, and potentially very damaging, strategy.

Consider: by far the most likely outcome of the backlash Facebook is currently experiencing is that, in future, its leadership will be less likely to allow its data scientists to publish their findings in the scientific literature. Remember, Facebook is not a research institute expressly designed to further understanding of the human condition; it’s a publicly-traded corporation that exists to create wealth for its shareholders. Facebook doesn’t have to share any of its data or findings with the rest of the world if it doesn’t want to; it could comfortably hoard all of its knowledge and use it for its own ends, and no one else would ever be any wiser for it. The fact that Facebook is willing to allow its data science team to spend at least some of its time publishing basic scientific research that draws on Facebook’s unparalleled resources is something to be commended, not criticized.

There is little doubt that the present backlash will do absolutely nothing to deter Facebook from actually conducting controlled experiments on its users, because A/B testing is a central component of pretty much every major web company’s business strategy at this point–and frankly, Facebook would be crazy not to try to empirically determine how to improve user experience. What criticism of the Kramer et al article will almost certainly do is decrease the scientific community’s access to, and interaction with, one of the largest and richest sources of data on human behavior in existence. You can certainly take a dim view of Facebook as a company if you like, and you’re free to critique the way they do business to your heart’s content. But haranguing Facebook and other companies like it for publicly disclosing scientifically interesting results of experiments that it is already constantly conducting anyway–and that are directly responsible for many of the positive aspects of the user experience–is not likely to accomplish anything useful. If anything, it’ll only ensure that, going forward, all of Facebook’s societally relevant experimental research is done in the dark, where nobody outside the company can ever find out–or complain–about it.

[UPDATE July 1st: I’ve posted some additional thoughts in a second post here.]

estimating the influence of a tweet–now with 33% more causal inference!

Twitter is kind of a big deal. Not just out there in the world at large, but also in the research community, which loves the kind of structured metadata you can retrieve for every tweet. A lot of researchers rely heavily on twitter to model social networks, information propagation, persuasion, and all kinds of interesting things. For example, here’s the abstract of a nice recent paper on arXiv that aims to  predict successful memes using network and community structure:

We investigate the predictability of successful memes using their early spreading patterns in the underlying social networks. We propose and analyze a comprehensive set of features and develop an accurate model to predict future popularity of a meme given its early spreading patterns. Our paper provides the first comprehensive comparison of existing predictive frameworks. We categorize our features into three groups: influence of early adopters, community concentration, and characteristics of adoption time series. We find that features based on community structure are the most powerful predictors of future success. We also find that early popularity of a meme is not a good predictor of its future popularity, contrary to common belief. Our methods outperform other approaches, particularly in the task of detecting very popular or unpopular memes.

One limitation of much of this body of research is that the data are almost invariably observational. We can build sophisticated models that do a good job predicting some future outcome (like meme success), but we don’t necessarily know that the “important” features we identify carry any causal influence. In principle, they could be completely epiphenomenal–for example, in the study I linked to, maybe the community structure features are just a proxy for some other, causally important, factor (e.g., whether the content of a meme has sufficiently broad appeal to attract attention from many different kinds of people). From a predictive standpoint, this may not matter much; if your goal is just to passively predict whether a meme is going to be successful or not, it’s irrelevant whether or not the features you’re using are doing causal work. On the other hand, if you want to actively design memes in such a way as to maximize their spread, the ability to get a handle on causation starts to look pretty important.

How can we estimate the direct causal influence of a tweet on the downstream popularity of a meme? Here’s a simple and (I suspect) very feasible idea in two steps:

  1. Create a small web app that allows any existing Twitter user to register via Twitter authentication. On signing up, a user has to specify just one (optional) setting: the proportion of their intended retweets they’re willing to withhold. Let’s this the Withholding Fraction (WF).
  2. Every time (or at least some of the time) a registered user wants to retweet a particular tweet*, they do so via the new web app’s interface (which has permission to post to the user’s Twitter account) instead of whatever interface they’re currently using. The key is that the retweet isn’t just obediently passed along; instead, the target tweet is retweeted successfully with probability (1 – WF), and randomly suppressed from the user’s stream with probability (WF).

Doing this  would allow the community to very quickly (assuming rapid adoption, which seems reasonably likely) build up an enormous database of tweets that were targeted for retweeting by an active user, but randomly assigned to fail with some known probability. Researchers would then be able to directly quantify the causal impact of individual retweets on downstream popularity–and to estimate that influence conditional on all of the other standard variables, like the retweeter’s number of followers, the content of the tweet, etc. Of course, this still wouldn’t get us to true experimental manipulation of such features (i.e., we wouldn’t be manipulating users’ follower networks, just randomly omitting tweets from users with different followers), but it seems like a step in the right direction**.

I figure building a barebones app like this would take an experienced developer familiar with the Twitter OAuth API just a day or two. And I suspect many people (myself included!) would be happy to contribute to this kind of experiment, provided that all of the resulting data were made public. (I’m aware that there are all kinds of restrictions on sharing assembled Twitter datasets, but we’re not talking about sharing firehose dumps here, just a restricted set of retweets from users who’ve explicitly given their consent to have the data used in this way.)

Has this kind of thing already been done? If not, does anyone want to build it?

 

* It doesn’t just have to be retweets, of course; the same principle would work just as well for withholding a random fraction of original tweets. But I suspect not many users would be willing to randomly eliminate a proportion of their original content from the firehose.

** If we really wanted to get close to true random assignment, we could potentially inject selected tweets into random users streams based on selected criteria. But I’m not sure how many tweeps would consent to have entirely random retweets published in their name (I probably wouldn’t), so this probably isn’t viable.

The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch

Over the past two years, my scientific computing toolbox been steadily homogenizing. Around 2010 or 2011, my toolbox looked something like this:

  • Ruby for text processing and miscellaneous scripting;
  • Ruby on Rails/JavaScript for web development;
  • Python/Numpy (mostly) and MATLAB (occasionally) for numerical computing;
  • MATLAB for neuroimaging data analysis;
  • R for statistical analysis;
  • R for plotting and visualization;
  • Occasional excursions into other languages/environments for other stuff.

In 2013, my toolbox looks like this:

  • Python for text processing and miscellaneous scripting;
  • Ruby on Rails/JavaScript for web development, except for an occasional date with Django or Flask (Python frameworks);
  • Python (NumPy/SciPy) for numerical computing;
  • Python (Neurosynth, NiPy etc.) for neuroimaging data analysis;
  • Python (NumPy/SciPy/pandas/statsmodels) for statistical analysis;
  • Python (MatPlotLib) for plotting and visualization, except for web-based visualizations (JavaScript/d3.js);
  • Python (scikit-learn) for machine learning;
  • Excursions into other languages have dropped markedly.

You may notice a theme here.

The increasing homogenization (Pythonification?) of the tools I use on a regular basis primarily reflects the spectacular recent growth of the Python ecosystem. A few years ago, you couldn’t really do statistics in Python unless you wanted to spend most of your time pulling your hair out and wishing Python were more like R (which, is a pretty remarkable confession considering what R is like). Neuroimaging data could be analyzed in SPM (MATLAB-based), FSL, or a variety of other packages, but there was no viable full-featured, free, open-source Python alternative. Packages for machine learning, natural language processing, web application development, were only just starting to emerge.

These days, tools for almost every aspect of scientific computing are readily available in Python. And in a growing number of cases, they’re eating the competition’s lunch.

Take R, for example. R’s out-of-the-box performance with out-of-memory datasets has long been recognized as its achilles heel (yes, I’m aware you can get around that if you’re willing to invest the time–but not many scientists have the time). But even people who hated the way R chokes on large datasets, and its general clunkiness as a language, often couldn’t help running back to R as soon as any kind of serious data manipulation was required. You could always laboriously write code in Python or some other high-level language to pivot, aggregate, reshape, and otherwise pulverize your data, but why would you want to? The beauty of packages like plyr in R was that you could, in a matter of 2 – 3 lines of code, perform enormously powerful operations that could take hours to duplicate in other languages. The downside was the intensive learning curve associated with learning each package’s often quite complicated API (e.g., ggplot2 is incredibly expressive, but every time I stop using ggplot2 for 3 months, I have to completely re-learn it), and having to contend with R’s general awkwardness. But still, on the whole, it was clearly worth it.

Flash forward to The Now. Last week, someone asked me for some simulation code I’d written in R a couple of years ago. As I was firing up R Studio to dig around for it, I realized that I hadn’t actually fired up R studio for a very long time prior to that moment–probably not in about 6 months. The combination of NumPy/SciPy, MatPlotLib, pandas and statmodels had effectively replaced R for me, and I hadn’t even noticed. At some point I just stopped dropping out of Python and into R whenever I had to do the “real” data analysis. Instead, I just started importing pandas and statsmodels into my code. The same goes for machine learning (scikit-learn), natural language processing (nltk), document parsing (BeautifulSoup), and many other things I used to do outside Python.

It turns out that the benefits of doing all of your development and analysis in one language are quite substantial. For one thing, when you can do everything in the same language, you don’t have to suffer the constant cognitive switch costs of reminding yourself say, that Ruby uses blocks instead of comprehensions, or that you need to call len(array) instead of array.length to get the size of an array in Python; you can just keep solving the problem you’re trying to solve with as little cognitive overhead as possible. Also, you no longer need to worry about interfacing between different languages used for different parts of a project. Nothing is more annoying than parsing some text data in Python, finally getting it into the format you want internally, and then realizing you have to write it out to disk in a different format so that you can hand it off to R or MATLAB for some other set of analyses*. In isolation, this kind of thing is not a big deal. It doesn’t take very long to write out a CSV or JSON file from Python and then read it into R. But it does add up. It makes integrated development more complicated, because you end up with more code scattered around your drive in more locations (well, at least if you have my organizational skills). It means you spend a non-negligible portion of your “analysis” time writing trivial little wrappers for all that interface stuff, instead of thinking deeply about how to actually transform and manipulate your data. And it means that your beautiful analytics code is marred by all sorts of ugly open() and read() I/O calls. All of this overhead vanishes as soon as you move to a single language.

Convenience aside, another thing that’s impressive about the Python scientific computing ecosystem is that a surprising number of Python-based tools are now best-in-class (or close to it) in terms of scope and ease of use–and, in virtue of C bindings, often even in terms of performance. It’s hard to imagine an easier-to-use machine learning package than scikit-learn, even before you factor in the breadth of implemented algorithms, excellent documentation, and outstanding performance. Similarly, I haven’t missed any of the data manipulation functionality in R since I switched to pandas. Actually, I’ve discovered many new tricks in pandas I didn’t know in R (some of which I’ll describe in an upcoming post). Considering that pandas considerably outperforms R for many common operations, the reasons for me to switch back to R or other tools–even occasionally–have dwindled.

Mind you, I don’t mean to imply that Python can now do everything anyone could ever do in other languages. That’s obviously not true. For instance, there are currently no viable replacements for many of the thousands of statistical packages users have contributed to R (if there’s a good analog for lme4 in Python, I’d love to know about it). In signal processing, I gather that many people are wedded to various MATLAB toolboxes and packages that don’t have good analogs within the Python ecosystem. And for people who need serious performance and work with very, very large datasets, there’s often still no substitute for writing highly optimized code in a low-level compiled language. So, clearly, what I’m saying here won’t apply to everyone. But I suspect it applies to the majority of scientists.

Speaking only for myself, I’ve now arrived at the point where around 90 – 95% of what I do can be done comfortably in Python. So the major consideration for me, when determining what language to use for a new project, has shifted from what’s the best tool for the job that I’m willing to learn and/or tolerate using? to is there really no way to do this in Python? By and large, this mentality is a good thing, though I won’t deny that it occasionally has its downsides. For example, back when I did most of my data analysis in R, I would frequently play around with random statistics packages just to see what they did. I don’t do that much any more, because the pain of having to refresh my R knowledge and deal with that thing again usually outweighs the perceived benefits of aimless statistical exploration. Conversely, sometimes I end up using Python packages that I don’t like quite as much as comparable packages in other languages, simply for the sake of preserving language purity. For example, I prefer Rails’ ActiveRecord ORM to the much more explicit SQLAlchemy ORM for Python–but I don’t prefer to it enough to justify mixing Ruby and Python objects in the same application. So, clearly, there are costs. But they’re pretty small costs, and for me personally, the scales have now clearly tipped in favor of using Python for almost everything. I know many other researchers who’ve had the same experience, and I don’t think it’s entirely unfair to suggest that, at this point, Python has become the de facto language of scientific computing in many domains. If you’re reading this and haven’t had much prior exposure to Python, now’s a great time to come on board!

Postscript: In the period of time between starting this post and finishing it (two sessions spread about two weeks apart), I discovered not one but two new Python-based packages for data visualization: Michael Waskom’s seaborn package–which provides very high-level wrappers for complex plots, with a beautiful ggplot2-like aesthetic–and Continuum Analytics’ bokeh, which looks like a potential game-changer for web-based visualization**. At the rate the Python ecosystem is moving, there’s a non-zero chance that by the time you read this, I’ll be using some new Python package that directly transliterates my thoughts into analytics code.

 

* I’m aware that there are various interfaces between Python, R, etc. that allow you to internally pass objects between these languages. My experience with these has not been overwhelmingly positive, and in any case they still introduce all the overhead of writing extra lines of code and having to deal with multiple languages.

** Yes, you heard right: web-based visualization in Python. Bokeh generates static JavaScript and JSON for you from Python code, so  your users are magically able to interact with your plots on a webpage without you having to write a single line of native JS code.