“Open Source, Open Science” Meeting Report – March 2015

[The report below was collectively authored by participants at the Open Source, Open Science meeting, and has been cross-posted in other places.]

On March 19th and 20th, the Center for Open Science hosted a small meeting in Charlottesville, VA, convened by COS and co-organized by Kaitlin Thaney (Mozilla Science Lab) and Titus Brown (UC Davis). People working across the open science ecosystem attended, including publishers, infrastructure non-profits, public policy experts, community builders, and academics.
Open Science has emerged into the mainstream, primarily due to concerted efforts from various individuals, institutions, and initiatives. This small, focused gathering brought together several of those community leaders. The purpose of the meeting was to define common goals, discuss common challenges, and coordinate on common efforts.

We had good discussions about several issues at the intersection of technology and social hacking including badging, improving standards for scientific APIs, and developing shared infrastructure. We also talked about coordination challenges due to the rapid growth of the open science community. At least three collaborative projects emerged from the meeting as concrete outcomes to combat the coordination challenges.

A repeated theme was how to make the value proposition of open science more explicit. Why should scientists become more open, and why should institutions and funders support open science? We agreed that incentives in science are misaligned with practices, and we identified particular pain points and opportunities to nudge incentives. We focused on providing information about the benefits of open science to researchers, funders, and administrators, and emphasized reasons aligned with each stakeholders’ interests. We also discussed industry interest in “open”, both in making good use of open data, and also in participating in the open ecosystem. One of the collaborative projects emerging from the meeting is a paper or papers to answer the question “Why go open?” for researchers.

Many groups are providing training for tools, statistics, or workflows that could improve openness and reproducibility. We discussed methods of coordinating training activities, such as a training “decision tree” defining potential entry points and next steps for researchers. For example, Center for Open Science offers statistics consulting, rOpenSci offers training on tools, and Software Carpentry, Data Carpentry, and Mozilla Science Lab offer training on workflows. A federation of training services could be mutually reinforcing and bolster collective effectiveness, and facilitate sustainable funding models.

The challenge of supporting training efforts was linked to the larger challenge of funding the so-called “glue” – the technical infrastructure that is only noticed when it fails to function. One such collaboration is the SHARE project, a partnership between the Association of Research Libraries, its academic association partners, and the Center for Open Science. There is little glory in training and infrastructure, but both are essential elements for providing knowledge to enable change, and tools to enact change.

Another repeated theme was the “open science bubble”. Many participants felt that they were failing to reach people outside of the open science community. Training in data science and software development was recognized as one way to introduce people to open science. For example, data integration and techniques for reproducible computational analysis naturally connect to discussions of data availability and open source. Re-branding was also discussed as a solution – rather than “post preprints!”, say “get more citations!” Another important realization was that researchers who engage with open practices need not, and indeed may not want to, self-identify as “open scientists” per se. The identity and behavior need not be the same.

A number of concrete actions and collaborative activities emerged at the end, including a more coordinated effort around badging, collaboration on API connections between services and producing an article on best practices for scientific APIs, and the writing of an opinion paper outlining the value proposition of open science for researchers. While several proposals were advanced for “next meetings” such as hackathons, no decision has yet been reached. But, a more important decision was clear – the open science community is emerging, strong, and ready to work in concert to help the daily scientific practice live up to core scientific values.

Authors
[Authors are listed in reverse alphabetical order; order does not denote relative contribution.]

  1. Tal Yarkoni, University of Texas at Austin
  2. Kara Woo, NCEAS
  3. Andrew Updegrove, Gesmer Updegrove and ConsortiumInfo.org
  4. Kaitlin Thaney, Mozilla Science Lab
  5. Jeffrey Spies, Center for Open Science
  6. Courtney Soderberg, Center for Open Science
  7. Elliott Shore, Association of Research Libraries
  8. Andrew Sallans, Center for Open Science
  9. Karthik Ram, rOpenSci and Berkeley Institute for Data Science
  10. Min Ragan-Kelley, IPython and UC Berkeley
  11. Brian Nosek, Center for Open Science and University of Virginia
  12. Erin C, McKiernan, Wilfrid Laurier University
  13. Jennifer Lin, PLOS
  14. Amye Kenall, BioMed Central
  15. Mark Hahnel, figshare
  16. C. Titus Brown, UC Davis
  17. Sara D. Bowman, Center for Open Science

the Neurosynth viewer goes modular and open source

If you’ve visited the Neurosynth website lately, you may have noticed that it looks… the same way it’s always looked. It hasn’t really changed in the last ~20 months, despite the vague promise on the front page that in the next few months, we’re going to do X, Y, Z to improve the functionality. The lack of updates is not by design; it’s because until recently I didn’t have much time to work on Neurosynth. Now that much of my time is committed to the project, things are moving ahead pretty nicely, though the changes behind the scenes aren’t reflected in any user-end improvements yet.

The github repo is now regularly updated and even gets the occasional contribution from someone other than myself; I expect that to ramp up considerably in the coming months. You can already use the code to run your own automated meta-analyses fairly easily; e.g., with everything set up right (follow the Readme and examples in the repo), the following lines of code:

dataset = cPickle.load(open('dataset.pkl', 'rb'))
studies = get_ids_by_expression("memory* &~ ("wm|working|episod*"), threshold=0.001)
ma = meta.MetaAnalysis(dataset, studies)
ma.save_results('memory')

…will perform an automated meta-analysis of all studies in the Neurosynth database that use the term ‘memory’ at a frequency of 1 in 1,000 words or greater, but don’t use the terms wm or working, or words that start with ‘episod’ (e.g., episodic). You can perform queries that nest to arbitrary depths, so it’s a pretty powerful engine for quickly generating customized meta-analyses, subject to all of the usual caveats surrounding Neurosynth (i.e., that the underlying data are very noisy, that terms aren’t mental states, etc.).

Anyway, with the core tools coming along, I’ve started to turn back to other elements of the project, starting with the image viewer. Yesterday I pushed the first commit of a new version of the viewer that’s currently on the Neurosynth website. In the next few weeks, this new version will be replacing the current version of the viewer, along with a bunch of other changes to the website.

A live demo of the new viewer is available here. It’s not much to look at right now, but behind the scenes, it’s actually a huge improvement on the old viewer in a number of ways:

  • The code is completely refactored and is all nice and object-oriented now. It’s also in CoffeeScript, which is an alternative and (if you’re coming from a Python or Ruby background) much more readable syntax for JavaScript. The source code is on github and contributions are very much encouraged. Like most scientists, I’m generally loathe to share my code publicly because I think it sucks most of the time. But I actually feel pretty good about this code. It’s not good code by any stretch, but I think it rises to the level of ‘mostly sensible’, which is about as much as I can hope for.
  • The viewer now handles multiple layers simultaneously, with the ability to hide and show layers, reorder them by dragging, vary the transparency, assign different color palettes, etc. These features have been staples of offline viewers pretty much since the prehistoric beginnings of fMRI time, but they aren’t available in the current Neurosynth viewer or most other online viewers I’m aware of, so this is a nice addition.
  • The architecture is modular, so that it should be quite easy in future to drop in other alternative views onto the data without having to muck about with the app logic. E.g., adding a 3D WebGL-based view to complement the current 2D slice-based HTML5 canvas approach is on the near-term agenda.
  • The resolution of the viewer is now higher–up from 4 mm to 2 mm (which is the most common native resolution used in packages like SPM and FSL). The original motivation for downsampling to 4 mm in the prior viewer was to keep filesize to a minimum and speed up the initial loading of images. But at some point I realized, hey, we’re living in the 21st century; people have fast internet connections now. So now the files are all in 2 mm resolution, which has the unpleasant effect of increasing file sizes by a factor of about 8, but also has the pleasant effect of making it so that you can actually tell what the hell you’re looking at.

Most importantly, there’s now a clean, and near-complete, separation between the HTML/CSS content and the JavaScript code. Which means that you can now effectively drop the viewer into just about any HTML page with just a few lines of code. So in theory, you can have basically the same viewer you see in the demo just by sticking something like the following into your page:

 viewer = Viewer.get('#layer_list', '.layer_settings')
 viewer.addView('#view_axial', 2);
 viewer.addView('#view_coronal', 1);
 viewer.addView('#view_sagittal', 0);
 viewer.addSlider('opacity', '.slider#opacity', 'horizontal', 'false', 0, 1, 1, 0.05);
 viewer.addSlider('pos-threshold', '.slider#pos-threshold', 'horizontal', 'false', 0, 1, 0, 0.01);
 viewer.addSlider('neg-threshold', '.slider#neg-threshold', 'horizontal', 'false', 0, 1, 0, 0.01);
 viewer.addColorSelect('#color_palette');
 viewer.addDataField('voxelValue', '#data_current_value')
 viewer.addDataField('currentCoords', '#data_current_coords')
 viewer.loadImageFromJSON('data/MNI152.json', 'MNI152 2mm', 'gray')
 viewer.loadImageFromJSON('data/emotion_meta.json', 'emotion meta-analysis', 'bright lights')
 viewer.loadImageFromJSON('data/language_meta.json', 'language meta-analysis', 'hot and cold')
 viewer.paint()

Well, okay, there are some other dependencies and styling stuff you’re not seeing. But all of that stuff is included in the example folder here. And of course, you can modify any of the HTML/CSS you see in the example; the whole point is that you can now easily style the viewer however you want it, without having to worry about any of the app logic.

What’s also nice about this is that you can easily pick and choose which of the viewer’s features you want to include in your page; nothing will (or at least, should) break no matter what you do. So, for example, you could decide you only want to display a single view showing only axial slices; or to allow users to manipulate the threshold of layers but not their opacity; or to show the current position of the crosshairs but not the corresponding voxel value; and so on. All you have to do is include or exclude the various addSlider() and addData() lines you see above.

Of course, it wouldn’t be a mediocre open source project if it didn’t have some important limitations I’ve been hiding from you until near the very end of this post (hoping, of course, that you wouldn’t bother to read this far down). The biggest limitation is that the viewer expects images to be in JSON format rather than a binary format like NIFTI or Analyze. This is a temporary headache until I or someone else can find the time and motivation to adapt one of the JavaScript NIFTI readers that are already out there (e.g., Satra Ghosh‘s parser for xtk), but for now, if you want to load your own images, you’re going to have to take the extra step of first converting them to JSON. Fortunately, the core Neurosynth Python package has a img_to_json() method in the imageutils module that will read in a NIFTI or Analyze volume and produce a JSON string in the expected format. Although I’m pretty sure it doesn’t handle orientation properly for some images, so don’t be surprised if your images look wonky. (And more importantly, if you fix the orientation issue, please commit your changes to the repo.)

In any case, as long as you’re comfortable with a bit of HTML/CSS/JavaScript hacking, the example/ folder in the github repo has everything you need to drop the viewer into your own pages. If you do use this code internally, please let me know! Partly for my own edification, but mostly because when I write my annual progress reports to the NIH, it’s nice to be able to truthfully say, “hey, look, people are actually using this neat thing we built with taxpayer money.”