January 7, 2016

Open Research Data?

This Blogpost, aiming at compiling a definition of Open Research Data, highlighting the implactions Open Notebook Science has on scientific practices and discussing the issues on how reputation regimes in academia for the moment are unable to keep track with developments in research practices, has been started in the context of the University of Seminar Open Science, the better science? with Katja Mayer. As such one of the characteristics of the text is the fulfillment of the formal and contentual requirements of the course. Second the text also aims at being an experiment in synchronous reviewing. Thus it has initially been published in an early stage of process and will be completed and edited. This is why I would like to invite you to openly comment on the text and start an open synchronous review experiment here.


What actually is Open Research Data? An open definition

Open Research Data, i.e. the sharing of data that has been produced in scientific practices with the purpose of informing a research outcome published in an academic journal, has been introduced since the mid-20th century and has been heavily discussed in and around academia since the 1990‘s, with the rise of the World Wide Web, and often is described as bearing in itself huge potentialities of fostering innovation processes in science and society. Hence first it is necessary that we turn our attention towards what ought to be so special about the sharing of research data.
Data in this context cannot be specified as being of any distinctive type, it can be quantitative (e.g. raw and processed measurements, statistical data,…) as well as qualitative materials (e.g. protocols, field notes, transcripts, narratives) or audio-visual content. Hence it is the ways in which data is generated, distributed and (re-)used that is differing Open Science from more traditional models of scientific deliberation that do not actively adopt practices of public disclosure of research data.

Access, Share, (Re-)Use

First it is important to highlight that all of those discussions and initiatives revolving around Open Research Data have to be understood as part of the Open Science movement. So sharing here means less sharing of specific bits of data in between of individuals or within teams, but providing Open Access to digitally stored research data through Information and Communications Technology (ICT) and enabling the re-use of research data for scientific, societal but also commercial purposes. The guiding principle of sharing research data not only with peers but openly with the world is that ‘publicly funded research data should be openly available to the maximum extent possible’ (Arzberger, Schroeder, Beaulieu, Bowker, Casey, & Laaksonen, 2004, p. 136). And this is what makes sharing of research data in the context of Open Data (see also http://linkeddata.org/) so different from forms of consensual sharing of bits of data in between of individuals or research teams. The data digitally made accessible and available free of supplementary monetary charge may not only be used in its original scientific intention but may be (re-)used without the explicit consent of its generator(-s). What at first sight seems to be quite straightforward as a principle and fostering equality and transparency in research turns out to be much more complex as soon as we leave the idealist definatory scenery and turn our attention towards actual practices of sharing and re-using research data.   

11 operating principle

Back in 2004 Arzberger et al. have formulated 11 operating principles characterizing Open Research Data (Arzberger, Schroeder, Beaulieu, Bowker, Casey, Laaksonen, et al., 2004) which I would like to briefly take up here and comment.

1.    Openness
As already stated above, data issuing from publicly funded research should be accessible free of cost as a resource that (ideally) can be fully downloaded from the WWW. This free of cost as a matter of course assumes that potential (re-)users have equal access to ICT, which has to be considered as the only, but still all too often limiting, financial gate keeper.

2.    Transparency of access and active dissemination
Data, just as any kind of content on the Web, needs to be stored as such that it is findable and as a second prerequisite needs to be actively promoted in order to be found. Hence we have to assume that those researchers or institutions that are most prolific in promoting themselves and strategically disseminating the data they produce will be most visible and most (re-)used. Thus it will be interesting in future to observe on a larger scale if there are channels of dissemination that are more promising in terms of proportions of reuse or if reuse can be correlated with institutional and/or individual reputation of original data generators.

3.    Assignment and assumption of formal responsibility
As in any domain of our western social coexistence Open Science sharing practices are strongly relying on formal roles and responsibilities, defined by and through the processes of academic and non-academic knowledge production an data generation, in order to be functional.

4.    Professionalism
Best practice models and codes of conduct when sharing and (re-)using research data still have to be established internationally and nationally. This can be very well be observed through the multitude of (mainly national) Open Science initiatives aiming at formalizing Open Research Data practices that have arisen over the past years.

5.    Interoperability
In order to make data accessible technical and software protocols need to be stablished and made transparent. This is one of the biggest problems within the Open Research Data movement as often research data needs to be run on proprietary software or is of such scale that it prerequisites specific large scale ICT facilities (cf. CERN’s practice of disseminating research data is at the same time an excellent and somewhat perverted example if we have a look at the first ever published web page published to the WWW by Tim Berners-Lee. 

6.    Quality
Quality of data maybe is the most difficult characteristic to pinpoint. How can we qualify data as being of higher or lower quality beyond the verification of formal criteria of description of (meta-)data and protocol standards? To my very personal quality of data is at least as hard to grasp and determine as it might be the case for esthetic qualities of data. The questions is, do we need to define what ultimately good data is or will it be sufficient to assess whether data collection processes have been appropriate for the contexts in which data has been used or is projected to be used? If these types of questions are able to tackle some issues of reproducibility and verification/falsification within research practices, what is then the situation for the re-use of data? Issues I see here are differing data standards throughout disciplines and research fields leading to questions of the type of: how can misinformed (re-)uses of data in between of disciplines be prevented and who/what is to blame for misuse – data generators, poor description and documentation of data standards, investigators re-using original research data, quality of data? 

7.    Operational Efficiency
Operational efficiency probably is the characteristic that is most prominently positioned in many of contemporary discussions revolving around Open Data − i.e. the underlying rationale quite straightforwardly is optimization of returns (in terms of data, research outcomes, etc.) by unit of capital expenditure. Hence, it is less connected to research practices than to economics and (re-)monetization of science funding. Which is not objectionable per se but if we consider the rhetoric of enhancing democracy and equality – in science and society – through open data initiatives and worldwide free of cost access to research outputs and data (e.g. Fecher & Friesike, 2014) leaves us behind with a bitter 

8.    Flexibility
The notion of flexibility, as introduced by Arzberger et al. does not refer to flexibility of data definitions or mixed usage of data through research filedds as we could assume but reflects much more down to earth issues of data management and data access regimes. ICT since a few decades enables quick and relatively flawless exchange of bits of information in between of locally disconnected researchers (and people in general). Thus, we have to assume that researchers will be much more likely to share data and collaborate with peers from their specific research fields than from their institution or national country. Hence, the notion of flexibility refers here to flexible modes and rules of electronic data exchange that are neither limited to national nor institutional barriers, which stands in a compelling ambiguity with the construction undertaken by Arzberger et al. that national governments should take the lead in promoting Open Data initiatives.

9.  Property
Whom should research data generated in publicly funded research activities belong to? Is it the individual researchers, is it the involved research institutions, is it the public or is it even individuals under investigation for the case of medicinal trials and data gathered in psychology, social sciences or the humanities? Taking Open Science principles for serious it obviously has to be the public. But, how then can we share collaborative ownership? Have the investigators concerned with gathering of data a relatively higher share of ownership in data or shall those taxpayers that pay most have the largest shares? These questions may seem relatively blunt if standing aside from issues of legality, accountability and distribution of reputation related to the production and (re-)distribution of research data but if put into these contexts infinitely gain in relevance and importance.  

10.  Legality
Just as it is the case for scholarly publishing once research data is intended to be published and re-used questions around legality arise. These do not only comprise in issues of intellectual property and rights of re-use of published data but have to deal with ethical issues and privacy rights with regard to data on individuals or groups (i.e. medicinal trials, narrative and survey data gained for Social Sciences purposes) as well as concerns of national security or protection of trade secrets. Legal restrictions related to these issues should for sure be taken serious by researchers even if they may withhold academics from making their research data openly available.

11.  Accountability
At least as I am concerned the principle stipulated here is heavily intertwined with operational efficiency (see above).In a nutshell accountability in academia can be broken
down to practices of making scientific practices measureable and countable (e.g. Marilyn Strathern, 2000). Hence academic activities are translated into figurs here that may be computed against the figures for peers around the world on the one hand, and against expenditures on the other.

How is Data sharing related to reward system(-s) in academia?

In order to introduce some notions and rationales that underly Open scientific cultures let me introduce the concepts of democracy that Fecher & Friesike (2014, pp. 25–32) describe as incentives for and in Open Science. The main objective of the democratic school of Open Science lies in making research outputs better and more equally available to the publics and academia itself, which stands in opposition but not in contradiction to the public school of Open Science as delineated by Fecher and Friesike (2014, pp. 19–25), which is rather concerned with opening research processes to the publics than with the presentation of scientific achievements to them. As such it is not surprising, that the major tools of the democratic school of Open Science are located in the domain of scientific communication, documentation and publication, well known and much discussed under the umbrella terms of Open Access (publishing) and Open Data (initiatives). Whereas the guiding concept of openness informing the democratic school of Open Science is a seemingly straightforward, policy driven process aiming at making scientific outcomes better and more equitably available – through the proxies of making scientific achievement commonly available and usable for the case of academic publishing and re-usable for the case of Open Data initiatives – which has to be understood as enhancing our prior experience of scientific knowledge production, we have to scrutinize the rationales and motivations that actually underlie this concept of openness in contemporary academia in order to gain a more levelled and detailed picture. According to Fecher and Friesike (2014, pp. 29-32) Open Access publishing, i.e. persistent equal access to scientific output, can be regarded as a driver for development in science, especially for the case of those areas of the world that we, in our western view on the realms of contemporary academia, deem to be underdeveloped. Second many commentators on Open Science are propagandizing Open Access as a convenient antagonist to classic subscription based publishing models. Second we have to turn our attention towards Open Data initiatives, which according to Fecher and Friesike (2014, pp. 26-29; see also Murray-Rust, 2008) is guided by a research(-er) driven rationale, meaning that access or non-access to research data – to be (re-)used in research contexts different to their initially designed use – lies in the interest of the concerned researcher(-s) rather than in the interest of publishing houses or commercial and non-commercial data providers. Even if we may deem the opportunities that sharing of research data – along with providing access to research outcomes – may have for and in research practices as being limited especially in the SSH (Fecher & Friesike, 2014, p. 26; Fink, 2000) we have to agree with Nowotny, Scott and Gibbons (2008) that democratic rationales for and in Open Science are tightly bound to actual research practices and intertwined with rationales of propagating Open Science – e.g. metrics; infrastructural debates.

So what are the incentives to actively share original research data? Altruism, Mertonian ethos of science, reproducibility of results, academic reputation?
Whereas practices in the makings of science have and still are about to alter in favor of more open practices this is far less the case for the regimes of academic qualification and reward distribution. Hence we have to thoroughly scrutinize on how Open Science and the sharing of research data are connected to modes of redistribution of academic qualification, status and prestige.
In order to enable an active collaborative discussion on issues related to attribution of individual credit in regimes of academic reputation vs. Open practices in science let me propose a game to you.  

Open Methods / Open Lab Notebooks

It should be self-evident that when speaking of Open Research Data we should not limit ourselves to publishing final and processed data, but that the actual generation of data and the documentation of research protocols should also be publicly accessible for reuse. The best known and most adopted strategy for doing so is Open Notebook Science.
More to come…


Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., & Laaksonen, L. (2004). Promoting Access to            Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, 3(29 November 2004),    135–152. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=

Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., … Wouters, P. (2004). Science and  government. An international framework to promote access to data. Science (New York, N.Y.), 303(5665), 1777–8.  doi:10.1126/science.1095958

Fecher, B., & Friesike, S. (2014). Open Science. One term, five schools of thought. In Opening Science. The Evolving  Guide on How the Web is Changing Research, Collaboration and Scholarly Publishing (pp. 213–224).  doi:10.1007/978-3-319-00026-8

Fecher, B., Friesike, S., Hebing, M., Linek, S., & Sauermann, A. (2015). A Reputation Economy: Results from an  Empirical Survey on Academic Data Sharing. SSRN Electronic Journal. doi:10.2139/ssrn.2568693

Murray-Rust, P. (2008). Open Data in Science. Nature Precedings.

Nowotny, H., Scott, P., & Gibbons, M. (2003). Introduction: `Mode 2’ Revisited: The New Production of Knowledge.  Minerva, 41(3), 179–194. http://doi.org/10.1023/A:1025505528250

Strathern, M. (2003). Audit Cultures: Anthropological Studies in Accountability, Ethics and the Academy. Routledge.


  1. Was this classification of issues taken up by the OECD?

    1. Im an not to sure what you mean by classification of issues?

    2. I am not sure, but the commenter might refer to this document: http://www.oecd.org/sti/sci-tech/oecdprinciplesandguidelinesforaccesstoresearchdatafrompublicfunding.htm

    3. Yes, I think it was a sort of preparatory work for OECD's principles and guidelines for data sharing from the same year (cited above)

  2. ‘publicly funded research data should be openly available to the maximum extent possible’ That is the point that matters and if taken seriously could be a good start in creating a new way to look at it as common achievements.

  3. Steve, you might also want and use parts of your assignments for this blog post, e.g. from the first assignment on the thought-schools...

  4. Good points Steve. For me the reward structure is the biggest issue to be dealt with. Humans can respond very powerfully to incentives, and I think some degree of recognition will be vital. Most people don't only work for the greater good, after all.

    1. exactly, if you don't get credit so why to invest your time and resources? to produce another citable publication definitely pays off much more

    2. Agreed! Thus we will have to scrutinize in future whether the ways in which credit is attributed in science are still suiting the ways in which we envision the Science of the future to be shaped. In my esteem a vital gap in between of Open Science policies and the ways reputation is selfreflexively redistributed in contemporary academia exists.

  5. Another point is that in academic "reputation regimes", novelty and originality are encouraged and valued much more than to ensure or prove the reproducibility of claimed results. This might be as well the reason why such initiatives like the Reproducibility project are mostly organized as a large collaboration where tasks are broken down in small chunks and would be feasible for people to contribute their time on a voluntary basis.

  6. The 11 principles give a very good overview what is important. It would be interesting to pick out the Open notebook topic and start a discussion on that. Are there some real good examples where the openness of lab notebooks was the initial to further research? Would be good to hear more about the experiences and results of this approach.

    1. thank you Walter I will look up some interesting examples and create a new post, where we can hopefully discuss on how Open Notebook Science could be a game changer

  7. Okay I will post some reflections on reward structure and issues related to the incompatibility of ways in which reputation is (re-)distributed and collaborative/open science practices soon

  8. looking forward to that! KM

  9. Thanks for the neatly arranged overview over the 11 principles of the Arzberger et al.(2004).

    I want to share here that there is one issue that always comes up when I think about open data. Unfortunately this issue is not addressed in the 11 operating principles of Arzberger et al..
    I do think that there could be a lot of problems when openly shared data will be re-used. In my opinion the re-usabily of open data is strongly connected to the way the data is edited and how neatly the context of its production is described. I admittedly do not have a lot of experience in how open data sets look like and therefore do not know in how far this kind of information is given, but I strongly believe that certain background information about data is absolutely necessary for its re-use. For that reason I wonder why this issue is not mentioned in Arzberger et al.'s operating principles. In my opinion comprehensive background information for a data set needs to be considered in operating principles for open data.

    1. Yes Bernhard I completely agree and I really should have emphasised this when reflecting notions of quality of research data. I really think that notions of quality can't be applied to data but only to its descriptors and descriptions, which should indicate what purposes the data was generated for and what was the setting in which it was produced.