This Blogpost, aiming at compiling a definition of Open Research Data, highlighting the implactions Open Notebook Science has on scientific practices and discussing the issues on how reputation regimes in academia for the moment are unable to keep track with developments in research practices, has been started in the context of the University of Seminar Open Science, the better science? with Katja Mayer. As such one of the characteristics of the text is the fulfillment of the formal and contentual requirements of the course. Second the text also aims at being an experiment in synchronous reviewing. Thus it has initially been published in an early stage of process and will be completed and edited. This is why I would like to invite you to openly comment on the text and start an open synchronous review experiment here.
What actually is Open Research Data? An open definition
Open
Research Data, i.e. the sharing of data that has been produced in
scientific practices with the purpose of informing a research outcome published
in an academic journal, has been introduced since the mid-20th
century and has been heavily discussed in and around academia since the 1990‘s,
with the rise of the World
Wide Web, and often is described as bearing in itself huge potentialities
of fostering innovation processes in science and society. Hence first it is
necessary that we turn our attention towards what ought to be so special about
the sharing of research data.
Data in this context cannot be specified as being of any
distinctive type, it can be quantitative (e.g. raw and processed measurements,
statistical data,…) as well as qualitative materials (e.g. protocols, field
notes, transcripts, narratives) or audio-visual content. Hence it is the ways
in which data is generated, distributed and (re-)used that is differing Open Science
from more traditional models of scientific deliberation that do not actively
adopt practices of public disclosure of research data.
Access, Share, (Re-)Use
First it is important to highlight that all of those discussions
and initiatives revolving around Open Research Data have to be understood as
part of the Open Science
movement. So sharing here means less sharing of specific bits of data in
between of individuals or within teams, but providing Open Access to digitally
stored research data through Information
and Communications Technology (ICT) and enabling the re-use of research
data for scientific, societal but also commercial purposes. The guiding
principle of sharing research data not only with peers but openly with the
world is that ‘publicly funded research data should be openly available to the maximum
extent possible’ (Arzberger, Schroeder, Beaulieu, Bowker, Casey, &
Laaksonen, 2004, p. 136).
And this is what makes sharing of research data in the context of Open Data (see also http://linkeddata.org/) so different from
forms of consensual sharing of bits of data in between of individuals or
research teams. The data digitally made accessible and available free of supplementary
monetary charge may not only be used in its original scientific intention but
may be (re-)used without the explicit consent of its generator(-s). What at
first sight seems to be quite straightforward as a principle and fostering
equality and transparency in research turns out to be much more complex as soon
as we leave the idealist definatory scenery and turn our attention towards
actual practices of sharing and re-using research data.
11 operating principle
Back in 2004 Arzberger et al. have formulated 11 operating principles
characterizing Open Research Data (Arzberger, Schroeder, Beaulieu, Bowker, Casey, Laaksonen,
et al., 2004)
which I would like to briefly take up here and comment.
1. Openness
As already stated above, data issuing from publicly
funded research should be accessible free of cost as a resource that (ideally)
can be fully downloaded from the WWW. This free
of cost as a matter of course assumes that potential (re-)users have equal
access to ICT, which has to be considered as the only, but still all too often
limiting, financial gate keeper.
Data, just as any kind of content on the Web, needs to be stored as such that it is findable and as a second prerequisite needs to be actively promoted in order to be found. Hence we have to assume that those researchers or institutions that are most prolific in promoting themselves and strategically disseminating the data they produce will be most visible and most (re-)used. Thus it will be interesting in future to observe on a larger scale if there are channels of dissemination that are more promising in terms of proportions of reuse or if reuse can be correlated with institutional and/or individual reputation of original data generators.
3.
Assignment and assumption of formal
responsibility
As in any domain of our western social coexistence Open Science sharing practices
are strongly relying on formal roles and responsibilities, defined by and
through the processes of academic and non-academic knowledge production an data
generation, in order to be functional.
4.
Professionalism
Best practice models and codes of conduct when sharing and (re-)using
research data still have to be established internationally and nationally. This
can be very well be observed through the multitude of (mainly national) Open
Science initiatives aiming at formalizing Open Research Data practices that
have arisen over the past years.
5.
Interoperability
In order to make data accessible technical and software protocols need to
be stablished and made transparent. This is one of the biggest problems within
the Open Research Data movement as often research data needs to be run on
proprietary software or is of such scale that it prerequisites specific large
scale ICT facilities (cf. CERN’s practice of disseminating research data is at
the same time an excellent and somewhat perverted example if we have a look at
the first ever published web page published to
the WWW by Tim
Berners-Lee.
6.
Quality
Quality of data maybe is the most difficult characteristic
to pinpoint. How can we qualify data as being of higher or lower quality beyond
the verification of formal criteria of description of (meta-)data and protocol
standards? To my very personal quality of data is at least as hard to grasp and
determine as it might be the case for esthetic qualities of data. The questions
is, do we need to define what ultimately good data is or will it be sufficient
to assess whether data collection processes have been appropriate for the
contexts in which data has been used or is projected to be used? If these types
of questions are able to tackle some issues of reproducibility and
verification/falsification within research practices, what is then the
situation for the re-use of data? Issues I see here are differing data
standards throughout disciplines and research fields leading to questions of
the type of: how can misinformed (re-)uses of data in between of disciplines be
prevented and who/what is to blame for misuse – data generators, poor
description and documentation of data standards, investigators re-using
original research data, quality of data?
7.
Operational Efficiency
Operational efficiency probably is the characteristic that
is most prominently positioned in many of contemporary discussions revolving
around Open Data − i.e. the underlying rationale quite straightforwardly is
optimization of returns (in terms of data, research outcomes, etc.) by unit of
capital expenditure. Hence, it is less connected to research practices than to
economics and (re-)monetization of science funding. Which is not objectionable
per se but if we consider the rhetoric of enhancing democracy and equality – in
science and society – through open data initiatives and worldwide free of cost
access to research outputs and data (e.g. Fecher & Friesike, 2014) leaves
us behind with a bitter
overtone.
8.
Flexibility
The notion of flexibility, as introduced by Arzberger et al.
does not refer to flexibility of data definitions or mixed usage of data
through research filedds as we could assume but reflects much more down to
earth issues of data management and data access regimes. ICT since a few
decades enables quick and relatively flawless exchange of bits of information
in between of locally disconnected researchers (and people in general). Thus,
we have to assume that researchers will be much more likely to share data and
collaborate with peers from their specific research fields than from their
institution or national country. Hence, the notion of flexibility refers here
to flexible modes and rules of electronic data exchange that are neither limited
to national nor institutional barriers, which stands in a compelling ambiguity
with the construction undertaken by Arzberger et al. that national governments
should take the lead in promoting Open Data initiatives.
9. Property
Whom should research data generated in publicly funded
research activities belong to? Is it the individual researchers, is it the
involved research institutions, is it the public or is it even individuals
under investigation for the case of medicinal trials and data gathered in psychology,
social sciences or the humanities? Taking Open Science principles for serious
it obviously has to be the public. But, how then can we share collaborative
ownership? Have the investigators concerned with gathering of data a relatively
higher share of ownership in data or shall those taxpayers that pay most have
the largest shares? These questions may seem relatively blunt if standing aside
from issues of legality, accountability and distribution of reputation related
to the production and (re-)distribution of research data but if put into these contexts infinitely gain in relevance and importance.
10. Legality
Just as it is the case for scholarly publishing once
research data is intended to be published and re-used questions around legality
arise. These do not only comprise in issues of intellectual property and rights of re-use of
published data but have to deal with ethical issues and privacy rights with
regard to data on individuals or groups (i.e. medicinal trials, narrative and
survey data gained for Social Sciences purposes) as well as concerns of
national security or protection of trade secrets. Legal restrictions related to
these issues should for sure be taken serious by researchers even if they may
withhold academics from making their research data openly available.
11. Accountability
At least
as I am concerned the principle stipulated here is heavily intertwined with operational efficiency (see above).In a nutshell accountability in academia can be broken
down
to practices of making scientific practices measureable and countable (e.g. Marilyn Strathern, 2000). Hence academic activities are translated into figurs here that may be computed against the figures for peers around the world on the one hand, and against expenditures on the other.
How is Data sharing related to reward system(-s) in academia?
In order to introduce some notions and rationales that underly Open scientific cultures let me introduce the concepts of democracy that Fecher & Friesike (2014, pp. 25–32) describe as incentives for and in Open Science. The main objective of the democratic school of Open Science lies in making research outputs better and more equally available to the publics and academia itself, which stands in opposition but not in contradiction to the public school of Open Science as delineated by Fecher and Friesike (2014, pp. 19–25), which is rather concerned with opening research processes to the publics than with the presentation of scientific achievements to them. As such it is not surprising, that the major tools of the democratic school of Open Science are located in the domain of scientific communication, documentation and publication, well known and much discussed under the umbrella terms of Open Access (publishing) and Open Data (initiatives). Whereas the guiding concept of openness informing the democratic school of Open Science is a seemingly straightforward, policy driven process aiming at making scientific outcomes better and more equitably available – through the proxies of making scientific achievement commonly available and usable for the case of academic publishing and re-usable for the case of Open Data initiatives – which has to be understood as enhancing our prior experience of scientific knowledge production, we have to scrutinize the rationales and motivations that actually underlie this concept of openness in contemporary academia in order to gain a more levelled and detailed picture. According to Fecher and Friesike (2014, pp. 29-32) Open Access publishing, i.e. persistent equal access to scientific output, can be regarded as a driver for development in science, especially for the case of those areas of the world that we, in our western view on the realms of contemporary academia, deem to be underdeveloped. Second many commentators on Open Science are propagandizing Open Access as a convenient antagonist to classic subscription based publishing models. Second we have to turn our attention towards Open Data initiatives, which according to Fecher and Friesike (2014, pp. 26-29; see also Murray-Rust, 2008) is guided by a research(-er) driven rationale, meaning that access or non-access to research data – to be (re-)used in research contexts different to their initially designed use – lies in the interest of the concerned researcher(-s) rather than in the interest of publishing houses or commercial and non-commercial data providers. Even if we may deem the opportunities that sharing of research data – along with providing access to research outcomes – may have for and in research practices as being limited especially in the SSH (Fecher & Friesike, 2014, p. 26; Fink, 2000) we have to agree with Nowotny, Scott and Gibbons (2008) that democratic rationales for and in Open Science are tightly bound to actual research practices and intertwined with rationales of propagating Open Science – e.g. metrics; infrastructural debates.
So what are the incentives to actively share original research data? Altruism, Mertonian ethos of science, reproducibility of results, academic reputation?
Whereas practices in the makings of science have and still are about to alter in favor of more open practices this is far less the case for the regimes of academic qualification and reward distribution. Hence we have to thoroughly scrutinize on how Open Science and the sharing of research data are connected to modes of redistribution of academic qualification, status and prestige.
In order to enable an active collaborative discussion on issues related to attribution of individual credit in regimes of academic reputation vs. Open practices in science let me propose a game to you.
Open Methods / Open Lab Notebooks
It should be self-evident that when speaking of Open Research Data we should not limit ourselves to publishing final and processed data, but that the actual generation of data and the documentation of research protocols should also be publicly accessible for reuse. The best known and most adopted strategy for doing so is Open Notebook Science.
More to come…
Literature
Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., & Laaksonen, L. (2004). Promoting Access to Public Research Data for Scientific, Economic, and Social Development. Data Science Journal, 3(29 November 2004), 135–152. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.9016&rep=rep1&type=pdf
Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., … Wouters, P. (2004). Science and government. An international framework to promote access to data. Science (New York, N.Y.), 303(5665), 1777–8. doi:10.1126/science.1095958
Fecher, B., & Friesike, S. (2014). Open Science. One term, five schools of thought. In Opening Science. The Evolving Guide on How the Web is Changing Research, Collaboration and Scholarly Publishing (pp. 213–224). doi:10.1007/978-3-319-00026-8
Fecher, B., Friesike, S., Hebing, M., Linek, S., & Sauermann, A. (2015). A Reputation Economy: Results from an Empirical Survey on Academic Data Sharing. SSRN Electronic Journal. doi:10.2139/ssrn.2568693
Murray-Rust, P. (2008). Open Data in Science. Nature Precedings.
Nowotny, H., Scott, P., & Gibbons, M. (2003). Introduction: `Mode 2’ Revisited: The New Production of Knowledge. Minerva, 41(3), 179–194. http://doi.org/10.1023/A:1025505528250
Strathern, M. (2003). Audit Cultures: Anthropological Studies in Accountability, Ethics and the Academy. Routledge.
Was this classification of issues taken up by the OECD?
ReplyDeleteIm an not to sure what you mean by classification of issues?
DeleteI am not sure, but the commenter might refer to this document: http://www.oecd.org/sti/sci-tech/oecdprinciplesandguidelinesforaccesstoresearchdatafrompublicfunding.htm
DeleteYes, I think it was a sort of preparatory work for OECD's principles and guidelines for data sharing from the same year (cited above)
Delete‘publicly funded research data should be openly available to the maximum extent possible’ That is the point that matters and if taken seriously could be a good start in creating a new way to look at it as common achievements.
ReplyDeleteSteve, you might also want and use parts of your assignments for this blog post, e.g. from the first assignment on the thought-schools...
ReplyDeleteGood points Steve. For me the reward structure is the biggest issue to be dealt with. Humans can respond very powerfully to incentives, and I think some degree of recognition will be vital. Most people don't only work for the greater good, after all.
ReplyDeleteexactly, if you don't get credit so why to invest your time and resources? to produce another citable publication definitely pays off much more
DeleteAgreed! Thus we will have to scrutinize in future whether the ways in which credit is attributed in science are still suiting the ways in which we envision the Science of the future to be shaped. In my esteem a vital gap in between of Open Science policies and the ways reputation is selfreflexively redistributed in contemporary academia exists.
DeleteAnother point is that in academic "reputation regimes", novelty and originality are encouraged and valued much more than to ensure or prove the reproducibility of claimed results. This might be as well the reason why such initiatives like the Reproducibility project are mostly organized as a large collaboration where tasks are broken down in small chunks and would be feasible for people to contribute their time on a voluntary basis.
ReplyDeleteThe 11 principles give a very good overview what is important. It would be interesting to pick out the Open notebook topic and start a discussion on that. Are there some real good examples where the openness of lab notebooks was the initial to further research? Would be good to hear more about the experiences and results of this approach.
ReplyDeletethank you Walter I will look up some interesting examples and create a new post, where we can hopefully discuss on how Open Notebook Science could be a game changer
DeleteOkay I will post some reflections on reward structure and issues related to the incompatibility of ways in which reputation is (re-)distributed and collaborative/open science practices soon
ReplyDeletelooking forward to that! KM
ReplyDeleteThanks for the neatly arranged overview over the 11 principles of the Arzberger et al.(2004).
ReplyDeleteI want to share here that there is one issue that always comes up when I think about open data. Unfortunately this issue is not addressed in the 11 operating principles of Arzberger et al..
I do think that there could be a lot of problems when openly shared data will be re-used. In my opinion the re-usabily of open data is strongly connected to the way the data is edited and how neatly the context of its production is described. I admittedly do not have a lot of experience in how open data sets look like and therefore do not know in how far this kind of information is given, but I strongly believe that certain background information about data is absolutely necessary for its re-use. For that reason I wonder why this issue is not mentioned in Arzberger et al.'s operating principles. In my opinion comprehensive background information for a data set needs to be considered in operating principles for open data.
Yes Bernhard I completely agree and I really should have emphasised this when reflecting notions of quality of research data. I really think that notions of quality can't be applied to data but only to its descriptors and descriptions, which should indicate what purposes the data was generated for and what was the setting in which it was produced.
Delete