Conference article

A PID is a promise Versioning with persistent identifiers

Martin Matthiesen
CSC – IT Center for Science, Espoo, Finland

Ute Dieckmann
University of Helsinki, Helsinki, Finland

Download article

Published in: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:11, p. 103-112

Show more +

Published: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We present the update process of a dataset using persistent identifiers (PIDs). The dataset is available in two different variants: for download and via an online web interface. During the update process, we had to fundamentally rethink as to how we wanted to use PIDs and version numbering. We will also reflect on how to effectively use PID assignment in case of minor changes in the large dataset. We discuss the roles of different types of PIDs, the role of metadata, and access locations.

Keywords

Data Curation, Persistent Identifiers, Landing Pages, Metadata, Citation, PID Granularity

References

Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature. https://doi.org/10.1038/533452a.

Lars Borin, Markus Forsberg, and Johan Roxendal. 2012. Korp the corpus infrastructure of Spr°akbanken. In Proceedings of LREC 2012. Istanbul: ELRA, page 474478.

DataCite Metadata Working Group. 2016. Datacite metadata schema for the publication and citation of research data v4.0. page 37ff. https://doi.org/10.5438/0012.

David Foster, editor. 2013. Innovating Together, The 29th Trans European Research and Education Networking Conference, 3 - 6 June, 2013, Maastricht, Netherlands, Selected Papers. TERENA, August. http://www.terena.org/publications/tnc2013-proceedings/.

Jan Hajic, Eduard Bejcek, Alevtina Bémová, Eva Buránová, Eva Hajicová, Jiríi Havelka, Petr Homola, Jirí Kárník,
Václava Kettnerová, Natalia Klyueva, Veronika Kolánová, Lucie Kucová, Markéta Lopatková, Marie Mikulová,
Jirí Mírovský, Anna Nedoluzhko, Petr Pajas, Jarmila Panevová, Lucie Poláková, Magdaléna Rysová, Petr Sgall,
Johanka Spoustová, Pavel Stranák, Pavlína Synková, Magda Ševcíková, Jan Štepánek, Zdenka Urešová, Barbora
Vidová Hladká, Daniel Zeman, Šárka Zikánova, and Zdenek Žabokrtský. 2018. Prague dependency treebank
3.5. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. http://hdl.handle.net/11234/1-2621.

Mikael Linden, Tommi Nyrönen, and Ilkka Lappalainen. 2013. Resource Entitlement Management System. In Foster (Foster, 2013). http://www.terena.org/publications/tnc2013-proceedings/.

Marcus R. Munafò, Brian A. Nosek, Dorothy V. M. Bishop, Katherine S. Button, Christopher D. Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wagenmakers, Jennifer J. Ware, and John P. A. Ioannidis. 2017. A manifesto for reproducible science. Nature Human Behaviour, 1:21, Jan. http://dx.doi.org/10.1038/s41562-016-0021.

Laura Rueda, Martin Fenner, and Patricia Cruse. 2016. Datacite: Lessons learned on persistent identifiers for research data. International Journal of Digital Curation, 11(2). https://doi.org/10.2218/ijdc.v11i2.421.

University of Helsinki. 2017. Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Downloadable Version 1. The Language Bank of Finland. Retrieved from http://urn.fi/urn:nbn:fi:lb-2016050401.

University of Helsinki. 2018. Finnish Tagtools. The Language Bank of Finland. Retrieved from http://urn.fi/urn:nbn:fi:lb-2018062101.

Tobias Weigel, Michael Lautenschlager, Frank Toussaint, and Stephan Kindermann. 2013. A framework for extended persistent identification of scientific assets. Data Science Journal, 12:10 – 22. https://doi.org/10.2481/dsj.12-036.

Tobias Weigel, Timothy DiLauro, and Thomas Zastrow. 2015. PID Information Types WG final deliverable. Technical report, Research Data Initiative. https://doi.org/10.15497/FDAA09D5-5ED0-403D-B97A-2675E1EBE786.

Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Merc`e Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A. C. ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. 2016. The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3:160018, Mar. http://dx.doi.org/10.1038/sdata.2016.18.

Peter Wittenburg, Margareta Hellström, Carlo-Maria Zwölf, Hossein Abroshan, Ari Asmi, Giuseppe Di Bernardo,
Danielle Couvreur, Tamas Gaizer, Petr Holub, Rob Hooft, Ingemar Häggström, Manfred Kohler, Dimitris Koureas, Wolfgang Kuchinke, Luciano Milanesi, Joseph Padfield, Antonio Rosato, Christine Staiger, Dieter van Uytvanck, and Tobias Weigel. 2017. Persistent identifiers: Consolidated assertions. Status of November, 2017., December. https://doi.org/10.5281/zenodo.1116189.

Peter Wittenburg, Dieter Van Uytvanck, Thomas Zastrow, Pavel Strak, Daan Broeder, Florian Schiel, Volker Boehlke, Uwe Reichel, and Lene Offersgaard. 2018. CLARIN B Centre Checklist. Technical Report CE-2013-0095, CLARIN ERIC. http://hdl.handle.net/11372/DOC-78.

Citations in Crossref