OpenCitations Meta: Conclusion, Acknowledgements, and References

3 Jun 2024


(1) Arcangelo Massari, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(2) Fabio Mariani, Institute of Philosophy and Sciences of Art, Leuphana University, Lüneburg, Germany {};

(3) Ivan Heibi, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(4) Silvio Peroni, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {};

(5) David Shotton, Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom {}.

6. Conclusion

This article detailed the methodology used to develop OpenCitations Meta, a database that stores and delivers bibliographic metadata for all publications involved in the OpenCitations Indexes. This process involves two main phases: (1) an automatic curation analysis aimed at deduplicating entities, correcting errors and enriching information, and (2) a data conversion to RDF, while keeping track of changes and provenance in RDF.

Information about new publications is continuously being added to Crossref, DataCite, and PubMed, and we will develop procedures to ingest these new metadata into OpenCitations Meta in a regular and timely manner. Furthermore, work is already underway to ingest bibliographic metadata from the Japan Link Center and the OpenAIRE Research Graph, and other sources will be included as our human and computational resources permit. OpenCitations Meta will thus continue to grow.

OpenCitations Meta has three major benefits. First, the use of OMIDs (OpenCitation Meta Identifiers) for all stored entities enables OpenCitations Meta to act as a mapping hub for publications that may have more than one external PID (for example a journal article described in Crossref with a DOI (Digital Object Identifier), and the same publication described in PubMed with a PMID (PubMed Identifier), while also making it possible to characterise citations involving resources lacking any external PIDs. Consequently, the second benefit is that OpenCitations Meta allows citations in OpenCitations Indexes to be described as OMID-to-OMID, disambiguating citations between documents with different identifier schemes, e.g. represented as DOI-to-DOI on Crossref and PMID-to-PMID on PubMed. Third, OpenCitations Meta speeds search operations to retrieve metadata on publications involved in the citations stored in the OpenCitations Citation Indexes, since these metadata are now kept in-house, rather than being retrieved by on-the-fly API calls to external resources.

Future challenges will be to elaborate a disambiguation system for people lacking an ORCID identifier, to improve the quality of the existing metadata, to enhance the search operations and the storage efficiency, to add additional metadata fields for Abstracts, Funder IDs, Funding information, and Institutional identifiers, and to populate these where these metadata are available from our sources.

Finally, an interface will be implemented and made available to trusted domain experts to permit direct real-time manual curation of metadata held by OpenCitations Meta. Such a system will track changes and provenance, will preserve the delta between different versions of each entity, and will retain information such as the agent responsible for the change, the primary source, and the date. In this way, we will strive to make OpenCitations Meta not only comprehensive but also an accurate and fully open and reusable source of bibliographic metadata to which members of the scholarly community can directly contribute.

7 Acknowledgements

This work has been partially funded by the European Union’s Horizon 2020 Research and Innovation Program under grant agreement No 101017452 (OpenAIRE-Nexus Project).


Abramatic, J.-F., Di Cosmo, R., & Zacchiroli, S. (2018). Building the universal archive of source code. Communications of the ACM, 61 (10), 29–31.

Atzori, C., Bardi, A., Manghi, P., & Mannocci, A. (2017). The OpenAIRE Workflows for Data Management [Series Title: Communications in Computer and Information Science]. In C. Grana & L. Baraldi (Eds.), Digital Libraries and Archives (pp. 95–107). Springer International Publishing.

Auer, S., Oelen, A., Haris, M., Stocker, M., D’Souza, J., Farfar, K. E., Vogt, L., Prinz, M., Wiens, V., & Jaradeh, M. Y. (2020). Improving Access to Scientific Literature with Knowledge Graphs. Bibliothek Forschung und Praxis, 44 (3), 516–529.

Board, D. U. (2020). DCMI Metadata Terms. Retrieved July 16, 2021, from 20/

Brase, J. (2009). DataCite - A Global Registration Agency for Research Data. 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology, 257–261. https: //

Brase, J. (2010). Datacite - A Global Registration Agency for Research Data. SSRN Electronic Journal.

Carroll, J. J., Bizer, C., Hayes, P., & Stickler, P. (2005). Named graphs, provenance and trust. Proceedings of the 14th international conference on World Wide Web - WWW ’05, 613. 1060835

Daquino, M., & Peroni, S. (2019). OCO, the OpenCitations Ontology. Retrieved September 4, 2021, from

Daquino, M., Peroni, S., & Shotton, D. (2020). The OpenCitations Data Model [Artwork Size: 836876 Bytes Publisher: figshare], 836876 Bytes. https: //

Dhakal, K. (2019). Unpaywall. Journal of the Medical Library Association, 107 (2).

European Commission. Directorate General for Research and Innovation. (2016). Realising the European open science cloud: First report and recommendations of the Commission high level expert group on the European open science cloud. Publications Office. Retrieved October 17, 2022, from

Falco, R., Gangemi, A., Peroni, S., Shotton, D., & Vitali, F. (2014). Modelling OWL Ontologies with Graffoo [Series Title: Lecture Notes in Computer Science]. In V. Presutti, E. Blomqvist, R. Troncy, H. Sack, I. Papadakis, & A. Tordai (Eds.), The Semantic Web: ESWC 2014 Satellite Events (pp. 320–325). Springer International Publishing. 1007/978-3-319-11955-7_42

Fricke, S. (2018). Semantic Scholar. Journal of the Medical Library Association, 106 (1).

Garcia, A., Lopez, F., Garcia, L., Giraldo, O., Bucheli, V., & Dumontier, M. (2018). Biotea: Semantics for Pubmed Central. PeerJ, 6, e4201. https: //

Gentile, A. L., & Nuzzolese, A. G. (2015). cLODg-Conference Linked Open Data Generator. ISWC (Posters & Demos).

Gil, Y., Cheney, J., Groth, P., Hartig, O., Miles, S., Moreau, L., & Silva, P. (2010). Provenance XG Final Report [Type: W3C.]. http://www.w3. org/2005/Incubator/prov/XGR-prov-20101214/

Gorraiz, J., Melero-Fuentes, D., Gumpenberger, C., & Valderrama-Zurián, J.-C. (2016). Availability of digital object identifiers (DOIs) in Web of Science and Scopus. Journal of Informetrics, 10 (1), 98–109. 10.1016/j.joi.2015.11.008

Haak, L. L., Fenner, M., Paglione, L., Pentz, E., & Ratner, H. (2012). ORCID: A system to uniquely identify researchers. Learned Publishing, 25 (4), 259–264.

Hammond, T., Pasin, M., & Theodoridis, E. (2017). Data integration and disintegration: Managing Springer Nature SciGraph with SHACL and OWL. ISWC (Posters, Demos & Industry Tracks).

Hara, M. (2020). Introduction of Japan Link Center (JaLC) [Artwork Size: 2213661 Bytes Publisher: ORCID], 2213661 Bytes. 23640/07243.12469094.V1

Heibi, I., Peroni, S., & Shotton, D. (2019a). Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal [arXiv: 1902.02534]. arXiv:1902.02534 [cs]. Retrieved September 15, 2021, from

Heibi, I., Peroni, S., & Shotton, D. (2019b). Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations. Scientometrics, 121 (2), 1213–1228.

Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1 (1), 414–427.

ICite, Hutchins, B. I., & Santangelo, G. (2022). iCite Database Snapshots (NIH Open Citation Collection) [Publisher: The NIH Figshare Archive]. https: //

Koivunen, M.-R., & Miller, E. (2001). Semantic Web Activity [Edition: W3C Volume: 11 02].

Lammey, R. (2020). Solutions for identification problems: A look at the Research Organization Registry. Science Editing, 7 (1), 65–69. 10.6087/kcse.192

Lebo, T., Sahoo, S., & McGuinness, D. (2013). PROV-O: The PROV Ontology [Place: PROV-O Volume: 04 30]. Retrieved July 16, 2021, from http: //

Maloney, C., Sequeira, E., Kelly, C., Orris, R., & Beck, J. (2013). PubMed Central. In The NCBI Handbook.

Manghi, P., Manola, N., Horstmann, W., & Peters, D. (2010). An Infrastructure for Managing EC Funded Research Output: The OpenAIRE Project. Grey Journal (TGJ), 6 (1).

Massari, A., & Heibi, I. (2022). How to structure citations data and bibliographic metadata in the OpenCitations accepted format. Proceedings of the Workshop on Understanding LIterature references in academic full TExt, 3220.

Massari, A., & Peroni, S. (2022). Performing live time-traversal queries via SPARQL on RDF datasets [Publisher: arXiv Version Number: 2]. https: //

Mora-Cantallops, M., Sánchez-Alonso, S., & García-Barriocanal, E. (2019). A systematic literature review on Wikidata. Data Technologies and Applications, 53 (3), 250–268.

Morrison, H. (2017). Directory of Open Access Journals (DOAJ). The Charleston Advisor, 18 (3), 25–28.

Nielsen, F. Å., Mietchen, D., & Willighagen, E. L. (2017). Scholia, Scientometrics and Wikidata. In E. Blomqvist, K. Hose, H. Paulheim, A. Lawrynowicz, F. Ciravegna, & O. Hartig (Eds.), The Semantic Web: ESWC 2017 Satellite Events - ESWC 2017 Satellite Events, Portorož, Slovenia, May 28 - June 1, 2017, Revised Selected Papers (pp. 237– 259). Springer.

Nuzzolese, A. G., Gentile, A. L., Presutti, V., & Gangemi, A. (2016). Semantic web conference ontology-a refactoring solution. European semantic web conference, 84–87.

OpenCitations. (2022). COCI CSV dataset of all the citation data. https://doi. org/10.6084/M9.FIGSHARE.6741422.V18

OpenCitations. (2023a). OpenCitations Meta CSV dataset of all bibliographic metadata.

OpenCitations. (2023b). OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information. FIGSHARE.21747536.V3

Pelgrin, O., Galárraga, L., & Hose, K. (2021). Towards fully-fledged archiving for RDF datasets (A.-C. Ngonga Ngomo, M. Saleem, R. Verborgh, M. Saleem, R. Verborgh, M. I. Ali, & O. Hartig, Eds.). Semantic Web Journal, 12 (6), 903–925.

Peroni, S., & Shotton, D. (2018). Open Citation: Definition [Artwork Size: 95436 Bytes Publisher: figshare], 95436 Bytes. FIGSHARE.6683855.V1

Peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure organization for open scholarship [_eprint:]. Quantitative Science Studies, 1 (1), 428–444.

Peroni, S., Shotton, D., & Vitali, F. (2012). Scholarly publishing and linked data: Describing roles, statuses, temporal and contextual extents. Proceedings of the 8th International Conference on Semantic Systems - I-SEMANTICS ’12, 9.

Persiani, S., Daquino, M., & Peroni, S. (2022). A Programming Interface for Creating Data According to the SPAR Ontologies and the OpenCitations Data Model [Series Title: Lecture Notes in Computer Science]. In P. Groth, M.-E. Vidal, F. Suchanek, P. Szekley, P. Kapanipathi, C. Pesquita, H. Skaf-Molli, & M. Tamper (Eds.), The Semantic Web (pp. 305–322). Springer International Publishing. 1007/978-3-031-06981-9_18

Pranckut˙e, R. (2021). Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World. Publications, 9 (1), 12.

Priem, J., Piwowar, H. A., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts [arXiv: 2205.01833]. CoRR, abs/2205.01833. 2205.01833

Research, E. O. F. N., & OpenAIRE. (2013). Zenodo: Research. Shared. [Publisher: CERN].

Sigurdsson, S. (2020). The future of arXiv and knowledge discovery in open science. Proceedings of the First Workshop on Scholarly Document Processing, 7–9.

Sikos, L. F., & Philp, D. (2020). Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge Graphs. Data Science and Engineering, 5 (3), 293–316. https: / / doi. org / 10. 1007 / s41019-020-00118-0

Subramanian, S., King, D., Downey, D., & Feldman, S. (2021). S2AND: A Benchmark and Evaluation System for Author Name Disambiguation. 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 170– 179.

Tanon, T. P., Vrandecic, D., Schaffert, S., Steiner, T., & Pintscher, L. (2016). From Freebase to Wikidata: The Great Migration. In J. Bourdeau, J. Hendler, R. Nkambou, I. Horrocks, & B. Y. Zhao (Eds.), Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016 (pp. 1419–1428). ACM. https: //

The Europe PMC Consortium. (2015). Europe PMC: A full-text literature database for the life sciences and platform for innovation. Nucleic Acids Research, 43 (D1), D1042–D1048.

Tillett, B. (2005). What is FRBR? A conceptual model for the bibliographic universe. The Australian Library Journal, 54 (1), 24–30. https://doi. org/10.1080/00049670.2005.10721710

Vision, T. (2010). The Dryad Digital Repository: Published evolutionary data as part of the greater data ecosystem. Nature Precedings. https://doi. org/10.1038/npre.2010.4595.1

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., . . . Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3 (1), 160018. 1038/sdata.2016.18

Wolf, M., & Wicksteed, C. (1997). Date and Time Formats. Retrieved May 9, 2022, from

Zhang, Z., Nuzzolese, A. G., & Gentile, A. L. (2017). Entity Deduplication on ScholarlyData [Series Title: Lecture Notes in Computer Science]. In E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler, & O. Hartig (Eds.), The Semantic Web (pp. 85–100). Springer International Publishing.

This paper is available on arxiv under CC 4.0 DEED license.