diff --git a/_posts/openalex_document_types/openalex_document_types_2024.Rmd b/_posts/openalex_document_types/openalex_document_types_2024.Rmd index 0a497cd..9a65bbd 100644 --- a/_posts/openalex_document_types/openalex_document_types_2024.Rmd +++ b/_posts/openalex_document_types/openalex_document_types_2024.Rmd @@ -17,6 +17,7 @@ author: date: "`r Sys.Date()`" output: distill::distill_article bibliography: literature.bib +preview: distill-preview.png css: vis.css --- diff --git a/_posts/openalex_document_types/openalex_document_types_2024.html b/_posts/openalex_document_types/openalex_document_types_2024.html index 8e35ade..f871fef 100644 --- a/_posts/openalex_document_types/openalex_document_types_2024.html +++ b/_posts/openalex_document_types/openalex_document_types_2024.html @@ -140,7 +140,7 @@ @@ -5612,8 +5612,8 @@
Our updated analysis demonstrated a noticable improvement of the classification of document types in OpenAlex when comparing it to Scopus and Web of Science. diff --git a/docs/index.html b/docs/index.html index 8565b8a..0c64c56 100644 --- a/docs/index.html +++ b/docs/index.html @@ -2174,137 +2174,137 @@
Our updated analysis demonstrated a noticable improvement of the classification of document types in OpenAlex when comparing it to Scopus and Web of Science.
diff --git a/docs/posts/posts.json b/docs/posts/posts.json
index c1a331c..b98efcc 100644
--- a/docs/posts/posts.json
+++ b/docs/posts/posts.json
@@ -20,9 +20,11 @@
"date": "2024-09-04",
"categories": [],
"contents": "\n\n\n\n\n\n\n\n\n\nIn June 2024, we submitted an analysis of publication and document types in OpenAlex in comparison with the proprietary databases Web of Science and Scopus and the open data sources Semantic Scholar and PubMed (Haupka et al. 2024).\nWe found substantial differences between these databases: While Web of Science and Scopus provided a comprehensive set of document types to describe works published in journals, OpenAlex supported only a comparably limited number of types.\nNotably, OpenAlex lacked a distinction between research articles and reviews, which can be crucial when calculating citation indicators.\nIn line with related studies (Alperin et al. 2024), we also observed discrepancies in the number of publications when restricting to certain document types.\nMeanwhile, in late May and late July 2024, OpenAlex introduced extended approaches to obtain publication and document types.\nAmong the four new categories were preprints and reviews. Using PubMed, OpenAlex identified approximately 4 million journal articles as editorials, erratum, letters, preprints, reviews, or retractions.\nOf course, we wanted to know how these improvements affect our findings.\nWe therefore re-applied our approach to the recent changes.\nUsing works published in journals between 2012 and 2022, we demonstrate that OpenAlex’s recent changes provide a more nuanced set of document types to refine scholarly works.\nHowever, the comparison with Web of Science and Scopus reveals that there remain considerable differences.\nData and Methods\nFollowing our preprint, we performed a pairwise comparison of journal publications indexed in OpenAlex with the Web of Science and Scopus published 2012 to 2022.\nTo investigate changes made in OpenAlex, we furthermore compared data from the OpenAlex July 2024 and August 2023 snapshots.\nScopus and Web of Science data were retrieved from the German Competence Network of Bibliometrics, using the April 2024 snapshots.\nWeb of Science data retrieval comprised the Core Collection.\nWe matched items between the databases by DOI after normalisation to lowercase.\nOverall, the intersection of OpenAlex and Scopus covered 24,704,172 and the intersection of OpenAlex and Web of Science covered 21,775,771 records.\nThen, we categorised works based on their document type information into two categories: research discourse and editorial discourse.\nThe research discourse category now also includes publications of type “preprint”, which was added to OpenAlex in May 2024.\nThe mapping tables used for reclassifying the document types can be found in the appendix of Haupka et al. (2024).\nFindings\nFigure 1 illustrates OpenAlex document type changes in comparison with Scopus.\nBefore the introduction of the more nuanced set of document types, OpenAlex tagged\n24,559,634 items (99.42%) as articles, which reduced to 22,132,347 (89.59%).\nScopus tagged 20,777,473 items (84.11%) as article.\nOpenAlex assigned the type review to 1,511,172 items (6.12%), whereas Scopus to 1,776,555 items (7.19%).\n\n\n\n\nFigure 1: Comparison of OpenAlex and Scopus for publication years 2012-2022\n\n\n\nFigure 2 illustrates the same for the comparison of OpenAlex with Web of Science.\nHere, OpenAlex tagged 21,673,833 items (99.53%) as articles before the introduction of the more nuanced set of document types and 19,500,710 (89.55%) after.\nIn Web of Science 17,266,997 items (79.29%) were tagged as articles.\nThe document type review is assigned to 1,362,290 items (6.26%) by OpenAlex, whereas Web of Science tagged 1,242,472 items (5.71%) as such.\n\n\n\n\nFigure 2: Comparison of OpenAlex and Web of Science for publication years 2012-2022\n\n\n\nOverall, Figures 1 and 2 demonstrate that even after the introduction of a more nuanced set of document types, OpenAlex still tags a higher proportion of items as articles than the commercial data sources.\nThe difference between the proportions of items tagged as articles is, however, slightly more pronounced in the comparison of OpenAlex with Web of Science.\nScopus tags a higher proportion of items as reviews and both Scopus and Web of Science still tag more items as editorial content than OpenAlex.\nIn sum, 340,998 (Scopus) and 656,366 (Web of Science) items are tagged as editorial/editorial material or letters in Scopus and Web of Science, respectively, while tagged as articles in OpenAlex.\nWhen grouping the document types into the two categories research discourse and editorial discourse, we found that even after the introduction of a more nuanced set of document types in OpenAlex, the proportion of items labelled as editorial discourse is still about 3% lower compared to Scopus and Web of Science, as shown in the tables below.\n\n\n\n\n\n\n\n\n\n\n\n\nDiscussion and Outlook\nOur updated analysis demonstrated a noticable improvement of the classification of document types in OpenAlex when comparing it to Scopus and Web of Science.\nCompared to data from 2023, the discrepancy in the classification of items has decreased slightly.\nThis indicates a convergence of the classification system in OpenAlex towards those from proprietary databases, with an enhanced coverage of reviews and editorial materials.\nIn addition, the rule-based string matching for recognising paratexts introduced and revised by OpenAlex resulted in more texts being categorised as editorial material than before.\nHowever, the results also show that the curation of document types has not yet been finalised.\nConclusively, we would like to point out that there is no correct classification system per se.\nRather different classification systems applied by the database operators can bring advantages and disadvantages.\nIn Semantic Scholar and PubMed, for example, publications are labelled as clinical studies and case reports, which in Scopus, Web of Science and OpenAlex are predominantly assigned to the document type article.\nA differentiation of these publications has the potential to increase the quality of bibliometric surveys in the analysed databases.\nAlso, the results from this analysis are only partially comparable with the results from our preprint, as in the preprint we worked with a more restrictive set that included publications from Semantic Scholar and PubMed.\nFunding\nThis work is funded by the Bundesministerium für Bildung und Forschung (BMBF) project KBOPENBIB (16WIK2301E). We acknowledge the support of the German Competence Center for Bibliometrics.\n\n\n\nAlperin, Juan Pablo, Jason Portenoy, Kyle Demes, Vincent Larivière, and Stefanie Haustein. 2024. “An Analysis of the Suitability of OpenAlex for Bibliometric Analyses.” arXiv. https://doi.org/10.48550/arXiv.2404.17663.\n\n\nHaupka, Nick, Jack H. Culbert, Alexander Schniedermann, Najko Jahn, and Philipp Mayr. 2024. “Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar.” https://arxiv.org/abs/2406.15154.\n\n\n\n\n",
- "preview": {},
- "last_modified": "2024-09-04T15:23:41+02:00",
- "input_file": {}
+ "preview": "posts/openalex_document_types/distill-preview.png",
+ "last_modified": "2024-09-04T15:29:30+02:00",
+ "input_file": "openalex_document_types_2024.knit.md",
+ "preview_width": 1416,
+ "preview_height": 1250
},
{
"path": "posts/oalex_oa_status/",
diff --git a/docs/search.json b/docs/search.json
index b1ce80d..9e9a672 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -5,21 +5,21 @@
"title": "About us",
"author": [],
"contents": "\n\nContents\nAbout this Blog!\nJournal publications\nTheses\nSoftware\nThird-party funded projects\n\nAbout this Blog!\nWelcome to our Blog! Here, you’ll find insights from our work as Data Analysts in the domain of scholarly communication. With this blog, we want to engage with the broader community about how to support data-driven workflows and decision-making around scholarly communication with R.\nWe are based at the Göttingen State and University Library, one of the largest academic libraries in Germany. We are using R-based tools in our everyday work and contribute to R package developments and training activities. In this blog, you’ll find news and case-studies around:\nOpen Access and Open Science Analytics\nR Packages making use of open databases and helping us in our work\nR Tools for interactive visualizations and dashboard developments\nR-related training and outreach activities\nWe want to thank Maëlle Salmon for encouraging us to start a blog about our work. As a technical framework for the blog, we are using Distill for R Markdown, a new web publishing format optimized for scientific and technical writing.\nDr. Anne Hobert, Nick Haupka, Najko Jahn\nJournal publications\nWe also publish in scholarly journals about our work.\nJahn, N. (2024). How open are hybrid journals included in transformative agreements? https://arxiv.org/abs/2402.18255\nCulbert, J., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., Mayr, P. (2024). Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus. https://arxiv.org/abs/2401.16359\nTaubert, N., Hobert, A., Jahn, N., Bruns, A., & Iravani, E. (2024). Understanding differences of the OA uptake within the German University landscape (2010–2020): Part 2—repository-provided OA. Scientometrics. https://doi.org/10.1007/s11192-024-05003-5\nFraser, N., Hobert, A., Jahn, N., Mayr, P., & Peters, I. (2023). No deal: German researchers’ publishing and citing behaviors after Big Deal negotiations with Elsevier.\nQuantitative Science Studies, 4(2), 325–352. https://doi.org/10.1162/qss_a_00255\nTaubert, N., Hobert, A., Jahn, N., Bruns, A., & Iravani, E. (2023). Understanding differences of the OA uptake within the German university landscape (2010–2020): Part 1—journal-based OA. Scientometrics, 128(6), 3601–3625. https://doi.org/10.1007/s11192-023-04716-3\nHaupka, N., Jahn, N., & Hobert, N. (2022). Praxisbericht Big Scholarly Data an der SUB Göttingen. LIBREAS. Library Ideas, 41 (2022). https://libreas.eu/ausgabe41/haupka/\nJahn, N., Matthias, L., & Laakso, M. (2022). Toward transparency of\nhybrid open access through publisher‐provided metadata: An article‐level\nstudy of Elsevier. Journal of the Association for Information Science\nand Technology, 73(1), 104–118. https://doi.org/10.1002/asi.24549\nJahn, N., Held, M., Walter, H., Haupka, N., & Hillenkötter, K. (2022).\nHOAD: Data Analytics für mehr Transparenz bei\nOpen-Access-Transformationsverträgen. ABI Technik, 42(1), 64–69.\nhttps://doi.org/10.1515/abitech-2022-0007\nStisser, A., Jahn, N., & Schmidt, B. (2022). Stand und Perspektiven bibliometriegestützter Open-Access-Services an Universitäten in Deutschland. Bibliothek Forschung und Praxis, 46(2), 275–283. https://doi.org/10.1515/bfp-2021-0098\nHobert, A., Jahn, N., Mayr, P., Schmidt, B., & Taubert, N. (2021). Open\naccess uptake in Germany 2010–2018: adoption in a diverse research\nlandscape. Scientometrics, 126(12), 9751–9777.\nhttps://doi.org/10.1007/s11192-021-04002-0\nLaakso, M., Matthias, L., & Jahn, N. (2021). Open is not forever: A\nstudy of vanished open access journals. Journal of the Association for\nInformation Science and Technology, 72(9), 1099–1112.\nhttps://doi.org/10.1002/asi.24460 (JASIST Best Paper Award 2022. Featured in Nature,\nNature,\nScience, CNN,\nDLF)\nJahn, N., Hobert, A., & Haupka, N. (2021). Entwicklung und Typologie des\nDatendiensts Unpaywall. Bibliothek Forschung und Praxis, 45(2),\n293–303. https://doi.org/10.1515/bfp-2020-0115\nMatthias, L., Jahn, N., & Laakso, M. (2019). The Two-Way Street of Open\nAccess Journal Publishing: Flip It and Reverse It. Publications, 7(2),\n23. https://doi.org/10.3390/publications7020023\nTheses\nHaupka, N. (2021). Analyse der Entwicklung des Open Access-Discovery-Services Unpaywall seit 2018 [Bachelor Thesis, Hochschule Hannover]. https://doi.org/10.25968/opus-1899\nSoftware\nR-Packages (selection):\nJahn, N. europepmc: R Interface to the Europe PubMed Central RESTful Web Service. https://CRAN.R-project.org/package=europepmc | https://docs.ropensci.org/europepmc/\nChamberlain, S., Zhu, H., Jahn, N., Boettiger, C., Ram, K. rcrossref: Client for Various ‘CrossRef’ ‘APIs’. https://CRAN.R-project.org/package=rcrossref https://docs.ropensci.org/rcrossref/\nJahn, N (2022). roadoi: Find Free Versions of Scholarly Publications via Unpaywall. https://CRAN.R-project.org/package=roadoi | https://docs.ropensci.org/roadoi/.\nDashboards (selection):\nHybrid Open Access Dashboard (HOAD). See our blog post: https://www.coalition-s.org/blog/introducing-the-hybrid-open-access-dashboard-hoad/\nmetacheck: Open Access Metadata Compliance Checker\nOpen Access uptake in Germany 2010-2018: Interactive Supplement\nThird-party funded projects\nBMBF\nKompetenznetzwerk Bibliometrie: Komparative Analyse und Kuratierung Deutscher Metadaten in Offenen Bibliometriedaten, Teilprojekt: Bereitstellung und Analyse Dokumenttypen\nKompetenznetzwerk Bibliometrie, Teilprojekt: Datenergänzung: Open-Access-Nachweise\nindi:oa - Verantwortungsbewusste Bewertung und Qualitätssicherung von Open-Access Publikationen mittels bibliometrischer Indikatoren (concluded)\nOAUNI - Entwicklung und Einflussfaktoren des Open-Access-Publizierens an Universitäten in Deutschland (concluded)\nDFG\nOA-Datenpraxis: Datenpraxis zur Gestaltung der Open-Access-Transformation - Analyse, Empfehlung, Training & Vernetzung\nHybrid OA Dashboards: Mehrwertorientierte Analytics-Anwendungen zur Förderung der Kostentransparenz bei Transformationsverträgen (concluded)\nEuropean Commission\nOn-Merrit (concluded)\nOpenAIRE Nexus (concluded)\n\n\n\n",
- "last_modified": "2024-09-04T15:24:55+02:00"
+ "last_modified": "2024-09-04T15:27:43+02:00"
},
{
"path": "data.html",
"title": "Open Scholarly Data @ SUB Göttingen - Overview",
"author": [],
"contents": "\n\nContents\nStatus Crossref\nStatus Unpaywall\nStatus Openalex\n\nWe use Google Big Query to work with large open scholarly data. Our main data sources are Unpaywall, Crossref and OpenAlex.\nAn overview of our data warehouse including procedures to load the data into BigQuery can be found below.\nAnyone can view and query our publicly available Open Scholarly Data warehouse on BigQuery with a Google Cloud Computing account. Note that Google will charge you for the number of bytes processed by each query (currently $ 6.25 per 1 TB).\nStatus Crossref\nCurrent Snapshot (cr_instant)\n\nSnapshot\nFile\nTable\nSchema\nProcedure\nLast Changed\nCoverage\nNumber of rows\n2024/07\nall.json.tar.gz\ncr_instant.snapshot\nschema_crossref.json\nRepo\n13.08.2024\n2013-2024\n49.288.254\n\nHistorical Snapshots (cr_history)\n\nSnapshot\nFile\nTable\nSchema\nProcedure\nLast Changed\nCoverage\nNumber of rows\n2018/04\nall.json.tar.gz\ncr_history.cr_apr18\nschema_crossref.json\nRepo\n20.02.2022\n2013-2018\n16.766.035\n2019/04\nall.json.tar.gz\ncr_history.cr_apr19\nschema_crossref.json\nRepo\n29.10.2021\n2013-2019\n20.715.644\n2020/04\nall.json.tar.gz\ncr_history.cr_apr20\nschema_crossref.json\nRepo\n29.10.2021\n2013-2020\n25.334.525\n2021/04\nall.json.tar.gz\ncr_history.cr_apr21\nschema_crossref.json\nRepo\n29.10.2021\n2013-2021\n30.579.119\n2022/04\nall.json.tar.gz\ncr_history.cr_apr22\nschema_crossref.json\nRepo\n14.05.2022\n2013-2022\n35.939.195\n2023/04\nall.json.tar.gz\ncr_history.cr_apr23\nschema_crossref.json\nRepo\n07.05.2023\n2013-2023\n41.767.461\n2024/04\nall.json.tar.gz\ncr_history.cr_apr24\nschema_crossref.json\nRepo\n07.05.2024\n2013-2024\n47.709.184\n\nStatus Unpaywall\nCurrent Snapshot (upw_instant)\n\nSnapshot\nFile\nTable\nSchema\nProcedure\nLast Changed\nCoverage\nNumber of rows\n2022/03\nunpaywall_snapshot_2022-03-09T083001.jsonl.gz\nupw_instant.snapshot\nbq_schema_mar22.json\nRepo\n14.03.2022\n2008-2022\n67.424.819\n\nHistorical Snapshots (upw_history)\n\nSnapshot\nFile\nTable\nSchema\nProcedure\nLast Changed\nCoverage\nNumber of rows\n2018/03\nunpaywall_snapshot_2018-03-29T113154.jsonl.gz\nupw_history.upw_Mar18_08_20\nbq_schema_mar18.json\nRepo\n29.10.2021\n2008-2018\n36.557.043\n2019/02\nunpaywall_snapshot_2019-02-21T031509.jsonl.gz\nupw_history.upw_Feb19_08_19\nbq_schema_feb19.json\nRepo\n10.11.2021\n2008-2019\n42.143.979\n2020/02\nunpaywall_snapshot_2020-02-25T115244.jsonl.gz\nupw_history.upw_Feb20_08_20\nbq_schema_feb20.json\nRepo\n30.10.2021\n2008-2020\n49.717.710\n2021/02\nunpaywall_snapshot_2021-02-18T160139.jsonl.gz\nupw_history.upw_Feb21_08_21\nbq_schema_feb21.json\nRepo\n29.10.2021\n2008-2021\n58.437.927\n2022/03\nunpaywall_snapshot_2022-03-09T083001.jsonl.gz\nupw_history.upw_Mar22_08_22\nbq_schema_mar22.json\nRepo\n14.03.2022\n2008-2022\n67.424.819\n\nStatus Openalex\n\nSnapshot\nDirectory\nTable\nSchema\nProcedure\nLast Changed\nCoverage\nNumber of rows\n2024-08-29\nauthors/\nopenalex.authors\nschema_openalex_author.json\nRepo\n04.09.2024\nAll\n95.724.450\n2024-08-30\nfunders/\nopenalex.funders\nschema_openalex_funders.json\nRepo\n04.09.2024\nAll\n32.437\n2024-08-30\ninstitutions/\nopenalex.institutions\nschema_openalex_institutions.json\nRepo\n04.09.2024\nAll\n109.259\n2024-08-30\npublishers/\nopenalex.publishers\nschema_openalex_publishers.json\nRepo\n04.09.2024\nAll\n10.250\n2024-08-30\nsources/\nopenalex.sources\nschema_openalex_sources.json\nRepo\n04.09.2024\nAll\n254.515\n2024-08-26\ntopics/\nopenalex.topics\nschema_openalex_topics.json\nRepo\n04.09.2024\nAll\n4.516\n2024-08-29\nworks/\nopenalex.works\nschema_openalex_work.json\nRepo\n04.09.2024\nAll\n258.602.038\n\n\n\n\n",
- "last_modified": "2024-09-04T15:24:57+02:00"
+ "last_modified": "2024-09-04T15:27:45+02:00"
},
{
"path": "index.html",
"title": "Blog | Scholarly Communication Analytics with R",
"author": [],
"contents": "\n\n\n\n",
- "last_modified": "2024-09-04T15:24:58+02:00"
+ "last_modified": "2024-09-04T15:27:47+02:00"
}
],
"collections": ["posts/posts.json"]
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index ff1390e..8f6e7a8 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -14,7 +14,7 @@