Skip to content

Commit

Permalink
update data tab
Browse files Browse the repository at this point in the history
  • Loading branch information
naustica committed Sep 4, 2024
1 parent df72565 commit 8811e77
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 33 deletions.
14 changes: 7 additions & 7 deletions data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ Anyone can view and query our publicly available [Open Scholarly Data warehouse

| Snapshot | Directory | Table | Schema | Procedure | Last Changed | Coverage | Number of rows |
|------------|---------------|-----------------------|-----------------------------------|-----------|--------------|-----------|----------------------|
| 2024-07-30 | authors/ | [openalex.authors](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_author.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 95.079.815 |
| 2024-07-31 | funders/ | [openalex.funders](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_funders.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 32.437 |
| 2024-07-31 | institutions/ | [openalex.institutions](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_institutions.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 108.832 |
| 2024-07-31 | publishers/ | [openalex.publishers](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_publishers.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 10.247 |
| 2024-07-31 | sources/ | [openalex.sources](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_sources.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 254.530 |
| 2024-07-29 | topics/ | [openalex.topics](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_topics.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 4.516 |
| 2024-07-30 | works/ | [openalex.works](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_work.json | [Repo](https://github.com/naustica/openalex) | 07.08.2024 | All | 257.748.845 |
| 2024-08-29 | authors/ | [openalex.authors](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_author.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 95.724.450 |
| 2024-08-30 | funders/ | [openalex.funders](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_funders.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 32.437 |
| 2024-08-30 | institutions/ | [openalex.institutions](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_institutions.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 109.259 |
| 2024-08-30 | publishers/ | [openalex.publishers](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_publishers.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 10.250 |
| 2024-08-30 | sources/ | [openalex.sources](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_sources.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 254.515 |
| 2024-08-26 | topics/ | [openalex.topics](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_topics.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 4.516 |
| 2024-08-29 | works/ | [openalex.works](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex) | schema_openalex_work.json | [Repo](https://github.com/naustica/openalex) | 04.09.2024 | All | 258.602.038 |

:::
38 changes: 19 additions & 19 deletions docs/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -2481,74 +2481,74 @@ <h2 id="status-openalex">Status Openalex</h2>
</thead>
<tbody>
<tr class="odd">
<td>2024-07-30</td>
<td>2024-08-29</td>
<td>authors/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.authors</a></td>
<td>schema_openalex_author.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>95.079.815</td>
<td>95.724.450</td>
</tr>
<tr class="even">
<td>2024-07-31</td>
<td>2024-08-30</td>
<td>funders/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.funders</a></td>
<td>schema_openalex_funders.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>32.437</td>
</tr>
<tr class="odd">
<td>2024-07-31</td>
<td>2024-08-30</td>
<td>institutions/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.institutions</a></td>
<td>schema_openalex_institutions.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>108.832</td>
<td>109.259</td>
</tr>
<tr class="even">
<td>2024-07-31</td>
<td>2024-08-30</td>
<td>publishers/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.publishers</a></td>
<td>schema_openalex_publishers.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>10.247</td>
<td>10.250</td>
</tr>
<tr class="odd">
<td>2024-07-31</td>
<td>2024-08-30</td>
<td>sources/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.sources</a></td>
<td>schema_openalex_sources.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>254.530</td>
<td>254.515</td>
</tr>
<tr class="even">
<td>2024-07-29</td>
<td>2024-08-26</td>
<td>topics/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.topics</a></td>
<td>schema_openalex_topics.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>4.516</td>
</tr>
<tr class="odd">
<td>2024-07-30</td>
<td>2024-08-29</td>
<td>works/</td>
<td><a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1ssubugoe-collaborative!2sopenalex">openalex.works</a></td>
<td>schema_openalex_work.json</td>
<td><a href="https://github.com/naustica/openalex">Repo</a></td>
<td>07.08.2024</td>
<td>04.09.2024</td>
<td>All</td>
<td>257.748.845</td>
<td>258.602.038</td>
</tr>
</tbody>
</table>
Expand Down
2 changes: 1 addition & 1 deletion docs/posts/posts.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"categories": [],
"contents": "\n\n\n\n\n\n\n\n\n\nIn June 2024, we submitted an analysis of publication and document types in OpenAlex in comparison with the proprietary databases Web of Science and Scopus and the open data sources Semantic Scholar and PubMed (Haupka et al. 2024).\nWe found substantial differences between these databases: While Web of Science and Scopus provided a comprehensive set of document types to describe works published in journals, OpenAlex supported only a comparably limited number of types.\nNotably, OpenAlex lacked a distinction between research articles and reviews, which can be crucial when calculating citation indicators.\nIn line with related studies (Alperin et al. 2024), we also observed discrepancies in the number of publications when restricting to certain document types.\nMeanwhile, in late May and late July 2024, OpenAlex introduced extended approaches to obtain publication and document types.\nAmong the four new categories were preprints and reviews. Using PubMed, OpenAlex identified approximately 4 million journal articles as editorials, erratum, letters, preprints, reviews, or retractions.\nOf course, we wanted to know how these improvements affect our findings.\nWe therefore re-applied our approach to the recent changes.\nUsing works published in journals between 2012 and 2022, we demonstrate that OpenAlex’s recent changes provide a more nuanced set of document types to refine scholarly works.\nHowever, the comparison with Web of Science and Scopus reveals that there remain considerable differences.\nData and Methods\nFollowing our preprint, we performed a pairwise comparison of journal publications indexed in OpenAlex with the Web of Science and Scopus published 2012 to 2022.\nTo investigate changes made in OpenAlex, we furthermore compared data from the OpenAlex July 2024 and August 2023 snapshots.\nScopus and Web of Science data were retrieved from the German Competence Network of Bibliometrics, using the April 2024 snapshots.\nWeb of Science data retrieval comprised the Core Collection.\nWe matched items between the databases by DOI after normalisation to lowercase.\nOverall, the intersection of OpenAlex and Scopus covered 24,704,172 and the intersection of OpenAlex and Web of Science covered 21,775,771 records.\nThen, we categorised works based on their document type information into two categories: research discourse and editorial discourse.\nThe research discourse category now also includes publications of type “preprint”, which was added to OpenAlex in May 2024.\nThe mapping tables used for reclassifying the document types can be found in the appendix of Haupka et al. (2024).\nFindings\nFigure 1 illustrates OpenAlex document type changes in comparison with Scopus.\nBefore the introduction of the more nuanced set of document types, OpenAlex tagged\n24,559,634 items (99.42%) as articles, which reduced to 22,132,347 (89.59%).\nScopus tagged 20,777,473 items (84.11%) as article.\nOpenAlex assigned the type review to 1,511,172 items (6.12%), whereas Scopus to 1,776,555 items (7.19%).\n\n\n\n\nFigure 1: Comparison of OpenAlex and Scopus for publication years 2012-2022\n\n\n\nFigure 2 illustrates the same for the comparison of OpenAlex with Web of Science.\nHere, OpenAlex tagged 21,673,833 items (99.53%) as articles before the introduction of the more nuanced set of document types and 19,500,710 (89.55%) after.\nIn Web of Science 17,266,997 items (79.29%) were tagged as articles.\nThe document type review is assigned to 1,362,290 items (6.26%) by OpenAlex, whereas Web of Science tagged 1,242,472 items (5.71%) as such.\n\n\n\n\nFigure 2: Comparison of OpenAlex and Web of Science for publication years 2012-2022\n\n\n\nOverall, Figures 1 and 2 demonstrate that even after the introduction of a more nuanced set of document types, OpenAlex still tags a higher proportion of items as articles than the commercial data sources.\nThe difference between the proportions of items tagged as articles is, however, slightly more pronounced in the comparison of OpenAlex with Web of Science.\nScopus tags a higher proportion of items as reviews and both Scopus and Web of Science still tag more items as editorial content than OpenAlex.\nIn sum, 340,998 (Scopus) and 656,366 (Web of Science) items are tagged as editorial/editorial material or letters in Scopus and Web of Science, respectively, while tagged as articles in OpenAlex.\nWhen grouping the document types into the two categories research discourse and editorial discourse, we found that even after the introduction of a more nuanced set of document types in OpenAlex, the proportion of items labelled as editorial discourse is still about 3% lower compared to Scopus and Web of Science, as shown in the tables below.\n\n\n\n\n\n\n\n\n\n\n\n\nDiscussion and Outlook\nOur updated analysis demonstrated a noticable improvement of the classification of document types in OpenAlex when comparing it to Scopus and Web of Science.\nCompared to data from 2023, the discrepancy in the classification of items has decreased slightly.\nThis indicates a convergence of the classification system in OpenAlex towards those from proprietary databases, with an enhanced coverage of reviews and editorial materials.\nIn addition, the rule-based string matching for recognising paratexts introduced and revised by OpenAlex resulted in more texts being categorised as editorial material than before.\nHowever, the results also show that the curation of document types has not yet been finalised.\nConclusively, we would like to point out that there is no correct classification system per se.\nRather different classification systems applied by the database operators can bring advantages and disadvantages.\nIn Semantic Scholar and PubMed, for example, publications are labelled as clinical studies and case reports, which in Scopus, Web of Science and OpenAlex are predominantly assigned to the document type article.\nA differentiation of these publications has the potential to increase the quality of bibliometric surveys in the analysed databases.\nAlso, the results from this analysis are only partially comparable with the results from our preprint, as in the preprint we worked with a more restrictive set that included publications from Semantic Scholar and PubMed.\nFunding\nThis work is funded by the Bundesministerium für Bildung und Forschung (BMBF) project KBOPENBIB (16WIK2301E). We acknowledge the support of the German Competence Center for Bibliometrics.\n\n\n\nAlperin, Juan Pablo, Jason Portenoy, Kyle Demes, Vincent Larivière, and Stefanie Haustein. 2024. “An Analysis of the Suitability of OpenAlex for Bibliometric Analyses.” arXiv. https://doi.org/10.48550/arXiv.2404.17663.\n\n\nHaupka, Nick, Jack H. Culbert, Alexander Schniedermann, Najko Jahn, and Philipp Mayr. 2024. “Analysis of the Publication and Document Types in OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar.” https://arxiv.org/abs/2406.15154.\n\n\n\n\n",
"preview": {},
"last_modified": "2024-09-04T15:14:30+02:00",
"last_modified": "2024-09-04T15:23:41+02:00",
"input_file": {}
},
{
Expand Down
Loading

0 comments on commit 8811e77

Please sign in to comment.