Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.14 KB

dataset.md

File metadata and controls

30 lines (19 loc) · 1.14 KB
layout title permalink
page
OpenBioLink2021 Challenge
/dataset/

📁 Dataset

The OpenBioLink2021 Dataset is a highly challenging benchmark dataset containing about 4.5 million high quality biomedical facts from various renowned biomedical knowledge bases. The dataset was split randomly with a ratio of 90-5-5.

# Train # Valid # Test # Entities # Relations
4,192,002 186,301 180,964 180,992 28

The dataset can be downloaded from Zenodo: KGID_HQ_DIR.zip or loaded with the provided python dataloader module, which is further documented here. Please make sure that you get the dataset from one of the two sources, as other versions of OpenBioLink may differ.

{% highlight python %}

from openbiolink.obl2021 import OBL2021Dataset

dl = OBL2021Dataset()

train = dl.training # torch.tensor of shape(num_train,3) valid = dl.validation # torch.tensor of shape(num_val,3)

{% endhighlight %}