Skip to content

Commit

Permalink
Added Data for Good at Meta.
Browse files Browse the repository at this point in the history
Signed-off-by: Dean Wampler <[email protected]>
  • Loading branch information
deanwampler committed Jan 31, 2025
1 parent 71f7f78 commit 9595bef
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 13 deletions.
1 change: 1 addition & 0 deletions docs/about.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ These Alliance member organizations are contributing to OTDI in various ways. In
* [Common Crawl Foundation](https://commoncrawl.org/){:target="commoncrawl"}
* [Hugging Face](https://huggingface.co){:target="huggingface"}
* [IBM](https://ibm.com){:target="ibm"}
* [Meta](https://meta.com){:target="meta"}
* [Pleias](https://pleias.fr/){:target="pleias"}
* [ServiceNow](https://www.servicenow.com/){:target="servicenow"}

Expand Down
28 changes: 19 additions & 9 deletions docs/catalog.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ Here is the current list of datasets, organized by owner.

> **BETA:** This is a provisional list of datasets. We are not yet validating datasets against our draft [requirements]({{site.baseurl}}/dataset-requirements).
## BrightQuery

[BrightQuery](https://brightquery.ai/){:target="bq"} ("BQ") provides proprietary financial, legal, and employment information on private and public companies derived from regulatory filings and disclosures. BQ proprietary data is used in capital markets for investment decisions, banking and insurance for KYC & credit checks, and enterprises for master data management, sales, and marketing purposes. In addition, BQ provides public information consisting of clean and standardized statistical data from all the major government agencies and NGOs around the world, and is doing so in partnership with the source agencies. BQ public datasets will be published in OTDI spanning all topics: economics, demographics, healthcare, crime, climate, education, sustainability, etc. Much of the data will be tabular (i.e., structured) time series data, as well as unstructured text.

_More specific information is coming soon._

## Common Crawl Foundation

[Common Crawl Foundation](https://commoncrawl.org/){:target="ccf"} is working on tagged and filtered crawl subsets for English and other languages.

_More specific information is coming soon._

## Meta

[Data for Good at Meta](https://dataforgood.facebook.com/dfg/){:target="dfg"} empowers partners with privacy-preserving data that strengthens communities and advances social issues. Data for Good is helping organizations respond to crises around the world and supporting research that advances economic opportunity.

There are 220 datasets available. See [Meta's page](https://data.humdata.org/organization/meta){:target="humdata"} page at the [Human Data Exchange](https://data.humdata.org/){:target="humdata"} for the full list of datasets.

## PleIAs

Domain-specific, clean datasets.
Expand Down Expand Up @@ -64,15 +82,7 @@ The training dataset for the [SemiKong](https://www.semikong.ai/){:target="semik
| :---------------- | :-------------- | :------- | :--------- |
| **SemiKong** | An open model training dataset for semiconductor technology | [Hugging Face](https://huggingface.co/datasets/pentagoniac/SemiKong_Training_Datset){:target="semikong-dataset"} | 2024-09-01 |

## Coming Soon

In addition to the above organizations, the following are collaborating with us on additional datasets to be published soon.

| Organization | Kind |
| :--------------- | :------- |
| [BrightQuery](https://brightquery.ai/){:target="bq"} | BrightQuery ("BQ") provides proprietary financial, legal, and employment information on private and public companies derived from regulatory filings and disclosures. BQ proprietary data is used in capital markets for investment decisions, banking and insurance for KYC & credit checks, and enterprises for master data management, sales, and marketing purposes. In addition, BQ provides public information consisting of clean and standardized statistical data from all the major government agencies and NGOs around the world, and is doing so in partnership with the source agencies. BQ public datasets will be published in OTDI spanning all topics: economics, demographics, healthcare, crime, climate, education, sustainability, etc. The data will in general be tabular time series. (TBD) |
| [Common Crawl Foundation](https://commoncrawl.org/){:target="ccf"} | Tagged and filtered crawl subsets for English and other languages |

## Your Contributions?

To expand this catalog, we [welcome contributions]({{site.baseurl}}/contributing).

Expand Down
9 changes: 5 additions & 4 deletions docs/index.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@ has_children: true
> **News:**
>
> * January 31, 2025: Added [Data for Good at Meta]({{site.baseurl}}/catalog/#meta) datasets.
> * January 23, 2025: The initiative [Steering Committee]({{site.baseurl}}/about/#steering-committee) is established.
> * December 11, 2024: Added [ServiceNow](https://www.servicenow.com/){:target="sn"} datasets.
> * November 20, 2024: [BrightQuery](https://brightquery.com/){:target="bq"} joins the AI Alliance and the Open Trusted Data Initiative: [LinkedIn announcement](https://www.linkedin.com/posts/jose-plehn_brightquery-is-proud-to-now-be-a-member-of-activity-7265516443742478338-xjIz/?utm_source=share&utm_medium=member_desktop){:target="bq-li"}.
> * November 4, 2024: [pleias](https://pleias.fr){:target="pleias"} joins the AI Alliance and the Open Trusted Data Initiative: [LinkedIn announcement](https://www.linkedin.com/posts/pleias_pleias-joins-the-ai-alliance-to-co-lead-open-ugcPost-7259263514542796800-Uphx/){:target="pleias-li"}.
> * October 15, 2024: [Common Crawl Foundation](https://commoncrawl.org/){:target="ccf"} joins the AI Alliance and the Open Trusted Data Initiative.
> * December 11, 2024: Added [ServiceNow]({{site.baseurl}}/catalog/#servicenow) datasets.
> * November 20, 2024: [BrightQuery]({{site.baseurl}}/catalog/#brightquery) joins the AI Alliance and the Open Trusted Data Initiative: [LinkedIn announcement](https://www.linkedin.com/posts/jose-plehn_brightquery-is-proud-to-now-be-a-member-of-activity-7265516443742478338-xjIz/?utm_source=share&utm_medium=member_desktop){:target="bq-li"}.
> * November 4, 2024: [PleIAs]({{site.baseurl}}/catalog/#pleias) joins the AI Alliance and the Open Trusted Data Initiative: [LinkedIn announcement](https://www.linkedin.com/posts/pleias_pleias-joins-the-ai-alliance-to-co-lead-open-ugcPost-7259263514542796800-Uphx/){:target="pleias-li"}.
> * October 15, 2024: [Common Crawl Foundation]({{site.baseurl}}/catalog/#common-crawl-foundation) joins the AI Alliance and the Open Trusted Data Initiative.
> **Tip:** Use the search box at the top of this page to find specific content.
Expand Down

0 comments on commit 9595bef

Please sign in to comment.