- Datasets - Meta - Wikimedia
Various places that have Wikimedia datasets, and tools for working with them Also, you can now store table and maps data using Commons Datasets, and use them from all wikis from Lua and Graphs
- Wikipedia Dataset on Hugging Face: Structured Content for AI ML
Wikimedia Enterprise has released an early beta dataset to Hugging Face for the general public to freely use and provide feedback for future improvements The dataset is sourced from our Snapshot API which delivers bulk database dumps, aka snapshots, of Wikimedia projects—in this case, Wikipedia in English and French languages
- Find Open Datasets and Machine Learning Projects | Kaggle
Download Open Datasets on 1000s of Projects + Share Projects on One Platform Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More Flexible Data Ingestion
- Data. gov Home - Data. gov
Today, Data gov is nearing 300,000 datasets and dataset collections in the catalog, harvested from over 100 organizations, and counts over a million monthly pageviews from people like you, looking to discover that information
- Using a Word2Vec model pre-trained on wikipedia
I need to use gensim to get vector representations of words, and I figure the best thing to use would be a word2vec module that's pre-trained on the english wikipedia corpus Does anyone know where
- WikiBio Dataset - Papers With Code
This dataset gathers 728,321 biographies from English Wikipedia It aims at evaluating text generation algorithms For each article, we provide the first paragraph and the infobox (both tokenized)
- [2012. 14919] WikiTableT: A Large-Scale Data-to-Text Dataset for . . .
In this work, we cast generating Wikipedia sections as a data-to-text generation task and create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata
- DavidGrangier wikipedia-biography-dataset - GitHub
This dataset gathers 728,321 biographies from wikipedia It aims at evaluating text generation algorithms For each article, we provide the first paragraph and the infobox (both tokenized) - GitH
|