Datasetsの人気記事 55件 - はてなブックマーク

1 - 40 件 / 55件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

Datasetsの検索結果1 - 40 件 / 55件

タグ検索の該当結果が少ないため、タイトル検索結果を表示しています。

Datasetsに関するエントリは55件あります。機械学習、 dataset、データなどが関連タグです。人気エントリには『Papers with Code - Machine Learning Datasets』などがあります。

Papers with Code - Machine Learning Datasets
- 66 users
- paperswithcode.com
- 学び
- 2021/02/03
CIFAR-10 (Canadian Institute for Advanced Research, 10 classes) The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but no
litagin/moe-speech · Datasets at Hugging Face
- 36 users
- huggingface.co
- テクノロジー
- 2024/01/24
Not-For-All-Audiences This repository has been marked as containing sensitive content and may contain potentially harmful and sensitive information. View dataset card
Papers With CodeのDatasets：人気度まで分かるデータセット一覧サイト
- 31 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2021/06/30
Papers With CodeのDatasets：人気度まで分かるデータセット一覧サイト：AI・機械学習のデータセット辞典データセットが効率よく見つけられるPapers With CodeのDatasetsを紹介。各データセットのページでは、データセット利用に向くタスクや、ベストな性能を発揮するモデル、コードありの論文、各ライブラリのデータローダー、データセットの人気傾向などを確認できる。連載目次最近、非常に有用な新しいデータセットの一覧サイトが登場したので紹介したい。 Papers With CodeのDatasetsとは？「Papers With Code」というサイトをご存じだろうか？　さまざまなタスク（例えば画像分類やテキスト生成など）に対して現時点でベストな性能を発揮する「機械学習モデル」や、スターの多い「コードあり論文」などをランキング形式で紹介してくれる、無料でオ
GitHub - google-research/deduplicate-text-datasets
- 25 users
- github.com/google-research
- テクノロジー
- 2021/07/20
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

Hugging FaceのDatasets：自然言語処理のデータセット提供サイト
- 24 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2021/11/01
英語ではあるが、詳細な説明は要らないだろう。簡単に概説しておくと、右側にはダウンロード数順で人気のデータセットが一覧表示されている。キーワード検索できるだけでなく、左側の［Task Categories］（タスクのカテゴリー：問題種別の大まかな大分類）／［Tasks］（タスク：より具体的な問題種別）／［Languages］（言語）／［Multilinguality］（多言語性）／［Sizes］（データサイズ）／［Licenses］（ライセンス）でフィルタリングできる。機械学習の際に「どのデータセットを使えばよいか」を悩むことはよくあると思うが、このランキング表示は非常に参考になるのではないだろうか。各データセットのページ内容図1のデータセット名（例えばwikitext）をクリックしてページを開くと、図2のように表示される。これも直観的に把握できると思うので、細かな説明は不要だと思
GitHub - BandaiNamcoResearchInc/Bandai-Namco-Research-Motiondataset: This repository provides motion datasets collected by Bandai Namco Research Inc
- 24 users
- github.com/BandaiNamcoResearchInc
- テクノロジー
- 2022/04/28
This repository provides motion datasets collected by Bandai Namco Research Inc. Find here for a README in Japanese. There is a long-standing interest in making diverse stylized motions for games and movies that pursue realistic and expressive character animation; however, creating new movements that include all the various styles of expression using existing methods is difficult. Due to this, Mot
- 3d
- オープンデータ
- cg
- 3DCG
- github
Free public datasets for COVID-19 | Google Cloud Blog
- 21 users
- cloud.google.com
- テクノロジー
- 2020/03/31
COVID-19 public datasets: supporting organizations in their pandemic responseSee how organizations have used the BigQuery COVID-19 public dataset for research, healthcare, and more. By Johanna Katz • 5-minute read These datasets remove barriers and provide access to critical information quickly and easily, eliminating the need to search for and onboard large data files. Researchers can access the
- data
- google
- covid-19
- あとで読む
Nerfgun3/bad_prompt · Datasets at Hugging Face
- 17 users
- huggingface.co
- テクノロジー
- 2022/11/18
Negative Embedding / Textual Inversion Idea The idea behind this embedding was to somehow train the negative prompt as an embedding, thus unifying the basis of the negative prompt into one word or embedding. Side note: Embedding has proven to be very helpful for the generation of hands! :) Usage To use this embedding you have to download the file aswell as drop it into the "\stable-diffusion-webui
LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS | LAION
- 13 users
- laion.ai
- テクノロジー
- 2022/08/24
LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETSby: Romain Beaumont, 31 Mar, 2022 We present a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text dataset in the world - see also our NeurIPS2022 paper See our update on the LAION-5B dataset. Large image-text models like ALIGN, BASIC, Turing Bletchly, FLO
CC-100: Monolingual Datasets from Web Crawl Data
- 13 users
- data.statmt.org
- テクノロジー
- 2020/11/02
This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus comprises of monolingual data for 100+ languages and also includes data for romanized languages (indicated by *_rom). This was constructed using the urls and paragraph indices provided by the CC-Net repository by processing January-December 2018 Commoncrawl snapshots. Each file comprises of documents separated b
- 自然言語処理
- dataset
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
- 12 users
- arxiv.org
- 学び
- 2023/03/06
In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from ra
Overture Maps Foundation Releases General Availability of its Open Maps Datasets – Overture Maps Foundation
- 9 users
- overturemaps.org
- 学び
- 2024/07/26
New data on 200 million+ addresses added in alpha release SAN FRANCISCO, Calif. —July, 24 2024 — The Overture Maps Foundation, a collaborative effort to enable current and next-generation interoperable open map services and products, today announced the General Availability (GA) of several of its global open maps datasets, paving the way for new and expanded use cases for a wide range of commercia
- map
kunishou/databricks-dolly-15k-ja · Datasets at Hugging Face
- 7 users
- huggingface.co
- テクノロジー
- 2023/04/13
ヴァージン・オーストラリア航空は、2000年8月31日にヴァージン・ブルー航空として、2機の航空機で単一路線の運航を開始しました。ヴァージン・オーストラリア航空（Virgin Australia Airlines Pty Ltd）はオーストラリアを拠点とするヴァージン・ブランドを冠する最大の船団規模を持つ航空会社です。2000年8月31日に、ヴァージン・ブルー空港として、2機の航空機、1つの空路を運行してサービスを開始しました。2001年9月のアンセット・オーストラリア空港の崩壊後、オーストラリアの国内市場で急速に地位を確立しました。その後はブリスベン、メルボルン、シドニーをハブとして、オーストラリア国内の32都市に直接乗り入れるまでに成長しました。
- LLM
- dataset
elyza/ELYZA-tasks-100 · Datasets at Hugging Face
- 7 users
- huggingface.co
- テクノロジー
- 2023/08/29
傍若無人にふるまう人もいますが。）\n\n上記の文章を読んで、に入れるのに最も適したものを以下の選択肢から選び、その理由を答えなさい。\n\n- だから\n- また\n- むしろ\n- もちろん"},"output":{"kind":"string","value":"文章中のに適した選択肢は「もちろん」です。\n\nの周辺では自発的な意志によって社会の秩序が保たれているという筆者の主張に対し、「傍若無人にふるまう人もいる」という反論を予想し予め掲げているため、「もちろん」が適切です。"},"eval_aspect":{"kind":"string","value":"- 選択肢を外している場合: -4点\n- 理由が的外れな場合: -2点\n- 理由の説明として（反論を）予想する旨が記述されていない場合: -1点\n"}}},{"rowIdx":7,"cells":{"input":{
GitHub - finos/perspective: A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
- 6 users
- github.com/finos
- テクノロジー
- 2023/02/05
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
wiki40b | TensorFlow Datasets
- 6 users
- www.tensorflow.org
- テクノロジー
- 2020/09/26
Deploy ML on mobile, microcontrollers and other edge devices
izumi-lab/llm-japanese-dataset · Datasets at Hugging Face
- 5 users
- huggingface.co
- テクノロジー
- 2023/05/23
「abc ～the first～」へようこそ！さて、ABC・・・と始まるアルファベットは、全部で何文字でしょう？
hotchpotch/JQaRA · Datasets at Hugging Face
- 5 users
- huggingface.co
- 世の中
- 2024/03/04
2016年の初めころから、週刊文春のスクープ記事により政治家・著名人が辞任や活動停止に追い込まれるケースが増えたことから『文春砲』と恐れられるようになった。2004年に鈴木が編集長に就任して以来、読売新聞および読売新聞グループ本社会長・渡邉恒雄を徹底的に批判しているが、読売から損害賠償請求や謝罪広告を求める訴訟を起こされ、その多くで敗訴している。他の大手出版社と異なり、ジャニーズ事務所が影響力をほとんど持たないため、1999年から2000年にかけて社長・ジャニー喜多川の児童(ジャニーズJr.の研修生たち)への性的虐待疑惑を報道した。ニューヨーク・タイムズやオブザーバーなどの国外メディアも後追いし、国会でも取り上げられるなど内外に波紋を広げた。
GitHub - roapi/roapi: Create full-fledged APIs for slowly moving datasets without writing a single line of code.
- 5 users
- github.com/roapi
- テクノロジー
- 2022/01/12
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- rust
- OSS
- database
- data
- API
- あとで読む
Sharing new research, models, and datasets from Meta FAIR
- 5 users
- ai.meta.com
- テクノロジー
- 2024/10/19
Sharing new research, models, and datasets from Meta FAIR Today, Meta FAIR is publicly releasing several new research artifacts in support of our goal of achieving advanced machine intelligence (AMI) while also supporting open science and reproducibility.The work we’re sharing today includes Meta Segment Anything 2.1 (SAM 2.1), an update to our popular Segment Anything Model 2 for images and video
- Meta
huggingface datasetsで使える日本語データセットのまとめ
- 5 users
- tech.yellowback.net
- テクノロジー
- 2021/12/07
エントリ数:データセットサイズ: dataset.dataset_size の値。圧縮されたファイルを展開した後のサイズ。ダウンロードサイズ: dataset.download_size の値。ダウンロードするサイズ。cc100エントリ数: 458,387,942データセットサイズ: 82,042,212,602ダウンロードサイズ: 15,916,192,184ストリーミング: 不可オリジナルのデータセットは、空白行を挟んで 1 ブロックのようなテキストファイルになっているようですが、 datasets ライブラリでは、1 サンプル 1 行として読み込んでいるようです。 dataset = load_dataset('cc100', lang='ja', split='train') cc100 サンプル{'id': '0', 'text': '午後から雨が心配だったので遠出はせず、『ふれ
GitHub - huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
- 5 users
- github.com/huggingface
- テクノロジー
- 2020/05/29
🤗 Datasets is a lightweight library providing two main features: one-line dataloaders for many public datasets: one-liners to download and pre-process any of the major public datasets (image datasets, audio datasets, text datasets in 467 languages and dialects, etc.) provided on the HuggingFace Datasets Hub. With a simple command like squad_dataset = load_dataset("squad"), get any of these datase
Pytorchで「月とすっぽん」の画像認識をしてみた（kerasのfrom_from_directryにあたるtorchvision.datasets.ImageFolder使用） - Qiita
- 4 users
- qiita.com/ryryry
- テクノロジー
- 2020/04/21
Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article?
Sharing new research, models, and datasets from Meta FAIR
- 4 users
- ai.meta.com
- テクノロジー
- 2024/06/19
For more than a decade, Meta’s Fundamental AI Research (FAIR) team has focused on advancing the state of the art in AI through open research. As innovation in the field continues to move at a rapid pace, we believe that collaboration with the global AI community is more important than ever. Maintaining an open science approach and sharing our work with the community help us stay true to our goal o
- Meta
- 機械学習
Hosting your Models and Datasets on Hugging Face Spaces using Streamlit
- 4 users
- huggingface.co
- テクノロジー
- 2021/10/18
Hosting your Models and Datasets on Hugging Face Spaces using Streamlit Showcase your Datasets and Models using Streamlit on Hugging Face Spaces Streamlit allows you to visualize datasets and build demos of Machine Learning models in a neat way. In this blog post we will walk you through hosting models and datasets and serving your Streamlit applications in Hugging Face Spaces. Building demos for
GitHub - robvanvolt/DALLE-datasets: This is a summary of easily available datasets for generalized DALLE-pytorch training.
- 4 users
- github.com/robvanvolt
- テクノロジー
- 2021/05/24
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- dataset
- *あとで読む
Datasets
- 4 users
- www.bigcode-project.org
- テクノロジー
- 2023/08/09
Opt-out request submitted by 09.02.2023 were excluded from this ersion of the dataset as well as initially flagged malicious files (not exhaustive). Datasets and data governance tools released by BigCode #The Stack: Exact deduplicated version of The Stack.The Stack dedup: Near deduplicated version of The Stack (recommended for training).The Stack issues: Collection of GitHub issues.The Stack Metad
- あとで読む
OpenAssistant/oasst1 · Datasets at Hugging Face
- 4 users
- huggingface.co
- テクノロジー
- 2023/04/16
'Jew' or 'rabbi'"},"role":{"kind":"string","value":"assistant"},"lang":{"kind":"string","value":"en"},"review_count":{"kind":"number","value":3,"string":"3"},"review_result":{"kind":"string","value":"true"},"deleted":{"kind":"string","value":"false"},"rank":{"kind":"number","value":1,"string":"1"},"synthetic":{"kind":"string","value":"false"},"model_name":{"kind":"null"},"detoxify":{"kind":"string
- dataset
GitHub - llm-jp/text2dataset: Easily turn large English text datasets into Japanese text datasets using open LLMs.
- 4 users
- github.com/llm-jp
- テクノロジー
- 2024/10/30
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Well-tuned Simple Nets Excel on Tabular Datasets
- 4 users
- arxiv.org
- テクノロジー
- 2021/06/23
Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularizati
Object Detection Datasets
- 3 users
- public.roboflow.com
- テクノロジー
- 2020/07/07
Roboflow hosts free public computer vision datasets in many popular formats (including CreateML JSON, COCO JSON, Pascal VOC XML, YOLO v3, and Tensorflow TFRecords). For your convenience, we also have downsized and augmented versions available. If you'd like us to host your dataset, please get in touch.
TensorFlow Datasets: The Bad Parts
- 3 users
- www.determined.ai
- テクノロジー
- 2020/08/01
TLDR: TensorFlow’s tf.data API is a popular approach to loading data into deep learning models. Although tf.data has a lot of powerful features, it is built around sequential access to the underlying data set. This design makes it difficult to efficiently shuffle large data sets, to shard data when doing distributed training, and to implement fault-tolerant training. We argue that random access sh
bigcode/the-stack · Datasets at Hugging Face
- 3 users
- huggingface.co
- テクノロジー
- 2022/10/29
Terms of Use for The Stack\nThe Stack dataset is a collection of source code in over 300 programming languages. We ask that you read and acknowledge the following points before using the dataset: \n\nThe Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including a
- dataset
A Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets (Recsys 2020) 読んだ - 糞糞糞ネット弁慶
- 3 users
- repose.hatenadiary.jp
- 学び
- 2021/01/03
論文 Gunosy と理研AIPの論文．企業が持つ implicit feedback のデータを公開するためにはアクティブユーザ数や収益や平均クリック数といった business metric を隠したい公平性を担保したい Population Bias を減らしたいという三つの気持ちがある．今回はログ中のユーザをサンプリングして公開用データを構築するわけですが，ユーザごとにサンプリング時の重み w を推定する問題として定式化する．この時 business metric を隠すために，サンプリング後のクリック数の分布と特定の分布 (zipf など) との Wasserstein distance を取る公平性のためにサンプリング後のユーザの属性の分布と uniform distribution との KL divergence Population Bias 対策でサンプリ
- あとで読む
PandaSet Open Datasets - Scale
- 3 users
- scale.com
- 暮らし
- 2020/05/28
Scene #1Scene #2Scene #3Scene #4Scene #5Scene #6Scene #7Scene #8
- dataset
GitHub - NVIDIA-Merlin/NVTabular: NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
- 3 users
- github.com/NVIDIA-Merlin
- テクノロジー
- 2022/11/20
NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library. NVTabular is a component of NVIDIA Merlin, an open source framework for build
GitHub - langfuse/langfuse: 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
- 3 users
- github.com/langfuse
- テクノロジー
- 2023/08/30
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Reindex, Transform, and Aggregate datasets using pandas Library
- 3 users
- www.ejable.com
- テクノロジー
- 2023/10/01
Most of the time, the dataset we will get from the business will be dirty and cannot be used straight forward to train machine learning models. Therefore, we must treat the dataset and bring it to the desired form to input it into an algorithm. This tutorial discusses reindexing, transforming, and aggregating datasets in Pandas. What are Reindexing, Transforming, and Aggregating?Reindexing, transf
Shota Imai@えるエル on Twitter: "既に機械学習界のインフラ的サイトになっていたPaper With Codeに続いて，論文のデータセットもまとめた「Paper with Datasets」（名前はツイッターでの呼称で，サイト自体は同じ）ができたようです… https://t.co/suec5vimBx"
- 3 users
- twitter.com/ImAI_Eruel
- テクノロジー
- 2021/04/16
既に機械学習界のインフラ的サイトになっていたPaper With Codeに続いて，論文のデータセットもまとめた「Paper with Datasets」（名前はツイッターでの呼称で，サイト自体は同じ）ができたようです… https://t.co/suec5vimBx
GitHub - spiceai/spiceai: A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.
- 3 users
- github.com/spiceai
- テクノロジー
- 2021/09/08
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- OSS
- tool