[B! serde] manboubirdのブックマーク

manboubird id:manboubird

serdeに関するmanboubirdのブックマーク (27)

https://airflow.apache.org/docs/apache-airflow/stable/dag-serialization.html
manboubird 2020/11/06
airflow

dag

serde

dagSerialization
リンク
Apache Iceberg - Apache Iceberg™
What is Apache Iceberg™? Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Expressive SQL Iceberg supports flexible SQL commands to merge new data, update existing rows, an
manboubird 2020/07/02
apacheIceberg

schemaEvolution

serde
リンク
Making Sense of Performance in Data Analytics Frameworks | USENIX
Kay Ousterhout, University of California, Berkeley; Ryan Rasti, University of California, Berkeley, International Computer Science Institute, and VMware; Sylvia Ratnasamy, University of California, Berkeley; Scott Shenker, University of California, Berkeley, and International Computer Science Institute; Byung-Gon Chun, Seoul National University There has been much research devoted to improving the
manboubird 2020/05/25
paper

nsdi

comparizon

serde

bigData
リンク
Kafka with AVRO vs., Kafka with Protobuf vs., Kafka with JSON Schema
Kafka serialisation schemes — playing with AVRO, Protobuf, JSON Schema in Confluent Streaming Platform. The code for these examples available at https://github.com/saubury/kafka-serialization Apache Avro was has been the default Kafka serialisation mechanism for a long time. Confluent just updated their Kafka streaming platform with additional support for serialising data with Protocol buffers (or
manboubird 2020/05/06
jsonSchema

Kafka

avro

serde

schemaManagement
リンク
GitHub - google/flatbuffers: FlatBuffers: Memory Efficient Serialization Library
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2020/03/28
flatbuffers

serde

game

google
リンク
Introducing SLOG: Cheating the low-latency vs. strict serializability tradeoff
manboubird 2019/11/24
paper

research

serde

vldb
リンク
Demystifying Apache Arrow Flight
manboubird 2019/08/29
apacheArrow

serde
リンク
Michelangelo PyML: Introducing Uber's Platform for Rapid Python ML Model Development
AIMichelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model DevelopmentOctober 23, 2018 / Global As a company heavily invested in AI, Uber aims to leverage machine learning (ML) in product development and the day-to-day management of our business. In pursuit of this goal, our data scientists spend considerable amounts of time prototyping and validating powerful new types of ML model
manboubird 2019/01/01
uber

machineLearning

dataPlatform

pyMl

dataLake

dataModeling

serde

mlOps

deploy

devOps
リンク
GitHub - cloudpipe/cloudpickle: Extended pickling support for Python objects
cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library. cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data. Among other things, cloudpickle supports pickling for lambda functions along with functions and classes d
manboubird 2018/11/24
python

cloudpickle

pickle

serde
リンク
Zarr-Python — zarr 2.16.1 documentation
manboubird 2018/06/25
zarr

python

serde

compression
リンク
Page Not Found
Sorry, but the page you were trying to view does not exist — perhaps you can try searching for it below.
manboubird 2018/05/21
rfc

spec

comparizon

serde

json

avro

csv
リンク
"チープ"にビッグデータを扱うのならMessagePack＋LZ4がいい感じ【データベースと対決】 - Qiita
皆さんはビッグデータを扱うときどのような形式で保存していますか？ここでいうビッグデータとは数GB～数十GB（笑）のJSONです。Mongo DBのようなNoSQLなデータベース使う？素晴らしいと思います。PostgreSQLでJSONを使う？とても良いと思います。ここでは、データベースという枠組みから外れて、「ファイルシステム」を中心に手軽に**お安く（ここポイント）**ビッグデータを扱うことを考えます。なので、この方法は最速ではありませんし、個人がちょっと遊んでみようというときに気楽にできる”チープ”な物です1。企業でやるならちゃんとしたデータベースを使うべきです。その前提で読んでみてください（ちょっと長いです）。ファイルシステムは、テキストファイルやZip アーカイブといったただのファイルです。ただのファイルなので、データベースが得意なインデックスも効きませんし、検索や結合も弱いですし
manboubird 2018/02/25
messagePack

lz4

compression

comparizon

serde
リンク
CBOR — Concise Binary Object Representation | Overview
RFC 8949 Concise Binary Object Representation “The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.” JSON data model CBOR is based on the wildly successful JSON data model: numbers, strings, arrays, maps (called objects in JSON)
manboubird 2018/02/25
rfc

cbor

serde

json
リンク
GitHub - ultrajson/ultrajson: Ultra fast JSON decoder and encoder written in C with Python bindings
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2018/02/25
ultraJson

python

serde
リンク
How Uber Engineering Evaluated JSON Encoding and Compression Algorithms to Put the Squeeze on Trip Data
For compression, we put three lossless and widely accepted libraries to the test: Snappy zlib Bzip2 (BZ2) Snappy aims to provide high speeds and reasonable compression. BZ2 trades speed for better compression, and zlib falls somewhere between them. Testing Our goal was to find the combination of encoding protocol and compression algorithm with the most compact result at the highest speed. We teste
manboubird 2018/02/25
uber

json

compression

messagePack

serde

comparizon

zlib
リンク
What is PFA for? · PFA
manboubird 2017/10/02
pfa

machineLearning

serde
リンク
File Format Benchmark - Avro, JSON, ORC & Parquet
This document summarizes a benchmark study of file formats for Hadoop, including Avro, JSON, ORC, and Parquet. It found that ORC with zlib compression generally performed best for full table scans. However, Avro with Snappy compression worked better for datasets with many shared strings. The document recommends experimenting with the benchmarks, as performance can vary based on data characteristic
manboubird 2016/10/29
slide

avro

serde

comparizon

hadoopSummit

parquet

orcFile

json

schemaManagement
リンク
HiveでJSONデータを処理するあれこれ(中級編) - Qiita
Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article?
manboubird 2015/04/14
hive

udf

json

serde
リンク
Home
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert Test Platform OS:Mac OS X JVM:Oracle Corporation 11.0.19 CPU:2.6 GHz 6-Core Intel Core i7 os-arch:Darwin Kernel Version 21.6.0 Cores (incl HT):12 Disclaimer Th
manboubird 2013/12/02
jvmSerializers

java

jvm

lib

serde
リンク
GitHub - jghoman/haivvreo: Hive + Avro. Serde for working with Avro in Hive
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2013/05/11
hive

serde

avro

linkedIn
リンク
1 2 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx