[SPARK-25390] Data source V2 API refactoring - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Target Version/s:

3.0.0

Description

Currently it's not very clear how we should abstract data source v2 API. The abstraction should be unified between batch and streaming, or similar but have a well-defined difference between batch and streaming. And the abstraction should also include catalog/table.

An example of the abstraction:

batch: catalog -> table -> scan
streaming: catalog -> table -> stream -> scan

We should refactor the data source v2 API according to the abstraction

Attachments

Issue Links

blocks

SPARK-25186 Stabilize Data Source V2 API

In Progress

Sub-Tasks

1.	data source V2 API refactoring (batch read)	Resolved	Wenchen Fan
2.	data source v2 API refactor (batch write)	Resolved	Wenchen Fan
3.	Remove SaveMode from data source v2 API	Resolved	Wenchen Fan
4.	data source V2 API refactoring (micro-batch read)	Resolved	Wenchen Fan
5.	data source V2 API refactoring (continuous read)	Resolved	Wenchen Fan
6.	data source v2 API refactor: streaming write	Resolved	Wenchen Fan
7.	remove streaming output mode from data source v2 APIs	Resolved	Wenchen Fan
8.	create StreamingWrite at the begining of streaming execution	Resolved	Wenchen Fan
9.	data source v2 API improvement	Resolved	Wenchen Fan
10.	Add DataSourceV2 capabilities to check support for batch append, overwrite, truncate during analysis.	Resolved	Ryan Blue
11.	Add DataSourceV2 capabilities for streaming	Resolved	Wenchen Fan
12.	move data source v2 API to catalyst module	Resolved	Wenchen Fan
13.	table capabilty to skip the output column resolution	Resolved	Wenchen Fan

Activity

People

Assignee:: Wenchen Fan

Reporter:: Wenchen Fan

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 10/Sep/18 05:35

Updated:: 12/Dec/22 18:10

Resolved:: 09/Oct/19 05:26