fix: json in struct destination type by GarrettWu · Pull Request #1187 · googleapis/python-bigquery-dataframes

GarrettWu · 2024-12-04T01:43:37Z

Patches the json output as BQ JSON type instead of STRING. Which avoids JSON destination table error in b/381148539 and unblocks Multimodal integrations.

Our current implementation convert JSON to STRING completely when reading into BigFrames. Here it uses pd.ArrowDtype(pa.large_string()) as BigFrames Dtype and pa.large_string() as PA dtype, and pass the information though in BigFrames to determine the output type and parse STR back to JSON at the end.

It is a workaround of the workaround of JSON as STR. We shall rework the JSON support after pyarrow brings in json type.

tswast · 2024-12-10T19:22:30Z

Question: Does this change anything with regards to I/O? For example, I notice with read_csv with the pandas-y code path, the DataFrame from pandas puts string columns as the large_string() in my testing. Hopefully such columns don't end up as JSON in BigQuery now.

GarrettWu · 2024-12-10T22:28:36Z

Question: Does this change anything with regards to I/O? For example, I notice with read_csv with the pandas-y code path, the DataFrame from pandas puts string columns as the large_string() in my testing. Hopefully such columns don't end up as JSON in BigQuery now.

How did you test? I tried pandas df with pa.large_string dtype is interpreted as normal string. It shouldn't affect input as large_string, since we convert it to normal string in our I/O.

screen/763yZzyywmEVWbR

destination table

screen/4u6scyqtczXSAK8

tswast · 2024-12-11T19:09:08Z

How did you test?

It was part of my debugger step-through of #371 but it's possible I accidentally changed something in the I/O path there.

shobsi · 2024-12-15T03:07:10Z

bigframes/core/__init__.py

        if any(i.field_type == "JSON" for i in table.schema if i.name in schema.names):
            warnings.warn(
-                "Interpreting JSON column(s) as StringDtype. This behavior may change in future versions.",
+                "Interpreting JSON column(s) as StringDtype and pyarrow.large_string. This behavior may change in future versions.",


Shouldn't we just say "... as pyarrow.large_string"? StringDtype here is not relevant anymore, no?

shobsi · 2024-12-15T03:10:23Z

bigframes/core/blocks.py

 T1 AS (
    SELECT *,
-           JSON_OBJECT(
+           TO_JSON_STRING(JSON_OBJECT(


Do we need an explicit TO_JSON_STRING? Doesn't the change in ibis_types.py::ibis_dtype_to_bigframes_dtype take care of this post read?

if isinstance(ibis_dtype, ibis_dtypes.JSON): ... return bigframes.dtypes.JSON_DTYPE

fix: json in struct destination type

e96db9e

GarrettWu requested review from TrevorBergeron and tswast December 4, 2024 01:43

GarrettWu self-assigned this Dec 4, 2024

GarrettWu requested a review from a team as a code owner December 4, 2024 01:43

GarrettWu requested a review from a team December 4, 2024 01:43

product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 4, 2024

GarrettWu marked this pull request as draft December 4, 2024 20:47

patch output json_str to json

e5ed6c5

product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Dec 5, 2024

GarrettWu added 5 commits December 6, 2024 01:12

Merge remote-tracking branch 'github/main' into garrettwu-json

baba027

fix type hint

d7b2be1

fix type hint

c9b7410

remove draft

0294837

Merge remote-tracking branch 'github/main' into garrettwu-json

25ac5a4

GarrettWu marked this pull request as ready for review December 6, 2024 18:46

GarrettWu added 4 commits December 10, 2024 03:12

use pd.ArrowDtype(pa.large_string) as bf dtype

ec43920

wording

e426ab6

Merge remote-tracking branch 'github/main' into garrettwu-json

061e2ee

fix tests

9bf25bd

GarrettWu added 2 commits December 10, 2024 22:48

fix tests

d978358

Merge remote-tracking branch 'github/main' into garrettwu-json

2cdced5

GarrettWu added 2 commits December 12, 2024 01:16

remove json conversions

6d8fce2

Merge remote-tracking branch 'github/main' into garrettwu-json

f7bab4f

TrevorBergeron approved these changes Dec 12, 2024

View reviewed changes

fix remote func

70df7ca

GarrettWu enabled auto-merge (squash) December 12, 2024 22:24

GarrettWu merged commit 200c9bb into main Dec 12, 2024

GarrettWu deleted the garrettwu-json branch December 12, 2024 22:25

release-please bot mentioned this pull request Dec 12, 2024

chore(main): release 1.30.0 #1215

Merged

shobsi reviewed Dec 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: json in struct destination type#1187

fix: json in struct destination type#1187
GarrettWu merged 16 commits intomainfrom
garrettwu-json

GarrettWu commented Dec 4, 2024 •

edited

Loading

Uh oh!

tswast commented Dec 10, 2024

Uh oh!

GarrettWu commented Dec 10, 2024 •

edited

Loading

Uh oh!

tswast commented Dec 11, 2024

Uh oh!

shobsi Dec 15, 2024

Uh oh!

shobsi Dec 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

GarrettWu commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tswast commented Dec 10, 2024

Uh oh!

GarrettWu commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tswast commented Dec 11, 2024

Uh oh!

shobsi Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

shobsi Dec 15, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GarrettWu commented Dec 4, 2024 •

edited

Loading

GarrettWu commented Dec 10, 2024 •

edited

Loading