Conversation
|
Question: Does this change anything with regards to I/O? For example, I notice with |
How did you test? I tried pandas df with pa.large_string dtype is interpreted as normal string. It shouldn't affect input as large_string, since we convert it to normal string in our I/O. screen/763yZzyywmEVWbR destination table screen/4u6scyqtczXSAK8 |
It was part of my debugger step-through of #371 but it's possible I accidentally changed something in the I/O path there. |
| if any(i.field_type == "JSON" for i in table.schema if i.name in schema.names): | ||
| warnings.warn( | ||
| "Interpreting JSON column(s) as StringDtype. This behavior may change in future versions.", | ||
| "Interpreting JSON column(s) as StringDtype and pyarrow.large_string. This behavior may change in future versions.", |
There was a problem hiding this comment.
Shouldn't we just say "... as pyarrow.large_string"? StringDtype here is not relevant anymore, no?
| T1 AS ( | ||
| SELECT *, | ||
| JSON_OBJECT( | ||
| TO_JSON_STRING(JSON_OBJECT( |
There was a problem hiding this comment.
Do we need an explicit TO_JSON_STRING? Doesn't the change in ibis_types.py::ibis_dtype_to_bigframes_dtype take care of this post read?
if isinstance(ibis_dtype, ibis_dtypes.JSON):
...
return bigframes.dtypes.JSON_DTYPE
Patches the json output as BQ JSON type instead of STRING. Which avoids JSON destination table error in b/381148539 and unblocks Multimodal integrations.
Our current implementation convert JSON to STRING completely when reading into BigFrames. Here it uses pd.ArrowDtype(pa.large_string()) as BigFrames Dtype and pa.large_string() as PA dtype, and pass the information though in BigFrames to determine the output type and parse STR back to JSON at the end.
It is a workaround of the workaround of JSON as STR. We shall rework the JSON support after pyarrow brings in json type.