Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

feat!: Enable reading JSON data with dbjson extension dtype#1139

Merged
tswast merged 6 commits intomainfrom
main_chelsealin_readdbjsontype
Jan 23, 2025
Merged

feat!: Enable reading JSON data with dbjson extension dtype#1139
tswast merged 6 commits intomainfrom
main_chelsealin_readdbjsontype

Conversation

@chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented Nov 7, 2024

feat!: Enable reading JSON data with dbjson extension dtype (#1139)

This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Release-As: 1.34.0

  • Fixes internal issue 377764399
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 377764399 🦕

@chelsea-lin chelsea-lin requested a review from tswast November 7, 2024 21:24
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 7, 2024
@chelsea-lin chelsea-lin changed the title feat: Enable reading/writing JSON data with dbjson extension dtype feat: Enable reading JSON data with dbjson extension dtype Nov 7, 2024
@tswast tswast changed the title feat: Enable reading JSON data with dbjson extension dtype feat!: Enable reading JSON data with dbjson extension dtype Nov 8, 2024
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Let's make sure we mark this as a "breaking change" in our release notes (https://github.com/googleapis/release-please/blob/main/README.md#how-should-i-write-my-commits)

Since it's a breaking change for a preview feature, we shouldn't bump to 2.0 though. Let's use the Release-As footer in the commit message to make sure we do a 1.x release. https://github.com/googleapis/release-please/blob/main/README.md#how-do-i-change-the-version-number

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 7c81975 to 48ca926 Compare November 13, 2024 19:43
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Nov 13, 2024
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 48ca926 to 2707038 Compare January 22, 2025 18:22
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jan 22, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_readdbjsontype branch from 2707038 to 6e1aacc Compare January 22, 2025 19:39
@chelsea-lin chelsea-lin marked this pull request as ready for review January 22, 2025 19:40
@chelsea-lin chelsea-lin requested a review from a team as a code owner January 22, 2025 19:40
@chelsea-lin chelsea-lin requested review from a team and jialuoo January 22, 2025 19:40
@chelsea-lin chelsea-lin requested a review from tswast January 23, 2025 18:18
Comment on lines -206 to -211
# b/381148539
def test_json_in_struct():
df = bpd.read_gbq(
"SELECT STRUCT(JSON '{\\\"a\\\": 1}' AS data, 1 AS number) as struct_col"
)
assert df["struct_col"].struct.field("data")[0] == '{"a":1}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep / update this test, instead? I'd like to make sure we avoid regressions since I believe this was added to make sure we can work with some AI/ML/ObjectRef features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I moved this test to test_dataframe_io.py. Also add similar tests for both struct and array

Co-authored-by: Tim Sweña (Swast) <swast@google.com>
@tswast tswast merged commit f672262 into main Jan 23, 2025
22 checks passed
@tswast tswast deleted the main_chelsealin_readdbjsontype branch January 23, 2025 22:33
shuoweil pushed a commit that referenced this pull request Jan 24, 2025
This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Release-As: 1.34.0
shuoweil pushed a commit that referenced this pull request Jan 24, 2025
This change updates how we handle JSON data types read from BigQuery.

Previously, BigQuery JSON types were treated as generic large strings within our system. To improve accuracy and functionality, we now map them to a dedicated JSON data type (db_dtypes.JSONType or db_dtypes.JSONArrowType for pyarrow).

While this provides a more appropriate representation of JSON data, it's important to note that this feature is still in preview and may evolve.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Release-As: 1.34.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants