feat: Implement ST_LENGTH geography function#1791
Conversation
This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters. The implementation includes: - A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`. - The user-facing function `st_length` in `bigframes.bigquery._operations.geo`. - Exposure of the new operation and function in relevant `__init__.py` files. - Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs. The function behaves as per the BigQuery ST_LENGTH documentation: - Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs. - Returns the perimeter for POLYGON and MULTIPOLYGON. - Returns the total length for LINESTRING and MULTILINESTRING. - For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.
This commit adds a `length` property to the `GeoSeries` class. Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead. This change includes: - The `length` property in `bigframes/geopandas/geoseries.py`. - A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed.
This commit applies a user-provided patch that includes: - Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`. - Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`. - Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`. - Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`. Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment.
This commit introduces the `use_spheroid` parameter to the `ST_LENGTH`
geography function, aligning it more closely with the BigQuery
ST_LENGTH(geography_expression[, use_spheroid]) signature.
Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass
that accepts `use_spheroid` (defaulting to `False`). A check is
included to raise `NotImplementedError` if `use_spheroid` is `True`,
as this is the current limitation in BigQuery.
- The Ibis compiler implementation for `geo_st_length_op` in
`bigframes.core.compile.scalar_op_compiler.py` has been updated
to accept the new `GeoStLengthOp` operator type.
- The user-facing `st_length` function in
`bigframes.bigquery._operations.geo.py` now includes the
`use_spheroid` keyword argument.
- The docstring for `st_length` has been updated to match the
official BigQuery documentation, clarifying that only lines contribute
to the length (points and polygons result in 0 length), and
detailing the `use_spheroid` parameter. Examples have been
updated accordingly.
- Tests in `tests/system/small/bigquery/test_geo.py` have been
updated to:
- Reflect the correct behavior (0 length for polygons/points).
- Test calls with both default `use_spheroid` and explicit
`use_spheroid=False`.
- Verify that `use_spheroid=True` raises a `NotImplementedError`.
Note: System tests requiring Google Cloud authentication were not
re-executed for this specific commit due to environment limitations
identified in previous steps. The changes primarily affect the operator
definition, function signature, and client-side validation, with the
core Ibis compilation logic for length remaining unchanged.
This commit refactors the ST_LENGTH implementation to correctly
pass the `use_spheroid` parameter to BigQuery by using Ibis's
`ibis_udf.scalar.builtin('ST_LENGTH', ...)` function.
Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side
`NotImplementedError` for `use_spheroid=True` (raised in
`__post_init__`) has been removed. BigQuery DataFrames will now
pass this parameter directly to BigQuery.
- `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`:
The implementation now always uses
`ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)`
instead of `x.length()`. This ensures the `use_spheroid`
parameter is included in the SQL generated for BigQuery.
- `tests/system/small/bigquery/test_geo.py`:
- The test expecting a client-side `NotImplementedError` for
`use_spheroid=True` has been removed.
- A new test `test_st_length_use_spheroid_true_errors_from_bq`
has been added. This test calls `st_length` with
`use_spheroid=True` and asserts that an exception is raised
from BigQuery, as BigQuery itself currently only supports
`use_spheroid=False` for the `ST_LENGTH` function.
- Existing tests for `st_length` were already updated in a
previous commit to reflect that only line geometries contribute
to the length, and these continue to verify behavior with
`use_spheroid=False`.
This change ensures that BigQuery DataFrames accurately reflects BigQuery's
`ST_LENGTH` capabilities concerning the `use_spheroid` parameter.
This commit refactors the ST_LENGTH geography operation to use an Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with the pattern exemplified by other built-in functions like ST_DISTANCE when a direct Ibis method with all necessary parameters is not available. Key changes: - A new `st_length` function is defined in `bigframes/core/compile/scalar_op_compiler.py` using `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's `ST_LENGTH(geography, use_spheroid)` function. - The `geo_length_op_impl` in the same file now calls this `st_length` Ibis UDF, replacing the previous use of `op_typing.ibis_function`. - The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and the user-facing `st_length` function in `bigframes/bigquery/_operations/geo.py` remain unchanged from the previous version, as they correctly define the operation's interface and parameters. This change provides a cleaner and more direct way to map the BigQuery DataFrames operation to the specific BigQuery ST_LENGTH SQL function signature, while maintaining the existing BigQuery DataFrames operation structure. The behavior of the `st_length` function, including its handling of the `use_spheroid` parameter and error conditions from BigQuery, remains the same.
This commit refactors the system tests for the `st_length` geography function in `tests/system/small/bigquery/test_geo.py`. The numerous individual test cases for different geometry types have been combined into a single, comprehensive test function `test_st_length_various_geometries`. This new test uses a single GeoSeries with a variety of inputs (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, None/Empty) and compares the output of `st_length` (with both default and explicit `use_spheroid=False`) against a pandas Series of expected lengths. This consolidation improves the conciseness and maintainability of the tests for `st_length`. The test for `use_spheroid=True` (expecting an error from BigQuery) remains separate.
This commit fixes an ImportError caused by an incorrect name being used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`. When `geo_st_length_op` (a variable) was replaced by the dataclass `GeoStLengthOp`, the import and `__all__` list in this `__init__.py` file were not updated. This commit changes the import from `.geo_ops` to correctly import `GeoStLengthOp` and updates the `__all__` list to export `GeoStLengthOp`.
|
@chelsea-lin This is mostly generated via jules.google.com/ but I patched some of the tests to make it work. I think this now ready for review. |
| ) -> bigframes.series.Series: | ||
| """Returns the total length in meters of the lines in the input GEOGRAPHY. | ||
|
|
||
| If geography_expression is a point or a polygon, returns zero. If |
There was a problem hiding this comment.
Update geography_expression into series
| bigframes.series.Series: | ||
| Series of floats representing the lengths in meters. | ||
| """ | ||
| series = series._apply_unary_op(ops.GeoStLengthOp(use_spheroid=use_spheroid)) |
There was a problem hiding this comment.
IIUC from the docstring, we don't support for a True value of use_spheroid. Can you please throw a NotImplementedError here?
There was a problem hiding this comment.
btw, why we cannot support a True value of use_spheroid here?
There was a problem hiding this comment.
This is a server-side limitation: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length
I'd rather avoid any client-side checks, as maybe someday the server side will implement this feature.
I'm not sure why they have the parameter at all if it's not implemented, to be honest.
|
Looks like this broke some other functions. I need to take a closer look: |
This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters.
The implementation includes:
geo_st_length_opinbigframes.operations.geo_ops.st_lengthinbigframes.bigquery._operations.geo.__init__.pyfiles.The function behaves as per the BigQuery ST_LENGTH documentation:
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕