docs: add python code sample to multiple timeseries forecasting#531

Merged

tswast merged 8 commits intomainfrom

May 6, 2024

Contributor

DevStephanie commented Mar 27, 2024 •

edited by shobsi

Loading

BEGIN_COMMIT_OVERRIDE
docs: Add python code sample for multiple forecasting time series (#531)
END_COMMIT_OVERRIDE

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕


          docs: add python code sample to multiple timeseries forecasting

DevStephanie requested a review from a team as a code owner

March 27, 2024 18:13

DevStephanie requested review from a team, junyazhang and nicain

March 27, 2024 18:13

snippet-bot bot commented Mar 27, 2024 •

edited

Loading

Here is the summary of changes.

You are about to add 2 region tags.

samples/snippets/create_multiple_timeseries_forecasting_model.py:19, tag bigquery_dataframes_bqml_arima_multiple_step_2_visualize
samples/snippets/create_multiple_timeseries_forecasting_model.py:72, tag bigquery_dataframes_bqml_arima_multiple_step_3_fit

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

Refresh this comment

product-auto-label bot added the size: m label


          Merge branch 'main' into Stephanie446

product-auto-label bot added api: bigquery samples labels

Your Name added 2 commits

March 27, 2024 13:14


          docs: add python code sample to multiple timeseries forecasting

25e8d63


          Merge branch 'Stephanie446' of https://github.com/googleapis/python-b…

37341e5

…igquery-dataframes into Stephanie446

Contributor Author

DevStephanie commented Mar 27, 2024

Corrected region tags, will be updated on next PR!


          Merge branch 'main' into Stephanie446

1d62110

junyazhang requested review from GarrettWu and ashleyxuu

March 28, 2024 17:36

tswast reviewed

View reviewed changes

samples/snippets/create_multiple_timeseries_forecasting_model.py Show resolved Hide resolved

GarrettWu reviewed

View reviewed changes

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

+                  # Start by selecting the data you'll use for training. `read_gbq_table` accepts
+                  # either a SQL query or a table ID. Since this example selects from multiple
+                  # tables via a wildcard, use SQL to define this data. Watch issue
+                  # https://github.com/googleapis/python-bigquery-dataframes/issues/169

Contributor

GarrettWu Apr 1, 2024

Already fixed

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

+                  # https://github.com/googleapis/python-bigquery-dataframes/issues/169
+                  # for updates to `read_gbq_table` to support wildcard tables.
+                  df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])

Contributor

GarrettWu Apr 1, 2024

If you don't need filters, no need to pass in.

Contributor Author

DevStephanie May 6, 2024

ok, will correct.

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

+                  # https://github.com/googleapis/python-bigquery-dataframes/issues/169
+                  # for updates to `read_gbq_table` to support wildcard tables.
+                  df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])

Contributor

GarrettWu Apr 1, 2024

Why use read_gbq_table instead of more general API read_gbq?

samples/snippets/create_multiple_timeseries_forecasting_model.py

+                  num_trips = features.groupby(["date"], as_index=False).count()
+                  model = forecasting.ARIMAPlus()
+                  X = num_trips["date"].to_frame()

Contributor

GarrettWu Apr 1, 2024

to_frame() not needed.

Contributor Author

DevStephanie May 6, 2024

ok, will update that.

Contributor Author

DevStephanie May 6, 2024

Looks like we might need .to_frame() because without it, I see

AttributeError                            Traceback (most recent call last)
Cell In[12], line 18
     15 X = num_trips["date"]
     16 y = num_trips["num_trips"]
---> 18 model.fit(X, y)
     19 # The model.fit() call above created a temporary model.
     20 # Use the to_gbq() method to write to a permanent location.
     22 model.to_gbq(
     23     your_model_id,  # For example: "bqml_tutorial.sample_model",
     24     replace=True,
     25 )

File ~/python-bigquery-dataframes/bigframes/ml/base.py:163, in SupervisedTrainablePredictor.fit(self, X, y)
    158 def fit(
    159     self: _T,
    160     X: Union[bpd.DataFrame, bpd.Series],
    161     y: Union[bpd.DataFrame, bpd.Series],
    162 ) -> _T:
--> 163     return self._fit(X, y)

File ~/python-bigquery-dataframes/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
     42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     43     add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)

File ~/python-bigquery-dataframes/bigframes/ml/forecasting.py:218, in ARIMAPlus._fit(self, X, y, transforms)
    197 def _fit(
    198     self,
    199     X: Union[bpd.DataFrame, bpd.Series],
    200     y: Union[bpd.DataFrame, bpd.Series],
    201     transforms: Optional[List[str]] = None,
    202 ):
    203     """Fit the model to training data.
    204 
    205     Args:
   (...)
    216         ARIMAPlus: Fitted estimator.
    217     """
--> 218     if X.columns.size != 1:
    219         raise ValueError(
    220             "Time series timestamp input X must only contain 1 column."
    221         )
    222     if y.columns.size != 1:

File ~/python-bigquery-dataframes/bigframes/series.py:1062, in Series.__getattr__(self, key)
   1053     raise AttributeError(
   1054         textwrap.dedent(
   1055             f"""
   (...)
   1059         )
   1060     )
   1061 else:
-> 1062     raise AttributeError(key)

AttributeError: columns

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

+                      your_model_id,  # For example: "bqml_tutorial.sample_model",
+                      replace=True,
+                  )
+                  # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',

Contributor

GarrettWu Apr 1, 2024

Why code below are all commented out?

tswast requested changes

View reviewed changes

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

Comment on lines +9 to +14

+                  # either a SQL query or a table ID. Since this example selects from multiple
+                  # tables via a wildcard, use SQL to define this data. Watch issue
+                  # https://github.com/googleapis/python-bigquery-dataframes/issues/169
+                  # for updates to `read_gbq_table` to support wildcard tables.
+                  df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])

Contributor

tswast Apr 1, 2024

This is just reading a regular table. We don't need to explain wildcard tables.

Also, the default filter is no filters, so we don't need to pass that in.

Suggested change

      
                # either a SQL query or a table ID. Since this example selects from multiple
          
                # tables via a wildcard, use SQL to define this data. Watch issue
          
                # https://github.com/googleapis/python-bigquery-dataframes/issues/169
          
                # for updates to `read_gbq_table` to support wildcard tables.
          
                df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])
          
                # either a SQL query or a table ID.
          
                df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips")

Contributor Author

DevStephanie Apr 1, 2024

Ok, will edit the filter but removing from this section that doesn't have one and removing the read gbp table explanation.

Contributor Author

DevStephanie May 6, 2024

Resolved

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

		@@ -0,0 +1,106 @@
		def test_multiple_timeseries_forecasting_model(random_model_id):
		# [START bigquery_dataframes_bqml_create_data__set]

Contributor

tswast Apr 1, 2024

As discussed in our 1:1, these region tags need to be globally unique, so let's borrow from the URL of the tutorial (https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial) to construct these.

For example:

Suggested change

      
                # [START bigquery_dataframes_bqml_create_data__set]
          
                # [START bigquery_dataframes_bqml_arima_multiple_step_2_visualize]

Contributor Author

DevStephanie Apr 1, 2024

yes, region tags will follow URL.

Contributor Author

DevStephanie May 6, 2024

Corrected

samples/snippets/create_multiple_timeseries_forecasting_model.py Show resolved Hide resolved

samples/snippets/create_multiple_timeseries_forecasting_model.py Show resolved Hide resolved

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

Comment on lines +16 to +18

		# [END bigquery_dataframes_bqml_create_data__set(1)]

		# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast]

Contributor

tswast Apr 1, 2024

These two region tags can be combined since they are both for step 2.

Contributor Author

DevStephanie Apr 1, 2024

ok, makes sense! is "/ " or "&" the preference?

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

+                  # Use the to_gbq() method to write to a permanent location.
+                  model.to_gbq(
+                      your_model_id,  # For example: "bqml_tutorial.sample_model",

Contributor

tswast Apr 1, 2024

Let's stay consistent with the SQL with regards to suggested model ID. See step 3: https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial#arima-single-model

Suggested change

      
                    your_model_id,  # For example: "bqml_tutorial.sample_model",
          
                    your_model_id,  # For example: "bqml_tutorial.nyc_citibike_arima_model",

Contributor Author

DevStephanie Apr 1, 2024

OOH, good catch! Yes, will correct this.

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated

Comment on lines +92 to +94

+                  # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',
+                  # max_time_series_length=3, min_time_series_length=20,
+                  # time_series_length_fraction=1.0, trend_smoothing_window_size=-1)

Contributor

tswast Apr 1, 2024

Not sure what this comment is from. Let's remove it.

Suggested change

      
                # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',
          
                # max_time_series_length=3, min_time_series_length=20,
          
                # time_series_length_fraction=1.0, trend_smoothing_window_size=-1)

samples/snippets/create_multiple_timeseries_forecasting_model.py Outdated


		# [END bigquery_dataframes_bqml_visualize_time_series_to_forecast]

		# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast]

Contributor

tswast Apr 1, 2024

As discussed in our 1:1, EXPLAIN_FORECAST isn't implemented yet in bigframes. Please file an issue if you haven't already and remove this sample for now.

Contributor Author

DevStephanie Apr 1, 2024

Yes, will remove and file request.

DevStephanie and others added 2 commits

April 1, 2024 17:47


          Merge branch 'main' into Stephanie446

b4a9c22


          fix: if setting recurison limit fails, still succeed the import

cec173a

DevStephanie requested a review from a team as a code owner

May 6, 2024 20:49


          Merge branch 'main' into Stephanie446

58d08ae

tswast approved these changes

View reviewed changes

Contributor

tswast left a comment

Thanks! Will wait for e2e tests with code samples to pass before merging.

tswast enabled auto-merge (squash)

May 6, 2024 21:20

tswast merged commit 16866d2 into main

tswast deleted the Stephanie446 branch

May 6, 2024 22:41

release-please bot mentioned this pull request

chore(main): release 1.5.0 #645

Merged

blunderbuss-gcf bot assigned ashleyxuu

Contributor

tswast commented May 7, 2024

e2e tests failed with

FAILED tests/system/large/ml/test_core.py::test_bqml_e2e - AssertionError: Da...
FAILED tests/system/large/ml/test_ensemble.py::test_xgbregressor_default_params
FAILED tests/system/large/ml/test_pipeline.py::test_pipeline_random_forest_classifier_fit_score_predict
FAILED tests/system/large/ml/test_pipeline.py::test_pipeline_xgbregressor_fit_score_predict

which appear to be unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

GarrettWu GarrettWu left review comments

tswast tswast approved these changes

nicain Awaiting requested review from nicain nicain is a code owner automatically assigned from googleapis/python-samples-reviewers

junyazhang Awaiting requested review from junyazhang

ashleyxuu Awaiting requested review from ashleyxuu

Labels

api: bigquery samples size: m