docs: add python code sample to multiple timeseries forecasting#531
docs: add python code sample to multiple timeseries forecasting#531
Conversation
|
Here is the summary of changes. You are about to add 2 region tags.
This comment is generated by snippet-bot.
|
|
Corrected region tags, will be updated on next PR! |
| # Start by selecting the data you'll use for training. `read_gbq_table` accepts | ||
| # either a SQL query or a table ID. Since this example selects from multiple | ||
| # tables via a wildcard, use SQL to define this data. Watch issue | ||
| # https://github.com/googleapis/python-bigquery-dataframes/issues/169 |
| # https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
| # for updates to `read_gbq_table` to support wildcard tables. | ||
|
|
||
| df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
If you don't need filters, no need to pass in.
There was a problem hiding this comment.
ok, will correct.
| # https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
| # for updates to `read_gbq_table` to support wildcard tables. | ||
|
|
||
| df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
Why use read_gbq_table instead of more general API read_gbq?
| num_trips = features.groupby(["date"], as_index=False).count() | ||
| model = forecasting.ARIMAPlus() | ||
|
|
||
| X = num_trips["date"].to_frame() |
There was a problem hiding this comment.
ok, will update that.
There was a problem hiding this comment.
Looks like we might need .to_frame() because without it, I see
AttributeError Traceback (most recent call last)
Cell In[12], line 18
15 X = num_trips["date"]
16 y = num_trips["num_trips"]
---> 18 model.fit(X, y)
19 # The model.fit() call above created a temporary model.
20 # Use the to_gbq() method to write to a permanent location.
22 model.to_gbq(
23 your_model_id, # For example: "bqml_tutorial.sample_model",
24 replace=True,
25 )
File ~/python-bigquery-dataframes/bigframes/ml/base.py:163, in SupervisedTrainablePredictor.fit(self, X, y)
158 def fit(
159 self: _T,
160 X: Union[bpd.DataFrame, bpd.Series],
161 y: Union[bpd.DataFrame, bpd.Series],
162 ) -> _T:
--> 163 return self._fit(X, y)
File ~/python-bigquery-dataframes/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
43 add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)
File ~/python-bigquery-dataframes/bigframes/ml/forecasting.py:218, in ARIMAPlus._fit(self, X, y, transforms)
197 def _fit(
198 self,
199 X: Union[bpd.DataFrame, bpd.Series],
200 y: Union[bpd.DataFrame, bpd.Series],
201 transforms: Optional[List[str]] = None,
202 ):
203 """Fit the model to training data.
204
205 Args:
(...)
216 ARIMAPlus: Fitted estimator.
217 """
--> 218 if X.columns.size != 1:
219 raise ValueError(
220 "Time series timestamp input X must only contain 1 column."
221 )
222 if y.columns.size != 1:
File ~/python-bigquery-dataframes/bigframes/series.py:1062, in Series.__getattr__(self, key)
1053 raise AttributeError(
1054 textwrap.dedent(
1055 f"""
(...)
1059 )
1060 )
1061 else:
-> 1062 raise AttributeError(key)
AttributeError: columns
| your_model_id, # For example: "bqml_tutorial.sample_model", | ||
| replace=True, | ||
| ) | ||
| # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', |
There was a problem hiding this comment.
Why code below are all commented out?
| # either a SQL query or a table ID. Since this example selects from multiple | ||
| # tables via a wildcard, use SQL to define this data. Watch issue | ||
| # https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
| # for updates to `read_gbq_table` to support wildcard tables. | ||
|
|
||
| df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
This is just reading a regular table. We don't need to explain wildcard tables.
Also, the default filter is no filters, so we don't need to pass that in.
| # either a SQL query or a table ID. Since this example selects from multiple | |
| # tables via a wildcard, use SQL to define this data. Watch issue | |
| # https://github.com/googleapis/python-bigquery-dataframes/issues/169 | |
| # for updates to `read_gbq_table` to support wildcard tables. | |
| df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) | |
| # either a SQL query or a table ID. | |
| df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips") |
There was a problem hiding this comment.
Ok, will edit the filter but removing from this section that doesn't have one and removing the read gbp table explanation.
| @@ -0,0 +1,106 @@ | |||
| def test_multiple_timeseries_forecasting_model(random_model_id): | |||
| # [START bigquery_dataframes_bqml_create_data__set] | |||
There was a problem hiding this comment.
As discussed in our 1:1, these region tags need to be globally unique, so let's borrow from the URL of the tutorial (https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial) to construct these.
For example:
| # [START bigquery_dataframes_bqml_create_data__set] | |
| # [START bigquery_dataframes_bqml_arima_multiple_step_2_visualize] |
There was a problem hiding this comment.
yes, region tags will follow URL.
| # [END bigquery_dataframes_bqml_create_data__set(1)] | ||
|
|
||
| # [START bigquery_dataframes_bqml_visualize_time_series_to_forecast] |
There was a problem hiding this comment.
These two region tags can be combined since they are both for step 2.
There was a problem hiding this comment.
ok, makes sense! is "/ " or "&" the preference?
| # Use the to_gbq() method to write to a permanent location. | ||
|
|
||
| model.to_gbq( | ||
| your_model_id, # For example: "bqml_tutorial.sample_model", |
There was a problem hiding this comment.
Let's stay consistent with the SQL with regards to suggested model ID. See step 3: https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial#arima-single-model
| your_model_id, # For example: "bqml_tutorial.sample_model", | |
| your_model_id, # For example: "bqml_tutorial.nyc_citibike_arima_model", |
There was a problem hiding this comment.
OOH, good catch! Yes, will correct this.
| # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', | ||
| # max_time_series_length=3, min_time_series_length=20, | ||
| # time_series_length_fraction=1.0, trend_smoothing_window_size=-1) |
There was a problem hiding this comment.
Not sure what this comment is from. Let's remove it.
| # ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', | |
| # max_time_series_length=3, min_time_series_length=20, | |
| # time_series_length_fraction=1.0, trend_smoothing_window_size=-1) |
|
|
||
| # [END bigquery_dataframes_bqml_visualize_time_series_to_forecast] | ||
|
|
||
| # [START bigquery_dataframes_bqml_visualize_time_series_to_forecast] |
There was a problem hiding this comment.
As discussed in our 1:1, EXPLAIN_FORECAST isn't implemented yet in bigframes. Please file an issue if you haven't already and remove this sample for now.
There was a problem hiding this comment.
Yes, will remove and file request.
tswast
left a comment
There was a problem hiding this comment.
Thanks! Will wait for e2e tests with code samples to pass before merging.
|
e2e tests failed with which appear to be unrelated. |
BEGIN_COMMIT_OVERRIDE
docs: Add python code sample for multiple forecasting time series (#531)
END_COMMIT_OVERRIDE
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕