Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

bug 1431820: use Django middleware for handling ETag/Last-Modified #4647

Merged
merged 3 commits into from
Feb 7, 2018
Merged

bug 1431820: use Django middleware for handling ETag/Last-Modified #4647

merged 3 commits into from
Feb 7, 2018

Conversation

escattone
Copy link
Contributor

@escattone escattone commented Jan 31, 2018

This PR does the following:

  • Sets the USE_ETAGS setting to True. This enables the ETag functionality within Django's CommonMiddleware, which includes both computing/adding the ETag header to all responses (even responses to non-GET requests) as well as handling conditional GET requests made using the IF-NONE-MATCH header (the IF-UNMODIFIED-SINCE header is not handled here). The fact that the ETag header is added to all responses is a shortcoming of Django 1.8. Another shortcoming is that 304 ("Not Modified") responses due to an ETag match clear most of the headers, so headers like Cache-Control, ETag, Last-Modified, and Vary are lost. I don't think this will cause any issues, but I'm not certain. When we move to Django 1.11, the USE_ETAGS setting should be deleted, and this deprecated functionality within CommonMiddleware will no longer be used.

  • Adds Django's ConditionalGetMiddleware to the list of MIDDLEWARE_CLASSES. In Django 1.8, this adds handling of the IF-NONE-MATCH and IF-UNMODIFIED-SINCE headers, but does not actually generate and add the ETag header to the response (that's done by CommonMiddleware as explained above). When we move to Django 1.11, this middleware will assume full responsibility for generating/handling the ETag, and will be free of the shortcomings mentioned in the first point above (properly preserving some headers for 304 responses, and only generating/handling ETag headers for GET requests).

  • Removes the few last_modified and etag view decorators that were used. The etag decorator is fully replaced by the middleware above (but of course still valid and available for those endpoints that have a faster way to generate the ETag value), while the ConditionalGetMiddleware only replaces the conditional handling portion of the last_modified decorator (the Last-Modified header must still be generated and added to the response by an endpoint).

  • Removes header checks in tests for 304 responses due to an ETag match (see first point above).

@@ -273,4 +273,3 @@ def test_raw_file_if_modified_since(client, settings, file_attachment):
assert 'Cache-Control' in response
assert 'public' in response['Cache-Control']
assert 'max-age=900' in response['Cache-Control']
assert 'Vary' not in response
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the Vary header was now in the response with a value of Accept-Encoding. I'm not sure if it's worth checking any of the headers when the response is a 304.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm OK if this now varies by encoding, not worth worrying about.

# Last-Modified header). Django's ConditionalGetMiddleware, uses both the ETag
# and Last-Modified headers to handle conditional GET requests.
#
# IMPORTANT NOTE: When we move to Django 1.11, the USE_ETAGS setting is no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing can be in TODO:

@codecov-io
Copy link

codecov-io commented Feb 3, 2018

Codecov Report

Merging #4647 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4647      +/-   ##
==========================================
+ Coverage   95.32%   95.32%   +<.01%     
==========================================
  Files         260      260              
  Lines       23534    23538       +4     
  Branches     1688     1692       +4     
==========================================
+ Hits        22434    22438       +4     
  Misses        888      888              
  Partials      212      212
Impacted Files Coverage Δ
kuma/attachments/tests/test_views.py 100% <ø> (ø) ⬆️
kuma/wiki/tests/test_views_code.py 100% <100%> (ø) ⬆️
kuma/core/tests/test_middleware.py 100% <100%> (ø) ⬆️
kuma/core/middleware.py 87.61% <100%> (+1.02%) ⬆️
kuma/settings/common.py 92.54% <100%> (+0.02%) ⬆️
kuma/wiki/tests/test_views_document.py 99.27% <100%> (+0.02%) ⬆️
kuma/attachments/views.py 85% <100%> (-0.72%) ⬇️
kuma/wiki/views/code.py 100% <100%> (ø) ⬆️
kuma/wiki/views/document.py 90.31% <100%> (-0.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27b77dc...04bbe0e. Read the comment docs.

@escattone
Copy link
Contributor Author

escattone commented Feb 3, 2018

I feel like I stepped into a Django 1.8 minefield with this PR, but I think I've survived.

The second commit of this PR does several things:

  • It fixes an issue in Django 1.8 with the way the GZipMiddleware and CommonMiddleware/ConditionalGetMiddleware fail to play together nicely when using the ETag header for conditional requests. Django's GZipMiddleware actually mangles the ETag header (inserts ;gzip into the quoted ETag value) when compressing, effectively eliminating the possibility of 304 or 412 responses for subsequent conditional requests.
  • It improves all of the existing ETag-based conditional tests by checking that the ETag header returned from a gzipped request (the normal case) will return a 304 or 412 for subsequent conditional IF_NONE_MATCH or IF_MATCH requests. All of these tests will now fail if the GZipMiddleware and CommonMiddleware/ConditionalGetMiddleware fail to play nicely together.
  • It paves the way for a smooth transition to Django 1.11. When moving to Django 1.11, do the following:
    • Delete the USE_ETAGS setting.
    • Delete the kuma.core.middleware.GZipMiddleware class and its associated tests.
    • Replace kuma.core.middleware.GZipMiddleware with Django's GZipMiddleware in settings.MIDDLEWARE_CLASSES assuming MIDDLEWARE_CLASSES continues to be used, or the equivalent middleware function if settings.MIDDLEWARE takes its place.
    • That's it. All of the etag-based conditional tests have been modified such that they should work with Django 1.11 as well.

Copy link
Contributor

@jwhitlock jwhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @escattone! I like getting back on the standard Django code, and I appreciate your research into the Django 1.8 → 1.11 transition.

I've noted a few nits, and some stuff that would be appropriate as follow-on PRs or commits, your choice.

@@ -273,4 +273,3 @@ def test_raw_file_if_modified_since(client, settings, file_attachment):
assert 'Cache-Control' in response
assert 'public' in response['Cache-Control']
assert 'max-age=900' in response['Cache-Control']
assert 'Vary' not in response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm OK if this now varies by encoding, not worth worrying about.

HttpResponseRedirect,
HttpResponseForbidden,
HttpResponsePermanentRedirect,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer hanging indents, but aligned indents are more common in the Kuma code. So, this is a nit out of responsibility. I look forward to the day when style linting is closer to the top of the priority list.

Copy link
Contributor Author

@escattone escattone Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had always used hanging indents until my last job, where we started using aligned indents because they enabled future edits to only affect one line (but I don't always like the look of it). I'm not consistent though, since I personally didn't do it long enough to make it habit. With MDN I'm happy to settle on either style, and if you prefer the hanging indent style, let's go with that. It's a greater part of my muscle and visual memory anyway!

@@ -59,6 +54,7 @@ def stream_raw_file(*args):
response['Content-Length'] = rev.file.size
except OSError:
pass
response['Last-Modified'] = convert_to_http_date(rev.created)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 conversion to convert_to_http_date. I'm guessing that the ETag skips streaming responses, so this is still needed.

convert_to_utc can't handle ambiguous dates:

import datetime
from kuma.attachments.utils import convert_to_utc
convert_to_utc(datetime.datetime(2017, 11, 5, 1, 8, 42))

That feels like follow-on work on bug 1173189, and moving date manipulations into kuma/core/utils.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about streaming responses (we could have removed this code that sets the Last-Modified header if it wasn't a streaming response), and good catch about content_to_utc not handling ambiguous dates! I would never have thought of that until it failed. I guess the solution is to follow what you did for the publication date for DocumentsFeed, and assume DST if the timezone is naive? I'll do that in a follow-on PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was on my mind because of PR #4621. I'd use the same strategy of assuming DST and being half-wrong rather than raise an ISE.

The function may be slightly different because that one converts to server timezone, while this one wants UTC. Maybe! I'd have to look at get_current_timezone vs get_default_timezone.

Getting to USE_TZ=1 is on my list, and I think centralizing the timezone stuff in kuma.core.utils would be a good step. This stuff may be a shared function, or two close functions in the same file

@@ -93,14 +102,6 @@ def test_code_sample(code_sample_doc, constance_config, client, settings):
HTTP_IF_NONE_MATCH=current_etag
)
assert response.status_code == 304
assert response['Access-Control-Allow-Origin'] == '*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: test Access-Control-Allow-Origin in a headless test (different PR)

The content in this document's current revision contains multiple HTML
elements with an "id" attribute (or "sections"), and also has a length
greater than or equal to 200, which meets the compression threshold of
the GZipMiddleware.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 200-character minimum why SECTION5 was added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added SECTION4 solely for that reason.

return response

return render_code_sample(request)
return render(request, 'wiki/code_sample.html', data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 so much cleaner

* use new kuma.core.middleware.GZipMiddleware until Django 1.11
* add tests for kuma.core.middleware.GZipMiddleware
* improve tests of etag-based conditional GET requests to ensure
  that middleware play well with each other (tests correctly fail
  when using django.middleware.gzip.GZipMiddleware instead of
  kuma.core.middleware.GZipMiddleware)
@escattone escattone merged commit 509df1d into mdn:master Feb 7, 2018
@escattone escattone deleted the use-etag-middleware-1431820 branch February 7, 2018 22:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants