Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

escape_gt_in_attrs #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
da2x opened this issue Aug 5, 2015 · 5 comments
Open

escape_gt_in_attrs #201

da2x opened this issue Aug 5, 2015 · 5 comments

Comments

@da2x
Copy link

da2x commented Aug 5, 2015

Why is there no escape_gt_in_attrs counterpart of escape_lt_in_attrs? Optimistic regex tag-soup readers do negative lookaheads for > to determine the closing point of a tag. They’re just as important to escape in atts as < for compatibility with everything.

Authors can’t work around this because <elm attr="&lt;example&gt;"> is always turned into <elm attr="&lt;example>">

@gsnedders
Copy link
Member

escape_lt_in_attrs is needed for some legacy browsers to parse the output correctly (old Gecko and WebKit) when they are unquoted, hence why it exists (and probably should cease to exist as a distinct option once #11 is fixed).

I'm not sure I really want to start adding hacks for regexp parsers given they're almost certainly going to be broken by some reasonable output. Is it anything particularly notable or what is it?

@cjerdonek
Copy link

cjerdonek commented May 4, 2017

FYI, one more data point on this. I just discovered that html-tidy has a built-in limitation where it will error out if it uncovers more than 10 of the following characters in an attribute value: \n, <, and >. So escaping only left-angle brackets (and even right-angle brackets, too) isn't enough for passing to html-tidy.

Here is the issue I just filed about it: htacg/tidy-html5#542

If it's straightforward to add a custom filter to escape characters like this, I assume that would be enough.

@gsnedders
Copy link
Member

Maybe we should make escaping pluggable, though I wonder what the perf cost of that would be… This isn't the first thing it'd help with.

@cjerdonek
Copy link

cjerdonek commented May 6, 2017

For sure, I think it can be done. It's mainly a question of the API you want and which aspects you want pluggable.

All of the behavior we're discussing in this issue happens in these three lines:

v = v.replace("&", "&amp;")
if self.escape_lt_in_attrs:
    v = v.replace("<", "&lt;")

One possible API would be to have an instance attribute that stores the simple escapes. It could be a list of pairs (old, new) with name and default value like so--

self.attribute_replacements = [
    ("&", "&amp;")
].

Then the three lines up top would become:

for args in self.attribute_replacements:
    v = v.replace(*args)

This approach wouldn't impact performance in any noticeable way.

@gsnedders
Copy link
Member

FWIW: my view at this point is adding escape_gt_in_attrs is fine, but someone should write a few tests for it (mostly looking at how it combines with quote_attrs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants