Paid Advertising

SLA.CKERS.ORG
HA.CKERS SLACKING
sla.ckers.org web application security lab forums
Q and A for any cross site scripting information. Feel free to ask away. 
[Theory] Standards to thwart browser quirks
Posted by: Ambush Commander (IP Logged)
Date: January 24, 2007 11:02PM

I've been doing a little thinking on this subject, and while nothing comes to mind right now, I'd like to field the question to you guys.

Some XSS depends on browser quirks, strange parsing behaviors that result in code that shouldn't be valid becoming valid and wrecking havoc. In these cases, no amount of reading a W3C spec will help you: you'll have to field test each browser and figure out where the trick works.

However, I've have believed that as long as you adhere to the W3C spec, these mysterious behaviors stop manifesting. For example:

<IMG """><SCRIPT>alert("XSS")</SCRIPT>">

...is not, by any far-cry, valid HTML. By just submitting this to a parser that tried to make the HTML standards-compliant (like Tidy), you would end up with:

<IMG><SCRIPT>alert("XSS")</script>&gt;

...which is far more palatable to XSS filters. Another example:

<a href="hello@hello.com" a=`"a'style="a; style='<script'></a><script src=http://ha.ckers.org/xss.js?

...becomes (When subject to Tidy):

<a href="hello@hello.com" a="`"></a>

I am then tempted to say that as long as you keep the output standards-compliant, you don't have to worry about any of the malformed input attacks. At this point, it's simply worrying about extra "functionality" of HTML (which shouldn't be a problem if you're whitelisting).

Comments?

HTML Purifier - Standards Compliant HTML filtering

Re: [Theory] Standards to thwart browser quirks
Posted by: jungsonn (IP Logged)
Date: January 25, 2007 07:50AM

Mkay, that sounds reasonable.

I never was in the situation where I was forced to let users input HTML, but aside from standards, or just relying on them, I guess 'basic' markup as whitelisted should solve alot. Like many allow javascript, I have no idea why any user need to have javascript like in MySpace for instance. And that they are allowed to modify the full header. I think if I buils a community space, I could let users input HTML but only the basics like W3C compliant, and indeed then tidy it up. and out it in a large <div></div> only. Alot can be made with basic HTML and external CSS. So i'm a little in doubt what to think about this yet.



Edited 1 time(s). Last edit at 01/25/2007 07:51AM by jungsonn.

Re: [Theory] Standards to thwart browser quirks
Posted by: hasse (IP Logged)
Date: January 25, 2007 03:17PM

If you're making a community maybe using something like bbcode is a good idea?

Re: [Theory] Standards to thwart browser quirks
Posted by: Ambush Commander (IP Logged)
Date: January 25, 2007 04:10PM

What I'm trying to say, I guess, is that even though you're allowed to use something like <a href=foo.txt>Link!</a>, you should try to fix it so it has quotes anyway. You're allowed to have stray quotes outside of attribute delimeters, but escape them anyway with &quot; You're allowed to omit closing <li> tags, but close them anyway.

So standards-compliant is *necessary* for security, not just a nice optional thing for those web-standards hicks (and yes, I'm one of them).

And yes, we're considering that we need to allow clients to submit HTML. I never liked BBCode: I've called another completely inconsistent (and of course widely used) markup language that came about because developers where to lazy to parse HTML right. BBCode is nice because it's forgiving: you don't have to worry about escaping square brackets that aren't part of a real tag, and you're forced to explicitly define a good syntax.

HTML Purifier - Standards Compliant HTML filtering

Re: [Theory] Standards to thwart browser quirks
Posted by: jungsonn (IP Logged)
Date: January 25, 2007 06:05PM

Hm, maybe oftopic but I found something really weird, I was busy on some project using Ajax cals to pull content into a page. Really, i'm not lying; the page I pulled containted PHP and some test javascript: guess what? when that file run through the XHR, it is in the sourcode, but it won't execute on the page. *_* really weird stuff, cause it should execute. Anyhow, that was a problem when developing, but now I think I got a clever way to totally block all javascript_with_javascript for some HTML purify thingy, cause it only allows HTML and parsed SQL queries.

it's weird I know.



Edited 1 time(s). Last edit at 01/25/2007 06:10PM by jungsonn.

Re: [Theory] Standards to thwart browser quirks
Posted by: rsnake (IP Logged)
Date: January 25, 2007 06:14PM

@Ambush Commander - I think you have an okay concept, but there are still situations that even that is too strict for. Not to put too fine a point on it but there are situations where legally you cannot modify the user's content. You can remove it, block it, etc... but you cannot modify it. Trust me, it happens.

So yes, in certain situations that would work, but the other disadvantage is that it can make changes to the content that the user didn't mean. Maybe they meant for the malformed HTML to do something specific, but by tidying it you make it do something slightly different. Although benign you've hurt your consumer's experience. Tough problem.

@Jungsonn - can you elaborate on that last bit there?

Re: [Theory] Standards to thwart browser quirks
Posted by: Ambush Commander (IP Logged)
Date: January 25, 2007 06:22PM

@RSnake
> there are situations where legally you cannot modify the user's content. You can remove it, block it, etc... but you cannot modify it.

This problem would be present in any sort of HTML filter. If this is indeed the case, you'll have to stick to plain text, or stuff it on another domain (sort of like Google Blogger).

> It can make changes to the content that the user didn't mean. Maybe they meant for the malformed HTML to do something specific, but by tidying it you make it do something slightly different. Although benign you've hurt your consumer's experience.

Yeah, that's very tough. I've already had this sort of problem where the DOM extension would randomly drop whitespace or where character entities would be annihilated because HTML Purifier de-escaped them to prevent XSS.

However, I believe these concerns are mitigated by two things: first of all, most of the time users don't use malformed HTML but just use deprecated tags. For most of them it's easy to transparently do a replacement that "looks" the same to the user (although, in reality, the source changed). Second, I'm a strong believer in the caching of filtered text, so that the original is always available. If a filter does something weird, they can always go back and edit to make things work. Most HTML newbies do things this way: type something, see if it works, if it doesn't, try something else. Forget reading the manual!

If they made the malformed HTML for a very specific reason, it probably was for XSS. ;-)

@Jungsonn: Agreed with RSnake. What are you saying?

HTML Purifier - Standards Compliant HTML filtering



Edited 1 time(s). Last edit at 01/25/2007 06:23PM by Ambush Commander.

Re: [Theory] Standards to thwart browser quirks
Posted by: rsnake (IP Logged)
Date: January 25, 2007 06:48PM

Yes, it applies to any filter, but I'm talking about allowing anything that isn't malicious (all the steps of filtering, but if you find something worth filtering you reject it). So the trick is to not reject anything unless you absolutely find something worth rejecting. By not filtering you don't risk the modification of user input.

- RSnake
Gotta love it. http://ha.ckers.org

Re: [Theory] Standards to thwart browser quirks
Posted by: Ambush Commander (IP Logged)
Date: January 25, 2007 06:53PM

Isn't what you're proposing a blacklist? Aren't those fundamentally flawed (due to the fact that HTML is flexible and is always growing)

HTML Purifier - Standards Compliant HTML filtering

Re: [Theory] Standards to thwart browser quirks
Posted by: rsnake (IP Logged)
Date: January 25, 2007 06:58PM

Correct! But unfortunately there is no other way with the caveats I've given:

a) cannot be modified by law
b) must contain HTML (not bbcode or any variant of HTML)
c) must allow arbitrary styles and anything else non-malicious
d) can be rejected outright legally

What other options do you have? I agree my proposal is ridiculous and the caveats seem impossible to protect against, but I have seen this exact requirement before (MySpace is a great example - minus the legal issues). I was just making your life harder, that's all. :)

- RSnake
Gotta love it. http://ha.ckers.org

Re: [Theory] Standards to thwart browser quirks
Posted by: Ambush Commander (IP Logged)
Date: January 25, 2007 08:12PM

Well, the legal issues regarding modification of the HTML present the biggest problems, otherwise MySpace would have been completely overrun by now! ;-)

With such draconic legal issues, draconic actions would probably have to implemented. What I would do is solve the problem client-side. Think about email clients: they need to be able to render a diverse amount of HTML without compromising the user of the mail client. Thus, they won't execute any JavaScript, they'll refrain from retrieving external resources, i.e. anything that we would normally protect against by removing from the text. Just implement something similar and block any other browser.

If you had a spare domain lying around, and you don't mind users DOSing each other with image crash or infinite JavaScript loops, just stuff the unsafe HTML there. They won't be able to steal cookies since there are no cookies to steal, and the threat is downgraded to that of visiting a random website on the web.

The final proposal requires the most work: create a parser that, parses the document while keeping track of the original HTML, verbatim. The parser must be able to programmatically recognize all well-defined malformed HTML like <a href=url.html>. If it runs across a particular corrupt string of HTML, i.e. one that doesn't match its whitelist of "good" corrupt strings, it rejects outright. At the same time, it's developing a DOM, which would be identical throughout all major browsers that the filter supports. This consistency throughout browsers is dependent on the ability of the parser to recognize when a corrupt string of HTML would be interpreted differently by different browsers. The DOM would also be created by inlining the HTML into sandbox HTML document that emulates the real world conditions, so that later on, when analyzing the DOM, we could determine if the user maliciously broke out of their container to wreck havoc on the document.

After finishing the parsing process successfully, it would then traverse the DOM and validate all the tags and attributes. Blacklisted content would immediately result in failure, and if you want to be lenient you would let everything else through and ensure that your blacklist of browser features is thoroughly up-to-date. Preferably you'd perform a code audit of all open-source browsers for hidden "features". You could also maintain a whitelist, and, for added leniency, use some fuzzy text matching algorithms to detect when a user made a typo.

If the DOM validation process was successful, the HTML is good. It's two stages:

1. Generate a DOM from the document, using well-known behaviors for malformed HTML and rejecting too-badly-formed HTML
2. Parse the DOM for legit features that need to be blocked

Back to reality: what we could start to do is start profiling inconsistencies in the parsers of major browser in how the transform regular HTML into DOMs. Simple things like missing tags to full fledge angled bracket un-equal-signed attribute frenzies.

But... if any company is stupid enough to go this route, they'll probably get it all wrong. In that case, security by obscurity is always a good second defense. ;-)

HTML Purifier - Standards Compliant HTML filtering

Re: [Theory] Standards to thwart browser quirks
Posted by: jungsonn (IP Logged)
Date: January 26, 2007 05:41AM

Oh well; I wrote a quick write-up item last night about the find/problem,

called: Blocking XSS with Ajax.

I think I know the reason for it to work also: it's the way Ajax uses asynchronous calls and without the re-rendering of the javascript/browser engine, it cannot execute. Anyway, that's what I reasoned. I could be completly off, If so please let me know. What I do know from testing is that any javascript can't execute anymore.

Instead of clogging up this thread(different subject) you can read it here if you want: [www.jungsonnstudios.com]



Sorry, only registered users may post in this forum.