Cenzic 232 Patent
Paid Advertising
sla.ckers.org is
ha.ckers sla.cking
Sla.ckers.org
Whether this is about ha.ckers.org, sla.ckers.org or some other project you are interested in or want to talk about, throw it in here to get feedback. 
Go to Topic: PreviousNext
Go to: Forum ListMessage ListNew TopicSearchLog In
Pages: 12Next
Current Page: 1 of 2
"Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 05, 2010 01:03PM

Hi all,

I've been rolling my own php framework recently and initially thought about using an existing html filtering solution such as htmLawed. But curiosity got the better of me and I decided to try writing my own instead.

I was hoping a few people might be good enough to give the demo a try to see how it holds up.

>> http://allowhtml.com/demo/

There's a link to the source code in the demo as well, which is running the default settings (and allowing the "style" attribute, which I wouldn't do normally).

Any comments / feedback appreciated. :) I was thinking about releasing it under LGPL (hence the domain name), but wanted to see if it's up to scratch first.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: thornmaker
Date: February 05, 2010 02:59PM

</textarea><script>alert(0)</script>

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 05, 2010 03:48PM

Not another! Jeez
Lets see if I can break it without looking at the code or the demo

<img src="1"onerror=alert(1)>

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: thornmaker
Date: February 05, 2010 04:50PM

@gareth haha, you'll have to try a bit harder then that. great xss fu does not translate into x-ray vision

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 05, 2010 04:55PM

Ah, yes...must remember not to leave raw input in the input textarea (doh!). Thanks thornmaker!

Might have to look at the demo next time Gareth. ;)



Edited 1 time(s). Last edit at 02/05/2010 04:56PM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 08, 2010 03:48AM

I was a little unfair, it seems pretty good. Reminds me of htmlpurifier very much in the way it does things. I like your whitelists and list of allowed tags. Here are a couple of points

Good points
-Nice whitelists
-Good tags/attribute lists
-Well organised code

Bad points
-Auto decodes e.g. <a href="%22" becomes <a href="&quot;"
-You use mbstring http://www.securityfocus.com/bid/32948
-Your CSS text checker is very weak, looking for "alert" won't help anyone

Here is the bypass:-
<div style="-ms-behaviour:x">

IE8 supports -ms-behaviour too

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]



Edited 1 time(s). Last edit at 02/08/2010 03:50AM by Gareth Heyes.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Anonymous User
Date: February 08, 2010 05:24AM

I like the filter too - didn't manage to find anything during a quick look.

@Gareth: I cannot get any code execution to work with -ms-behavior - my test base is:

1<l style="behavior:url(#default#time2)" onbegin="alert(1)"> //works
2<l style="-ms-behavior:url(#default#time2)" onbegin="alert(2)"> // doesn't work
3<l style="-ms-behaviour:url(#default#time2)" onbegin="alert(3)"> // doesn't work

Any ideas?

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 08, 2010 05:31AM

@mario

IE needs to be in IE8 mode with a doctype or IE8 meta

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Anonymous User
Date: February 08, 2010 05:40AM

@Gareth Thx :)

Works like a charm (no circumvention - just for documentation sake)
<meta http-equiv="X-UA-Compatible" content="IE=8">
1<l style="-ms-behavior:url(#default#time2)" onbegin="alert(1)">



Edited 1 time(s). Last edit at 02/08/2010 05:45AM by .mario.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 08, 2010 07:31AM

Thanks for the feedback, Gareth.

I'll build in a whitelist for the style attribute (always figured I'd have to if I was going to allow it). I've got mbstring dotted around the place in various components I'm building, so that's going to need re-thinking.

I'm interested in the decoding issue - I was under the (false?) impression that it was a good idea to try and decode the input as much as possible, then check for and neutralise any resulting evil characters afterwards. Bad idea in general, or just in regards to "urlencoded" characters like %22?



Edited 1 time(s). Last edit at 02/08/2010 07:33AM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 08, 2010 08:00AM

@sjdev86

Yeah whitelist each property and only allow the ones you know about rather than blacklist.

The reason that decode is a mistake is because your filter is potentially creating new vectors by converting. I'd recommend you inspect but do not convert that way you'll avoid potential issues in future like this for example:-
<a href="%256aavascript%2522">test</a>

Your filter is performing a auto decode of urlencoded data, after that it is then encoding it with html entities. Some vectors include html entities and also can function with double urlencoding I'd recommend you only leave input as it is supplied or remove it if it is dangerous. IMHO Designing a filter like this involves thinking what could potentially break not what breaks currently.

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 08, 2010 08:13AM

The decoding function should catch double encoding (since it continues to cycle until nothing changes), however I agree that your approach seems the more sensible one.

I'll re-configure it for a pass / fail approach - if it passes all checks, leave input as is (still run it through htmlspecialchars once passed?), otherwise remove that data entirely.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 08, 2010 08:51AM

@sjdev86

If you are comparing the input continually then I guess that would be ok to run through htmlspecialchars but try use whitelist wherever possible even in attributes.

I'd also be tempted to avoid certain characters completely like for example just because the HTML spec says you can use x and unlimited characters does it mean you should? Specifications are fine for making things easy to understand and implement but should be ignored whenever their definition makes it easier to exploit your system.

I like your code, if it is improved and you take this approach I'll definitely recommend it after mario, sirdarckcat and thornmaker have broken it first though :)

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 08, 2010 10:52AM

Gives me something to aim for then, doesn't it. ;)

I was just playing around with the decoding options. So far I'm taking the approach that if passing the attribute through the decoding function changes the value in any way, then it's presumed to be bad input and the attribute is removed (otherwise further checks are carried out).

The only situation I can think of where a legitimate user entering html might fall foul of this is if they've copy & pasted a urlencoded link into the attribute value. Other than that, there's no reason for any encoded data to ever end up in an attribute (that I can think of).

<a href="%256aavascript%2522">test</a> BECOMES <a>test</a>

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 08, 2010 01:22PM

@sjdev86

That's a really good change, double urlencoded data would most probably be bad input

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Anonymous User
Date: February 08, 2010 04:42PM

Damn - almost gotcha... :)

<b style="color:red;background:url(/bla&#x29&#x3b;x:&#x65;xprEssio/n(write(1));">000</b>

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 08, 2010 05:53PM

Getting a bit too close there.

Looks like php's DOMDocument automatically decoded the hex entities, which rendered the encoding check useless. Better get that css whitelist up.

EDIT: initial attempt at css whitelist now in place



Edited 1 time(s). Last edit at 02/08/2010 06:57PM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 09, 2010 12:08PM

Alright, I'm happy with it now.

The only thing I haven't addressed is mbstring, as I'm still thinking about the best alternative (any thoughts on dealing with the charset welcome).

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 09, 2010 03:40PM

You really need to stop autodecoding everything

http://pastebin.ca/1791900

Also your css value whitelist sucks
<div style="color:(')(')(')(')(')(')(')''">test</div>

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]



Edited 1 time(s). Last edit at 02/09/2010 03:52PM by Gareth Heyes.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 09, 2010 05:12PM

Thanks for the hole picking. I'll get on to re-factoring / tightening up the character whitelist (and the decoding has been given the boot from plain text output).

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 10, 2010 04:17AM

I'd take a look at how the css rules work in anti-samy, it's pretty good:-
http://i8jesus.com:9080/AntiSamyDemoWebApp/test.jsp

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 10, 2010 06:33AM

Anti-samy does look good. I've refined the attribute filtering somewhat, although I haven't gone as far as producing a rule for every different attribute / style value.

At this point, something like <div style="color:'''';">test</div> will still get through the value whitelist (allow letters, numbers, spaces and # % ' , - . _ characters), but I wouldn't have thought that would be exploitable (which is my primary concern for now)?



Edited 2 time(s). Last edit at 02/10/2010 06:35AM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 11, 2010 04:03AM

@sjdev86

Not exploitable currently true but IMO you should do strict whitelists and only allow values that conform to it. A global whitelist should come before it which you do which is pretty cool.

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 11, 2010 10:27AM

I suppose that one alternative would be to hook into the anti-samy policy file, using xpath to find the approriate rules for the attribute or property. The input value could then be matched against the resulting regex / literal rules.

EDIT: current demo now using anti-samy



Edited 4 time(s). Last edit at 02/16/2010 10:05AM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 13, 2010 02:09PM

Is anyone able to verify whether the mbstring function "mb_detect_encoding" is vulnerable to the buffer overflow vulnerability? I don't currently have access to anything below php 5.29.

http://www.securiteam.com/unixfocus/6X00P0ANFM.html seems to suggest that it is one of the functions that should be "safe in their nature".



Edited 1 time(s). Last edit at 02/13/2010 07:37PM by sjdev86.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Kyo
Date: February 14, 2010 06:30AM

not an exploit (still tinkering), but i find it odd that it doesn't allow title for img

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: rvdh
Date: February 14, 2010 05:47PM

LMAO I can see where this thread is heading.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 15, 2010 03:17AM

@rvdh

C'mon give it a shot I know you want to :)

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: sjdev86
Date: February 16, 2010 10:05AM

I've now switched everything over to use the anti-samy policy file.

Still a couple more things to implement (eg. cross-referencing shorthand lists), but it's working very nicely (IMO!) so far.

Options: ReplyQuote
Re: "Yet another html filter" (allowHTML)
Posted by: Gareth Heyes
Date: February 16, 2010 01:45PM

@sjdev86

Wow very nice I will test soon

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Pages: 12Next
Current Page: 1 of 2


Sorry, only registered users may post in this forum.