<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>&quot;Yet another html filter&quot; (allowHTML)</title>
        <description>Hi all,

I've been rolling my own php framework recently and initially thought about using an existing html filtering solution such as htmLawed. But curiosity got the better of me and I decided to try writing my own instead.

I was hoping a few people might be good enough to give the demo a try to see how it holds up.

&gt;&gt; http://allowhtml.com/demo/

There's a link to the source code in the demo as well, which is running the default settings (and allowing the &quot;style&quot; attribute, which I wouldn't do normally).

Any comments / feedback appreciated. :) I was thinking about releasing it under LGPL (hence the domain name), but wanted to see if it's up to scratch first.</description>
        <link>http://sla.ckers.org/forum/read.php?12,33287,33287#msg-33287</link>
        <lastBuildDate>Sat, 18 May 2013 20:31:11 -0500</lastBuildDate>
        <generator>Phorum 5.2.15a</generator>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34267#msg-34267</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34267#msg-34267</link>
            <description><![CDATA[@sirdarckcat - thanks, I spotted some of those vectors coming through recently. I didn't realise that php's DOMDocument automatically placed cdata tags in between script / style tags (which caused malicious html coming afterwards to be allowed through).<br />
<br />
I need to go over that in more detail (since the fix I've implemented is crude). I'm certainly not going to claim it's &quot;safe&quot; in its current form. The only aim is to keep improving it, based on the feedback / attacks from you guys!]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Fri, 23 Apr 2010 12:02:21 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34246#msg-34246</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34246#msg-34246</link>
            <description><![CDATA[Lots of bypasses by a couple of friends and users of another forum!<br />
https://foro.elhacker.net/nivel_web/cyh_bypass_de_filtros_de_xss-t289955.0.html<br />
<br />
They are fixed now, but I dont think it's very safe atm.. <br />
<br />
Greetings!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 22 Apr 2010 08:21:39 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34204#msg-34204</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34204#msg-34204</link>
            <description><![CDATA[@sirdarckcat - better you didn't come across it too early, fewer holes to pick!<br />
<br />
I think we can safely say that better character checking of style property names is needed (whitelist now in place) to guard against #1. Blacklisting style comments is all I can imagine for #2.]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 19 Apr 2010 14:21:08 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34081#msg-34081</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34081#msg-34081</link>
            <description><![CDATA[@sirdarckcat  <br />
<br />
Nice ones :)]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 08 Apr 2010 06:40:52 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34077#msg-34077</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34077#msg-34077</link>
            <description><![CDATA[btw, thanks guys.. theres a new filter and noone told me :(<br />
<br />
background:url(/*this-is-a-comment-on-IE);background-image:url(still-a-comment*/);<br />
<br />
CSS is not easy dude :P]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 08 Apr 2010 01:45:35 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,34076#msg-34076</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,34076#msg-34076</link>
            <description><![CDATA[nice try, good luck next time<br />
<br />
&lt;div style=&quot;xss=\000065xpression(confirm(1))!: url('xD');&quot;&gt;hola&lt;/div&gt;<br />
<br />
greetings!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 08 Apr 2010 01:42:49 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33675#msg-33675</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33675#msg-33675</link>
            <description><![CDATA[I'd literally just sat down to work on the url filtering (I wasn't happy with using a single regex for them, so decided to bypass the anti-samy rule), when I saw some new entries coming through in the demo logs - I thought it might be you. ;)<br />
<br />
I've made some small updates as a result (good timing on your part) and will test the new entries against the default regex as well for comparison.<br />
<br />
I have found type-hinting to be increasinly useful as time has gone by.]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 02 Mar 2010 15:23:51 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33673#msg-33673</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33673#msg-33673</link>
            <description><![CDATA[Nice design, I've checked out the code too and it's much better. Tried a few things and it seemed to catch them. The &lt;a href&gt; check seems a bit weak though I almost able to sneak a vector through.<br />
<br />
Good work! I'll keep my eye on this one. BTW I like the fact you use type hinting! Wish more devs did this]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 02 Mar 2010 14:34:30 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33532#msg-33532</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33532#msg-33532</link>
            <description><![CDATA[The shorthand lists in antisamy are now allowed for (eg. background can take on background-image values as well). A couple of very minor issues encountered so far (the default rules not allowing everything I would have expected), but it is a very small minority:<br />
<br />
AllOWED<br />
<br />
&lt;div style=&quot;background:url(test.png);&quot;&gt;test&lt;/div&gt;<br />
<br />
NOT ALLOWED<br />
<br />
&lt;div style=&quot;background:url('test.png');&quot;&gt;test&lt;/div&gt;<br />
&lt;div style=&quot;background:#FFF url(test.png);&quot;&gt;test&lt;/div&gt;]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Sat, 20 Feb 2010 18:40:09 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33477#msg-33477</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33477#msg-33477</link>
            <description><![CDATA[@sjdev86<br />
<br />
Wow very nice I will test soon]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 16 Feb 2010 13:45:52 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33466#msg-33466</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33466#msg-33466</link>
            <description><![CDATA[I've now switched everything over to use the anti-samy policy file.<br />
<br />
Still a couple more things to implement (eg. cross-referencing shorthand lists), but it's working very nicely (IMO!) so far.]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 16 Feb 2010 10:05:14 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33438#msg-33438</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33438#msg-33438</link>
            <description><![CDATA[@rvdh<br />
<br />
C'mon give it a shot I know you want to :)]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 15 Feb 2010 03:17:51 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33431#msg-33431</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33431#msg-33431</link>
            <description><![CDATA[LMAO I can see where this thread is heading.]]></description>
            <dc:creator>rvdh</dc:creator>
            <category>Projects</category>
            <pubDate>Sun, 14 Feb 2010 17:47:18 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33424#msg-33424</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33424#msg-33424</link>
            <description><![CDATA[not an exploit (still tinkering), but i find it odd that it doesn't allow title for img]]></description>
            <dc:creator>Kyo</dc:creator>
            <category>Projects</category>
            <pubDate>Sun, 14 Feb 2010 06:30:18 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33423#msg-33423</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33423#msg-33423</link>
            <description><![CDATA[Is anyone able to verify whether the mbstring function &quot;mb_detect_encoding&quot; is vulnerable to the buffer overflow vulnerability? I don't currently have access to anything below php 5.29.<br />
<br />
http://www.securiteam.com/unixfocus/6X00P0ANFM.html seems to suggest that it is one of the functions that should be &quot;safe in their nature&quot;.]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Sat, 13 Feb 2010 14:09:40 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33388#msg-33388</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33388#msg-33388</link>
            <description><![CDATA[I suppose that one alternative would be to hook into the anti-samy policy file, using xpath to find the approriate rules for the attribute or property. The input value could then be matched against the resulting regex / literal rules.<br />
<br />
EDIT: current demo now using anti-samy]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 11 Feb 2010 10:27:43 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33383#msg-33383</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33383#msg-33383</link>
            <description><![CDATA[@sjdev86 <br />
<br />
Not exploitable currently true but IMO you should do strict whitelists and only allow values that conform to it. A global whitelist should come before it which you do which is pretty cool.]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Thu, 11 Feb 2010 04:03:44 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33372#msg-33372</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33372#msg-33372</link>
            <description><![CDATA[Anti-samy does look good. I've refined the attribute filtering somewhat, although I haven't gone as far as producing a rule for every different attribute / style value.<br />
<br />
At this point, something like &lt;div style=&quot;color:'''';&quot;&gt;test&lt;/div&gt; will still get through the value whitelist (allow letters, numbers, spaces and # % ' , - . _ characters), but I wouldn't have thought that would be exploitable (which is my primary concern for now)?]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Wed, 10 Feb 2010 06:33:58 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33365#msg-33365</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33365#msg-33365</link>
            <description><![CDATA[I'd take a look at how the css rules work in anti-samy, it's pretty good:-<br />
http://i8jesus.com:9080/AntiSamyDemoWebApp/test.jsp]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Wed, 10 Feb 2010 04:17:50 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33339#msg-33339</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33339#msg-33339</link>
            <description><![CDATA[Thanks for the hole picking. I'll get on to re-factoring / tightening up the character whitelist (and the decoding has been given the boot from plain text output).]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 09 Feb 2010 17:12:59 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33332#msg-33332</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33332#msg-33332</link>
            <description><![CDATA[You really need to stop autodecoding everything<br />
<br />
http://pastebin.ca/1791900<br />
<br />
Also your css value whitelist sucks<br />
&lt;div style=&quot;color:(')(')(')(')(')(')(')''&quot;&gt;test&lt;/div&gt;]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 09 Feb 2010 15:40:32 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33328#msg-33328</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33328#msg-33328</link>
            <description><![CDATA[Alright, I'm happy with it now.<br />
<br />
The only thing I haven't addressed is mbstring, as I'm still thinking about the best alternative (any thoughts on dealing with the charset welcome).]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Tue, 09 Feb 2010 12:08:27 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33316#msg-33316</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33316#msg-33316</link>
            <description><![CDATA[Getting a bit too close there.<br />
<br />
Looks like php's DOMDocument automatically decoded the hex entities, which rendered the encoding check useless. Better get that css whitelist up.<br />
<br />
EDIT: initial attempt at css whitelist now in place]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 17:53:31 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33315#msg-33315</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33315#msg-33315</link>
            <description><![CDATA[Damn - almost gotcha... :)<br />
<br />
<pre class="bbcode">&lt;b style=&quot;color:red;background:url(/bla&amp;#x29&amp;#x3b;x:&amp;#x65;xprEssio/n(write(1));&quot;&gt;000&lt;/b&gt;</pre>]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 16:42:29 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33314#msg-33314</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33314#msg-33314</link>
            <description><![CDATA[@sjdev86  <br />
<br />
That's a really good change, double urlencoded data would most probably be bad input]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 13:22:19 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33313#msg-33313</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33313#msg-33313</link>
            <description><![CDATA[Gives me something to aim for then, doesn't it. ;)<br />
<br />
I was just playing around with the decoding options. So far I'm taking the approach that if passing the attribute through the decoding function changes the value in any way, then it's presumed to be bad input and the attribute is removed (otherwise further checks are carried out).<br />
<br />
The only situation I can think of where a legitimate user entering html might fall foul of this is if they've copy &amp; pasted a urlencoded link into the attribute value. Other than that, there's no reason for any encoded data to ever end up in an attribute (that I can think of).<br />
<br />
&lt;a href=&quot;%256aavascript%2522&quot;&gt;test&lt;/a&gt; BECOMES &lt;a&gt;test&lt;/a&gt;]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 10:52:20 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33312#msg-33312</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33312#msg-33312</link>
            <description><![CDATA[@sjdev86<br />
<br />
If you are comparing the input continually then I guess that would be ok to run through htmlspecialchars but try use whitelist wherever possible even in attributes. <br />
<br />
I'd also be tempted to avoid certain characters completely like for example just because the HTML spec says you can use x and unlimited characters does it mean you should? Specifications are fine for making things easy to understand and implement but should be ignored whenever their definition makes it easier to exploit your system.<br />
<br />
I like your code, if it is improved and you take this approach I'll definitely recommend it after mario, sirdarckcat and thornmaker have broken it first though :)]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 08:51:35 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33311#msg-33311</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33311#msg-33311</link>
            <description><![CDATA[The decoding function should catch double encoding (since it continues to cycle until nothing changes), however I agree that your approach seems the more sensible one.<br />
<br />
I'll re-configure it for a pass / fail approach - if it passes all checks, leave input as is (still run it through htmlspecialchars once passed?), otherwise remove that data entirely.]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 08:13:25 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33310#msg-33310</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33310#msg-33310</link>
            <description><![CDATA[@sjdev86<br />
<br />
Yeah whitelist each property and only allow the ones you know about rather than blacklist. <br />
<br />
The reason that decode is a mistake is because your filter is potentially creating new vectors by converting. I'd recommend you inspect but do not convert that way you'll avoid potential issues in future like this for example:- <br />
&lt;a href=&quot;%256aavascript%2522&quot;&gt;test&lt;/a&gt;<br />
<br />
Your filter is performing a auto decode of urlencoded data, after that it is then encoding it with html entities. Some vectors include html entities and also can function with double urlencoding I'd recommend you only leave input as it is supplied or remove it if it is dangerous. IMHO Designing a filter like this involves thinking what could potentially break not what breaks currently.]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 08:00:12 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?12,33287,33309#msg-33309</guid>
            <title>Re: &quot;Yet another html filter&quot; (allowHTML)</title>
            <link>http://sla.ckers.org/forum/read.php?12,33287,33309#msg-33309</link>
            <description><![CDATA[Thanks for the feedback, Gareth.<br />
<br />
I'll build in a whitelist for the style attribute (always figured I'd have to if I was going to allow it). I've got mbstring dotted around the place in various components I'm building, so that's going to need re-thinking.<br />
<br />
I'm interested in the decoding issue - I was under the (false?) impression that it was a good idea to try and decode the input as much as possible, then check for and neutralise any resulting evil characters afterwards. Bad idea in general, or just in regards to &quot;urlencoded&quot; characters like %22?]]></description>
            <dc:creator>sjdev86</dc:creator>
            <category>Projects</category>
            <pubDate>Mon, 08 Feb 2010 07:31:53 -0600</pubDate>
        </item>
    </channel>
</rss>
