Cenzic 232 Patent
Paid Advertising
sla.ckers.org is
ha.ckers sla.cking
Sla.ckers.org
Script obfuscation, filter evasion, IDS/IPS/WAF bypassing... this is where it should live. Because this topic is too big to live anywhere else. Phj33r! 
Go to Topic: PreviousNext
Go to: Forum ListMessage ListNew TopicSearchLog In
Javascript compression contest
Posted by: Gareth Heyes
Date: August 15, 2009 06:25AM

So mario suggested a new contest based on some experiments I was doing:-

http://www.thespanner.co.uk/2009/08/15/javascript-compression-with-unicode-characters/

The idea is to compression javascript as small as possible without lookup tables. The contest is can you reduce the compression and reduce the decoding function. Here are the rules:-

1. No lookup tables
2. Code can't be static, must generated by anything you supply.

Have fun :)

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]



Edited 1 time(s). Last edit at 08/15/2009 09:16AM by Gareth Heyes.

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Anonymous User
Date: August 15, 2009 07:44AM

Here's my (invalid but working) approach:

Compressor:
for([l='length',d='',b='alert(1)'],i=0;i<b[l];i++)c=b.charCodeAt()-32+'',d+=c[l]<2?0+c:c;for([e=d.match(/\d{4}/g),d=''],i=0;i<e[l];i++)d+=String.fromCharCode(e)

Decompressor:
for(b='',i=0;i<[a=d][x=32,0].length;i++)with(a.charCodeAt()+'')b+=String.fromCharCode(x+parseInt(match(/\d\d/)[0],10),x+parseInt(match(/\d\d$/)[0],10));eval(b)

It's not about length all the time, Gareth. Mine is a tiny bit longer - but compresses 8 chars to 4 and not 8 to 5 :)

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Gareth Heyes
Date: August 15, 2009 07:53AM

@mario

What would you suggest for the rules then?
There has to be a goal, as Andrea pointed out my code doesn't actual compress because unicode in javascript are two distinct characters and the actual bytes increase. So I guess you are winning :)

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Anonymous User
Date: August 15, 2009 08:13AM

I like the first two rules but the third one doesn't make too much sense. If we make a compression contest it should probably be about the compression ratio - just thinking :)

Basically I am just curios how you guys would solve the problem with entity usage beyond &#9999; :P

Options: ReplyQuote
Re: Javascript compression contest
Posted by: sirdarckcat
Date: August 15, 2009 11:17PM

this breaks your program gareth:
<@jspack_0>cake<@/jspack_0>

try copy-pasting that hehe :P your algo makes null chars

greetz!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat



Edited 3 time(s). Last edit at 08/15/2009 11:32PM by sirdarckcat.

Options: ReplyQuote
Re: Javascript compression contest
Posted by: sirdarckcat
Date: August 15, 2009 11:32PM

we should make

function d(){}

to decompress and

function c(){}

to compress

:)

mario's would be:

function c(b){for([l='length',d=''],i=0;i<b[l];i++)c=b.charCodeAt()-32+'',d+=c[l]<2?0+c:c;for([e=d.match(/\d{4}/g),d=''],i=0;i<e[l];i++)d+=String.fromCharCode(e)}

function d(d){for(b='',i=0;i<[a=d][x=32,0].length;i++)with(a.charCodeAt()+'')b+=String.fromCharCode(x+parseInt(match(/\d\d/)[0],10),x+parseInt(match(/\d\d$/)[0],10));eval(b)}

Also, we should use something to test our algos.

I realized mario's could be improved using toSource() tab-compression while testing it with: http://foro.elhacker.net/cake.js so I'll suggest using it as test suite. It has everything, from ternary operators, to normal loops, E4X, etc..

Andrea's statement is correct since our code will be encoded on UTF-8 (or the charset in turn) so our code will increase as he says it will, so I think unicode is in no way usable for code compression.

code rewriting and lookup tables are our best approach, and for code rewriting there's already YUI compressor / js-minify / etc.. and for lookup tables there's dean edwards packer.. we could try to beat them.

Greetings!!

This code: http://pastebin.com/f2f0269b6 is a simple generic lookup table (to improve performance is a static lookup table but if we are concerned about the size it can be transformed to a dynamic lookup table, with a best-fit GZ-alike algorithm) code-rewriting (based on the toSource() compression).

It can be improved since spidermonkey's decompiler generates code that can be improved a lot... like

x = 1;

function x () {

etc..

with JSReg regexes I think this can be transformed to a veeery good compressor.. wouldn't it?

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat



Edited 3 time(s). Last edit at 08/16/2009 01:32AM by sirdarckcat.

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Anonymous User
Date: August 16, 2009 06:12AM

Yep - mine can be broken with whitespace and newlines etc. - also multibyte characters cannot be compressed etc. Very very alpha and just for fun. But yes - with some more time spent on the algorithms - some tables and better support for multibyte characters it could really be useful - maybe :)

If it used unescape() instead of String.fromCharCode() we could use the hexadecimal table index - not the decimal causing problems above 99 (that's why I used the 32 - pretty lame). Then the script would have to detect multibyte characters and flag them - like with a trailing zero creating 5 characters instead of 4.

<edit>
This should get rid of the 32 issue and allow the whole ASCII range from 0-255 (no multibyte support so far though) I think regex based whitespace compression makes more sense - thoughts?

for([l='length',d='',b='alert(1);        alert(2);'.replace(/[\n\r\t]|\s{4,}/gm,'')],
i=0;i<b[l];i++)c=b.charCodeAt().toString(16),
d+=c[l]<2?0+c:c;for([e=d.match(/\w{4}/g),d=''],
i=0;i<e[l];i++)d+=String.fromCharCode(parseInt(e,16))

//result: &#24940;&#25970;&#29736;&#12585;&#15201;&#27749;&#29300;&#10290;&#10555;

And here's the corrected decompressor:
for(b='',i=0;i<[a=d][0].length;i++)with(a.charCodeAt().toString(16))
b+=String.fromCharCode(parseInt(match(/\w\w/)[0],16),parseInt(match(/\w\w$/)[0],16));eval(b)
</edit>



Edited 3 time(s). Last edit at 08/16/2009 07:00AM by .mario.

Options: ReplyQuote
Re: Javascript compression contest
Posted by: sirdarckcat
Date: August 16, 2009 08:13AM

hmmm... dudes. this is not compressing anything! unicode is in fact a way of saying
"hey! let's make everything twice it's original size", and actually some times it can be 16 times its original size.

the simplest way of doing the unicode trick would be simply doing a bad-unicode emulation:

"hola".replace(/[\x01-\xFF]{1,2}/g,function(_){var u=_.charCodeAt(),d=_[1]?_.charCodeAt(1):0;return String.fromCharCode((u<<8)+d)});

and decoding:

"\u686F\u6C61".replace(/./g,function(_){var d=_.charCodeAt();return String.fromCharCode(d>>8,d%256);});

That would be reducing everything exactly by half.

but still... if you see this correctly, in reality, every Unicode char would be using (on real byte-transfer over the interwebs and in memory) either 2 or 3 bytes on UTF-8 representation.

you can check using encodeURI();

// before "compression"
"hola".length=4;
// after "compression"
"\u686F\u6C61".length==2;
// real string length on UTF-8
encodeURI("\u686F\u6C61").match(/%/g).length==6;

all our "compressed" strings are in reality 2/3 bigger.

Greetz!!

--------------------------------
http://sirdarckcat.blogspot.com/ http://www.sirdarckcat.net/ http://foro.elhacker.net/ http://twitter.com/sirdarckcat

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Gareth Heyes
Date: August 17, 2009 09:21AM

Ok been messing with this instead of high unicode characters:-

http://www.businessinfo.co.uk/labs/hackvertor/hackvertor.php#PEBqc2NvbXByZXNzMTI4XzA%2BJ3Rlc3QnKyJ0ZXN0ZXIiPEAvanNjb21wcmVzczEyOF8wPg%3D%3D

Basically I match pairs of characters and reduce them to one. Storing the reference of the character in a low ascii character with an offset to the next character up to 10. This could be used to compress small strings or variable concatenations. It isn't perfect however as the resulting offsets will be different when the length is modified or if the compressed characters occur very close to each other.

charsCompressed = {'"':1,"'":11,"+":21}
'aa'

becomes:-
0x14aa

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]

Options: ReplyQuote
Re: Javascript compression contest
Posted by: Gareth Heyes
Date: August 19, 2009 03:50AM

I've been having fun trying to find different ways to compress stuff :) I tried to keep an open mind and ignore the exact compression techniques used by other algorithms.

So here are my ideas:-

Pair compress
-------------
Using pairs of characters and then a unicode character to define the repetition. This works really well when characters are repeated next to each other for example:-

''''''''''''''''''''''''''''''''''''''''  
Becomes:-
&#800;5

The first character indicates the ' character and the 5 indicates the amount.

Map compress
------------
Here I use repeated combinations in succession or elsewhere in the document. So for example:-

function x() {}
function() {}

Becomes:-
&#500;function&#500;
&#500;() {}

The replacement maps are placed at the top of the compressed document.

I've also added a compression section to Hacvkertor because it's useful for js compression and obfuscation. Unicode compression is particular handy with persistent XSS when you have a limited number of characters.

Examples of them are available here:-
http://www.businessinfo.co.uk/labs/hackvertor/hackvertor.php#PEBqc2NvbXByZXNzMzBfMD4ndGVzdCc8QC9qc2NvbXByZXNzMzBfMD4KCjxAanNtYXBjb21wcmVzc18xPmZ1bmN0aW9uIHgoKXt9IApmdW5jdGlvbiB5KCl7fTxAL2pzbWFwY29tcHJlc3NfMT4KCjxAcGFpcmNvbXByZXNzXzI%2BYWFhYWFhYWFhYWFhYWFhYWFhYTxAL3BhaXJjb21wcmVzc18yPg%3D%3D

BTW these are experimental at the moment and are not finished in any way :)

------------------------------------------------------------------------------------------------------------
"People who say it cannot be done should not interrupt those who are doing it.";
labs : [www.businessinfo.co.uk]
blog : [www.thespanner.co.uk]
Hackvertor : [hackvertor.co.uk]



Edited 1 time(s). Last edit at 08/19/2009 03:51AM by Gareth Heyes.

Options: ReplyQuote


Sorry, only registered users may post in this forum.