log analysis

1890 Words4 Pages

Log is a file that records the events which happens while an operating system or software runs [1]. It may include any activity such as information about a simple keystroke, the complete record of communication between two machines, system errors, inter-process communication, update events, server activities, client sessions, browsing history, etc. Logs provide a good insight into various states of a system at any instant and their analytical and statistical study can manage systems and mine useful knowledge about a user on various aspects. Log data is voluminous, growing at a very fast rate, with varying structure across various applications, usages, servers, etc. It possesses the key characteristics of the Big Data which include volume, velocity, variety and value.
Analytical study of logs support accurate interpretation of the current state, prediction of upcoming state, and suggest certain reactive measures in a scenario. With such a diverse and rich lot of information, statistical analysis will easily monitor the system performance and take proactive measures to improve it without human intervention. A screenshot in Figure-1 showing 195 log files on a Windows system can give an estimate how diverse and rich information does they contain.
Logs can be classified into various categories based on the type of activity they monitor, the source type, the type of information they reveal. Few such classifications are depicted in Figure-2.
Each log has its own structure and parameters of presenting information but the common fact they share is the rate of growth of data is almost comparable in each case.
A lot of statistical work had already been done on log analysis in past years. Application of time series analysis to transaction log...

... middle of paper ...

...a mining are Anomaly Detection, Machine Learning, Clustering, Classification, Regression, Summarization etc.
Key directions in data mining are:
a. Feature Extraction: Looks for most extreme examples of certain phenomenon and represent data by those examples [10]. It may be similarity based or frequent item sets.
b. Statistical Modeling: Decide about which distribution pattern a data set follows. It eases the descriptive analysis and provides foundation for predictive analysis.
c. Machine Learning: Uses data to train the algorithm and proves effective when we have little knowledge of what we are looking for.
d. Computational Approaches: It considers the algorithms complexity aspect of data mining. It may include approximation or specifics.
e. Summarization: Provides an overview of data based on certain techniques like clustering, cumulative probability concepts etc.

Open Document