Thursday, 4 August 2011

Apache Http Access Log Parser



What Is Log Parser?
Log parser is small java applications that parses apache HTTP access log files and creates a corresponding tsv(Tab Seperated Values) file.
It provides an easily readable view of log data.
The generated tsv file can also be imported in a database and queried for analysis purpose




How To Run Log Parser?


2. For windows:
a. Open command prompt (Windowns--> run -->cmd)
b. Go to folder where you have saved logparser.jar
b. Run command java -jar inputFileName (inputFileName is the fully qualified name of log file, for e.g. C:\logs\test-access.log)

  For linux/unix:
a. Go to folder where you have saved logparser.jar
b. Run command java -jar inputFileName (inputFileName is the fully qualified name of log file, for e.g. temp/logs/test-access.log)

3. Output file is generated in output folder inside the folder where logParser.jar was placed. The file is named as httpLog-yyyymmdd.tsv.



Prerequisites To Run The Jar:
1. JRE(min version 1.5) should be installed on the machine and should be added to classpath

or build it from code. Code is located at https://github.com/nehasaxena/log-parser



Please give your feedback. This will help me to improve the log parser.

Sunday, 30 January 2011

Remove jsessionID from URL (java)




Ø  How is session maintained in java applications?

Session can be maintained using one of the following mechanisms:

1.       Cookies – Server generates name value pair which is stored by the web browser on user’s computer.  Browser sends the information stored in cookie back to the server each time it access the application, so that server can recognize that the new request is part of existing session.
2.       URL rewrite – Server appends session id using parameter jsessionID to all the URLs present on the page returned to the browser. When the user clicks on any of the URLs onthis page, jsessionID is sent back by the browser to the server and the server recognizes that this request is part of an existing session. (for e.g http://www.mytestdomain.com;jsessionid=390018FF5697193A9EFF4EC43B3695B3?param1=value1.
 Methods encodeURL() and encodeRedirectedURL() can be used to implement this.)
3.       Hidden variables – We can also use hidden variables to maintain session, although in true sense, it would be a request based system, not a session based system.

Find out more about maintaining sessions on Oracle website.


Ø  Why is jsessionid appended to some URLs even after cookies are enabled?
If the cookies are disabled on the browser or cookies are absent, and URL is being encoded, jsessionid will be appended to the URL
Note that even when cookies are enabled, if URLs are being encoded, java application appends jsessionid to all the URLs for the first request. This happens because when the first request is sent, the server doesn’t know if cookies are enabled on the browser.


Ø  Why do we need to remove jSession ID from URL’s?
Distinct resources in your applications should be identifiable with distinct URLs. Here are a few advantages of doing this:
1.       More effective search engine optimization
2.       Easier to enable caching based on URL
3.       Cleaner, more user friendly URL


Ø  How to enable cookies for maintaining session on Tomcat server?
Cookies are by default enabled on Tomcat server. They can be turned off by putting cookies=false in <context> element defined for your application.


Ø  How/where to write rules to remove jessionid?

1.       Add a URLRewrite module to your project:
a.       Entry in pom.xml to include dependency to URLRewrite module:
        <dependency>
            <groupId>org.tuckey</groupId>
            <artifactId>urlrewrite</artifactId>
            <version>3.0.4</version>
        </dependency>

b.      Entry in web.xml:
<filter>
        <filter-name>UrlRewriteFilter</filter-name>
        <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter</filter-class>
        <init-param>
            <param-name>logLevel</param-name>
            <param-value>WARN</param-value>
        </init-param>
    </filter>
    <filter-mapping>
        <filter-name>UrlRewriteFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

c.       Add urlrewrite.xml in WEB-INF directory

            Find more information on adding url rewrite module for java on tuckey.org


2.       Add following outbound rules to urlrewrite.xml file to ensure that parameter jsessionID is removed from every outgoing URL:
    <outbound-rule encodefirst="true">
        <note>Remove jsessionid from embedded urls - for urls WITH query parameters</note>
        <from>^/(.*);jsessionid=.*[?](.*)$</from>
        <to encode="false">/$1?$2</to>
    </outbound-rule>


    <outbound-rule encodefirst="true">
        <note>Remove jsessionid from embedded urls - for urls WITHOUT query parameters</note>
        <from>^/(.*);jsessionid=.*[^?]$</from>
        <to encode="false">/$1</to>
    </outbound-rule>