Monday, February 11, 2013

Map Reduce Simplified

Yes it is about parallel and distributed computing, there are tonnes of web pages, books articles, diagrams etc. etc with nice buzz words to talk about Map Reduce, here is the most simplified explanation.

Lets take a real life example.

1. Company CEO called all Program Manager's "I need total effort spent this month by noon". Program Manager's no problem sir. Why are Program Manager's not worried because they are going to distribute task :-)

2. Each Program Manager called their project manager asking for effort spent so far.

3. Each Project Manager pulled up effort sheet and provided it to their Program Managers.

4. Program Managers complied received sheet into one file and sent it to CEO.

5. Company CEO collated all the sheets and calculated total effort spent.

Each individually broke its task to smaller tasks (Mapped its input task to smaller tasks), Program Manager was required to provide effort spent, he mapped his task to smaller tasks, this is MAP.

Program Manager's on receiving data from their project managers compiled it back to single output, this is REDUCE.

Now lets zoom out and summarize how Map Reduce applies to distributed and parallel computing. Each node distributes its task to smaller tasks(Maps its given task). Each node receive results, combine them(REDUCE) to generate required output.


Sunday, January 27, 2013

Apache Roller Getting Started

If you are thinking to setup a blog website then apache roller is the quickest and simplest thing to do so, below are steps for getting started, though these instructions are present in documentation provided by apache, but hopefully following steps will make it very easy for you.

Here you Go !!

1. Download apache roller

http://roller.apache.org/download.cgi#roller50

2. Install Tomcat

http://tomcat.apache.org/download-70.cgi

3. Download mysql

http://dev.mysql.com/downloads/

4. Unzip apache roller zip file and copy roller-5.0.1-tomcat.war from
\roller-weblogger-5.0.1-for-tomcat.zip\roller-weblogger-5.0.1-tomcat\webapp to tomcat webapps folder

5. Create mysql database, dont worry about schema, that will be automatically created later.

mysql -u root -p
password: *****

mysql> create database rollerdb;
mysql> grant all on rollerdb.* to scott@'%' identified by 'tiger';
mysql> grant all on rollerdb.* to scott@localhost identified by 'tiger';

Otherwise you will get following error on webpage:

[com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown database 'rollerdb'


6. Create a file roller-custom.properties in tomcat lib folder, here is default content:

installation.type=auto
mediafiles.storage.dir=/usr/local/rollerdata/mediafiles
search.index.dir=/usr/local/rollerdata/searchindex
log4j.appender.roller.File=/usr/local/rollerdata/roller.log
database.configurationType=jdbc
database.jdbc.driverClass=com.mysql.jdbc.Driver
database.jdbc.connectionURL=jdbc:mysql://localhost:3306/rollerdb?
autoReconnect=true&useUnicode=true&characterEncoding=utf-8&mysqlEncoding=utf8
database.jdbc.username=scott
database.jdbc.password=tiger
mail.configurationType=properties
mail.hostname=smtp-server.example.com
mail.username=scott
mail.password=tiger

If you dont add this file then you will get following error on startup:

Roller Weblogger: No customer properties found in classpath

7. Please download and copy jars for mail api and mysql jdbc drivers to lib directory of tomcat

mail.jar
mysql-connector-java-5.1.22-bin.jar

I came across following error and copying mail.jar to lib directory fixed it.

SEVERE: Error listenerStart
Jan 28, 2013 11:03:29 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/roller-5.0.1-tomcat] startup failed due to previous errors
Jan 28, 2013 11:03:29 AM org.apache.catalina.loader.WebappClassLoader clearRefer
encesThreads
SEVERE: The web application [/roller-5.0.1-tomcat] appears to have started a thr
ead named [Abandoned connection cleanup thread] but has failed to stop it. This
is very likely to create a memory leak.

8. Start tomcat and open following url

http://localhost:8080/roller-5.0.1-tomcat/

9. If you have followed all the steps, you should see following screen:


10. Click on "Yes - create tables now", once the tables are created you will see following confirmation message.

Database tables were created successfully as you can see below.
Database tables are present and up-to-date. Click here to complete the installation process and start using Roller.

Click on "Here"

11. You should now see welcome message

"Welcome to Roller"

12. Follow on screen steps to create a user and blog.

Hopefully you will find it very simple to setup and have fun setting up your blog site in few minutes.

Saturday, August 13, 2011

HTML5 Cool Features

HTML5 has come up with lot of features but coolest of them are MathML and SVG support. Tried few quick examples, they make web designing more fun:




If you don't see a green circle, red rectangle and a blue line then your browser does not support HTML5, visit following URL to check your browser:

http://html5test.com/

Good luck for rich GUI's.



Thursday, June 16, 2011

View Drools Generated Code

When we execute rules, behind the scene java code is generated which is actually getting executed, to see this java code you need to set configuration path where you want java code generated by drools to be stored.

PackageBuilderConfiguration configuration = new PackageBuilderConfiguration();
configuration.setDumpDir(new File("XYZ"));

final KnowledgeBuilder kbuilder = KnowledgeBuilderFactory
.newKnowledgeBuilder(configuration);

Set XYZ as a directory path where you would like drools to store generated code. e.g. XYZ = "c:/drools/codegen"

When you run your program the java classes generated for the .drl files are stored at the path XYZ.

This generated code is really helpful to understand the behavior in lot of situations.

Wednesday, October 27, 2010

JSON Optimizing for Faster Rendering

All web applications have fast rendering as one of the primary goals, JSON is a preferred choice in most of AJAX based applications because it is lightweight data interchange format.

Consider a scenario where you have to read data from database and render it on UI, typically the steps which will be followed are:

a. Send request data to server using AJAX calls
b. At server end invoke the Business layer
c. Business layer calls the database layer
d. Data returned from database layer is populated into business objects
e. From business objects build JSON objects, send them back to client
f. Client use Javascript libraries to render data on UI.

If everything works fine for you, you can stop reading right now, because you are achieving the required goal in the best way.

But in real world this might not be working for you because of performance issues at one or the other step mentioned above. Here are few performance tips, they might suite in some of the scenarios and might be the worst suggestion in other cases:

1. Store JSON objects in database, read them send them to UI. It will be the fastest Way but how to query, we need to build JSON object from our business objects before sending them to UI.

2. Store JSON objects in one of the columns in database, query on other columns and return the column containing the JSON Object.

These tips might be useful in some of the scenario's, also use JSON API's to add methods to generate JSON objects out of Java Objects.

Hope they might help you, might not help some of you.

Good Luck for faster UI's

Friday, October 22, 2010

Drools - An overview

For Java based applications the most challenging part has always been the business logic maintenance, and pick any applications which you find complex and if we ask ourself how complex it would be moving forward, the answer will always be nX times.

What do we do ? Drools comes for Rescue as a Rule Engine.

Drools provides mechanism:

a. To write business logic in simple english language
b. Easy to maintain and very simple to extend
c. Reusability of logic by defining keywords in a DSL file and using them in DSLR file.

But be careful nothing comes free, everything takes cost in terms of memory and time space.

Use Drools if you really have :

a. Business logic which you think is getting cluttered with multiple if conditions because of variety of scenarios
b. You will have growing demand of increase in the complexity
c. The business logic changes would be frequent (1 - 2 times a year would also be frequent)
d. Your server's have enough of memory as it is a memory hungary tool, it provides performance at cost of memory

Choosing a technology stack is a big decision for the lifecycle of an application, so evaluate both pros and cons and if they fits in your application requirements go for it because it is one of the easiest to use and plug in Java based Rule Engine.

Some programmers might find it cumbersome as we are used to looking at the code which is being executed and loves to debug it and understand it and see what is actually happening, latest version of Drools 5.0 address this by providing JMX support and functionality to see the generated Java Code.

Good Luck with your Rule Engine !!

Friday, February 20, 2009

Class Data Sharing

Class data sharing (CDS) a feature introduced in J2SE 5.0 reduces the startup time for Java
programming language applications.

When the JRE is installed on 32-bit platforms using the Sun provided installer, the installer loads a set of
classes from the system jar file into a private internal representation, and dumps that representation to a file,
called a "shared archive".Class data sharing is not supported in Microsoft Windows 95/98/ME.

During subsequent JVM invocations, the shared archive is memory-mapped in, saving the cost of loading those
classes and allowing much of the JVM's metadata for these classes to be shared among multiple JVM processes.

The primary motivation for including CDS in the 5.0 release is the decrease in startup time it provides.
CDS produces better results for smaller applications because it eliminates a fixed cost: that of loading
certain core classes. The smaller the application relative to the number of core classes it uses, the
larger the saved fraction of startup time.


The footprint cost of new JVM instances has been reduced in two ways. First, a portion of the shared archive,
currently between five and six megabytes, is mapped read-only and therefore shared among multiple JVM processes.
Previously this data was replicated in each JVM instance. Second, since the shared archive contains class data
in the form in which the Java Hotspot VM uses it, the memory which would otherwise be required to access the
original class information in rt.jar is not needed. These savings allow more applications to be run concurrently
on the same machine.

Regenerating Shared Archive

To regenerate the share archive use the following command:

java -Xshare:dump

Diagnostic information will be printed as the archive is generated.

Manually Controlling Class Data Sharing

The class data sharing feature is automatically enabled when conditions allow it to be used. The following command
line options are present primarily for diagnostic and debugging purposes and may change or be removed in future
releases.

-Xshare:off
Disable class data sharing.

-Xshare:on
Require class data sharing to be enabled. If it could not be enabled for various reasons, print an error message and exit.

-Xshare:auto
The default; enable class data sharing whenever possible.