Avoiding Garbage Collection?

I ran across an article recently discussing the new release by Terracotta of their BigMemory product. At first I thought it was rather intriguing, until I get to the end of the article and read the following, "Whilst the target market for BigMemory is mainly people who are not ready to build a fully distributed architecture, the product does work with the distributed cache as well as the single node.". Yes, they are essentially admitting that it's for a small application or, in not so many words, targeting those applications which are not built for scalability. In other words, grow the application up rather than out. Using this new BigMemory product, the idea is that you'll be reducing the number of JVMs you are running (virtual or not) and increasing the memory usage per JVM. This is like going backwards from distributed architectures to monolithic ones and not something to approach lightly. Perhaps there is a happy middle ground (or perhaps BigMemory is more marketing of a product to address a small niche than something we should all take seriously)? Being skeptical, I wanted to share some thoughts on this approach...

The advantage being given for BigMemory is it doesn't need to allocate and deallocate
blocks of memory as it's in-RAM and can be thought of as a big array (and we all know
arrays can grow and shrink at will). The whole premise of the JVM's garbage collection
is because over time, once memory blocks are allocated and deallocated repeatedly, the
JVM needs a way to clean up these memory blocks. BigMemory appears to essentially just
allocate whatever size chunk of memory you specify and manage it internally, but hiding
this fragmentation from the garbage collection. Moreover, what happens when the
application runs out of memory available? Back to square one - distributing (putting app
tuning, memory leaks, etc. aside). Furthermore, BigMemory seems to only solve the GC
issue with the use of ehCache - the GC will still occur in your application regardless
(you can just shrink the space allocated to the JVM so long as you increase the space
allocated to ehCache, meaning the JVMs GC will not take as long with less space to clean

Back to the statement from the article, growing the application up rather than out has
it's own consequences that must be considered from an architectural deployment
standpoint. You would need to ask yourself a few important questions, namely what to do
for failover. If the quick/easy answer is to have a another box running the same stack,
then you've essentially tip-toed into the distributed camp already ever so slightly as it
may be. Suppose now you need to synchronize the data across these two boxes so that
there is no downtime in case of a failover. You'll need to integrate some memory
management tool into your architecture. At this point you are back to making memory
management configuration adjustments to handle distributing requests between multiple
systems. So now we're left with a distributed architecture, each having it's own
capability of not running GC (thus not having full-stops during gc runs) yet each also
having to be configured for data management for failover. Interestingly enough, once
load becomes greater than the application can handle, another node will need to be
inserted into the mix, thus leading us back down the path of distributed architectures
and all of its glory.

At the end of the day, Terracotta's new BigMemory product appears to address a narrow
market, which they also seem to be admitting to. I would ignore all of the additional
marketing hype in which they are attempting to make one think it's the "next big thing"
in software deployment (since it's not). BigMemory seems to have solved a problem
ehCache had, but appears as a whole to be a product looking for a solution in the
marketplace. Having said this, the new idea of BigMemory is an interesting one and I
look forward to seeing metrics collected using it against GC. For now I'll be sticking
to distributed architectures and utilizing data grids.

Project Packaging: by-layer vs. by-feature

For some, the topic of project source code packaging is taken personally. Much agonizing typically takes place when new projects are started regarding structuring of the project and the source code. For existing projects, these things are already in place and therefore no fundamental decisions have to be dealt with from a project perspective. For those projects just getting started or refactoring their project structure and/or source code packaging, this post is for you.

From my experience approximately 95% of the projects I've worked on were of the package-by-layer variety. The reason, I think, is due to what we've all learned about design. Designing a software project and modeling it in UML brings about a more structured approach to the process. This rigidness stems from the UML package diagram, whereby the architect or project lead is left with laying out the project package structure for the application. This structure is generally composed in a top-down manner and starts with the company website backwards (com.shaunchilders.* in the case of this site). From here, most designers will model around MVC and a tier-based pattern: action, servlet, controller, form, domain, view, dao, business, services, etc. Below, I hope to present this as not necessarily "wrong", but "not good enough" and give reasons why...

In the package-by-layer approach, the code packaging is structured such that the highest levels of packaging reflect tiers or layers, for example:


In this approach, each package contains classes that are very loosely coupled together. This typicallys means that we end up with a project that has a large number of non-related classes grouped together. This should be a "code smell" from the standpoint that it completely ignores one of the key java capabilities: package-level access with public, protected and private access. Your project will never be able to take advantage of this feature of Java (likewise for .NET). Furthermore, this leads to low cohesion and low modularity while at the same time introducing high coupling among different packages. In this approach, it is extremely difficult and time consuming to pull out a piece of logic or pieces of functionality as no one is sure what other code will be affected.

In this approach, each package represents a feature (or use-case) of the application. All classes related to a particular feature or feature set are grouped together and this allows for taking advantage of proper class access by other classes. This approach also leads to proper use of "code to interfaces" techniques, thereby making the application very modular and component-based. In this approach, it is clear what each package represents, for example:


As you can see, the package names correspond to high level aspects of a problem domain or use cases and all support classes needed for a given feature can be found within its package. This also allows for easy deletion (or addition) of features from a system since you can simply delete the directory. In this approach it is very likely that a class or classes in one package will depend on a class or classes from another package to complete its functionality and this is perfectly acceptable. The only thing you need to be sure of is that you don't introduce a circular dependency whereby package A depends on package B, which depends on package A. If this is the case, then you should reconsider your features and possibly combine them into one abstract feature set.

Recommended Approach
Do a mixture of both. At the high-level use packaging by feature to group similar things together and then at the sub-level, continue to use the traditional package-by-layer approach, whereby you might have a package structure that looks like this:


In the scenario above it is clear that there exists a Billing feature and that it provides support for both Citi and PayPal and that it also interfaces with a database via Hibernate.

Separating Domain (model) from Presentation (view)

In this day and age of websites and software development, things change at a rapid-fire pace. With these changes no doubt come modifications to the presentation of the website/application.

Hibernate, Caching and SQL

If you've ever struggled with using Hibernate for your persistance layer and also needing to write your own SQL, then you have no doubt ran into some problems. More likely, if you were using the Hibernate second-level cache mechanism. As it turns out, Hibernate's second-level cache cannot be used in conjunction with a custom SQL query - you'll end up debugging your application to find an unrelated ERROR being thrown from Hibernate.

Proper SQL with Hibernate

Let's take the example of using Hibernate to retrieve some data. A normal program listing for retrieving, for example, a Content object would be similar to the following:

And for making this query cachable by Hibernate to speed up retrieval next time, you would use the setCacheable() method:

So far so good, but now suppose you want to determine the highest rated piece of content in your table.

Custom SQL Queries
Since doing calculations (sum's) against data is much faster in the database versus programmatically, we'd like to have our calculations specified in the SQL. By doing so, we must pay attention to the way Hibernate handles such things. Normally, Hibernate wouldn't care about SQL since you can just use the session.createSQLQuery()method. The "gotcha" in doing so is that Hibernate doesn't like the use of aliases and you must account for this. In our example, let's suppose we have Content that references another table for tabulating the number of hits, ContentHits.

What we want to do is sum up the count and weight of each against the total number of counts. Ignoring the actual calculation we're performing, it would be nice to able to use an alias for the summing value. This is possible using pure SQL, but not possible using pure SQL through Hibernate. Hibernate balks at the JOINed reference to the alias name. For example, the following cause an exception in Hibernate:

The above query is completely valid SQL and we're simply performing a left join to pull the sum of the number of hits from the second table, ContentHits. The problem with the above when using Hibernate is that is throws an exception and doesn't recognize the alias in the left join side of the query. The following is what Hibernate expects and will work propertly with:

Notice the lack of the ORDER BY Total reference for the summed value. Due to this error in Hibernate, the user is forced to pull back the Content data and the total sum value and then perform more calculations programmatically. This is not efficient as it means the end user must now implement some sort of ordering logic into the persistence layer (or business tier) when the database can do it much faster. On top of this, the application can't take advantage of Hibernate caching when performing any custom SQL queries.

Internationalization (i18n) For All

In today's Web 2.0 world almost any good web portal of a global company must provide the ability to "locallize" content to their users. While there are numerous ways to determine the country (and language) of the user (proxies aside), the focus of this post will be what to do to make a website recognize and adjust the language displayed.

Recognizing the User Language

Internationalization (i18n) For All

In today's Web 2.0 world almost any good web portal of a global company must provide the ability to "locallize" content to their users. While there are numerous ways to determine the country (and language) of the user (proxies aside), the focus of this post will be what to do to make a website recognize and adjust the language displayed.

Recognizing the User Language

The first step in integrating your website with "localization" is to identify the language of your users. This is handled by identifying the user agent string value of the device they are using to make a request into your website. A user agent is of the form below:

You can see in the user agent string above that the country and language is part of the string (en-US). In programming terms, all countries and languages have an associated two-character country and language that can be use to identify them.

Storing the User Language

After identifying the user agent and subsequent country and language value, you need to store this information in your application, preferrably in the user session scope (so that you can reference it on subsequent requests without another lookup). How you design your application to store this information is up to the reader, however in this post we'll do so using a web filter (another recommended approach is to do so using aspect-oriented programming). In our example, we'll have a LocaleFilter object that will pull the localized information from the user agent string and store it in a Locale object in the session.

Making Multiple Languages Available

In order to now make your website localized and take advantage of multiple languages you must make use of the stored Locale object. There are two steps involved: 1) create the different language files; 2) retrieve the values for each display element on your website page.

To create the different language files you need to create a context_country_language.properties file. An example of this type of file could be Welcome_en_US.properties. This file would be for displaying english words on the welcome page and is packaged in your application deployment file in the CLASSPATH. The contents are in the format of key/value pairs:

An example of a Spanish language file (Welcome_es_ES.properties):

Now in order to use this information, you will need to have logic for pulling the Locale information from the session for each user when it's time to render a website page and referencing the proper context file. You can do so in the following manner:

Obviously, this can be cleaed up considerably, but the main point is to show the reader how simple internationalization, and thus localization, can be when developing a website.

Action-based or Component-based MVC?

Which type of MVC framework to use: action-based or component-based? A very opinionated topic to say the least and has many pros and cons to both sides of the argument. I'm going to provide some insight from my experience into this area of discussion and hopefully keep the discussion to facts and not market hype ("fluff" as I call it).

The Case for Action-Based MVC Frameworks
Action-based MVC frameworks were introduced about the time of Struts (some would argue that it was Struts that brought about the term). The basic premise of the action-based design is that for each user request into the application, these are considrered "actions". Each action would have a mapping and a flow from the request to the object responsible for request handling to the view representing this action result, looking like this:

This design gives us a very fine-grained flow and makes the application web-tier very simplistic in terms of keeping up with which request is handled by which object(s) and is presented with which view. The benefits of this are many, including: ease of development; quick ramp-up time for new developers; easily tested using a browser. The drawbacks of this design include, among other minor issues, that the application project will become very large in the MVC management with any site consisting of more than a few pages. Another drawback is the lack of "collecting" the common actions into single objects (each action has its own object) - so if you had 30 actions, you would likewise have 30 configurations of action mappings along with 30 objects.

The Case for Component-Based MVC Frameworks
Component-based MVC frameworks provide a coarse-grained design and were born out of the desire for many architects to "collect" actions related to the same type of data together into groups. This means that if you have actions for 'login', 'logout' and 'getAccount' that these could be grouped together into one component called 'AccountController', for example. This controller object would then handle all account-related actions. The benefits of this design are: easy testing; small project footprint at the action level (actions essentially are methods); simplified configuration. The drawbacks of this design include: having to read code to determine presentation view (could be handled in configuration); multiple developers potentially working on same component at same time. The following is an overview of component-based design flow:

My Recommendation
Whether using an MVC framework or handling MVC on your own, I suggest that you first decide how big the project is going to potentially be. Of course no one has a crystal ball, however if the application is a storefront with displays of products and handling credit card information for payment, then it's small enough that action-based will be sufficient. On the other hand, if the application is an online portal then most likely you should use the component-based design. It has been my personal experience that most applications are better off using the component-based design. I have worked with many clients and their employees in which the decision of which type of MVC framework to use was based purely on whichever one they had experience with - this is absolutely the worse way to design an application. Application design should be done by someone with no reservations as to which tools to use - rather to design the application with the best tool for the job. Deciding on the best tool for the application has nothing to do with how much experience one has with said tool.

JAAS Security Simplified

As discussed previously security is a huge part of a software application and should be designed and incorporated into any application from the start. One of the primary concerns of architects when it comes to security is vendor-lockin. No one wants to be locked into a particular type of technology due to numerous reasons, including slow bug fixes, irregular release schedules, etc. Fortunately there is an alternative: Java Authentication and Authorization Service (JAAS). Most of you have probably heard about JAAS, but have never used it and thus ended up designing your own security infrastructure. As credit to the standard, even the Spring Framework (the most popular application container) security component (formerly Acegi Security) uses JAAS at its core. JAAS is the Java standard and is fairly simple to use.

Here I'll focus on declarative security (as opposed to programmatic, which would be hard-coded into our classes). The first thing to do is to define a class in your application that implements the javax.security.auth.spi.LoginModule, which will force you to implement a few methods. One of the methods you must implement is the login() method, which is then used to pull the login information from the form on your website and log the user in. From here, you can then load whatever data information from your database and store this information in the user session or application. Let's look at an example of the this method:

In the example above you can see that we pull the username and password from the session and then authenticate against the database. Next the commit() method takes the user object and sets up the JAAS javax.security.auth.Subject and java.security.Principal objects to be used throughout the application to authenticate and/or authorize access against. To finish up the implementation of the login module logic we have implemented the logout() method to remove our user from the application.

Now, to put this all together and make use of our JAAS security implementation we need to make our application server aware of the LoginModule to authenticate against. We do this be creating a configuration file called 'jaas.config'. One thing to note: each application server has vendor-specific ways for configuring JAAS to be used, however I prefer to keep everything vendor-neutral so that my application can be moved amongst different application servers without issues, thus I recommend using the 'jaas.config' configuration file. In Tomcat, place this file in your /tomcat/conf folder.

Lastly, we need to tell our application server about our login page that has our login form. Specify this login form in the 'web.xml' file as follows:

That's it. Now deploy your application and login (assuming you have the login in place to access your database).

AJAX using DWR

Developing user interfaces for applications is always a tedious task, not because the technology is difficult to work with, but rather because the "what it's supposed to look like" requirements are subject to opinion of one or more folks. These folks will change their mind and thus you, as the developer, will have to change your work. In a perfect world, the requirements would be locked down and never changed. One thing that you can do is to design your application web tier in such a way as to be flexible enough to absorb all of the changes that are inevitable.

In this post I'd like to show you how you can design your application to make use of the latest web trends (AJAX) while at the same time avoiding getting caught up in the different "skins" that will display your application. In other words, the presentation pages can change hundreds of times and your application functionality will handle all of the changes without any rework from you (you merely change the data being displayed and its location in the presentation as the pages change).

Let's start with the choice of AJAX. AJAX gives us the ability to make a web-based application appear as though it was a desktop client even though it's running in a browser - the page refresh drawback is removed. AJAX is particulary useful for handling data that is primarily read-only, although it can also be used to send updates to the backend. Now we can write our own Javascript functions to perform the AJAX handling for us or we can save lots of time and leverage one of the number of AJAX libraries that are freely available for use. Currently DWR is my favorite. DWR stands for Direct Web Remoting and provides a nice wrapper around the lower-level remoting complexities of HTTP so that Java developers can quickly solve their business problems.

Making use of DWR is fairly easy and allows us to focus on Java. First of all, we'll need to configure our application to know about DWR. We start with configuring our 'web.xml' file for DWR:

Next, we just write Java code for a typical web request, knowing that we're returning data to be used in Javascript for dipslay on a web page. Let's suppose we had a restaurant lookup service that returned addresses for restaurants near a given geo location. We'll have a Restaurant object:

Then we'll define a method for looking up the nearest restaurants for the given location:

Notice above that we return a java.util.List of Restaurant objects. DWR allows for this and allows for you to retrieve data from these objects in the Javascript. To do this we need to tell DWR about our classes that will be accessible from the client-side and what, if any, objects it needs to be aware of to make available. To do this simply place a 'dwr.xml' configuration in your projects /project/WEB-INF directory:

In the above configuration we have told DWR about the java.util.Collection interface (for our list) and our Restaurant object. We've also told DWR which object will be handling the method calls and how it should be referenced in the Javascript (it's called 'RestaurantManger'). Finally, we've told DWR that our application is using the Spring Framework to manage our objects - DWR integrates nicely with Spring.

To put this all together and make it work we just need to write the client-side Javascript to interact with our Java objects:

I'll leave it to you to write the HTML page that will contain the Javascript and the corresponding CSS for styling.