Monday, December 29, 2008

One RIA Framework to Rule Them All

I took some time on this holiday break to start experimenting with Silverlight and Flex, neither of which I have any experience with. While I was toying around with them I started thinking about the dizzying number of the RIA framework options.

With Silverlight I really liked developing in Visual Studio, it was intuitive and fast. I also liked that fact that I could use XAML to markup the layout, then back it up with C# for the event handling.

Flex looks good too. I especially like the fact that you can mark up your presentation in an XML format (MXML), similar to XAML. I am not a fan of ActionScript though, but I will have to see how east the IDE makes editing the code when I get that far.

Then there is GWT. Well, I liked that so much that I wrote a book about it. I like the fact that I can used my editor of choice (Eclipse), code in Java, and it is open and free.

I once used Ext-JS on a large project, and although it looked great it was extremely painful. When I got done the code was completely unmaintainable.

I have also dealt with Scriptaculous, Prototype, DWR, and other tools over the years, but for smaller projects that probably can't be considered RIAs.

Then there is Wicket, Echo2, blah, blah, blah... so many to choose from!

So which is best? I am partial to GWT, mostly because I know it the best, and it is free and has a great community. On the other hand, I could get used to Silverlight (although the cost can be prohibitive).

If you have an opinion I would like to hear it.

Wednesday, December 24, 2008

Developing Applications more Efficiently

Your boss walks up to you and says, "we need to be more efficient". This seems reasonable; efficiency lowers costs, and assuming you don't sacrifice quality, who in their right mind would argue that this is a bad thing. But now think about it a little. How can you improve efficiency without being able to measure it? And how do you go about measuring efficiency when it comes to building software? That is what I want to discuss today.

The opening sentence in Tom DeMarco's Controlling Software Projects says, "You can't control what you can't measure". Robert L. Glass counters this in Facts and Fallacies of Software Engineering arguing that this is a fallacy. Both are well respected practitioners and authors in the field. So who is right?

Digging deeper into metrics you find the word "subjective" used a lot. DeMarco in his book Deadline even says, "don't sweat the units - while you're waiting to achieve objective metrifications, use subjective units". And then there is the cost of gathering these metrics. Robert Glass notes in his book that studies at NASA-Goddard found the cost gathering and processing metrics to be 7 to 9 percent of the project. Even though there are those that say that metrics provides a valuable benefit, from the outside it feels like that they are losing efficiency in order to measure a subjective value.

I am not dismissing the value of metrics, but I am arguing that it might not always be the best approach. For small teams working on small projects it may be difficult or impossible to define a "function point", one of the popular metrics used for analysis. Perhaps there is an alternative. What if you measured efficiency by the perceived value of changes in procedures. It is still subjective, but does not carry the overhead of other methods.

I think it may help to understand my thinking if I provided a concrete example. Let's say that most of the work you do are small browser-based applications, and each of them is constructed in a similar manner. They typically all use Hibernate and Spring, use the same security framework, always use Velocity for generating emails, use JNDI, external properties for configuration, and a dozen other similarities. Now let's say that you do this often enough to know that it takes about 45 minutes to setup a new project, and that you will probably build 20 or so applications like this in the next year. So we could reasonably put a number of 15 hours spent per year on setup, and that number is much higher if you need to teach others how to do the setup. Based on this, if you spent a couple of hours building a prototype project, or perhaps a couple of them, you have become more efficient.

Obviously a savings of 15 hours per year isn't spectacular, but that is one small change. What if you can make small change like this every week? All of a sudden we are talking about nearly 800 hours per year, or roughly five months of developer time. Again, we are talking about each small team, so this is considerable. And your small team didn't get any smaller, so this gets compounded because the team can now complete more projects.

Again, I am not dismissing the gathering of metrics, I am just looking at the problem from a different perspective. Instead of measuring output I am measuring the savings due to added efficiencies. If I can introduce measurable efficiencies without degrading product quality, I don't need to know what my output is in order to know that things are more efficient.

Author's Note: Thought is an ongoing process. I can't conclude with any certainty that my theories are accurate. Accuracy can only be measured by real world use. If you have different conclusions, either in theory or practice, I would like to hear them.

Saturday, December 13, 2008

Spring Security + Splunk = Security Monitoring

For many web sites Spring Security provides the framework to secure the site from unauthorized personal. This is a good start, but it won’t let you know when someone is testing your security to see if maybe they can sneak in. That is where Splunk comes in. Splunk is a tool that will analyze your log files real time, allowing reports and alerts to be generated. So if Splunk it to be the proverbial security guard in your security system, how do you get the data from Spring Security into Splunk? That is the topic of this article.

The Goal

We need to have some instructions for our virtual security guard. First and foremost the guard must look out for unwanted behavior and alert us. Let’s set a threshold of 20 failed login attempts in a five minute period. And when the guard sees 20 such attacks, they should send us an email. Of course we can tweak this formula depending on the specific site needs. For example, we could change the number of failed login attempts to something higher for high traffic sites, or have Splunk execute some script instead of emailing, which might do something like send an IM message.

A second goal is to have our guard keep a record of the guests that enter our site. We can use this information later for various things. For example, if someone broke something at 2am, we can check out records to see who was using the site at that time. Or perhaps we want to see usage patterns. For example, we want to know when the sales staff uses the web site so that we can plan maintenance. Or maybe we just had some layoffs and we forgot to disable their web accounts. We could use this log to verify that they didn’t use the site.

The Setup

We need to add a few things to our Java application in order to log security traffic. We assume that you are already using Spring Security, but if not, this is covered by their online documentation. What we need to add are a few jar files.

If you are using Maven, add these dependencies.


<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-aop</artifactId>
<version>2.5.6</version>
</dependency>

<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjrt</artifactId>
<version>1.6.1</version>
</dependency>

<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjweaver</artifactId>
<version>1.6.1</version>
</dependency>

If you are using Ivy, add these.


<dependency org="org.springframework" name="spring-aop" rev="2.5.6"/>
<dependency org="org.aspectj" name="aspectjrt" rev="1.6.1"/>
<dependency org="org.aspectj" name="aspectweaver" rev="1.6.1"/>

If you need to download the jars and add them to the project manually, you can get them here. Not that Maven and Ivy will automatically download the dependencies of these three jar files, so if you add them manually you will need to also add their dependencies per the instructions on their project sites.

Spring Downloads
http://www.springsource.org/download

AspectJ Downloads
http://www.eclipse.org/aspectj/downloads.php

The Strategy

Spring Security is a framework meant to make it make it very easy to set up security, and in many cases the entire website can be secured without any coding. The facade that Spring Security provides is a blessing, but makes it difficult to alter its behavior to add the logging we need. That is where Aspect-Oriented Programming (AOP) comes in. You don’t need to know anything about AOP to implement this solution, but if you would like to know more, there is a pretty good AOP introduction on Wikipedia.

Within Spring Security there is an interface AuthenticationProvider, and that interface defines a method that interests us named authenticate. When you use Spring Security, no matter what method you allow a user to login to your site, they need to pass through one of the classes that implement this interface. This is ideal for us, and by using AOP we will be able to intercept these method calls and log the result to a file.

For the actual logging we will use commons-logging, a generic logging API that acts as a facade for whatever logging API you already use. If you have not used commons-logging before, the online documentation will provide details for connecting to your preferred logging system (Log4J, JDK, etc). You should already have this jar in your project because it is a dependency of the Spring AOP jar.

The format which we use to log the data needs to be easy for Splunk to parse so that we can get the most benefit out of this data. Fortunately Splunk can parse name/value pairs out of the box, so that is what we will use. The format will begin with a timestamp, followed by semi-colon delimited name/value pairs so that it looks something like this.


[2008-12-10 12:34:08,526] [AUTH] app=my-website;auth=pass;user=rhanson
[2008-12-10 17:52:31,846] [AUTH] app=my-website;auth=fail;user=joehacker

Besides the timestamp it says "[AUTH]". This is the name of the logger we use, and having the logger show up in the logs allows us to filter for these lines when we create our Splunk search. It also includes the application name. This allows us to search for authentication attempts for a specific application, assuming you need to log authentication events across multiple applications.

Ok, enough setup, let’s get to the code.

Coding the Aspect

The entire code of the class that intercepts the authenticate() calls is below. Throughout I have added markers (e.g. /*[1]*/), which I explain following the code. So take a quick look at the code then read the explanation.


package org.roberthanson.loggers;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.aspectj.lang.ProceedingJoinPoint;
import org.springframework.security.Authentication;

public class AuthLogger
{
private String appName;
private Log authLog = LogFactory.getLog("AUTH");

/*[1]*/

/**
* Constructor.
* @param appName name of the application
*/
public AuthLogger (String appName)
{
this.appName = appName;
}

/*[2]*/

private void logAuthPass (String userName)
{
if (authLog.isInfoEnabled()) {
authLog.info("app=" + appName + ";auth=pass;user=" + userName);
}
}

private void logAuthFail (String userName)
{
if (authLog.isInfoEnabled()) {
authLog.info("app=" + appName + ";auth=fail;user=" + userName);
}
}

/*[3]*/

/**
* Wraps a call to Spring Security's
* AuthenticationProvider.authenticate().
*/
public Object logAuth (ProceedingJoinPoint call) throws Throwable
{
Authentication result;
String user = "UNKNOWN";

try { /*[3a]*/
Authentication auth = (Authentication) call.getArgs()[0];
user = auth.getName();
}
catch (Exception e) {
// ignore
}

try { /*[3b]*/
result = (Authentication) call.proceed();
}
catch (Exception e) {
logAuthFail(user);
throw e;
}

if (result != null) { /*[3c]*/
logAuthPass(user);
}

return result;
}

}

Our class has a constructor [1], that takes the application name as an argument. This allows us to set the name, the only variable that might change, in the Spring configuration file making it easy to reuse this class.

Next we define two methods [2] logAuthPass() and logAuthFail(). This is the code that actually logs to commons-logging. Both methods take the name of the user attempting authentication as the sole argument. If for some reason commons-logging is not appropriate for your project, this should be the only piece of code that you would need to change.

Last is the method that actually intercepts the authorization call [3], the logAuth method. It receives a join-point object as its sole argument. If you aren’t familiar with AOP speak, this is essentially the information required to call the method that was intercepted. What has happened is that the intercepted authentication() call was never made, instead Spring intercepts it and passes the call information to logAuth() instead.

In logAuth() we first need to [3a] inspect the argument the intercepted call, namely an Authentication object. We can get this from the join-point object, and cast it. We then extract the username from that object. Just in case something goes wrong, we catch any exception that might occur. This should not happen, but you know the Boy Scout motto.

Next we use the join-point object [3b] to execute the method that was intercepted to get the result of the call. The behavior of authenticate() is to throw an exception on an authentication failure. So we trap the exception and log a failure. On success we hold on to the resulting object so that we can return it as the result of the call.

The last thing we need to do is [3c] test for a null return value. A null result does not mean auth success, but it does not mean a failure either. Per the Spring Security documentation a null result is due to the AuthenticationProvider not being able to answer the question of if the user is authenticated. Spring Security may then call other AuthenticationProviders until it can find an answer.

Now that we know how it works, it is time that we configure Spring to use it. The configuration file below does just that.


<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-2.5.xsd">

<bean id="authLogger" class="org.roberthanson.loggers.AuthLogger">
<constructor-arg value="my-website" />
</bean>

<aop:config>
<aop:aspect id="authLoggerAspect" ref="authLogger">

<aop:pointcut id="authPointcut" expression="execution(public * org.springframework.security.providers.AuthenticationProvider.authenticate(..))" />

<aop:around pointcut-ref="authPointcut" method="logAuth" />

</aop:aspect>
</aop:config>

</beans>

It is hard to explain what all of this does without getting using AOP speak, but basically it says that Spring should intercept calls to AuthenticationProvider.authenticate(), and call AuthLogger.logAuth() instead. The only variable that you need to change is the constructor argument passed in the AuthLogger bean definition, which is the name of your application.

Notice that we didn’t change any controller code, or even change your existing Spring Security configuration. The whole point of AOP is to separate concerns, and to not mix things like logging code with your application code.

The only thing left is to configure logging.

Configuring Logging

I am not a master of all logging tools, so I am going to stick with Log4J, which is perhaps the most popular Java logging tool available today.

For the purposes of this article I am using Log4J’s XML-based configuration, but you could use the properties based-configuration if you like. And if you already use Log4J, you will want to merge this into your existing configuration file.


<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">

<appender name="security-file" class="org.apache.log4j.RollingFileAppender">
<param name="File" value="/usr/local/tomcat/logs/security.log"/>
<param name="Append" value="true"/>
<param name="MaxFileSize" value="2MB"/>
<param name="MaxBackupIndex" value="5"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="[%d] [%c] %m%n"/>
</layout>
</appender>

<category name="AUTH" additivity="false">
<level value="info"/>
<appender-ref ref="security-file"/>
</category>

</log4j:configuration>

I don’t want to get into the line-by-line details, but there are a few things to point out. First of all, the conversion pattern uses "[%d]" at the beginning of the line, which is the timestamp. In testing I noticed that when you don’t have the timestamp at the beginning of the line Splunk will sometimes see the "event" as being multiple lines long instead of each line being a separate event, losing searchable data. This is followed by the logger name "%c" and the message "%m". I prefer to log security events to a separate file, which allows me to specify a source type of "security" in Splunk. This allows you to conduct searches against "security" files across all of your applications.

Searching and Charting

Once you start logging events and having them ingested into Splunk it should be fairlry straight-forward to report on them. I will provide a few examples of thing you can do.


sourcetype="security" daysago=7 auth=fail

This searches for security authentication failures in the last seven days. The query assumes that when you added the security log file you specified its source type as "security". As mentioned, this is useful if you want to search the security logs across multiple log file locations as the same time.


"[AUTH]" sourcetype="security" daysago=7 auth=fail

This is the same query, but also looks for the name of the logger "[AUTH]". This is useful if you want to use the security log for logging other events as well, like when an admin runs as another user, like "su name" in Unix. You could log these with the logger "[RUNAS]" in the same security log.


sourcetype="security" daysago=7 auth=fail | timechart count(_raw) by app

This query takes the original query and charts it. Best viewed as a bar chart it shows a count of authentication failures per day, splitting it by application. Adding this to a security dashboard that you review each day (hopefully) gives you visual indication if there is some hanky-panky going on.


sourcetype="security" auth=fail minutesago=5

All of this reporting should be supplemented with alerting. In Splunk take this search and create a saved search from it. On the Schedule and Alert tab (assuming you used the pop-up), have it run on a schedule every five minutes. Then specify that it should alert when the number of events is greater than 20, and enter your email address, or better yet your SMS email address. You may also want it to send the events, it will do this as a CSV file, but be cautions, because if someone is trying to break into your site, this file could get large.

This concludes the article. If you tried out using this and made your own tweaks, leave a comment, you may be helping out someone else. And as always, if you liked/hated/questions this article, constructive comments are always welcome.

Happy Splunking.

Thursday, December 04, 2008

Log Analytics: Splunk

Sure, I use logs to troubleshoot problems, but I never gave it much thought as an analytics device since the rise of JavaScript based traffic analysis tools like Omniture and Google Analytics. I must say that I have sorely underestimated the power of logging.

I say this because of a tool that I have begun using called Splunk. If I had to characterize Splunk, I would say that it is a data analysis tool, with the ability to generate reports and alerts based on events.

The data could be an Apache log file, the output of the PS command, or a configuration file that you need to watch for changes. Input sources can come via file, via TCP/UDP port, and other mechanism.

The analysis of the data is done via a search, but the language is quite complex. The search allows you to target a specific data source, or class of sources (like all of your access logs), extract fields from the log entries using a regex (or just split key/val pairs), transform the data, and them chart it as a line, pie, bar, or other type of chart.

Here are some specific examples of what I have been doing with it.

Response Times - With a little aspect, I was able to wrap all of my controller methods (and some repository methods too), and have them dump response time data to a file. The data looks like this.

app=my-app;class=org.example.Demo;method=doSearch;response=348
app=my-app;class=org.example.Demo;method=doSearch;response=654
app=my-app;class=org.example.Demo;method=doSearch;response=439

All of this data is logged using commons-logging, and put into a separate performance.log file to separate it out from other logging. The result is that I can create a search for any class/method, and generate a chart of min/avg/max response times for that method. This allows me to spot any degradation that might occur over time (as the DB gets larger), or during certain parts of the day (when the network is congested). I can then set up an alert so that if the response times exceed a threshold, Splunk will send me an email (or execute a script).

Change Control - Simply put, I can have Splunk monitor a configuration file, and it will log any changes to that file. I can then use Splunk's diff command to see the actual change. Again, I can have it email me when a change occurs.

Service Unavailability - Because all of my Java application servers sit behind an Apache proxy, I can monitor the Apache logs and look for proxy errors (501, 503, 504). Splunk is able to alert me when these events occur because it can parse the logs and look specifically for these error codes. With Apache specifically this is super each because Splunk already knows how to parse an Apache log using a standard format. This can be more effective that monitoring that uses polling because polling will only let you know if the service is done when the polling occurred, when Splunk can alert me even if only a single hit against the site caused a proxy error.

There is much more to Splunk, and I still have much to learn, but I hope this helps provide some insight as to its capabilities.