My Pages

Monday, November 26, 2012

Functional programming by example

Functional Programming is becoming a really big concept; languages like Ruby and Python have gained quite a bit of popularity and introduce some functional concepts. Other languages like Erlang, Haskell and Lisp while having a decent developer base still remain as the "It's just really hard to write in" bucket of languages. Java has been such a powerful force in the software community in the past decade; as such, the JVM has become such a great, powerful, portable way of distributing software. On the JVM there are even more languages that have been introduced which contain functional concepts. Languages like JRuby, Jython, Groovy and Scala have opened the door of functional programming to many more people. Let's start by looking at a list of concepts which make up a functional language.
  • First-class and higher-order functions
  • Pure functions
  • Recursion
  • Strict versus non-strict evaluation
  • Statements
  • Pattern Matching
  • Immutable Variables
We'll look at some of these concepts and how we can use them to our advantage in real world software development. As we look at these concepts, we'll be looking at code examples as well. The list above is not an exhaustive list; and actually the last 4 are concepts that I wholeheartedly believe are key parts of functional programming. Some of these concepts can actually be achieved through imperative languages; so we'll talk about some of them. Without further ado, let's get started!

First-class and higher-order functions

This sounds like it's two different concepts; but really they are intertwined. First-Class functions are when functions are actually objects themselves. This means that you can send them to other functions or return them from functions which incidentally is what higher-order functions are. You can see these with Closures in Groovy such as below.
def incBy1 = { x -> x + 1 }
So now we have incBy1, and this is a function. The cool thing here is that we can now send this to another function to execute. But why would we do this? Well, let's assume that we have a function. It's job is to perform some operation to an integer and print out the value. Not very useful right? Don't worry it'll become more understandable in a minute. So let's see this function.
def printResultingNumberOperation(val, fn) {
        println(fn(val))
}
Now we just need to put it all together and see it running.
printResultingNumberOperation(1, incBy1)
Which results in 2 which really isn't that awesome. But in essence this is a perfect example of first class functions. Notice that the function incBy1 is it's own variable, so technically we could just execute it; and by it I mean execute the variable.
println incBy1(1)
Which has a resulting output of 2, this is probably less awesome than our previous example. Oh noes! But never fear, let's take it up a notch. printResultingNumberOperation is actually a higher-order function; why, because it takes a function as a parameter. Remember we said that higher-order functions are those that can take or return a function. Remember incBy1? Let's do some abstraction there; what if we just wanted an incBy2 or incBy3? Oh noes, we're going to be doing copy and paste! Do not do this, copy and paste is evil. Why is it evil? Because if you have a bug in incBy1; inherently you have a bug in incBy2 and incBy3! Well you'll just remember to update them right? Wrong, that never happens; I promise you, incBy2 and incBy3 will deviate from the updates to incBy1. And sadness will ensue. So let's abstract incBy and take a number. Check this out.
def incBy(num) {
        { x -> x + num }
}
Many of you might be asking "what are you doing?" Well, let's break this down. We are defining a function incBy that takes a single argument "num." And what does it return? Well, it returns a function (making incBy a higher-order function) and that function takes a parameter "x." It will then add "x" to "num" and return the result. It's important to understand that the "num" variable will be "closed over" and result in a "closure." Essentially, the returned function keeps "num" in scope even after the function incBy has returned. If that makes very little sense; let's see how to use this and that should make a bit more sense.
println incBy(2)
But wait, this prints out some garbage temp$_incBy_closure1@5815338! Right, remember that I said incBy will return a function; you're just seeing the toString. So what can I do with that? Well you can execute that function; remember above, it's a function that takes a single argument. So let's see what happens when we call it with 2 (hint, it should return 4).
println incBy(2)(2)
And what do we get, 4. Surprise; our function incBy that we called with 2 ended up with a function that did x + 2; and since we executed it with 2 we got 2 + 2 ending up with 4 as our return. So how is this useful? Well do you remember printResultingNumberOperation which takes a function which takes a number and increments it by some amount? Well how about we use it there!
printResultingNumberOperation(2, incBy(2))
And what do we get, 4; that is just amazing and life altering! I can write code which is reusable since I can abstract large chunks of my functions. This is one of the main things that a functional language must implement; if it does not have this, it is not a functional language.

Pure functions

Pure functions are functions that have no side effects. They are functions that purely execute on their parameters themselves. This allows the system to do more optimizations by either inlining code or even caching the results (since if what is put in is always the same coming out). To understand what pure functions are; we must first understand what side effects are. Let's look at some examples of side effects.
  • Output
  • Class Mutators
  • Parameter Mutators (really bad!)

Output

This is pretty straight forward; if you have a method, such as a logger or a database write; it's always non-pure. Why? Because the side effect of writing data to the console or to the database is something that changes every time it's called. Performing a log or database write is a side effect to calling the method.

Class Mutators

This is actually tied closely into immutable variables so I'm not going to get into Immutables until later. But the idea is still there; if you have a .setX(x) method which does exactly that "this.x = x" then this is a method with a side effect. The side effect is that we are mutating the current object. What is another option to make the function pure? Well let's say ".setX(x)" will create a new object and set the X during creation. Again, this ties into immutable variables so I'm not going to get too far into this.

Parameter Mutators (really bad!)

This one is really bad and also ties into immutable variables. Frankly this is one of those concepts that I choke on when I see someone violate. If you pass in a variable; and you try to change that parameter, you are doing it all wrong! Let us look at an example of this atrocity again in Groovy.
class Me {
        String fname

        def setFname(str) {
                this.fname = str
        }
}

new Me().setFname("Test")
So why is this so bad? Well this is actually a simple example which reflects a larger problem. What if I didn't actually want to change Me; why am I changing an input variable itself? Imagine if I had the following example.
class Me {
        String fname
}
def checkName(m, f) {
        if(m.fname != f) {
                m.fname = f
        }
}
def m = new Me()
checkName(m, "Test")
println(m.fname)
Well why is this bad? Because checkName is not clear in what it's doing. It's actually changing the value of my object that I'm sending in. This inherently means that I cannot trust my object that I send in. What should we have done instead?
class Me {
        String fname
}
def compareName(m, f) {
        if(m.fname != f) {
                new Me(fname: f)
        } else {
                m
        }
}
def m = compareName(new Me(), "Test")
println(m.fname)
So why is this better? Because I know that the Me I'm sending in cannot change. This means that either I can send a brand new Me or I can send another Me that has already been initialized and I'm assured that I have the original. Again, this goes to immutable variables; but it's important to understand that we want pure functions or functions that have no side effects.

Recursion

This is one of my favorite topics; not just because it makes peoples brains explode, but because most software developers are so scared of it they refuse to use it. Let's talk about the simple definition. A function that calls into itself to perform loops. So why does everyone find it difficult? Because you have to be correct with the end cases. The best thing to do is to define your end cases as soon as your begin writing your recursive function. But this isn't all; everyone has had to write a recursive function which recurses too deeply. Let's look at one example of recursion in Scala.
def fibonacci(i : Int) : Int = {
  if(i <= 0) { 
    0 
  } else { 
    i + fibonacci(i-1) 
  }
}
Notice that we have defined our end case (i <= 0) and we've performed general recursion. If I call this on 10, I get back 55; all seems non problematic until we try some large number. Let's say we want to find the Fibonacci number at the 10,000th spot. What happens? java.lang.StackOverflowError oh noes, I can't do that! So how do we get around this? Well; we use a technique called tail-recursion. It essentially keeps the recursive definition in code; but converts it to iteration at compile time. Now why do you do this? Well because recursion is one of the best ways to implement an algorithm (at least in my opinion) as it allows us to do much better and much more concise code within the algorithm. So what is tail recursion? Well, simply put; the call to the function (from within the function) should not require any further operation. Let's look at an example (again in Scala).
def fibonacci(i : Int, acc : Int = 0) : Int = {
  if(i <= 0) { 
    acc 
  } else { 
    fibonacci(i - 1, i + acc) 
  }
}
What happens if we call this with 10? We get 55, yay it still works. So what happens if we try 10,000? We get a ridiculous number of 5,0005,000; it didn't crash! So if we notice; the call to "fibonacci" has nothing that it relies on once it executes. Since the compiler understands this, it will convert this into an iterative loop. This allows us to write the code and utilize immutability, instead of in a for loop where you would have to have a counter that would need to mutate over time. Instead we pass the accumulation (state) back into the call of the function. Most functional languages will support this tail-call, languages like Lisp/Scheme, Scala, Erlang, and even C support this type of call. Other languages such as Groovy, it is not directly supported but instead is accomplished by using Trampolining. I won't get into it, but essentially you have a function that will call the function each time and wait for a specific end case "trampolining" between the actual function and the function maintaining it's state.

Strict versus non-strict evaluation

Strict evaluation is also called eager evaluation; non-strict is also called lazy evaluation. When we think about defining languages; we normally think about defining a variable.
def x = 10
This means that x is eagerly defined. So if we did something like below; we would expect x to be defined immediately.
def x = 10 * 10
So now, x is defined as 100; but what if we didn't want to have it evaluated immediately? What if that initialization was extremely costly? Well, we could either change it into a method call so that it would only be defined when we need it. But if we have to call it multiple times; we have to calculate it multiple times. Well certain languages like Groovy, as shown below, and Scala, as shown below that, allow you to define a variable as Lazy. This means that the variable is not actually defined until you use it.
class Me {
        @Lazy def o = [x()]

        static def x() {
                println("X Called")
                1
        }

}
println("Create")
def me = new Me()
println("Done Creating")
me.o.size()
println("Complete")
What is our output?
Create
Done Creating
X Called
Complete
Notice how "X Called" doesn't actually happen until we've actually called anything about o. Otherwise we do not execute the evaluation of o. Scala does the same kind of thing as shown below.
def x() = { 
  println("X called")
  1 
}
println("Create")
lazy val v = List(x())
println("Done Creating")
v
println("Complete")
Which gives us the exact same output.
Create
Done Creating
X Called
Complete
Again, this type of functionality is really useful for waiting before making large computations or executions that will define a variable until it's actually necessary.

Statements

Statements are very key to functional programming; at first glance, they seem like a useless predicate of a language; but when getting into immutable variables it becomes a very important part of the language itself. Statements are exactly that; every statement that occurs has (or should have) a return of some type. So take, for example, an if statement. Within functional programming an if statement should always return a type. Let's look at an example of a statement in Scala really quickly.
println(if(true) { 10 } else { 20 })
println(if(false) { 10 } else { 20 })
This gives us the output.
10
20
As we can see, the if statement itself actually has a return value. This is much like a ternary statement; you know the one that looks like
(true)?10:20
We also know that the last statement in a block is the return value of that block. check out this example of a block in Scala.
{
  val v = 10
  v + 20
}
This then returns 30 since v is 10, and 10 + 20 is ta-da 30 and this was the return of the last statement in the block. The reason why statements are so important is because we can do things like using function returns or, as we'll see, the return of an if statement or block to build an object. While this sounds crazy; it actually becomes simpler to understand the code over time. Let's see an example in Scala below.
class Test(str : String, length : Int) {}
So let's say that we want a method that generates a Test object; and if a null is passed in it should send back a Test with a blank string and a length of zero. How would we do this normally in an imperative manner?
def NewTest(str : String) = {
  val _str = if(str == null) {
    ""
  } else {
    str
  }
  new Test(_str, _str.length)
}
This is a good example of a statement, but let's try to remove the unnecessary variable.
def NewTest(str : String) = {
  if(str == null) {
    new Test("", 0)
  } else {
    new Test(str, str.length)
  }
}
Now this is really nasty; now we have to maintain two different branches where we create a new Test. So let's make the composition of Test be statements.
def NewTest(str : String) = {
  new Test(
    if(str == null) {
      ""
    } else {
      str
    }, 
    if(str == null) {
      0
    } else {
      str.length
    }
  )
}
So notice that the creation of Test is only defined once; it is composed of statements to build it's components. If we were to extend the if statement and add another branch; it would probably be best to rip those out into their own functions. Let's say, for example, that we just wanted to do something if it was null; maybe we can do a higher order function here?
def NewTest(str : String) = {
  def handleStr[T](op : String => T) : T = {
    op( if(str == null) { "" } else { str })
  }
  new Test(handleStr(x=>x), handleStr(x=>x.length))
}
This works out really well because we can just modify handleStr to do any extra checking in the future. And notice that we never use a variable and instead we are able to let handleStr deal with the edge cases for us.

Pattern Matching

Pattern matching is one of those topics that either people understand or they just miss the boat on what it is designed to do. There are plenty of uses for pattern matching and we'll look at a few of them in this section. For this section we'll be looking at examples in Scala.

Basic Functionality

We're going to start with something that is extremely reminiscent of a switch statement. We'll start with a boolean and look at a true, false, and everything else case.
true match {
  case true => "We're True"
  case false => "We're False"
  case _ => "Something that was not true or false"
}
From here, we will end up with the string "We're True". Now this seems odd, but if we assume that we just care if the match was true, otherwise we assume it's false we can do this instead.
true match {
  case true => "We're True"
  case _ => "Was false or something else"
}
Now what about a numerical value?
0 match {
  case 0 => true
  case _ => false
}
We now have a way to do simple matches and help us determine if our value was 0 or something else. Now this might now seem very interesting; but let's see what happens if we have a String and we want to make sure we have a valid string (non-null).
"string" match {
  case null => ""
  case str : String => str
}
Now why is this important? Because, remember we want to use as few variables as possible, you can do a match for a function return and get the string without storing it.
stringOperation("string") match {
  case null => ""
  case str : String => str
}
As we see, if stringOperation returns null we will end up with an actual valid blank string which is operable. If we get a string, then we'll just return that. So now we get the very basics of pattern matching.

Extracting Attributes

One of the general uses for pattern matching is to extract certain attributes from classes. Pattern matching is really useful when matching against lists. We'll look at list examples for now; and to start out we're going to see a basic list and we'll take the head element off of the list.
List(1, 2, 3) match {
  case List() => -1
  case x :: xs => x
}
So here we can see that we're looking for an empty list; and if we get an empty list we will return a -1. If it's not an empty list; we will extract the first element from the list (the head element) and return it. This usage of "::" is an unapply which allows us to extract the attribute. What is interesting here is that the :: will extract the head element in an x variable and the rest of the list into the xs variable which become available to the right of the => which is the body of the case statement. Now what is really cool is that we can actually extract more than just once!
List(1, 2, 3) match {
  case List() => -1
  case x ::  y :: xs => y
}
Now we're going to expect to get 2 since we extracted 1 into x, 2 into y, and List(3) into xs. Now if we look at this; there is clearly a missing case, x :: xs (which could also be x :: List(). So on compilation we end up with a warning warning: match is not exhaustive! which will let us know that we should extend our matches so that we don't fall off the end. Now for the mind killing part; you can actually use literals to extract certain parts of match.
List(1, 2, 3) match {
  case List() => -1
  case 1 ::  x :: 3 :: xs => x
}
And from this output we get 2; notice how we extracted the 2nd element by using the variable x and indicating where we wanted to rip it out from. This is some of the more general usages of pattern matching; next we'll look at case classes and how they are used to pass messages.

Case Classes

One of the major selling points of pattern matching is the ability to extract objects themselves. In Scala these are case classes which we'll look into. The basic concept is that a case class can be used to extract attributes from an object. Let's look at a very simple example.
case class MyObj(str : String, len : Int)
new MyObj("Foo", "Foo".length) match {
  case MyObj(str, len) => println(str + "@" + len)
}
As we can see, we match on the fact that we have an object and we are able to extract all of the attributes from the object itself. So the big question is; why do we care? We can access the attributes from the object anyway. Well here is the thing; we can use inheritance to do an extraction based on the child class.
trait MyTrait
case class MyObj1(str : String, len : Int) extends MyTrait
case class MyObj2(num : Long) extends MyTrait
def exec(in : MyTrait) : Long = in match {
  case MyObj1(_, len) => len.toLong
  case MyObj2(n) => n
}
So now what happens if we call it; more specifically, what if we call it with MyObj1 and how does it differ from MyObj2? Let's look at MyObj1 first.
exec(new MyObj1("Foo", "Foo".length))
So what do we get? Well we're going to get 3 as a return. Why? Because the match succeeded on the case MyObj1(_, len)! So what happens if we call this again with MyObj2?
exec(new MyObj2(22))
This gives us 22; which we kind of expected by now. This means that we can extract attributes of an object as we enter a function; or choose which function to execute based on the type of the object passed in; again, all without having to store any state.

Immutable Variables

This is a concept that is not new to programming; it is also not specific to functional programming. Think about this statement from C.
const char *str = "MyString";
What does this mean? Well, it means that the variable str cannot change after being set. Well why does this matter? Personally I think that this goes to the very heart of what functional programming is. When we think about functional programming; we think of returns from functions being sent directly as a parameter to another function. If we think about functional programming like this; then we can assume that a return from a function would not be changed before it was passed into another function. If this is true; then we can, for the most part, not need any variables whatsoever.
But of course, there are cases when a variable might be returned and we will need to pass individual components of the return to other functions. In these instances, we will need to store the variable such that we aren't doing the calculation that resulted in the return multiple times. In doing so, we should not make it possible to modify the variable in it's transitory state. Think about it; if we did multiple threads, and each of those threads were touching a variable that IS mutable; then it's possible that one thread could be modifying the variable at the same time another thread is trying to read it.
So now, let's think about this example; if I know that my object A cannot be modified (all of the components of the object are immutable). Then I also know that I can pass it to any method/function and know for a fact that I cannot change. This means that I can technically make multiple calls (if possible) against that variable concurrently.

Summary

As a quick summary, functional programming is all about purity in programming. Having pure functions as well as data that cannot mutate; only create new instances with mutated data. This type of programming ensures that bugs do not come from concurrent modifications; or modifications of variables where they shouldn't have modified. It also means that functions can be cached much easier especially with things like pure functions. Overall, using functional programming allows developers to be more expressive and thus be more intuitive to others picking up products from other programmers.
I hope that people find this interesting and help people to better understand what exactly functional programming means and how to start working programming in it. Remember, just because a language isn't setup to be functional (languages like Java or C) doesn't mean that you can't accomplish the same concepts. Although it's going to be much more painful and difficult to implement than actual functional languages (implementing higher order functions with interfaces and anonymous classes; an example is the Comparator interface and usage of it) it still has the same effect of good function re-use!

Thursday, November 22, 2012

Lift, a developers introspective

I've been using Lift for about a six months now; we used it to rewrite one of our internal C/Gtk applications. This is an introspection into our usages of Lift in enterprise software. For this post, I'm going to try and answer some general questions that I had going in as well as just some general questions that have come up from others working on the project.
  • Was it a new technology? How was the adoption of it?
  • How is view first for large systems?
  • How do snippets work out?
  • Did you keep mapper or switch?
  • How did you do UnitTests
  • Did you end up using the RESTful framework?
  • How did sitemap hold up with authentication?
So buckle up and let's get started into some of these questions.

Was it a new technology? How was the adoption of it?

Well, Lift itself had already been around for a little bit and Scala for a bit more than that. I got started into Scala during my M.S. at Depaul and was fascinated by the functional aspect of it. By that time, Scala had already achieved a version 2.8.x so again, it had been around for a bit. Both Lift and Scala were new technologies that we added to our stack.
I had added one small application in Scala previously; it was important because I was able to make changes with very minimal work. My boss at the time loved this, mainly because he could ask for something and I could pump out the work in a few minutes. I had spoken with him before (and his boss) about rewriting our entire C/Gtk application base into Web Applications. This ended up being my way in; I offered to do a simple rewrite using this framework and language. I then went in and did about 40% of the work and came back to him with it. Showing him that we could rewrite it with ease. He then agreed and had me write up some documentation to ask the team who would eventually take the product over from us if Lift + Scala were ok.
I got a chance to work with some of the people on the team taking the product before the team made its final decision. Each person told me that they wished they could work in Scala all the time. Not only that, but each of them have found it wonderful working in Lift. We had to adopt using Maven (it was either this or go with ant/ivy itself) since sbt was ruled out. It was great how quickly our project (utilizing Maven) plugged into IntelliJ. We started in ScalaIDE until we found that an index would be performed evertime you hit save. And this caused Eclipse to eventually run out of threads to generate these indexes. It was a known bug, but we didn't have a choice at this point and I switched us over to IntelliJ. Of course, IntelliJ has it's own issues; but for the most part I haven't seen many of them.
Overall, the adoption has been fantastic. The RestHelper is just mind bogglingly simple; and using pattern matching to create understandable RESTful resources. My co-workers came from a Spring background; and so this was a very different way of looking at services, but once they began to understand how it was setup; it is so much simpler and much easier. As I'll mention later, we switched to Squeryl for the ORM; and everyone freaked out about how SQL-ish the commands were. Making it easy to write the SQL statements while keeping type safety. The XML being a native datatype was just so nice for creating the services. Now the one downside that I had with this was that Record did NOT have an .asXml method; which meant that if you wanted to serialize Objects from the database you would have to manually create the XML. I figured that this would be a pain; and so I decided that I would extend Record to do this for me (by using the same functionality that .asJson does). I submitted a bug for this to the Lift group but haven't heard anything back.

How is view first for large systems?

This is an interesting question and I'm still not totally sure about this one. I've seen some really awesome abilities with performing multiple layers of embeds. But I've also seen some necessity to put out XML (XHTML) within the snippet calls. This is actually really awesome stuff where you get verified XML that will be embedded as it gets substituted. One thing that I've done in a newer project was to call into a snippet with a template XHTML section; I can then (since Scala does not modify the actual original XML on a Bind call) recall bind for each of my records that I want to apply my template to. It's really amazing what can be accomplished when you start realizing that Lift tries to keep the immutable variable idea.
Regardless, having multiple levels of embeds, and not having any business logic within the views themselves means that modifying layouts is simple. We then reduce the business logic back to the snippets to handle. One of the things that we did was to use a "loggedin" snippet call; at first I was thinking "oh man, I'm introducing business logic into the view." But actually I'm not, because the business logic of how "loggedin" is processed is kept in the Snippet. Now some people may use snippets like this in things like PHP or Grails or Rails by using something like below. And that might be fine; but eventually session.user.loggedIn becomes session.user.loggedIn || session.user.isAdmin.
<% if(session.user.loggedIn) { %>

How do snippets work out?

Snippets end up working the same way that taglibs work in Grails. They seem to work out really nicely. I find it really useful to treat them as template fillers. I find it useful in some instances to generate other XHTML (for example I have some code that decides when to send certain javascript functions because it's optional based on the user logging in) but on the whole I use it to populate my pages themselves. In some instances I'm using it to actually do Lift-y things and do lift callbacks for Javascript executions. When we started this project we all came from an MVC background which meant that we created snippets such as "*Show" or "*Index." If I could go back, I would've treated the snippets as truly encapsulated pieces. If I had to give anyone advice about using Lift it would be encapsulate the calls into snippets rather than treating each one as instead of doing one per page!

Did you keep mapper or switch?

We started with mapper until we found a composite key record. Mapper fought with me a bit and I finally wrangled it. Then I ran into another DB type where we had a composite key and we had an auto increment field that was NOT part of the PK. This is a completely insane concept and I wish I could change it; however, it is not within my power to change it. As such, Mapper just completely failed at this (and rightly so, if you have an AI field it should be your PK :( ). So I went through and looked at a couple of different ORM choices. The first was, obviously, hibernate (we already used this as our organization); I put it on the background mainly because I've heard some pains with hibernate and transactions/sessions as well as it wasn't pure Scala. So I started looking at pure Scala options. The first I came across was Squeryl; I checked it out, seemed pretty easy to get embedded into Lift but ran across some issues with defining AI fields that are not part of the primary key. However, it was much simpler to use than Mapper and since we were not doing CRUD style applications it made more sense especially since it had it's own DSL that was reminiscent of SQL itself. The final one that I looked at was Circumflex. I really liked the syntax, especially the definitions of objects and how they looked exactly like SQL create statements.
Regardless, we switched to Squeryl which took a little bit of work to accomplish; but on the whole it ended up being really awesome to work with. The ability to write queries that actually felt like queries but were type safe and still utilized prepared statements. My one complaint, as I mentioned above, was using Record meant that we did not get an ".asXml" component much like there is an ".asJson" option. So I went in and created it myself as a trait. It was really nice having the ability to do an ".map(_ asXml)" which I could then send back as a RestHelper response. I had created a github pull request for this and also pinged the Lift Google Group to see if it could be added. Hopefully they do; I think it would be a great compliment to the RestHelper to be used for XML Rest Services.

How did you do UnitTests

We actually ended up using ScalaTest. We did incorporate Cobertura as well for code coverage. However, we, as with anyone using Cobertura in Scala appears to see, you cannot get > 50% branch coverage. We have > 90% line coverage and about 850 UnitTests which is just awesome for the small amount of code we've actually written for our application itself. We had to use the JRunner interface since the maven-scalatest-plugin was not available at the time and is still kind of in a beta of sorts. My suggestion is just to go ahead and stick with ScalaTest using the JRunner interface; this way you can actually plug into tools like Jenkins and get UnitTest information into your reports.

Did you end up using the RESTful framework?

We did; as a matter of fact we ended up using the jqGrid plugin quite a bit; so we ended up making massive use in the RestHelper RESTful frameworks. It was fantastic how easy it was to create new resources and more specifically how simple it was to add resources. We actually ended up creating a list in our Boot class that contained all of the Rest objects; we then did a foreach on that list and added them with our guard (to protect our rest services from unauthenticated users) to the stateful dispatch. This would be something I would suggest to everyone if they are looking to do statefulDispatch rest services that require authentication. Setup a list of them and do a foreach on them to add them to the dispatch and do the guard. This way adding a new object is just adding it to a list.

How did sitemap hold up with authentication?

The sitemap itself was a little bit awkward at first to understand. Once you get the hang of "oh, after the slash is the filename without the .html appended" then it's pretty straight forward. The interesting thing about the SiteMap is that if you create the SiteMap manually you can specify a partial function to perform a check for users. I will say; if you do this, make sure to remember that there are a few different pages that shouldn't require authentication! Your login page, a logout page, your primary error page, your primary page missing page. Make sure in your partial function you are checking for that!

Summary

If I had to do it again here is a list of things that I would've done differently.
  • Converted to Squeryl.
  • Proper encapsulation in my snippets.
  • Used the HTML5 parser rather than the XHTML parser.
  • Done less in pure JS and did a bit more Lift-ing.
  • More Lazy Loading and Parallel Loading
  • Do not use as much Scala shorthand (this is merely for other users coming in)
I'm probably forgetting some, but these are the immediate ones that stand out to me as lessons that I learned writing a Lift Web Application. I hope others can see this and maybe take something away from it as things that will make a transition to Lift a bit easier. If you have questions I can do my best to answer them if I've run across them in the past; or if you have any experiences of your own let me know!
I would like to say thanks to the Lift Google group for being open with me when I had questions!

Thursday, July 26, 2012

MySQL/JDBC DateTime woes

The Situation

At my current company we store lots of timestamps; more specifically they are usually global timestamps. So clearly we needed to store the field as UTC. Why? Because we dozens of SQL servers which means we need a way to make sure that no matter where the servers are, the timestamps are always readable and understandable. Given this situation it has always been our idea to store the times in UTC. The biggest player in this is that JDBC attempts to convert timestamps for you so that you will always be in a local time. But what happens when you don't want to convert to a local time? For example, let's say you have two databases (for failover reasons) and you have two separate webapp servers that read from each of those databases (again for failover reasons except that the database primary is not bound to the web application primary). This means that you could have two sites one in India and one in Indiana and the India web app server reading from the Indiana database. So the conversion is never a constant thing. To add difficulty to this situation, let's assume that you have a group within your organization who has enough pull to state that these times must remain in UTC since that is what they are used to dealing with.

The Theory

Our solution starts by stating that we must keep the data in the database always be in UTC. Let's assume that the database servers are setup to be in UTC; so they are assumed to be UTC regardless. So let's digest the situation, we can look at this in 2 parts, the first is storing the timestamp in UTC. The second is retrieving the timestamp and converting it back into UTC from local (remember that JDBC is going to convert it to local for us).

How to store the timestamps in UTC?

The first part of this is knowing that there are two ways that we can get a timestamp to store. The first is to create a timestamp representing now. For this, we can just use a Calendar and store that. Simple enough, remember that it will be converted over so it'll be converted to now in UTC. The second is to create a timestamp based on input from a user. Here is where the difficulty lies, let's assume that our user wants to store "2012-02-02 00:00:00" but not based on a localtime instead they want it based on UTC. Well we can accomplish this two ways, both involve creating a SimpleDateFormat object and using it to parse the time. The first way is to append the "z" format symbol to get the timezone from the user. The key here is to just append the "UTC" string on parse.
SimpleDateFormat obj = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss z");
Date date_in_utc = obj.parse(str + " UTC");
So what is the other option? Well just set the timezone for the formatter of course.
SimpleDateFormat obj = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
obj.setTimeZone(TimeZone.getTimeZone("UTC"));
Date date_in_utc = obj.parse(str);
So why would I go with either of these options? They both have their pluses and minuses. The first option really allows us configuration. Let's say that we normally do UTC but we want to allow the user to specify their own timezones. Here we have that ability. But it's slower since we're going to be compiling the SimpleDateFormat each time and appending the "UTC" string each time. The second option means that we can compile the SimpleDateFormat once with it's UTC timezone set and not have to do any string appends when we want to perform the conversion.

How to get the timestamps in UTC?

Remember, we said that we are always going to get the timestamp in UTC and JDBC is going to convert the timestamp to the local timezone. So since we know that the time is in local time; how are we going to deal with this? Well let's just look at the display. Let's assume that we're going to display the time in UTC. How do we do it? The same way as in the previous step, except instead of doing a parse we do a format().toString as shown below.
SimpleDateFormat obj = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
obj.setTimeZone(TimeZone.getTimeZone("UTC"));
String date_in_utc = obj.format(str).toString;

Thursday, July 5, 2012

I can't get anything done (how you're doing more than you think)

I can't get anything done

I feel this almost everyday now. I just feel like I cannot accomplish anything during the day. To be honest, I have between 3-4 different projects going at any given time and every day I go home I feel like I haven't actually made any progress. Most of my time is spent either answering questions from people, or teaching them concepts/ideas that I've learned over time. Most people stop here and get angry that they just aren't accomplishing anything. Most people begin writing blogs and tweets about how they aren't getting anything done. The thing is, you're actually doing more than you think!

How you're doing more than you think

Obviously there are certain instances where this isn't the case; but if you're like me and answering questions and helping other people learn concepts and ideas that you've learned, you're actually doing more than you think. Many people understand the concept of the "force multiplier," the idea that if you train N people to work 50% of your capacity, you are actually performing N*0.5 work. This is obviously pretty straight forward; but it's important to reflect and realize what it means.

Why am I writing this?

I think many times we, especially as developers, get caught up int he concept that we must be programming to accomplish anything. Yet sometimes, it's the knowledge that we disperse that makes us more valuable than just churning out large chunks of code all the time. The next time you think you're not getting anything done, take a minute and think about everything else that you did and reflect on the fact that you're getting more done than you think!

Monday, May 7, 2012

Why Service Oriented Architecture Doesn't Work Internally?

I know I can hear a bunch of clicky keyboards starting to clack as tons of readers are composing a response on their Model-M keyboards to tell me I'm a complete moron. Well, that might be; but before you do let me at least make it through my point. I guess I should add one caveat here; I'm not saying that SoA doesn't work in certain ways. Instead, I'm making the point that using SoA for your internal applications does not work. I'll start by explaining my current situation where we are converting to SoA and how it is proving a complete failure for our internal software. Next, I'll explain why SoA does not work and how the problems SoA attempts to fix has been superseded. Finally I'll explain a better architecture which does contain SOME SoA for external clients and allows for faster response time for applications.


As promised, firstly I will explain my current situation. So we have a MySQL data storage on the backend and up to 100+ applications pounding on it at once. Our MySQL data storage is sharded quite a bit making up 63 hosts, 33 of which are replicated in a master-master replication scenario. There is an average of about 166 databases on each of the hosts. That's a ton of databases; and a ton of data on the backend. On average there are about 10,000,000 queries per day over all the databases/hosts. Now many of the databases can be normalized further (or normalized for the first time) and can be streamlined to actually take advantage of how the MySQL could do better.


Now that we've begun to understand the architecture, let's look at how SoA is being implemented. We're beginning to abstract many of the old database queries into REST-ful service calls. One of the architectures that is one of the newer and "better." This sounds all good and dandy; so let's examine what SoA is supposed to accomplish. SoA is implemented to abstract the data layer; this allows the data layer to change without having to change the applications which are talking to the database. This includes performing DDL or DML changes while not affecting the applications.


So why is it that SoA is failing for us? It's fairly simple, we are trying to emulate our MySQL queries into a midtier application. So why is this bad? Easy, we aren't in the business of making a SQL implementation; nor are we in the business of making high performance SQL implementations. Why is this the case though? Well if you think about an SoA, many times you will need to build an XML or JSON output from the MySQL rows returned. And in some of these instances we need to also perform some other validation or extra data extrapolation. So when we pull down 30,000 rows our service layer might choke because of memory requirements. Well, how about if we just return the first 300. Well we need to sort. So this means that the sort will be done in the SQL but how do we get more than 300? What happens if we don't want to sort by the default sort? You can see how this quickly becomes a giant mess and you begin to create a DSL (Domain Specific Language) build from a DTD in order to do exactly what SQL has already done. But now what if you need to do a count? Oh there is no count? No, instead you need to perform your REST-ful get and then count every element. But how much data was processed in order to receive a single integer?


Let's look at the count(*) example. Our ms are is a simplistic example. We'll assume that our

StepTime in ms
Read all elements into SQL memory0.01
Send all elements to mid-tier0.01
Sort data on mid-tier0.02
Transform mid-tier results to XML0.03
Send XML to client0.02
Count all XML elements0.03
Total0.12

Now let's assume that the query is for an InnoDB engine on MySQL which means that a count(*) must count every row in the table.

StepTime in ms
Read all elements counting as we go0.01
Send result to client0.01
Total0.02

Clearly this is not effective and we should probably look at another example to see when it does work. Let's assume that our client wants all records (200) in a table containing tuples that are 120 bytes per record.

StepTime in ms
Read all elements into SQL memory0.01
Send all elements to mid-tier0.01
Sort data on mid-tier0.02
Transform mid-tier results to XML0.03
Send XML to client0.02
Total0.09

Now let's look at this if we just queried the database directly.

StepTime in ms
Read all elements into SQL memory0.01
Send all elements to client0.01
Total0.02

Wow, looks like there is quite a large difference here. This is actually one of the many problems that we are facing attempting to do an SoA. We are also facing the issue of memory needs. For example, let's assume that we have our 200 records at 120 bytes per record. This comes out to 24000 bytes. Not really a huge deal here. But what happens if we have 100 clients that are performing these same types of queries. Except each one is asking to search by a different variable. This means that caching gets us nothing. So let's add another 100 clients. So now we're at 2.4MB of data that is being served. This of course doesn't include the extra XML overhead that needs to exist to send to the clients. It doesn't include the database handles etc... But we can see that if we increase the number of records from 200 records to let's say 500 records; we can increase the storage needed to 6MB. As we continue forward, given a steady rate of record creation we can see that we will eventually run out of space as we try to cache our records while trying to provide fast access to data.


Our issues stem from these exact issues; we've seen our service layer choke and die due to client load. Hundreds of queries per second looking for hundreds of records per second cause these services to choke. Not because the services are written poorly; but because they are not designed for high performance like an SQL database such as Oracle or MySQL is. Now let's look at the decoupling aspect. So now we're going to look at some of the concepts that SOA is supposed to fix with connecting to a data storage directly.


Many proponents of SOA say that the biggest thing you get is the decoupling of the data storage from the application. But let me ask you this; how often are you changing your data store? And even if you are, let me ask isn't this the point of JDBC, to abstract the SQL connection itself? Not to mention that if you do, you'll be updating your service layer anyway. So you're not actually saving anything here. Well, when you make DML changes let me ask, what do you need to do? Update the application layer to take advantage of the new fields. If you have an SOA, what do you need to do? Update the Service layer to take advantage of the new fields. But in the case of the service layer you can provide default field values such that the application doesn't need to be updated on DDL deployment. But what if we used default fields in the database? Don't we end up with the same functionality?


Now here's where we get into where, in my opinion, SOA really makes sense. The SOA can do checks in the data store in order to ensure that the data is. This makes complete sense right? We can ensure that our data is always valid. Unfortunately this same functionality exists in a database; this is the purpose of things like domains and triggers. Again, this functionality already exists in the data layer and is designed to perform faster than anything we can program in. But what about being able to combine tables together and show data in a concise manner while pulling underlying tables out from eachother. An example is maybe we have a customer table, and it contains some information that we want to bring into another table. If we implement SOA, we can just query both tables and return a data type that continues to look like our original table. But data stores have this taken care of as well; they are called views and they allow this same type of functionality.


So what is a better architecture? Clearly connecting directly to the database is not a good option in all cases. More specifically you don't want to have a Web UI that is doing manual database queries from some AJAX client. While that might be true; what you really want to do, is to create a library that contains all of your functions that perform the appropriate database queries. All of your applications (including your web-app) should include the library and make the calls directly. Note, when I say web-app I am talking only about the back-end of the web-app not the front-end HTML/JavaScript portion of the web UI. Now at this point, you can take that library and wrap it into its own service layer to allow external clients to perform queries as they see necessary. The idea is that you don't decrease performance and UX for your application (and subsequently your clients) but provide some functionality externally to other clients that may want to roll their own UX. This provides the high performance required since you have the functions that connect directly to your data storage and yet you have the ability to expose whichever functions you want externally. This also means that you can update the service layer at the same time your application is updated. So where do AJAX function calls exist in here? Well this is where your SOA is necessary; you need the SOA to allow for external connections.


The idea is simple, you treat data requests as either internal or external. If they are internal requests; then you include your library and perform the query through the built-in functions. If they are external requests, then you only allow a query through the SOA. Here are some examples ofeach type.

Internal

  • In-house application
  • Service layer itself

External

  • 3rd party internal applications
  • Clients who want an API

I know it seems counter intuitive, but sometimes we have to step back and ask why a specific architecture is better; not just assume that it's better because it "seems cool" or "because everyone else is going to it."

Thursday, January 12, 2012

Heaphone Woes

One of my favorite things to do while coding is listen to music. Most likely this is a topic I will cover in a later post; but for the time being just note that it's filled with almost everything. Regardless, I've fallen into a very bad situation with my current headphones. After lasting for 5 years; my pair of Bose over the ear headphones have now fallen apart. The rubber surround has completely fallen apart and I'm left with headphones that look like they've gone through a war with the surround exposing the foam composing the surround. Needless to say I need to replace my headphones at this point.



Of course I'm very picky about my headphones and how my music sounds; I play guitar and am very picky about my tonality being exactly what either I want it to be or as close to the original recording as possible. Most audiophiles like to have their music reproduced in specific ways; as close to possible with flattening equalizers and wide frequency responses etc... Me on the other hand, and I think one of the things that makes me a fun musician is that, I just want it to sound good. I don't really care that when the record was recorded, the treble was completely even with with the bass on this record. Instead, I just want it to be a very enjoyable experience; I like completely zoning out to my music (both when I'm playing and when I'm coding) and just letting my mind be solely encompassed.

Of course this doesn't work so well as my headphones die; as such, it's time to buy some new ones! At home, I have a pair of Beats by Dr. Dre Studio.



That's $300 for a pair of damn headphones, which is definitely expensive, but until you actually hear these things you will not understand just how well they take you to your own little happy place. The Bose headphones I previous had were about $100 and the price has now increased to $150. The problem is that; I'm not sure if I want to spend that amount on another set of headphones that will last for 5 years.

So, what other headphone choices to I have? The thing that I've seen takeover are the earbuds; but I'm never a huge fan of earbuds. They are always uncomfortable in my ear and it's not easy to take them out as opposed to headphones which let me go back into reality at work. Now there are other headphone makers such as Sony who make a decent pair of headphones. The headphones I had for my entire college career and into my professional career before I got the Bose were some simple $15 behind the neck headphones which were fabulous. They worked, they were easy to take one ear off and were also fairly comfortable. They also had a pretty good bass response. For the $15 you really couldn't beat them! That plus they fit into my backpack and I could crunch them and they just stayed alive for years.

Now the question that I have is if I buy the $150 headphones that I know will last (at max because I don't believe quality has gone up) 5 years at a maximum; which means that my headphones will cost about $30 per year. But if I were to go with the behind the neck headphones; I can actually replace them twice a year and still end up with the same amount of money spent over time as the one time cost of the Bose headphones. So the big question is; are the behind the neck as good as the Bose? My answer is no, the Bose are fantastic at blocking out all the noise around me (even though they are not noise cancelling).

Now Sony does have some other headphones that are closer to what I'm looking for (over the ear) which also include noise cancelling. I've tried the on ear version of Sony's noise cancelling but have been quite underwhelmed by the overall sound quality.



So I'm not sure if I'm going to end up with the same quality as the on-ear headphones. So what about Sennheisers? Well, I would be all over Sennheisers if I was running a music studio and needed to hear exactly what instruments were putting out and I wanted absolutely no equalization in my headphones.

I've been seeing Skullcandy but I have yet to actually explore them. My first thought is that I dislike them; their over ear headphones have a triangular look and feel to them. I've tried them on, and it always feels really odd on my head; so I'm really not thinking that I would be comfortable with them for an extended period of time. So this leaves me with the dilemma, do I want to buy some Sony's that may or may not underwhelm me with sound; some cheap Sony's that I will be replacing at some point in the near future. Or go buy the exact same type I have now with an increased price? Or maybe I spring for an extra $50 and get some on-ear Beats that I know will sound good even though I'm not a huge fan of on-ear headphones.

EDIT 1/16/2012:
Broke down and bought a pair of the Beats Solo (HD) headphones. Had a great sound quality, although no noise cancelling and not over the ear does a good job of drowning out the background noise. Also, the on ear is not that uncomfortable; and are actually pretty small and come in a nice carrying case. Looking forward to another 5+ years of headphone usage.