Persistent Counters in SimpleDB

I’ve already discussed the new consistency features of Amazon SimpleDB. One of the things people have wished for in SimpleDB was a way to manage a universal counter, something similar to an auto-incrementing primary key in MySQL. The consistency features allow clients to implement such a thing very easily. The following is an algorithm;

Read value
Write value+1, but only if the previous value is what we just read
If write failed, increment value and try again
else return new value

To make it easy for Java programmers to do this with typica, I’ve added a Counter class. Usage is very simple as you can see by this example;

SimpleDB sdb = new SimpleDB("AccessId", "SecretKey");
Domain dom = sdb.createDomain("MyDomain");
Counter c = new Counter(dom, "counter1");
for (int i=0; i<20; i++) {
	System.err.println("next val = "+c.nextValue());
}

This code creates a counter and initializes it if there isn’t a current value. It uses a Iterator-like interface, but there is no test for next value because there always is one. The Counter object is stateless, so it relies totally on SimpleDB for its value. This will work very well on multiple app servers, all relying on the same counter for unique values.

To avoid this blog getting out of date, I won’t include the counter code here, rather you can browse it in SVN.

Code has been checked into SVN r311. I’ll update this post once the new version of typica is released which includes this.

For those seeking a more pythonic version, have a look here.

Eventually Consistent, or Immediately with SimpleDB

Amazon SimpleDB has been a service that provides a schema-less data store with some fairly simple query abilities. One of the catches has always been that when you put a piece of data in, you might not get it back in a query right away. That time delay is generally very short (like < 1 second), but there are no guarantees of this. The cause of this goes back to the fundamental tradeoffs in highly available and redundant systems, such as those Amazon builds. Werner Vogels does a pretty good job of laying out the tradeoffs in his "Eventually Consistent” blog post, and others he links to. Essentially, it’s the CAP theorem, which talks about how you can have only 2 of Consistency, Availability or Partitioning (which gets at redundancy).
Using SimpleDB has required an understanding of how inconsistent results will affect your application. Mostly, it has been important that the application never rely on data being there immediately. This can cause problems when trying to give the user completely up to date information.

SimpleDB now supports consistent read, put and delete. There is a cost for consistency, which is potentially higher latency. Let’s take a look at the new features.
The simplest improvement is in the Select and GetAttributes calls. Supplying the “ConsistentRead=true” parameter ensures consistent data is returned. Now, SimpleDB is an option for storing application state. A regular Put can be used and consistent read will get the current state, always.
What is far more interesting is what has been done with put and delete. PutAttributes has some optional parameters that define a condition that must be met to allow the put to continue. In the request, you can define an expected value for some attribute, or specify that the attribute simply must exist. One application for consistent put is a counter. Imagine an item that has counter attribute. To increment the counter, simply read the value, then do a conditional put, specifying the new value, but only if the old value is set. The request will fail if another writer got there first. A retry loop is required, as in this pseudo-code

value = read(counter);
while (put counter=value+1, if counter==value fails) {
    value = read(counter);
}

The same can happen for the delete operation. In a future post, I’ll talk about how to use typica to access these new features from Java. (added! https://coderslike.us/2010/03/09/persistent-counters-in-simpledb/)

A Unique Method of Authenticating against App-Managed Userlist

I have a project that uses Amazon’s SimpleDB service for data storage. Being a Java programmer, I have become fond of using JPA (Java Persistence Architecture) implementations. In some cases, I’ve used EclipseLink, but more recently I’ve been playing with SimpleJPA. This is a partial JPA implementation on top of SimpleDB. The benefits include writing value objects with minimal annotations to indicate relationships.

Anyway, enough about why I do it. Since my user list is also stored in JPA entities, I’d like to tie this into the container managed authentication. The web app I’m writing is being deployed to tomcat and so realms are used to define a authentication provider. Tomcat provides several realms that hook into a JDBC Database, JAAS, JNDI Datasource and more. In my case, I wanted to rely in data access via JPA. Before discussing the challenges, I should point out that in a Java web app container, there are different class loaders to contend with. The container has its own classloader, and each web application has its own. My application obviously contains all of the supporting jars for SimpleJPA and my value objects. Since authentication is being handled by the container, it doesn’t have access to my app’s classloader. So, I’d need to deploy about 12 jar files into the tomcat/lib directory to make them available to the container. One of those contains my value objects and could change in the future. I don’t think that’s a very nice deployment strategy (deploying a war, and then a separate jar for each software update).

To solve this problem, I had to come up with a way to write my own Realm with as few dependencies on my application as possible. What I came up with is a socket listener, running on a dedicated socket, within my web application. It only accepts connections from localhost, so it is not likely to become a security concern. The socket listener receives a username and returns username,password,role1,role2,… as a string. That is the contract between my web application and the authentication realm. The realm interfaces with the socket listener and uses that to get information about the user trying to authenticate, which is converts to the object format used within realms in tomcat.

The code for the socket listener is fairly simple;

package org.scalabletype.util;

import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.ServerSocket;
import java.net.UnknownHostException;

import javax.persistence.EntityManager;
import javax.persistence.Query;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import org.scalabletype.data.DataHelper;
import org.scalabletype.data.User;

/**
 * This class listens on a port, receives a username, looks up user record, then responds with data.
 */
public class AuthServer extends Thread {
	private static Log logger = LogFactory.getLog(AuthServer.class);
	public static final int AUTH_SOCKET = 2000;

	public AuthServer() { }

	public void run() {
		while (!isInterrupted()) {
			try {
				ServerSocket ss = new ServerSocket(AUTH_SOCKET);
				while (!isInterrupted()) {
					Socket sock = ss.accept();
					try {
						// confirm connection from localhost only
						InetAddress addr = sock.getInetAddress();
						if (addr.getHostName().equals("localhost")) {
							// get user to authenticate
							InputStream iStr = sock.getInputStream();
							byte [] buf = new byte[1024];
							int bytesRead = iStr.read(buf);
							String username = new String(buf, 0, bytesRead);
							logger.info("username to authenticate:"+username);

							// fetch user from JPA
							EntityManager em = DataHelper.getEntityManager();
							Query query = em.createQuery("select object(o) from User o where o.username = :name");
							query.setParameter("name", username);
							User usr = (User)query.getSingleResult();

							// return user data, or nothing
							OutputStream oStr = sock.getOutputStream();
							logger.info("got connection, going to respond");
							if (usr != null) {
								StringBuilder ret = new StringBuilder();
								ret.append(usr.getUsername());
								ret.append(",");
								ret.append(usr.getPassword());
								ret.append(",");
								ret.append(usr.getAuthGroups());
								oStr.write(ret.toString().getBytes());
							}
							oStr.flush();
						}
						sock.close();
					} catch (Exception ex) {
						logger.error("Some problem handling the request", ex);
					}
				}
			} catch (Exception ex) {
				logger.error("problem accepting connection. will keep going.", ex);
			}
		}
	}
}

The socket listener needs to be invoked when the web application is initialized and a ServletContextListener is a good place to do that;

public class ScalableTypeStarter implements ServletContextListener {
	private AuthServer auth;

	public void contextInitialized(ServletContextEvent evt) {
		// init data persistence layer
		DataHelper.initDataHelper(evt.getServletContext());

		// start authorization socket listener
		auth = new AuthServer();
		auth.start();
	}

	public void contextDestroyed(ServletContextEvent evt) {
		if (auth != null) {
			auth.interrupt();
			auth = null;
		}
	}
}

Here is the code for my realm, which is packaged by itself into a jar, and deployed (once) into the tomcat/lib directory.

package org.scalabletype.util;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.net.UnknownHostException;
import java.security.Principal;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.catalina.Group;
import org.apache.catalina.Role;
import org.apache.catalina.User;
import org.apache.catalina.UserDatabase;
import org.apache.catalina.realm.GenericPrincipal;
import org.apache.catalina.realm.RealmBase;

/**
 * This realm authenticates against user data via the socket listener.
 *
 */
public class UserRealm extends RealmBase {
	public static final int AUTH_SOCKET = 2000;

    protected final String info = "org.scalabletype.util.UserRealm/1.0";
    protected static final String name = "UserRealm";

    /**
     * Return descriptive information about this Realm implementation and
     * the corresponding version number, in the format
     * <code>&lt;description&gt;/&lt;version&gt;</code>.
     */
    public String getInfo() {
        return info;
    }

    /**
     * Return <code>true</code> if the specified Principal has the specified
     * security role, within the context of this Realm; otherwise return
     * <code>false</code>. This implementation returns <code>true</code>
     * if the <code>User</code> has the role, or if any <code>Group</code>
     * that the <code>User</code> is a member of has the role. 
     *
     * @param principal Principal for whom the role is to be checked
     * @param role Security role to be checked
     */
    public boolean hasRole(Principal principal, String role) {
        if (principal instanceof GenericPrincipal) {
            GenericPrincipal gp = (GenericPrincipal)principal;
            if(gp.getUserPrincipal() instanceof User) {
                principal = gp.getUserPrincipal();
            }
        }
        if (!(principal instanceof User) ) {
            //Play nice with SSO and mixed Realms
            return super.hasRole(principal, role);
        }
        if ("*".equals(role)) {
            return true;
        } else if(role == null) {
            return false;
        }
        User user = (User)principal;
        UserInfo usr = findUser(user.getFullName());
        if (usr == null) {
            return false;
        } 
        for (String group : usr.groups) {
			if (role.equals(group)) return true;
		}
        return false;
    }
		
    /**
     * Return a short name for this Realm implementation.
     */
    protected String getName() {
        return name;
    }

    /**
     * Return the password associated with the given principal's user name.
     */
    protected String getPassword(String username) {
        UserInfo user = findUser(username);

        if (user == null) {
            return null;
        } 

        return (user.password);
    }

    /**
     * Return the Principal associated with the given user name.
     */
    protected Principal getPrincipal(String username) {
        UserInfo user = findUser(username);
        if(user == null) {
            return null;
        }

        List roles = new ArrayList();
        for (String group : user.groups) {
            roles.add(group);
        }
        return new GenericPrincipal(this, username, user.password, roles);
    }

	private UserInfo findUser(String username) {
		UserInfo user = new UserInfo();
		try {
			Socket sock = new Socket("localhost", AUTH_SOCKET);
			OutputStream oStr = sock.getOutputStream();
			oStr.write(username.getBytes());
			oStr.flush();
			InputStream iStr = sock.getInputStream();
			byte [] buf = new byte[4096];
			int len = iStr.read(buf);
			if (len == 0) {
				return null;
			}
			String [] data = new String(buf, 0, len).split(",");
			user.username = data[0];
			user.password = data[1];
			ArrayList<String> groups = new ArrayList<String>();
			for (int i=2; i<data.length; i++) {
				groups.add(data[i]);
			}
			user.groups = groups;
		} catch (UnknownHostException ex) {
			ex.printStackTrace();
		} catch (IOException ex) {
			ex.printStackTrace();
		}
		return user;
	}

	class UserInfo {
		String username;
		String password;
		List<String> groups;
	}
}

The web app’s context.xml contains this line to configure the realm;

<Realm className="org.scalabletype.util.UserRealm" resourceName="ScalableTypeAuth"/>

Amazon SimpleDB now available in the EU region

Amazon has just announced support for SimpleDB in their European data center. That means applications running in the EU will have lower latency when accessing SimpleDB. That’s good news for SimpleDB adoption. The EU SimpleDB is a totally separate version of the service as with S3 and EC2. So, you simply use the EU endpoint (sdb.eu-west-1.amazonaws.com) and it is business as usual.

The QueryTool now has built in support for region selection, which should make it easier to test queries and export data from both places. It is available for download here

Typica is ready for the EU. Simply create the SimpleDB object with the new EU endpoint (instead of the default US endpoint);

SimpleDB sdb = new SimpleDB(accessId, secretKey, true, “sdb.eu-west-1.amazonaws.com“);


QueryTool now exports data

There are some things I’ve been wanting in the QueryTool, so I just threw them in. Here’s the list;

  • proxy support (add proxy host and port to the command line args)
  • sortable results table (via column header click)
  • export result data to CSV file
  • scrollable query scratchpad

Screen shot 2009-09-18 at 10.00.01 AM

The tool can be downloaded here

To run it, type;

java -jar QueryTool1.2.jar <AccessId> <SecretKey> [ProxyHost] [ProxyPort]

QueryTool 1.1, even more to like!

Screen shot 2009-08-05 at 4.29.39 PM

A number of you have requested some new features, and some of them have made it into this new release. You can now add and delete domains from within the tool. Also, clicking on results lets you copy the cell or row into the clipboard. This later feature can be really handy for putting item names into queries. So, check it out! Once you download the jar, you run it by using this command;

java -jar QueryTool1.1.jar <accessId> <secretKey>

Query Tool for Amazon SimpleDB

Although these projects are never really finished, I can say I’ve completed version 1 of the Amazon SimpleDB Query Tool. This is built on top of a new SimpleDB API that will be part of an upcoming typica release. The code currently resides in a branch, but will hopefully get merges into trunk in the next few weeks.

OK, more about the tool. It was built to provide a convenient way to test queries. That’s it. Towards that end, there is a list of features I included that really met a need for me.

  • flexible query workspace (scratch pad)
  • run query on the line where the cursor is
  • allow domain selection via the UI
  • display domain metadata
  • show results from several queries at once
  • show box usage and other stats

Now, the moment you’ve all been waiting for, a screenshot!

querytool

Right now, the code is still in SVN, so if you’d like to run it, you’ll need to check out the branch and build it.  If you get that far, to run it, you can use this command, “ant test.main -Dclass=QueryTool -Dargs=”<access id> <secret key>”

For an official release, I’ll make an executable jar, so you’d run “java -jar QueryTool.jar <access id> <secret key>”