How to build a local NAS backed by Amazon S3

A previous post talked about my need for some local, reliable storage in my home. That project led to investigating some other options. Since I’m a big fan of Amazon S3, it seemed like something I should involve in my storage solution. The Elastician (Mitch Garnaat) and I bought the same hardware and are working through the setup together. Here’s the rundown of the hardware including costs;

Cooler Master Elite 360 m-ATX ATX Mid/Mini Tower Case with 350-Watt Power Supply RC-360-KKR1 $56.97
Gigabyte Core 2 Quad/Intel G41/DDR2/A&V&GbE/MATX/DualBIOS Motherboard GA-G41M-ES2L $56.99
Intel Pentium E5300 2.6GHz 2M L2 Cache 800MHz LGA775 Desktop Processor $66.99
Corsair XMS2 4 GB (2 X 2 GB) PC2-6400 800 MHz 240-PIN DDR2 Dual-Channel Memory Kit – TWIN2X4096-6400C5 $94.99
Western Digital 1 TB Caviar Green SATA Intellipower 64 MB Cache Bulk/OEM Desktop Hard Drive WD10EARS $54.49 * 2
Kingston DataTraveler 112 – 8 GB USB 2.0 Flash Drive DT112K/8GBCL (Black) $13.93 * 2
RadioShack® Molex® to SATA Power Cable $2.99

My previous post discusses the hardware in more detail and some of the choices. Here’s a picture of inside of the case once things were assembled. The observant among you would notice that one of the drives doesn’t have power. That’s because the case power supply didn’t have 2 SATA power connectors and the adapter cable was on order when this picture was taken. I’ll also point out that this case isn’t ideal for mounting several 3.5″ drives. With adapters, I can fit 4 in there, true. However, shopping around for something more to my liking is something I’d do differently next time. Thinking more about the software to run on the NAS has led to several projects including FreeNAS and OpenFiler. We decided to go with something we’re familiar with, Ubuntu. Ubuntu has instructions on their download page for creating a bootable flash drive. I tried the Mac OS-X method and failed, so I resorted the tool from pendrivelinux.com on the family window box. The Universal USB Installer they have works well and created good, bootable flash drives every time.

Creating a Bootable Flash Drive

I tried the Ubuntu Server download, but that seems to be geared towards jumpstarting a server install vs running right off the flash drive. The Ubuntu Desktop was much more to my liking.

To get things going, I needed to connect a mouse/keyboard/monitor. Once I configured the BIOS to boot from the USB HDD, it recognized the bootable flash drive and started bring Ubuntu up. It seems to take “forever” to boot up. I could hit “escape” to watch the console and found that it was timing out on the floppy drive, which I don’t have. I went into the BIOS settings to let it know there wasn’t a floppy drive attached and boot time went WAY down! I let the desktop come up, but since this is an install image, changes made aren’t saved. Having the 2nd flash drive will come in very handy now! Plug it into another USB port before prceeding. Select the “System”->”Administration” menus, then the “Install Ubuntu… ” option. There are steps on the install wizard that require special mention. On step 4, select “erase and use the entire disk”, and select your flash drive (not of the hard drivces!). In step 5, after you’ve entered the required information, select “log in automatically”, which will help when running headless later. Now the most critical part, step 7 has an “advanced” button you need to click. Make sure  you select the proper device, because it defaults to /dev/sda (the first hard drive). You need to select /dev/sdd, which is the last device connected (the target flash drive). Let the install proceed and you’ll have a bootable ubuntu image we can start configuring.

Remote Desktop for Administration

Once it was up, I could use the desktop and configure Remote Desktop. Having played with the default VNC server, it seemed like the wrong option. It didn’t run unless I had a monitor attached, so I did some digging and found that tightVNC is a popular alternative. There are a few steps to getting it installed and running at boot, detailed here.

For another means of access, its a good idea to install ssh (“apt-get install openssh-server”)

Configuring the RAID

The Disk Utility also has a menu option to configure the RAID. It uses mdadm, but I heard some folks talking about using lvm. Linux Mag has an article that talks about both. I decided to go with the built-in option.

Run “apt-get install mdadm” in a termal window. You can then use “Disk Utility” (on the “System”->”Administration” menu). One thing I noticed is that if you play around with RAID config or do your own partitioning of the drives, the RAID wizard isn’t really happy about using those drives. If this is the case, select each drive and then “Format Drive”. Select the “Don’t Partition” option to reset the drive state. You’ll find that you can now select the drives in the RAID setup wizard.

I’ve set the drives up in a RAID 0 config. Prior to doing this, I did a performance test on a single drive and got an average read rate of 84MB/sec. Once the RAID was configured and formatted, I ran the same performance test and got a read rate of 155MB/sec, which is approaching double the speed! Now that’s what I was hoping for!

To get the RAID started at boot time, edit the /etc/mdadm/mdadm.conf file and replace the existing “DEVICE” line with these 2 lines;

DEVICE /dev/sda1 /dev/sdb1
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1 auto=yes

Next, run “dpkg-reconfigure mdadm” and accept the defaults. Thanks to goldfisch.at for the help.

Now, to get it mounted, add this to the /etc/fstab

/dev/md0	/mdeia/RAID	ext4	rw,nosuid,nodev,uhelper=udisks	1	2

I might have been able to say “defaults” in that options column, but I took what was there when I mounted the RAID manually using the disk utility.

Sharing the Storage

Initially, I’m setting up Samba to share with my household machines. I found this article at ubuntu.com to help me. I’m concerned with privacy, not because I don’t trust my family, but because I plan on backing up my laptop and I don’t want others messing with my files.

I created a “data” directory on the RAID drive. If you right-click on that folder, select “sharing options”. It brings up a dialog, and if you click “share this folder”, you’ll get prompted to install some packages (do it!). I discovered that I needed to use “smbpasswd” to set the share password. I’ll probably need to do this for each user I create to access the RAID.

The Amazon S3 Backup

For the Amazon S3 backup part, we’ve tossed around a number of different options. S3sync isn’t bad, but doesn’t allow for threaded uploads, and there’s the issue of how often do we kick it off. We asked, “what about running an S3 based filesystem and doing a RAID 1 on top of that and the RAID 0 local drives?”. That might be OK, but how about traffic control? What block size do we use, and what penalty do we pay for a larger block size when storing small files? Where do we store the local cache? Do we even want a local cache since we have a local disk array? Along those lines, we looked at S3Backer and others.

What is the solution when  you don’t really think the available options are great? Right your own! We think that we can write a daemon tied into the file system notification (pynotify) and use boto for the S3 part. Stay tuned… I smell another open source project!

Amazon Simple Notification Service

Amazon has just come out with yet another service to help build your app on AWS. Their Simple Notification Service is a pub/sub setup where you create topics and users can subscribe. Delivery is via a “push” mechanism, so subscribers won’t need to poll for new messages. Output can be one of several protocols which include http/https/email/email-json or sqs. While the e-mail output can be useful for managing things like users watching a comment or blog post. The other options are clearly geared towards consumption by other software. Imagine the http options being used to implement a web service callback. SQS is clearly helpful for building loosely coupled services in the cloud. Now, SNS can help feed into those services.

SNS overview

For more information, visit the SNS documentation
Jeff Barr also does an excellent job of describing SNS at the AWS blog.

typica now supports SNS. Subversion contains the latest code. A release will be coming shortly. (check this space for updates)

Here’s an example of how to use typica to create a topic, subscribe, send a message, then unsubscribe and remove the topic;

NotificationService sns = new NotificationService(props.getProperty("aws.accessId"), props.getProperty("aws.secretKey"));
Result<String> ret = sns.createTopic("TestTopic");
String topicArn = ret.getResult();
System.err.println("topicArn: "+topicArn);

sns.subscribe(topicArn, "email", "dkavanagh@gmail.com");
System.out.println("Waiting till subscription is confirmed.");
System.out.println("Check your e-mail, confirm, then press <return>");
System.in.read();

List<SubscriptionInfo> subs = sns.listSubscriptionsByTopic(topicArn, null).getItems();
String subArn = subs.get(0).getSubscriptionArn();
System.err.println("subscriptionArn: "+subArn);
sns.publish(topicArn, TEST_MSG, "[SNS] testing...");

sns.unsubscribe(subArn);
sns.deleteTopic(topicArn);

Persistent Counters in SimpleDB

I’ve already discussed the new consistency features of Amazon SimpleDB. One of the things people have wished for in SimpleDB was a way to manage a universal counter, something similar to an auto-incrementing primary key in MySQL. The consistency features allow clients to implement such a thing very easily. The following is an algorithm;

Read value
Write value+1, but only if the previous value is what we just read
If write failed, increment value and try again
else return new value

To make it easy for Java programmers to do this with typica, I’ve added a Counter class. Usage is very simple as you can see by this example;

SimpleDB sdb = new SimpleDB("AccessId", "SecretKey");
Domain dom = sdb.createDomain("MyDomain");
Counter c = new Counter(dom, "counter1");
for (int i=0; i<20; i++) {
	System.err.println("next val = "+c.nextValue());
}

This code creates a counter and initializes it if there isn’t a current value. It uses a Iterator-like interface, but there is no test for next value because there always is one. The Counter object is stateless, so it relies totally on SimpleDB for its value. This will work very well on multiple app servers, all relying on the same counter for unique values.

To avoid this blog getting out of date, I won’t include the counter code here, rather you can browse it in SVN.

Code has been checked into SVN r311. I’ll update this post once the new version of typica is released which includes this.

For those seeking a more pythonic version, have a look here.

Eventually Consistent, or Immediately with SimpleDB

Amazon SimpleDB has been a service that provides a schema-less data store with some fairly simple query abilities. One of the catches has always been that when you put a piece of data in, you might not get it back in a query right away. That time delay is generally very short (like < 1 second), but there are no guarantees of this. The cause of this goes back to the fundamental tradeoffs in highly available and redundant systems, such as those Amazon builds. Werner Vogels does a pretty good job of laying out the tradeoffs in his "Eventually Consistent” blog post, and others he links to. Essentially, it’s the CAP theorem, which talks about how you can have only 2 of Consistency, Availability or Partitioning (which gets at redundancy).
Using SimpleDB has required an understanding of how inconsistent results will affect your application. Mostly, it has been important that the application never rely on data being there immediately. This can cause problems when trying to give the user completely up to date information.

SimpleDB now supports consistent read, put and delete. There is a cost for consistency, which is potentially higher latency. Let’s take a look at the new features.
The simplest improvement is in the Select and GetAttributes calls. Supplying the “ConsistentRead=true” parameter ensures consistent data is returned. Now, SimpleDB is an option for storing application state. A regular Put can be used and consistent read will get the current state, always.
What is far more interesting is what has been done with put and delete. PutAttributes has some optional parameters that define a condition that must be met to allow the put to continue. In the request, you can define an expected value for some attribute, or specify that the attribute simply must exist. One application for consistent put is a counter. Imagine an item that has counter attribute. To increment the counter, simply read the value, then do a conditional put, specifying the new value, but only if the old value is set. The request will fail if another writer got there first. A retry loop is required, as in this pseudo-code

value = read(counter);
while (put counter=value+1, if counter==value fails) {
    value = read(counter);
}

The same can happen for the delete operation. In a future post, I’ll talk about how to use typica to access these new features from Java. (added! https://coderslike.us/2010/03/09/persistent-counters-in-simpledb/)

Creating your own AccessId and SecretKey

When building your own web services, security has to be one of the concerns. Having worked with Amazon Web Services extensively over he past several years, I’ve become quite familiar with the Query API security features. I wrote a java client for many of their services, so the workings of the various signing methods needed to be understood.
On a recent project, I decided to implement the same version 2 signing that Amazon uses on most of their web services. I thought I’d be able to leverage my typica client code, and the signing code to help validate requests on the server side. Before any of this could happen, I needed to have a way to generate my own access id and secret key. Shown below is some code that I came up with (using some Eucalyptus code as a reference).

public class Credentials {
	static {
		Security.addProvider( new BouncyCastleProvider( ) );
	}

	public static String genAccessId(String userName) {
		try {
			byte[] userBytes = userName.getBytes();
			MessageDigest digest = MessageDigest.getInstance("SHA224", "BC");
			digest.update(userBytes);
			byte [] digestBytes = digest.digest();
			return new String(Base64.encodeBase64(digestBytes)).replaceAll( "\\p{Punct}", "" ).substring(0, 20);
		} catch (Exception ex) {	// catch those security exceptions
			System.err.println("Error with provider : "+ex.getMessage());
			System.exit(-2);
		}
		return null;
	}

	public static String genSecretKey(String userName) {
		try {
			byte[] userBytes = userName.getBytes();
			MessageDigest digest = MessageDigest.getInstance("SHA256", "BC");
			digest.update(userBytes);
			SecureRandom random = new SecureRandom();
			random.setSeed(System.currentTimeMillis());
			byte[] randomBytes = random.generateSeed(userBytes.length);
			digest.update(randomBytes);
			byte [] digestBytes = digest.digest();
			return new String(Base64.encodeBase64(digestBytes)).substring(0, 40);
		} catch (Exception ex) {	// catch those security exceptions
			System.err.println("Error with provider : "+ex.getMessage());
			System.exit(-2);
		}
		return null;
	}
}

When creating a user on the server, I generate an access id and secret key based on the username. A new secret key can be generated at any time, using that username. The access id shouldn’t be changed (as a rule).

A Unique Method of Authenticating against App-Managed Userlist

I have a project that uses Amazon’s SimpleDB service for data storage. Being a Java programmer, I have become fond of using JPA (Java Persistence Architecture) implementations. In some cases, I’ve used EclipseLink, but more recently I’ve been playing with SimpleJPA. This is a partial JPA implementation on top of SimpleDB. The benefits include writing value objects with minimal annotations to indicate relationships.

Anyway, enough about why I do it. Since my user list is also stored in JPA entities, I’d like to tie this into the container managed authentication. The web app I’m writing is being deployed to tomcat and so realms are used to define a authentication provider. Tomcat provides several realms that hook into a JDBC Database, JAAS, JNDI Datasource and more. In my case, I wanted to rely in data access via JPA. Before discussing the challenges, I should point out that in a Java web app container, there are different class loaders to contend with. The container has its own classloader, and each web application has its own. My application obviously contains all of the supporting jars for SimpleJPA and my value objects. Since authentication is being handled by the container, it doesn’t have access to my app’s classloader. So, I’d need to deploy about 12 jar files into the tomcat/lib directory to make them available to the container. One of those contains my value objects and could change in the future. I don’t think that’s a very nice deployment strategy (deploying a war, and then a separate jar for each software update).

To solve this problem, I had to come up with a way to write my own Realm with as few dependencies on my application as possible. What I came up with is a socket listener, running on a dedicated socket, within my web application. It only accepts connections from localhost, so it is not likely to become a security concern. The socket listener receives a username and returns username,password,role1,role2,… as a string. That is the contract between my web application and the authentication realm. The realm interfaces with the socket listener and uses that to get information about the user trying to authenticate, which is converts to the object format used within realms in tomcat.

The code for the socket listener is fairly simple;

package org.scalabletype.util;

import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.ServerSocket;
import java.net.UnknownHostException;

import javax.persistence.EntityManager;
import javax.persistence.Query;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import org.scalabletype.data.DataHelper;
import org.scalabletype.data.User;

/**
 * This class listens on a port, receives a username, looks up user record, then responds with data.
 */
public class AuthServer extends Thread {
	private static Log logger = LogFactory.getLog(AuthServer.class);
	public static final int AUTH_SOCKET = 2000;

	public AuthServer() { }

	public void run() {
		while (!isInterrupted()) {
			try {
				ServerSocket ss = new ServerSocket(AUTH_SOCKET);
				while (!isInterrupted()) {
					Socket sock = ss.accept();
					try {
						// confirm connection from localhost only
						InetAddress addr = sock.getInetAddress();
						if (addr.getHostName().equals("localhost")) {
							// get user to authenticate
							InputStream iStr = sock.getInputStream();
							byte [] buf = new byte[1024];
							int bytesRead = iStr.read(buf);
							String username = new String(buf, 0, bytesRead);
							logger.info("username to authenticate:"+username);

							// fetch user from JPA
							EntityManager em = DataHelper.getEntityManager();
							Query query = em.createQuery("select object(o) from User o where o.username = :name");
							query.setParameter("name", username);
							User usr = (User)query.getSingleResult();

							// return user data, or nothing
							OutputStream oStr = sock.getOutputStream();
							logger.info("got connection, going to respond");
							if (usr != null) {
								StringBuilder ret = new StringBuilder();
								ret.append(usr.getUsername());
								ret.append(",");
								ret.append(usr.getPassword());
								ret.append(",");
								ret.append(usr.getAuthGroups());
								oStr.write(ret.toString().getBytes());
							}
							oStr.flush();
						}
						sock.close();
					} catch (Exception ex) {
						logger.error("Some problem handling the request", ex);
					}
				}
			} catch (Exception ex) {
				logger.error("problem accepting connection. will keep going.", ex);
			}
		}
	}
}

The socket listener needs to be invoked when the web application is initialized and a ServletContextListener is a good place to do that;

public class ScalableTypeStarter implements ServletContextListener {
	private AuthServer auth;

	public void contextInitialized(ServletContextEvent evt) {
		// init data persistence layer
		DataHelper.initDataHelper(evt.getServletContext());

		// start authorization socket listener
		auth = new AuthServer();
		auth.start();
	}

	public void contextDestroyed(ServletContextEvent evt) {
		if (auth != null) {
			auth.interrupt();
			auth = null;
		}
	}
}

Here is the code for my realm, which is packaged by itself into a jar, and deployed (once) into the tomcat/lib directory.

package org.scalabletype.util;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.Socket;
import java.net.UnknownHostException;
import java.security.Principal;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.catalina.Group;
import org.apache.catalina.Role;
import org.apache.catalina.User;
import org.apache.catalina.UserDatabase;
import org.apache.catalina.realm.GenericPrincipal;
import org.apache.catalina.realm.RealmBase;

/**
 * This realm authenticates against user data via the socket listener.
 *
 */
public class UserRealm extends RealmBase {
	public static final int AUTH_SOCKET = 2000;

    protected final String info = "org.scalabletype.util.UserRealm/1.0";
    protected static final String name = "UserRealm";

    /**
     * Return descriptive information about this Realm implementation and
     * the corresponding version number, in the format
     * <code>&lt;description&gt;/&lt;version&gt;</code>.
     */
    public String getInfo() {
        return info;
    }

    /**
     * Return <code>true</code> if the specified Principal has the specified
     * security role, within the context of this Realm; otherwise return
     * <code>false</code>. This implementation returns <code>true</code>
     * if the <code>User</code> has the role, or if any <code>Group</code>
     * that the <code>User</code> is a member of has the role. 
     *
     * @param principal Principal for whom the role is to be checked
     * @param role Security role to be checked
     */
    public boolean hasRole(Principal principal, String role) {
        if (principal instanceof GenericPrincipal) {
            GenericPrincipal gp = (GenericPrincipal)principal;
            if(gp.getUserPrincipal() instanceof User) {
                principal = gp.getUserPrincipal();
            }
        }
        if (!(principal instanceof User) ) {
            //Play nice with SSO and mixed Realms
            return super.hasRole(principal, role);
        }
        if ("*".equals(role)) {
            return true;
        } else if(role == null) {
            return false;
        }
        User user = (User)principal;
        UserInfo usr = findUser(user.getFullName());
        if (usr == null) {
            return false;
        } 
        for (String group : usr.groups) {
			if (role.equals(group)) return true;
		}
        return false;
    }
		
    /**
     * Return a short name for this Realm implementation.
     */
    protected String getName() {
        return name;
    }

    /**
     * Return the password associated with the given principal's user name.
     */
    protected String getPassword(String username) {
        UserInfo user = findUser(username);

        if (user == null) {
            return null;
        } 

        return (user.password);
    }

    /**
     * Return the Principal associated with the given user name.
     */
    protected Principal getPrincipal(String username) {
        UserInfo user = findUser(username);
        if(user == null) {
            return null;
        }

        List roles = new ArrayList();
        for (String group : user.groups) {
            roles.add(group);
        }
        return new GenericPrincipal(this, username, user.password, roles);
    }

	private UserInfo findUser(String username) {
		UserInfo user = new UserInfo();
		try {
			Socket sock = new Socket("localhost", AUTH_SOCKET);
			OutputStream oStr = sock.getOutputStream();
			oStr.write(username.getBytes());
			oStr.flush();
			InputStream iStr = sock.getInputStream();
			byte [] buf = new byte[4096];
			int len = iStr.read(buf);
			if (len == 0) {
				return null;
			}
			String [] data = new String(buf, 0, len).split(",");
			user.username = data[0];
			user.password = data[1];
			ArrayList<String> groups = new ArrayList<String>();
			for (int i=2; i<data.length; i++) {
				groups.add(data[i]);
			}
			user.groups = groups;
		} catch (UnknownHostException ex) {
			ex.printStackTrace();
		} catch (IOException ex) {
			ex.printStackTrace();
		}
		return user;
	}

	class UserInfo {
		String username;
		String password;
		List<String> groups;
	}
}

The web app’s context.xml contains this line to configure the realm;

<Realm className="org.scalabletype.util.UserRealm" resourceName="ScalableTypeAuth"/>

Amazon EC2 – Boot from EBS and AMI conversion

Amazon recently announced an important new feature for their Elastic Compute Cloud. Previously, each instance was based on an image that could be a maximum of 10 GB in size. So, each machine you brought up could have a root partition up to 10 GB in size and additional storage would need to be added in other ways. The size restriction alone is somewhat limiting. Amazon has not only addressed that, but given users some other very powerful abilities.

Now, you can define an image in an EBS snapshot. That means the size of your root partition can be as large as 1 TB. Yes, that’s 100 times larger than the old 10 GB limit. Beyond the obvious benefit of having larger images, you can also stop instances. Stopping an instance is different than terminating an instance. The distinction is important because stopping an instance is very much like hitting the “pause” button. It doesn’t take a lot to realize that pausing a running instance and being able to start it up again later is very powerful! Instances tend to boot faster off EBS. As  you might expect, if you create a really large volume for a root partition (like 100s of GBs), it will take longer to come up. That’s just because it takes longer to create larger volumes than smaller ones.

Let’s go further and look at how powerful it is to have snapshots as the basis for images. By having a snapshot that you can create EBS volumes from, that means you can mount a volume, based on your snapshot (which represents your image) and make modifications to it! This is immensely helpful when trying to make changes to an image. Previously, it was somewhat more awkward to modify an image. You actually had to boot it up and run it. But now, even if there is an error that prevents proper running, you can access the image storage and make changes. Very useful!

Of course judging by the number of public AMIs out there, there are a great number of images backed by S3 that people will want to convert. Towards this end, I came up with a script to convert AMIs from the old to the new style. Here’s the cliff’s notes version.

Use an instance in the same region as your image to do the following,

  • download the image bundle to the ephemeral store
  • unbundle the image (resulting in a single file)
  • create a temporary EBS volume in the same availability zone as the instance
  • attach the volume to your instance
  • copy the unbundled image onto the raw EBS volume
  • mount the EBS volume
  • edit /etc/fstab on the volume to remove the ephemeral store mount line
  • unmount and detach the volume
  • create a snapshot of the EBS volume
  • register the snapshot as an image, and you’re done!

During the private beta for this feature, I created an AMI to handle all of this, so you boot the AMI with a set of parameters and it does the dirty work. The script uses the standard API and AMI tools that Amazon supplies. I’ll roll that out on the public cloud shortly.

Here’s the interesting portion of the script (parsing arguments and setting up environment variable for the tools has been omitted) :

Using the AMI ID, get the manifest name and architecture
AMI_DESC=`$EC2_HOME/bin/ec2dim |grep $AMI_ID`
MANIFEST=`echo $AMI_DESC | awk '{ print $3 }'`
ARCH=`echo $AMI_DESC | awk '{ print $7 }'`
MANIFEST_PATH=`dirname $MANIFEST`/
MANIFEST_PREFIX=`basename $MANIFEST |awk -F. '{ print $1 }'`

Download the bundle to /mnt

echo grabbing bundle $MANIFEST_PATH $MANIFEST_PREFIX
/usr/local/bin/ec2-download-bundle -b $MANIFEST_PATH -a $ACCESS_ID -s $SECRET_KEY -k pk.pem -p $MANIFEST_PREFIX -d /mnt

Unbundle the image into a single (rather large) file.


echo unbundling, this will take a while
/usr/local/bin/ec2-unbundle -k pk.pem -m /mnt/$MANIFEST_PREFIX.manifest.xml  -s /mnt -d /mnt

Create an EBS volume, 10 GB. This size is used because that is the largest size for an S3 based AMI. Using launch options I show at the end of this article, you can increase that at run time. Notice, the availability zone comes from instance metadata. We must wait till the volume is created before moving on.


ZONE=`curl http://169.254.169.254/latest/meta-data/placement/availability-zone`
VOL_ID=`$EC2_HOME/bin/ec2addvol -s 50 -z $ZONE | awk '{ print $2 }'`
STATUS=creating
while [ $STATUS != "available" ]
do
echo volume $STATUS, waiting for volume create...
sleep 3
STATUS=`$EC2_HOME/bin/ec2dvol $VOL_ID | awk '{ print $5 }'`
done

Attach the volume

INST_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
$EC2_HOME/bin/ec2attvol $VOL_ID -i $INST_ID -d $EBS_DEV

Here’s where we turn the image into a real volume, using our old friend “dd”

echo copying image to volume, this will also take a while
dd if=/mnt/$MANIFEST_PREFIX of=$EBS_DEV

Mount the volume and remove ephemeral store entry from /etc/fstab. This is required because “Boot from EBS” doesn’t use the ephemeral store by default.

mount $EBS_DEV /perm
cat /perm/etc/fstab |grep -v mnt >/tmp/fstab
mv /perm/etc/fstab /perm/etc/fstab.bak
mv /tmp/fstab /perm/etc/

Then, unmount and detach the volume. We’re nearly there.

umount /perm
$EC2_HOME/bin/ec2detvol $VOL_ID -i $INST_ID

Create a snapshot and wait for it to complete.

SNAP_ID=`$EC2_HOME/bin/ec2addsnap $VOL_ID -d "created by createAMI.sh" | awk '{ print $2 }'`
# now, wait for it
STATUS=pending
while [ $STATUS != "completed" ]
do
echo volume $STATUS, waiting for snap complete...
sleep 3
STATUS=`$EC2_HOME/bin/ec2dsnap $SNAP_ID | awk '{ print $4 }'`
done

Finally, delete the volume and register the snapshot


$EC2_HOME/bin/ec2delvol $VOL_ID
$EC2_HOME/bin/ec2reg -s $SNAP_ID -a $ARCH -d $DESCR -n $MANIFEST_PREFIX

To run your AMI with a larger root partition, use a command like this (which specifies 100GB);
  ec2-run-instances –key <KEYPAIR> –block-device-mapping /dev/sda1=:100 <AMI_ID>

Amazon Relational Database Service

logo_awsI’m pretty excited about this new service from Amazon. They’ve taken a lot of the pain out of running a relational database in the cloud. Specifically, they now support managed instances running MySQL. Amazon RDS handles provisioning, operating and scaling your database. Much like Elastic Load Balancing and Auto Scaling did for the application tier, RDS does for the database tier.

Amazon RDS provides APIs and command line tools to manage your database, removing complicated scripts and much of the “muck” that was somewhat tricky before. There are additional Cloud Watch fields that give additional information about the state of the database, such as # open connections.

There are 4 main groups of commands with RDS.

  • High Level Instance Management
  • Database Configuration
  • Security Group Management
  • Backup and Restore Services

Initially, you might create a db instance, authorize access to an existing EC2 security group (perhaps for your application tier auto scaling group). Going further, you can get more sophisticated about configuring the database. You can configure a parameter group and set the types of things you’d have configured in your my.cnf file. You can also add storage while the database is running. Finally, you’ll want to make use of snapshots to make backups of your database.

To help monitoring, Amazon RDS provides more than additional CloudWatch parameters. They’ve added the ability to track access details, so you can request events related to instances and security groups for the past 2 weeks.

To support relational databases in the EC2, there are 2 new instance types, m2.2xlarge and m2.4xlarge which are both high memory and higher I/O. This is great news and dovetails nicely with Amazon RDS (not by mistake either).

  • High-Memory Double Extra Large Instance 34.2 GB of memory, 13 EC2 Compute Units (4 virtual cores with 3.25EC2 Compute Units each), 850 GB of instance storage, 64-bit platform
  • High-Memory Quadruple Extra Large Instance 68.4 GB of memory, 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

I think they’ve done a good job of making database deployment and management easier in their cloud! I’m considering adding support to the typica java client. I’d appreciate feedback to help with that decision.

Some folks have been questioning the relevancy of SimpleDB in light of RDS. I think Mitch does a great job on elastician.com of discussing that topic. I have to agree that the scale issue still applies. Some applications can live with the limitations of SimpleDB and gain the advantage of massive scale. That is something that RDS cannot provide. Amazon RDS does give a good set of management APIs for running MySQL in the cloud, and people shouldn’t expect more than that.

Amazon SimpleDB now available in the EU region

Amazon has just announced support for SimpleDB in their European data center. That means applications running in the EU will have lower latency when accessing SimpleDB. That’s good news for SimpleDB adoption. The EU SimpleDB is a totally separate version of the service as with S3 and EC2. So, you simply use the EU endpoint (sdb.eu-west-1.amazonaws.com) and it is business as usual.

The QueryTool now has built in support for region selection, which should make it easier to test queries and export data from both places. It is available for download here

Typica is ready for the EU. Simply create the SimpleDB object with the new EU endpoint (instead of the default US endpoint);

SimpleDB sdb = new SimpleDB(accessId, secretKey, true, “sdb.eu-west-1.amazonaws.com“);


QueryTool now exports data

There are some things I’ve been wanting in the QueryTool, so I just threw them in. Here’s the list;

  • proxy support (add proxy host and port to the command line args)
  • sortable results table (via column header click)
  • export result data to CSV file
  • scrollable query scratchpad

Screen shot 2009-09-18 at 10.00.01 AM

The tool can be downloaded here

To run it, type;

java -jar QueryTool1.2.jar <AccessId> <SecretKey> [ProxyHost] [ProxyPort]