Connecting to the Eucalyptus Community Cloud with typica

Eucalyptus recently announced a public “cloud” sandbox known as Eucalyptus Community Cloud. It is a place where you can kick the tires to some degree and since they support a subset of the Amazon EC2 API, you can generally point EC2 tools at the ECC. This post will deal with using typica to interact with the ECC from within your Java software.

First thing to do is follow the ECC link above and create an account. If you already have an account to get into the Eucalyptus forums, you can login and apply for an ECC account. Once you get a confirmation e-mail and confirm the account, you’ll be able to login and get your access id and secret key. To do that, visit the ECC, login and select “show keys”, which reveal the QueryID (access id) and Secret Key. While you’re hear, you should also download credentials. This gives you a zip that contains something we’ll need later.


Jec2 ec2 = new Jec2(props.getProperty("aws.accessId"), props.getProperty("aws.secretKey"), true, "ecc.eucalyptus.com", 8773);
ec2.setResourcePrefix("/services/Eucalyptus");

Let me explain this code. The first line creates a new Jec2 object, that is configured to talk to the ECC. The “props” variable came from reading a property file containing the access id and secret key. The next parameter specifies SSL. Then, you pass the hostname for the ECC and the port it uses. After that, it would be business as usual. The EC2 sample code demonstrates some normal operations, and the API docs give a more complete picture.

When running the code, there’s a special option you’ll need as compared to using typica to talk to AWS. Since Eucalyptus clouds are generally installed with self signed SSL certs, you’ll need to specify a file that came with that credentials download in your java options. If you don’t do this, you’ll likely see a “SSLPeerUnverifiedException: peer not authenticated” error.


$ java ... -Djavax.net.ssl.trustStore=<path to files from credentials zip>/jssecacerts ... TestJec2

Advertisements

Amazon EC2 – Boot from EBS and AMI conversion

Amazon recently announced an important new feature for their Elastic Compute Cloud. Previously, each instance was based on an image that could be a maximum of 10 GB in size. So, each machine you brought up could have a root partition up to 10 GB in size and additional storage would need to be added in other ways. The size restriction alone is somewhat limiting. Amazon has not only addressed that, but given users some other very powerful abilities.

Now, you can define an image in an EBS snapshot. That means the size of your root partition can be as large as 1 TB. Yes, that’s 100 times larger than the old 10 GB limit. Beyond the obvious benefit of having larger images, you can also stop instances. Stopping an instance is different than terminating an instance. The distinction is important because stopping an instance is very much like hitting the “pause” button. It doesn’t take a lot to realize that pausing a running instance and being able to start it up again later is very powerful! Instances tend to boot faster off EBS. As  you might expect, if you create a really large volume for a root partition (like 100s of GBs), it will take longer to come up. That’s just because it takes longer to create larger volumes than smaller ones.

Let’s go further and look at how powerful it is to have snapshots as the basis for images. By having a snapshot that you can create EBS volumes from, that means you can mount a volume, based on your snapshot (which represents your image) and make modifications to it! This is immensely helpful when trying to make changes to an image. Previously, it was somewhat more awkward to modify an image. You actually had to boot it up and run it. But now, even if there is an error that prevents proper running, you can access the image storage and make changes. Very useful!

Of course judging by the number of public AMIs out there, there are a great number of images backed by S3 that people will want to convert. Towards this end, I came up with a script to convert AMIs from the old to the new style. Here’s the cliff’s notes version.

Use an instance in the same region as your image to do the following,

  • download the image bundle to the ephemeral store
  • unbundle the image (resulting in a single file)
  • create a temporary EBS volume in the same availability zone as the instance
  • attach the volume to your instance
  • copy the unbundled image onto the raw EBS volume
  • mount the EBS volume
  • edit /etc/fstab on the volume to remove the ephemeral store mount line
  • unmount and detach the volume
  • create a snapshot of the EBS volume
  • register the snapshot as an image, and you’re done!

During the private beta for this feature, I created an AMI to handle all of this, so you boot the AMI with a set of parameters and it does the dirty work. The script uses the standard API and AMI tools that Amazon supplies. I’ll roll that out on the public cloud shortly.

Here’s the interesting portion of the script (parsing arguments and setting up environment variable for the tools has been omitted) :

Using the AMI ID, get the manifest name and architecture
AMI_DESC=`$EC2_HOME/bin/ec2dim |grep $AMI_ID`
MANIFEST=`echo $AMI_DESC | awk '{ print $3 }'`
ARCH=`echo $AMI_DESC | awk '{ print $7 }'`
MANIFEST_PATH=`dirname $MANIFEST`/
MANIFEST_PREFIX=`basename $MANIFEST |awk -F. '{ print $1 }'`

Download the bundle to /mnt

echo grabbing bundle $MANIFEST_PATH $MANIFEST_PREFIX
/usr/local/bin/ec2-download-bundle -b $MANIFEST_PATH -a $ACCESS_ID -s $SECRET_KEY -k pk.pem -p $MANIFEST_PREFIX -d /mnt

Unbundle the image into a single (rather large) file.


echo unbundling, this will take a while
/usr/local/bin/ec2-unbundle -k pk.pem -m /mnt/$MANIFEST_PREFIX.manifest.xml  -s /mnt -d /mnt

Create an EBS volume, 10 GB. This size is used because that is the largest size for an S3 based AMI. Using launch options I show at the end of this article, you can increase that at run time. Notice, the availability zone comes from instance metadata. We must wait till the volume is created before moving on.


ZONE=`curl http://169.254.169.254/latest/meta-data/placement/availability-zone`
VOL_ID=`$EC2_HOME/bin/ec2addvol -s 50 -z $ZONE | awk '{ print $2 }'`
STATUS=creating
while [ $STATUS != "available" ]
do
echo volume $STATUS, waiting for volume create...
sleep 3
STATUS=`$EC2_HOME/bin/ec2dvol $VOL_ID | awk '{ print $5 }'`
done

Attach the volume

INST_ID=`curl http://169.254.169.254/latest/meta-data/instance-id`
$EC2_HOME/bin/ec2attvol $VOL_ID -i $INST_ID -d $EBS_DEV

Here’s where we turn the image into a real volume, using our old friend “dd”

echo copying image to volume, this will also take a while
dd if=/mnt/$MANIFEST_PREFIX of=$EBS_DEV

Mount the volume and remove ephemeral store entry from /etc/fstab. This is required because “Boot from EBS” doesn’t use the ephemeral store by default.

mount $EBS_DEV /perm
cat /perm/etc/fstab |grep -v mnt >/tmp/fstab
mv /perm/etc/fstab /perm/etc/fstab.bak
mv /tmp/fstab /perm/etc/

Then, unmount and detach the volume. We’re nearly there.

umount /perm
$EC2_HOME/bin/ec2detvol $VOL_ID -i $INST_ID

Create a snapshot and wait for it to complete.

SNAP_ID=`$EC2_HOME/bin/ec2addsnap $VOL_ID -d "created by createAMI.sh" | awk '{ print $2 }'`
# now, wait for it
STATUS=pending
while [ $STATUS != "completed" ]
do
echo volume $STATUS, waiting for snap complete...
sleep 3
STATUS=`$EC2_HOME/bin/ec2dsnap $SNAP_ID | awk '{ print $4 }'`
done

Finally, delete the volume and register the snapshot


$EC2_HOME/bin/ec2delvol $VOL_ID
$EC2_HOME/bin/ec2reg -s $SNAP_ID -a $ARCH -d $DESCR -n $MANIFEST_PREFIX

To run your AMI with a larger root partition, use a command like this (which specifies 100GB);
  ec2-run-instances –key <KEYPAIR> –block-device-mapping /dev/sda1=:100 <AMI_ID>

Amazon Relational Database Service

logo_awsI’m pretty excited about this new service from Amazon. They’ve taken a lot of the pain out of running a relational database in the cloud. Specifically, they now support managed instances running MySQL. Amazon RDS handles provisioning, operating and scaling your database. Much like Elastic Load Balancing and Auto Scaling did for the application tier, RDS does for the database tier.

Amazon RDS provides APIs and command line tools to manage your database, removing complicated scripts and much of the “muck” that was somewhat tricky before. There are additional Cloud Watch fields that give additional information about the state of the database, such as # open connections.

There are 4 main groups of commands with RDS.

  • High Level Instance Management
  • Database Configuration
  • Security Group Management
  • Backup and Restore Services

Initially, you might create a db instance, authorize access to an existing EC2 security group (perhaps for your application tier auto scaling group). Going further, you can get more sophisticated about configuring the database. You can configure a parameter group and set the types of things you’d have configured in your my.cnf file. You can also add storage while the database is running. Finally, you’ll want to make use of snapshots to make backups of your database.

To help monitoring, Amazon RDS provides more than additional CloudWatch parameters. They’ve added the ability to track access details, so you can request events related to instances and security groups for the past 2 weeks.

To support relational databases in the EC2, there are 2 new instance types, m2.2xlarge and m2.4xlarge which are both high memory and higher I/O. This is great news and dovetails nicely with Amazon RDS (not by mistake either).

  • High-Memory Double Extra Large Instance 34.2 GB of memory, 13 EC2 Compute Units (4 virtual cores with 3.25EC2 Compute Units each), 850 GB of instance storage, 64-bit platform
  • High-Memory Quadruple Extra Large Instance 68.4 GB of memory, 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform

I think they’ve done a good job of making database deployment and management easier in their cloud! I’m considering adding support to the typica java client. I’d appreciate feedback to help with that decision.

Some folks have been questioning the relevancy of SimpleDB in light of RDS. I think Mitch does a great job on elastician.com of discussing that topic. I have to agree that the scale issue still applies. Some applications can live with the limitations of SimpleDB and gain the advantage of massive scale. That is something that RDS cannot provide. Amazon RDS does give a good set of management APIs for running MySQL in the cloud, and people shouldn’t expect more than that.

Amazon CloudWatch with Java/typica

Recently, Amazon announced that it’s CloudWatch service went into public beta. I’ve been involved with the private beta of this and the Elastic Load Balancing and Auto Scaling services. I’ve just completed testing of the CloudWatch monitoring service APIs in typica and thought I’d share some of what has been added.

First of all, the Jec2 class has 2 new methods, monitorInstances(..) and unmonitorInstances(..). They do exactly what you’d expect by turning monitoring on or off for one or more instances. What I think more people will use is the new flag on LaunchConfiguration to enable monitoring when you launch an instance. Also, if you describe instances, you’ll get the monitoring status back now also.

The real CloudWatch APIs are in their own package. I did this because it seems like while they are initially released for EC2, they are written to allow monitoring other service also (hence the namespace parameter). The new API has only two methods. The first lets you list the metrics you can query in the second call. To do this, you can use some code like this;

Monitoring mon = new Monitoring(props.getProperty(“aws.accessId”), props.getProperty(“aws.secretKey”));
List<Metric> metrix = mon.listMetrics();
for (Metric m : metrix) {
System.out.println(“name = “+m.getName()+”:”+m.getNamespace());
for (Dimension dim : m.getDimensions()) {
System.out.println(”   “+dim.getName()+”: “+dim.getValue());
}
}
Monitoring mon = new Monitoring(accessId, secretKey);
List<Metric> metrix = mon.listMetrics();
for (Metric m : metrix) {
	System.out.println("name = "+m.getName()+":"+m.getNamespace());
	for (Dimension dim : m.getDimensions()) {
		System.out.println("   "+dim.getName()+": "+dim.getValue());
	}
}
Here is some of the output (trucated because there is a lot more);
     [java] name = NetworkIn:AWS/EC2
     [java] name = NetworkOut:AWS/EC2
     [java]    ImageId: ami-85d037ec
     [java] name = NetworkOut:AWS/EC2
     [java] name = DiskWriteBytes:AWS/EC2
     [java]    InstanceType: m1.small
     [java] name = CPUUtilization:AWS/EC2
     [java]    InstanceType: m1.large
     [java] name = DiskWriteBytes:AWS/EC2
     [java]    InstanceType: m1.large
     [java] name = DiskReadOps:AWS/EC2
     [java]    InstanceId: i-1de3a674
     [java] name = DiskWriteOps:AWS/EC2
     [java]    InstanceType: m1.small
     [java] name = DiskReadOps:AWS/EC2
     [java]    ImageId: ami-24fa86b
     [java] name = DiskReadOps:AWS/EC2
     [java]    InstanceId: i-51423838

Once you have an instance or an image you’d like to monitor, you can use some code like this to fetch the data;

List<Statistics> stats = new ArrayList<Statistics>();
stats.add(Statistics.AVERAGE);

Map<String, String> dimensions = new HashMap<String, String>();
// can be InstanceId, InstanceType, ImageId
dimensions.put("ImageId", "ami-85d037ec");

Date end = new Date();	// that means now
end = new Date(end.getTime() + 3600000*5); // need to adjust for GMT
Date start = new Date(end.getTime() - 3600000*24);	// 1 days ago
MetricStatisticsResult result = mon.getMetricStatistics(
				60,	// must be multiple of 60
				stats,	// see above
				"AWS/EC2",
				dimensions,
				start,	// start of interval
				end,	// end of interval
				// can be NetworkIn, NetworkOut, DiskReadOps,
				// DiskWriteOps, DiskReadBytes, DiskWriteBytes,
				// CPUUtilization
				"CPUUtilization",
				StandardUnit.PERCENT,
				null);
System.out.println("metrics label = "+result.getLabel());
for (Datapoint dp : result.getDatapoints()) {
	System.out.println(dp.getTimestamp().getTime().toString()+
			" samples:"+dp.getSamples()+" "+dp.getAverage()+" "+dp.getUnit());
}
It can be useful monitor by ImageId when you’re running a pool of servers (like with the auto scaling service). I’ve tried to include comments within the code that indicate appropriate values because it can get complicated..
     [java] metrics label = CPUUtilization
     [java] Fri May 22 10:56:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 11:42:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:55:00 EDT 2009 samples:1.0 1.54 Percent
     [java] Fri May 22 12:41:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 13:10:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:09:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:51:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:40:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:07:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 13:41:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:34:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:01:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:17:00 EDT 2009 samples:1.0 0.39 Percent
     [java] Fri May 22 11:39:00 EDT 2009 samples:1.0 1.15 Percent
     [java] Fri May 22 10:06:00 EDT 2009 samples:1.0 0.38 Percent
     [java] Fri May 22 12:10:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:09:00 EDT 2009 samples:1.0 0.76 Percent
     [java] Fri May 22 13:46:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:39:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:11:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:03:00 EDT 2009 samples:1.0 1.15 Percent
     [java] Fri May 22 11:32:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 10:44:00 EDT 2009 samples:1.0 0.0 Percent
     [java] Fri May 22 12:45:00 EDT 2009 samples:1.0 0.0 Percent
This code is available in typica SVN as of r265. Look for typica release 1.6 which will contain CloudWatch, ElasticLoadBalancing and AutoScaling once a little more testing has been completed.

directEC2 is available in the AppStore

I’m pleased to announce that the application I wrote to manage Amazon EC2 instances from an iPhone or iPod touch is now in the AppStore. This application is the first version of what will become a very feature rich management console. Here is a quick run-down of the features;

  • Manage images, instances, volumes, snapshots and more.
  • Maintain launch configurations to quickly spin up more servers
  • Check server status, console output
  • Multiple account support
  • Access all regions
  • Create, attach volumes
  • Backup and restore with snapshots
  • Shake navigation aid
That last item is something I came up with to solve the problem of being several levels deep in the application navigation. To simplify returning to the top level, the application responds to shake and resets the navigation. This turns out to be pretty handy and I hope other applications adopt this feature.
Get the app here: iTunes App Store
Here are some screenshots to entice you;
img_0020img_0009img_0010img_0008img_0011

Amazon announces Elastic MapReduce

I think this slipped out a day early, but here is what came across in the DevPay documentation;

On April 2, 2009, AWS announced the release of Amazon Elastic MapReduce, a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. For more information, go to http://aws.amazon.com/elasticmapreduce.

Here is the link to the official docs: http://aws.amazon.com/elasticmapreduce

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. Amazon Elastic MapReduce lets you focus on crunching or analyzing your data without having to worry about time-consuming set-up, management or tuning of Hadoop clusters or the compute capacity upon which they sit.

I do wonder if they’re trying to move up the food chain a bit much. With this service, it is clear that they are walking over some people who have set up a business doing this for customers already. It feels l bit like what Microsoft used to do. They had special knowledge and APIs into the guts of the OS and could so things better than the competition once they decided to go there. Similar things could happen here. I feel better about Amazon in general, but it is a slippery slope!