Opsworks Introduction: Creating a NFS Server

by Shawn Bower

Opsworks is a service provided by AWS to provide configuration management for EC2 instances.  The services relies on Chef and allows users to create and upload their own cookbooks.  For this purpose of this blog we will assume a familiarity with chef, for those who are new to chef please check out their documentation.  The first thing we need to do is setup our cookbooks.  First lets get the cookbook for NFS.

chef-nfs-server-download

Cool now we have the cookbook for nfs.  When using Opsworks you have to specify all the cookbooks in the root of the repository.  We will need to move into the nfs directory, resolve all its dependencies and move them to the root.

nfs-deps

Now that we have the nfs cookbook with all its dependencies we can configure our server.  We now have two options to get this to Opsworks we can either upload a zip of the root folder to s3 or we can upload the contents to a git repository.  For this demonstration lets upload what we have to a github repository.  Now we can login into the AWS console and navigate to Opsworks.  The first step we have to go through is to create a new stack.  A stack represents a logical grouping of instances, it could be an application or set of applications.  When creating our stack we won’t want to call it NFS as its likely that the our NFS server is only one piece of the stack.  We will want to use the latest version of chef which is Chef 12, the chef 11 stack is being phased out.  In order to use our custom cookbooks we will select the “Yes” button and add our github repository.

opsworks-stack

 

One of the first things we will want to do is allow our IAM user to access to these machines.  We will be able to import users from IAM and control their access to the stack as well as their access to the instances we will create.  Each user can set a public ssh key to use for access to instances in this stack.

Edit_srb55_-_Users_–_AWS_OpsWorks

The next step is to create a layer, this represents a specific instance class.  In this case we will want the layer to represent our NFS sever.  Click add a layer and set the name and short name to “nfs”.  From the layer we can control network settings such as EIPs or ELBS, EBS volumes to create and mount and we can add security groups.

Layers_-_cool-stack_–_AWS_OpsWorks

Once we have the layer we can add our recipes.  When adding recipes we have to choose the lifecycle to add it to.  There are five phases to the lifecycle and in our case it makes sense to add the nfs recipe to the setup phase.  This will run when the instance is started and finished booting.

Edit_nfs_-_Layers_-_cool-stack_–_AWS_OpsWorks

Now that we have our stack setup and we have add a layer we can add instances to that layer.  Let’s add an instance using the default settings and an instance type of t2.medium.  Once the instance is create we can start it up.  Once the server is online we can login and verify that the nfs service is running.

chef-nfs-server_—_srb55_nfs2___—_ssh_10_92_77_93_—_126×32_and_nfs2_-_Logs_–_AWS_OpsWorks_and_Dashboard_-_Cornell_University_JIRA_and_HipChat

From above we can see the logs of the setup phase showing that nfs is part of the run list.  We can login to the machine since we setup our user earlier giving it ssh access.  The bare minimum to run an NFS server are now installed, to take this further we can configure what directories to export.  In a future post we will explore expanding this layer.

 

Backing up DynamoDB

by Shawn Bower

As we have been helping folks move their applications to AWS we have found many of the services to be provided to be amazing.  We started using DynamoDB, the AWS manage NoSQL database, to store application data in.  The story behind DynamoDB is fascinating as it is one of the key building blocks used for AWS services.  We have been very impressed with DynamoDB it-self as it provides a completely managed scalable solution that allows us to focus on applications rather than infrastructure tasks.  Almost.  While the data stored in dynamo is highly durable there is no safeguard against human error; dropping an item is forever.  Originally this problem seemed like it would be trivial to solve, surely AWS offers an easy backup feature.  My first attempt was to try and use the export function from the AWS console.

 

dynamo - export

Then I ended up here…

dynamo-export-pipeline

What?  Why would I want to create a data pipeline to back up my DynamoDB table?  Some of are tables are very small and most are not much more then a key value store.  Looking into this process the data pipeline actually creates an Elastic Map Reduce cluster to facilitate the backup to s3.  You can get full details on the setup here.  The output of this process is a compressed zip file of a JSON representation of the table.  It seemed to me that this process was too heavy weight for our use case.  I started thinking that this would be a relatively straightforward with lambda given you can now schedule lambda functions with a cron like syntax.  The full code is available here.

The first thing I wanted to do was to describe the table and write that metadata to the first line of the output file.

dyano-backup-describe

Using the api call for describeTable we can get back the structure of the table as well as configuration information such as the Read/Write Capacity.  The results of this call will look something like:

{“AttributeDefinitions”:[{“AttributeName”:”group”,”AttributeType”:”S”},{“AttributeName”:”name”,”AttributeType”:”S”}],”TableName”:”alarms”,”KeySchema”:[{“AttributeName”:”name”,”KeyType”:”HASH”}],”TableStatus”:”ACTIVE”,”CreationDateTime”:”2015-03-20T14:03:31.849Z”,”ProvisionedThroughput”:{“NumberOfDecreasesToday”:0,”ReadCapacityUnits”:1,”WriteCapacityUnits”:1},”TableSizeBytes”:16676,”ItemCount”:70,”TableArn”:”arn:aws:dynamodb:us-east-1:078742956215:table/alarms”,”GlobalSecondaryIndexes”:[{“IndexName”:”group-index”,”KeySchema”:[{“AttributeName”:”group”,”KeyType”:”HASH”}],”Projection”:{“ProjectionType”:”ALL”},”IndexStatus”:”ACTIVE”,”ProvisionedThroughput”:{“NumberOfDecreasesToday”:0,”ReadCapacityUnits”:1,”WriteCapacityUnits”:2},”IndexSizeBytes”:16676,”ItemCount”:70,”IndexArn”:”arn:aws:dynamodb:us-east-1:078742956215:table/alarms/index/group-index”}]}

Having the table metadata makes it easy to recreate the table.  It’s also worth pointing out that we use the knowledge of the provisioned ReadCapacityUnits to limit our scan queries while pulling data out of the table.  The next thing we need to do is write every item out to our backup file.  This is accomplished by scanning the table providing a callback onScan.

 

dynamo-backup-onscan

In this function we loop through the data items and write them out to a file.  After that we look at the LastEvaluatedKey. If it is undefined then we have scanned the entire table, otherwise we will recursively call the onScan function providing the LastEvaluatedKey as a parameter to mark the starting point for the next scan.  The data is continually shipped to s3 and compressed, this is achieved using a stream pipe.

dynamo-backup-streaming

 

You will have to update the bucket name to the s3 bucket in your account that you wish to store the dynamo backups in. Once the code was in place we uploaded it to lambda and used a cron schedule to run the process nightly.  For details on how to install and use this backup process please refer to the github repository.  As we move more to AWS and use more of the AWS services, we find that there are some gaps.  We have begun to log them and try to tackle them with our cross campus working group.  As we come up with solutions to these gaps we will post them, so everyone on campus can benefit.  If anyone is interested in contributing to this joint effort please email cloud-support@cornell.edu