The Cornell “Standard” AWS VPC 2.0

By Paul Allen

In a previous post, I described the standard VPC configuration we use for Cornell AWS accounts requiring network connectivity back to the campus network. This post is to share minor updates of that configuration. Differences from the original are:

  • Using AWS Direct Connect instead of a VPN to establish network connectivity between campus and AWS VPCs. Our current primary DC connection is 1Gbs, and our secondary connection is 100Mbs.
  • Continued allocation of a /22 CIDR block (1024 addresses) to the VPC, but no longer allocating all of those addresses to subnets within the VPC. This allows for future customization of the VPC without having to vacate and delete /24 subnets as was necessary for VPC customization with the original design.
  • Reducing the size of the four subnets to /26 CIDR blocks (64 addresses) instead of /24 CIDR blocks (256 addresses). This allows the flexibility described above, while still allowing /24 subnets to be created as part of VPC customizations.

Cornell Standard VPC in AWS version 2.0

Amazon AppStream 2.0 for Neuroscience Research

By Bertrand Reyna-Brainerd

It will come as news to few of you that the pace of advancement in computer technology has exceeded all expectations.  In 2013, 83.8% of American households owned at least one personal computer.[1]  In 1984, when the US census first began to gather data on computer ownership, that number was only 8.2%.  At that time, a debate was raging about the scope of personal computing.  Would every household need to own a computer, or would they remain a specialized tool for researchers and IT professionals?  Even the mere existence of that debate seems inconceivable today, and yet the era of personal computing may already be drawing to a close.  This is not due to a decline in the need for or utility of computing resources, but by the explosive growth of cloud computing.  Amazon Web Services’ AppStream 2.0 is a significant step toward “cloudification,” and toward a future in which all forms of computation are increasingly delocalized.

Our laboratory, which labors under the title of “The Laboratory for Rational Decision Making,” applies a variety of approaches in its search for patterns in human behavior and the brain activity that underlies them.  In 2016, David Garavito, a J.D./PhD graduate student at Cornell, Joe DeTello, a Cornell undergraduate, and Valerie Reyna, PhD, began a study on the effects of risk perception on sports-related concussions.  Joe is a former high school athlete who has suffered a number of concussions himself during play, and is therefore able to provide personal, valuable insights into how these injuries occur and why they are so common.

Bertrand Reyna-Brainerd
Bertrand Reyna-Brainerd
David Garavito
David Garavito
Valerie Reyna, PhD
Valerie Reyna
Joe DeTello
Joe DeTello

The team has previously conducted research for this project by physically traveling to participating high schools and carrying a stack of preconfigured laptops.  This is not an uncommon method of conducting research in the field, but it imposes several limitations (such as the cost of travel, the number of students that can be physically instructed in a room at one time, and by the number of laptops that can be purchased, carried, and maintained).  What’s more, the type of information required for the research could be easily assessed online—they only needed to collect the typical sort of demographic data that has long been gathered using online polls, as well as measures such as memory accuracy and reaction time.  Most of this legwork is performed by software that was designed to operate only in a physical, proctor-to-subject environment.

 

At that point – Joe and David came to me and asked if we could find a better solution.

 

Marty Sullivan, Cloud Engineer
Marty Sullivan

I was not initially sure if this would be possible.  In order for our data to be comparable with data gathered on one laptop at a time our applications would have to be reprogrammed from scratch to run on the web.  However, I agreed to research the problem and consulted with the IT@Cornell Cloudification Team. Marty Sullivan, Cloud Engineer, helped us discover that Amazon Web Services (AWS) and Cornell were testing AWS’s then-unreleased product: AppStream 2.0.  Amazon AppStream 2.0 is a fully-featured platform for streaming software, specifically desktop applications, through a web browser. The most attractive component is the fact that the end-users are not required to install any software – they can simply access the software via their preferred personal device.

 

Figure 1 - Screenshot of our assessment software running on an ordinary desktop.
Figure 1 – Screenshot of our assessment software running on an ordinary desktop.

 

Figure 2 - The same software running on AppStream.
Figure 2 – The same software running on AppStream.

 

AppStream 2.0 Application Selection Screen
Figure 3. AppStream 2.0 Application Selection Screen

 

The AppStream 2.0 environment has a number of advantages over a traditional remote desktop approach.  When a user connects to our system they are presented with an option to choose an application. After selecting an application, only that application and its descendants are displayed to the user.  This avoids several of the usual hassles with virtual computing, such as sandboxing users away from the full Windows desktop environment.  Instead, the system is ready-to-go at launch and the only barriers to entry are access to a web browser and an activated URL.

Once the virtual environment has been prepared, and the necessary software has been installed and configured, the applications are packaged into an instance that can then be delivered live over the web, to any number of different devices. Mac, PC, tablets and any other device that has a web browser can be used to view and interact with your applications!  This package resets to its original configuration periodically in a manner similar to how Deep Freeze operates in several of Cornell’s on-campus computer laboratories.  While periodic resets would be inconvenient in other contexts, this system is ideal for scientific work as it ensures that the choices made by one participant do not influence the environment of the next.

It is difficult to overstate the potential of this kind of technology.  Its advantages are significant and far-reaching:

  • Scalability: Because Amazon’s virtual machine handles most of the application’s processing demands, the system can “stretch” to accommodate peak load periods in ways that a local system cannot. To put it another way, if a user can run the AppStream web application, they can run the software associated with it.  This means that software that would otherwise be too large or too demanding to run on a phone, tablet, or Chromebook can now be accessed through AppStream 2.0.
  • Ease of use: End-users do not have to install new software to connect. Applications that would normally require installing long dependency chains, compilation, or complicated configuration procedures can be prepared in advance and delivered to end users in their final, working state.  AppStream 2.0 is more user friendly than traditional remote desktop approaches as well, all they need to do is follow a web link.  It is not necessary to create separate accounts for each user, but there is a mechanism for doing so.
  • Security: AppStream 2.0 can selectively determine what data to destroy or preserve between sessions. Because the virtual machine running our software is not directly accessible to users, this improves both security and ease of use (a rare pairing in information technology) over a traditional virtual approach.
  • Cross-platform access: Because applications are streamed from a native Windows platform, compatibility layers (such as WINE), partition utilities (such as Apple’s BootCamp), or virtual desktop and emulation software (such as VirtualBox or VMWare) are not needed to access Windows software on a non-Windows machine. This also avoids the performance cost and compatibility issues involved in emulation.  Another advantage is that development time can be reduced if an application is designed to be delivered through AppStream.
  • Compartmentalized deployment: Packaging applications into virtual instances enforces consistency across environments, which aids in collaborative development. In effect, AppStream instances can perform the same function as Docker, Anaconda, and other portable environment tools, but with a lower technical barrier for entry.

 

Disadvantages:

  • Latency: No matter how optimized an online system of delivery may be, streaming an application over the web will always impose additional latency over running the software locally. Software that depends on precise timing in order to operate, such as video games (or response time assessments in our case) may be negatively impacted.  In our case, we have not experienced latency above a barely-perceptible 50 ms.  Other users have reported delays of as much as 200ms, which is variable by region.
  • Compatibility: Currently AppStream 2.0 only has Windows Server 2012 instances available, and access to hardware-accelerated graphics such as OpenGL is not supported. This is expected to change in time.  In terms of the software itself, there are often additional configuration steps that are required to get a piece of software running within the Appstream environment compared to ordinary software install procedures.
  • Cost: To use AppStream 2.0 you must pay a recurrent monthly fee based on use. This includes costs associated with bandwidth, processing, and storage space.  You are likely to be presented with a bill on the order of tens to low hundreds of U.S. dollars per month, andyour costs will differ depending on use.

 

Ultimately, the impact of AppStream 2.0 will be determined by its adoption.  The largest hurdle that the product will need to clear is its cost, but its potential is enormous.  If Amazon leverages its position to purchase and deploy hardware at low rates, and passes these savings on to their customers, then it will be able to provide a better product at a lower price than the PC market.  And if Amazon does not do this another corporation will, and an end will come to the era of personal computing as we have known it.

[1] U.S. Census Bureau. (2013). Computer and Internet Use in the United States: 2013. Retrieved March 29, 2017, from https://www.census.gov/history/pdf/2013computeruse.pdf.

Class Roster – Launching Scheduler in the Cloud

by Eric Grysko

Introduction

Class Roster
Class Roster – classes.cornell.edu

In Student Services IT, we develop applications that support the student experience. Class Roster, classes.cornell.edu, was launched in late 2014 after several months of development in coordination with the Office of the University Registrar. It was deployed on-premises and faced initial challenges handling load just after launch. Facing limited options to scale, we provisioned for peak load year-round, despite predictable cyclical load following the course-enrollment calendar.

By mid-2015, Cornell had signed a contract with Amazon Web Services, and Cornell’s Cloudification Team was inviting units to collaborate and pilot use of AWS. Working with the team was refreshing, and Student Services IT dove in. We consumed training materials, attended re:Invent, threw away our old way of doing things, and began thinking cloud and DevOps.

By late 2015, we were starting on the next version of Class Roster – “Scheduler”. Display of class enrollment status (open, waitlist, closed) with near real-time data, meant we couldn’t rely on long cache lifetimes. And new scheduling features, were expected to grow peak concurrent usage significantly. We made the decision, Class Roster would be our unit’s first high-profile migration to AWS. (more…)

Benchmarking Network Speeds for Traffic between Cornell and “The Cloud”

by Paul Allen

As Cornell units consider moving various software and services to the cloud, one of the most common questions the Cloudification Services Team gets is “What is the network bandwidth between cloud infrastructure and campus?” Bandwidth to cloud platforms like Amazon Web Services and Microsoft Azure seems critical now, as units are transitioning operations. It’s during that transition that units will have hybrid operations–part on-premise and part in-cloud–and moving or syncing large chunks of data is common.

(more…)

Using Shibboleth for AWS API and CLI access

by Shawn Bower


Update 2019-11-06: We now recommend using awscli-login to obtaining temporary AWS credentials via SAML. See our wiki page Access Keys for AWS CLI Using Cornell Two-Step Login (Shibboleth)


This post is heavily based on “How to Implement Federated API and CLI Access Using SAML 2.0 and AD FS” by Quint Van Derman, I have used his blueprint to create a solution that works using Shibboleth at Cornell.

TL;DR

You can use Cornell Shibboleth login for both API and CLI access to AWS.  I built docker images that will be maintained by the Cloud Services team that can be used for this and it is as simple as running the following command:

docker run -it --rm -v ~/.aws:/root/.aws dtr.cucloud.net/cs/samlapi

After this command has been run it will prompt you for your netid and password.  This will be used to login you into Cornell Shibboleth. You will get a push from DUO.  Once you have confirmed the DUO notification, you will be prompted to select the role you wish to use for login, if you have only one role it will choose that automatically.  The credentials will be placed in the default credential file (~/.aws/credentials) and can be used as follows:

aws --profile saml s3 ls

NOTE: In order for the script to work you must have at least two roles, we can add you to a empty second role if need be.  Please contact cloud-support@cornell.edu if you need to be added to a role.

If there are any problems please open an issue here.

Digging Deeper

All Cornell AWS accounts that are setup by the Cloud Services team are setup to use Shibboleth for login to the AWS console. This same integration can be used for API and CLI access allowing folks to leverage AD groups and aws roles for users. Another advantage is this eliminates the need to monitor and rotate IAM access keys as the credentials provided through SAML will expire after one hour. It is worth noting the non human user ID will still have to be created for automating tasks where it is not possible to use ec2 instance roles.

When logging into the AWS management console the federation process looks likesaml-based-sso-to-console.diagram

  1. A user goes to the URL for the Cornell Shibboleth IDP
  2. That user is authenticated against Cornell AD
  3. The IDP returns a SAML assertion which includes your roles
  4. The data is posted to AWS which matches roles in the SAML assertion to IAM roles
  5.  AWS Security Token Services (STS) issues a temporary security credentials
  6. A redirect is sent to the browser
  7. The user is now in the AWS management console

In order to automate this process we will need to be able to interact with the Shibboleth endpoint as a browser would.  I decided to use Ruby for my implementation and typically I would use a lightweight framework like ruby mechanize to interact with webpages.  Unfortunately the DUO integration is done in an iframe using javascript, this makes things gross as it means we need a full browser. I decided to use selenium web driver to do the heavy lifting. I was able to script the login to Shibboleth as well as hitting the button for a DUO push notification:
duo-push

In development I was able to run this on mac just fine but I also realize it can be onerous to install the dependencies needed to run selenium web driver.  In order to make the distribution simple I decided to create a docker images that would have everything installed and could just be run.  This meant I needed a way to run selenium web driver and firefox inside a container.  To do this I used Xvfb to create a virtual frame buffer allowing firefox to run with out a graphics card.  As this may be useful to other projects I made this a separate image that you can find here.  Now I could create a Dockerfile with the dependencies necessary to run the login script:

saml-api-dockerfile

The helper script starts Xvfb and set the correct environment variable and then launches the main ruby script.  With these pieces I was able to get the SAML assertion from Shibboleth and the rest of the script mirrors what Quint Van Derman had done.  It parses the assertion looking for all the role attributes.  Then it presents the list of roles to the user where they can select which role they wish to assume.  Once the selection is done a call is made to the Simple Token Service (STS) to get the temporary credentials and then the credentials are stored in the default AWS credentials file.

Conclusion

Now you can manage your CLI and API access the same way you manage your console access. The code is available and is open source so please feel free to contribute, https://github.com/CU-CloudCollab/samlapi. Note I have not tested this on Windows but it should work if you change the volume mount to the default credential file on Windows. I can see the possibility to do future enhancements such as adding the ability to filter the role list before display it, so keep tuned for updates. As always if you have any questions with this or any other Cloud topics please email cloud-support@cornell.edu.

How to run Jenkins in ElasticBeanstalk

by Shawn Bower

The Cloud Services team in CIT maintains docker images for common pieces of software like apache, java, tomcat, etc.  One of these images that we maintain is Cornellized Jenkins images.  This image contains Jenkins with the oracle client and Cornell OID baked in.  One of the easiest way to get up and running in AWS with this Jenkins instance is to use Elastic Beanstalk which will manage the infrastructure components.  Using Elastic Beanstalk you don’t have to worry about patching as it will manage the underlying OS of your ec2 instances.  The Cloud Services team releases patched version of Jenkins image on a weekly basis. If you want to stay current the you just need to kick off a new deploy in Elastic Beanstalk.  Let’s walk through the process of getting this image running on Elastic Beanstalk!

A.) Save Docker Hub credentials to S3

INFO:

Read about using private Docker repos with Elastic Beanstalk.

We need to make our DTR credentials available to Elastic Beanstalk, so automated deployments can pull the image from the private repository.

  1. Create an S3 bucket to hold Docker assets for your organization— we use  cu-DEPT-docker 
  2. Login to Docker docker login dtr.cucloud.net
  3. Upload the local credentials file ~/.docker/config.json to the S3 bucket cu-DEPT-docker/.dockercfg 

    Unfortunately, Elastic Beanstalk uses an older version of this file named  .dockercfg.json  The formats are slightly different. You can read about the differences here.

    For now, you’ll need to manually create  .dockercfg & upload it to the S3 bucket  cu-DEPT-docker/.dockercfg

B.) Create IAM Policy to Read The S3 Bucket

    1. Select Identity and Access Management for the AWS management consoleIAM-step-1
    2. Select Policies IAM-step-2
    3. Select “Create Policy” IAM-step-3
    4. Select “Create Your Own Policy” IAM-step-4
    5. Create a policy name “DockerCFGReadOnly,” see the example policy provided. IAM-step-5
Below is an example Policy for reading from a S3 bucket.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1466096728000",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::cu-DEPT-dockercfg"
            ]
        },
        {
            "Sid": "Stmt1466096728001",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:HeadObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::cu-DEPT-dockercfg/.dockercfg"
            ]
        }
    ]
}

 

C.) Setup the Elastic Beanstalk environment

  1. Create a Dockerrun.aws.json fileHere’s an example.
  2. {
      "AWSEBDockerrunVersion": "1",
      "Image": {
        "Name": "dtr.cucloud.net/cs/jenkins:latest"
     },
     "Ports": [
       {
         "ContainerPort": "8080"
       }
     ],
     "Authentication": {
       "Bucket": "cu-DEPT-dockercfg",
       "Key": ".dockercfg"
     },
     "Volumes": [
       {
         "HostDirectory": "/var/jenkins_home",
         "ContainerDirectory": "/var/jenkins_home"
       },
       {
         "HostDirectory": "/var/run/docker.sock",
         "ContainerDirectory": "/var/run/docker.sock"
       }
     ]
    }
    

    The Authentication section refers to the Docker Hub credentials that were saved to S3.

    The Image section refers to the Docker image that was pushed to Docker Hub.

  3. We will also need to do some setup to instance using .ebextenstions.  Create a folder called “.ebextensions” and inside that folder create a file called “instance.config”  Add the following to the file:
  4.  

    container_commands:
     01-jenkins-user:
     command: useradd -u 1000 jenkins || echo 'User already exist!'
     02-jenkins-user-groups:
     command: usermod -aG docker jenkins
     03-jenkins-home:
     command: mkdir /var/jenkins_home || echo 'Directory already exist!'
     04-changeperm:
     command: chown jenkins:jenkins /var/jenkins_home
    

     

  5. Finally create a zip file with the Dockerrun.aws.json file and the .ebextenstions folder.
    zip -r jenkins-stalk.zip Dockerrun.aws.json .ebextensions/ 
    

 

 

D.) Setup Web Server Environment

  1. Choose Docker & Load balancing, autoscaling
    create_environment
  2. Select your local zip file that we created earlier ( jenkins-stalk.zip ) as the “Source” for the application version section application
  3. Set the appropriate environment name, for example you could use jenkins-prodenvironment
  4. Complete the configuration details

    NOTE: There are several options beyond the scope of this article.

    We typically configure the following:deployment

  5. Complete the Elastic Beanstalk wizard and launch.  If you are working with a standard Cornell VPC configuration, make sure the ELB is in the two public subnets while the EC2 instances are in the private subnets.
  6. NOTE: You will encounter additional AWS features like security groups etc… These topics are beyond the scope of this article.  If presented with a check box for launching inside a VPC you should check this box.

    Create_Application

    The container will not start properly the first time. Don’t panic.  
     
    We need to attach the IAM Policy we built earlier to the instance role used by Elastic Beanstalk.jerkins-prod_-_Dashboard_and__5__Twitter

  7. Select Identity & Access Management for the AWS management console
  8. IAM-step-1

  9.  Select “Roles” then select “aws-elasticbeanstalk-ec2-role”

IAM-step-6

  • Attach the “DockerCFGReadOnly” Policy to the role IAM-step-7

 

E.) Re-run the deployment in Elastic Beanstalk.  You can just redeploy the current version.

 

  1. Now find the URL to your Jenkins environment
  2. jenkins-prod-url

  3. And launch Jenkins

jenkins-running

SUCCESS !

 

F.) (optional) Running docker command inside Jenkins

The Jenkins image comes with docker preinstalled so you can run docker build and deploys from Jenkins.  In order to use it we need to make a small tweak to the Elastic Beanstalk Configuration.  This is because we are keeping the docker version inside the image patched and on the latest commercially supported release however Elastic Beanstalk currently supports docker 1.9.1. To get things working we need to add an environment variable to use an older docker API.  First go to configurations and select the cog icon under Software Configuration.

jenkins-prod_-_Configuration
Now we need to add a new environment variable, DOCKER_API_VERSION and set its value to 1.21 .
jenkins-env-var

That is it! Now you will be able to use the docker CLI in your Jenkins jobs

 

Conclusion

Within a few minutes you can have a managed Jenkins environment hosted in AWS.
There are a few changes you may want to consider for this environment.

  • Changing the autoscaling group to min 1 and max 1 makes sense since the Jenkins state data is stored on a local volume.  Having more than one instance in the group would not be useful.
  • Also considering the state data, including job configuration, is stored on a local volume you will want to make sure to backup the EBS volume for this instances.  You could also look into a NAS solution such as Elastic File Service to store state for Jenkins, this would require a modification to /var/jenkins_home path.
  • It is strongly encouraged that an HTTP SSL listener is used for the Elastic Load Balancer(ELB) and that the HTTP listener is turned off, to avoid sending credentials in plain text.

 

The code used in this blog is available at: https://github.com/CU-CloudCollab/jenkins-stalk, Please free to use and enhance it.

If you have any questions or issues please contact the Cloud Services team at cloud-support@cornell.edu

DevOps: Failure Will Occur

by Shawn Bower

The term DevOps is thrown around so much that it is hard to pin down it’s meaning.  In my mind DevOps is about culture shift in the IT industry.  It is about breaking down silos, enhancing collaboration, and challenging fundamental design principles.  One principal that has been turned on its head because of the DevOps revolution is the no single point of failure design principle. This principle asserts simply that no single part of a system can stop the entire system from working. For example, in the Financial system the database server is a single point of failure. If it crashes we cannot continue to serve clients in any fashion.  In DevOps we accept that failure is the norm and we build our automation with that in mind.  In AWS we have many tools at our disposal like auto scaling groups, elastic load balances, multi-az RDS, dynamodb, s3, etc.  When architecting for the cloud keeping these tools in mind is paramount to your success.

When architecting a software system there are a lot of factors to balance. We want to make sure our software is working and performant as well as cost effective.  Let’s look at a simple example of building a self healing website that requires very little infrastructure and can be done for low cost.

The first piece of infrastructure we will need is something to run our site.  If its a small site we could easy run it on a t2.nano in AWS which would cost less than 5 dollars a month.  We will want to launch this instance with an IAM profile with the policy AmazonEC2RoleforSSM.  This will allow us to send commands to the ec2 instance.  We will also want to install the SSM agent, for full details please see: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-ssm-agent.html.  Once we have our site up, we will want to monitor its health. At Cornell you can get free access to the Pingdom monitoring tool.  Using Pingdom you can monitor your sites endpoint from multiple locations around the world and get alerted if your site is unreachable.  If you don’t already have a Pingdom account please send an email to cloud-support@cornell.edu.  So now that we have our site running and a Pingdom account lets set up an uptime monitor.


We are doing great!  We have a site, we are monitoring, and we will be alerted to any downtime.   We can now take this one step further and programmatically react to Pingdom alert using their custom webhook notifier.  We will have to build an endpoint for Pingdom to send the alert to.  We could use Lambda and API gateway, which is a good choice for many reasons.  If we want we could start even simpler by creating a simple Sinatra app in Ruby.

pingdom-webhook

This is a very simple bit of code that could be expanded on.  It creates an endpoint called “/webhook” which first looks for an api-key query parameter.  This application should be run using SSL/TLS as it sends the key in clear text.  That key is compared against and environment variable that should be set before the application is launched.  This shared key is a simple mechanism for security only in place to stop the random person from hitting the endpoint.  For this example it is good enough but could be vastly improved upon.  Next we look at the data that Pingdom has sent, for this example we will only react to DOWN alerts.  If we have an alert in the DOWN state then we will query a table in DynamDB that will tell us how to react to this alert.  The schema looks like:

pingdom-dynamo

  • check_id – This is the check id generated by Pingdom
  • type – is the plugin type to use to respond to the Pingom alert.  The only implemented plugin is SSM which uses Amazon’s SSM to send a command to the ec2 host.
  • instance_id – This is the instance id of the ec2 machine running our website
  • command – This is the command we want to send to the machine

We will use the the type from our Dynamo table to respond to the down alert.  The sample code I have provided only has one type which uses Amazon’s SSM service to send commands to the running ec2 instance.  The plugin code looks like:
ssm-rb

This function takes the data passed in and send the command from our Dynamo table to the instance.  The full sample code can be found at https://github.com/CU-CloudCollab/pingdom-webhook.  Please feel free to use and improve this code.  Now that we have a simple webhook app we will need to deploy it to an instance in AWS.  That instance will have to use an IAM profile that will allow it to read from our Dynamo table as well as send SSM commands.  Again we can use a t2.nano so our cost at this point is approximately 10 dollars a month.

We need to make Pingdom aware of out new web hook endpoint.  To do that navigate to “Integrations” and click “Add integration.”

pingdom-integration-step-1 The next form will ask for information about your endpoint.  You will have to provide the DNS name for this service.  While you could just use the IP of the machine its highly encourage to use a real host name with SSL.

pingdom-integration-step-2

Once you have added the integration it can be used by any of the uptime checks.  Find the check you wish to use and click the edit button

pingdom-integration-step-3

Then scroll to the bottom of the settings page and you will see a custom hooks section.  Select your hook and you are all done!

pingdom-integration-step-4

This is a simple and cost effective solution to provide self-healing to web applications.  We should always expect failure will occur and look for opportunities to mitigate it’s effects.  DevOps is about taking a wholistic approach to your application.  Looking at the infrastructure side as we did in this blog post but also looking at the application it-self.  For example move to application architectures which are stateless.  Most importantly automate everything!

Benchmarking On-Premise and EC2 Clients Running Against RDS

by Paul Allen

At Cornell, as engineers contemplate moving infrastructure and applications to AWS it is tempting to ask whether they can start just by moving database instances to the AWS Relational Database Service (RDS) and leaving other application components on premise. The reasons behind this probably stem from the fact that the on-premise relational databases represent a very well-defined component with well-defined connectivity to other components. And, tools like the AWS Database Migration Service promise to make the move fairly painless.

So, how feasible is it to leave applications on campus while using an RDS database as a back end? When I queried the Cornell cloud community about this, I got several anecdotal responses that this had been tried, without much success, with web applications.

(more…)

How to Setup AWS Route53 to Work with Cornell-Managed DNS

by Paul Allen

If you are at Cornell developing and deploying sites or services using AWS, one of the things you’ll usually want to do is ensure that those sites and services can be accessed using a hostname in the cornell.edu domain. Keeping an existing cornell.edu hostname is even more critical if you are moving a service from Cornell infrastructure to AWS infrastructure. This article describes how to use the Cornell DNS system in conjunction with AWS Route53 to deploy services in AWS that respond to cornell.edu host names.

Scenario

Let’s say we want to deploy a web site running on AWS at URL http://example.cloud.cit.cornell.edu. The Cornell DNS system and Cornell IT policy won’t let you directly reference an AWS IP address in a Cornell DNS “A” record. Further, even if we could do that, we wouldn’t get to take advantage of all the flexibility and features that Route53 offers–for example health checks on backend servers and dynamic failover.

At Cornell, in order to create the example.cloud.cit.cornell.edu hostname, I need to be an administrator of the cloud.cit.cornell.edu sub-domain in the Cornell DNS system. I also need privileges to  a public hosted zone in Route53. In this example, I’ll use cs.cucloud.net, a public Route53 hosted zone which I have permissions to administer in AWS.

Here’s the short version of what needs to happen in this scenario:

  1. Setup Route53 to serve content from example.cs.cucloud.net.
  2. Add a CNAME to Cornell DNS so that example.cloud.cit.cornell.edu is an alias for example.cs.cucloud.net.

The rest of this article goes through the specifics of how to accomplish that. This particular example will use:

Step 1 – Get a site running in AWS

This doesn’t necessarily need to be your first step, but the explanation is easier if we start here. In this example I have an EC2 instance running a generic Apache server showing a generic test page. AWS provides very detailed instructions for setting up Apache on an EC2 instance. Right now, you don’t have to worry about any hostnames–the goal is to make sure you have content being delivered from AWS. My example instance is below.

EC2 Instance Configuration

There are many other ways to serve content or applications in AWS, but I’m just picking one of the simplest ways for this article. Other options might include using CloudFront and S3 to serve a static web site or using Elastic Beanstalk to run an application. There are a plethora of other ways to accomplish this as well.

Here are the important things about this instance configuration:

  • It is running in a public subnet with a public IP assigned. In a real situation, we would run the instance on a private subnet. But, this configuration is easier to test as you work through the example.
  • The Security Group (named “dns-example”) attached to the instance allows HTTP (port 80) and SSH (port 22) access from anywhere (0.0.0.0./0). Again, not how we’d setup things in real life, but good enough for now to accomplish our current goals.
  • Apache is installed and running in the instance. You can check that by pointing your browser to the public IP address of your instance as shown below.

Apache is running

Step 2 – Configure an Elastic Load Balancer

Strictly speaking I don’t need an ELB to accomplish my goal, but using an ELB is a best practice and allows us to easily configure Route53 to direct users to our content.

Again, AWS provides step-by-step instructions for creating an ELB and I’ll register the instance I already have running to my new load balancer. Those AWS instructions are great for our situation, except for one thing: the ELB health check configured in the instructions won’t quite work if the only content I’m serving from my instance is that default test page. The reason for that is that the test page returns a HTTP status of code 403 instead of 200.  That 403 code will cause the ELB heath check configured in the instructions to fail and the ELB will take your instance offline. Instead of a having the ELB check whether “http:/107.21.54.41:80/” returns a 200 status, we need to loosen that up to a TCP check on port 80 as shown below. The TCP  check just makes sure that something on my instance is accepting connections on port 80. I’ll set the “Healthy threshold” value to 2 so that the ELB will bring my instance back online more quickly–useful for our scenario but maybe not in real life.

 

Modified ELB Health Check

Besides the health check configuration, the key points of my ELB configuration are highlighted in yellow on the screenshot below:

  • The ELB scheme is “internet-facing”.
  • My EC2 instance is registered with the ELB and it is recognized as healthy (because of the looser health check). I.e., “1 of 1 instances in service”.
  • The ELB is configured to use public subnets in my chosen availability zones.  You’ll have to take my word for it that “subnet-8d95c4fa” and “subnet-8e618aa4” are the public subnets in my VPC.
  • The Source Security Group allows HTTP traffic to the ELB on port 80. In this case I’m using the same security group as I did for my EC2 instance.

ELB Configuration

Note that the ELB “Description” tab also provides you with information about the DNS name for the ELB, highlighted in orange above. You should be able to point your browser to that name and see your test page appear. If not, go back and confirm the key configuration details in your EC2 instance and ELB.

The ELB is serving our test page.

Step 3 – Configure Route53

Now we are ready to get Route53 looped into our configuration.

1. Start by pointing your AWS Console to the Route 53 service.

Route53 in the AWS Console

2. You will need to have privileges over at least one public Route 53 hosted zone. Setting up a public hosted zone is beyond the scope of this article, but contact cloud-support@cornell.edu if you need a hosted zone for your Cornell AWS account but don’t have one. I’ll be using cs.cucloud.net as the hosted zone for this article.

Route 53 Hosted Zone

cs.cucloud.net Zone in Route 53

3. In the cs.cucloud.net hosted zone, I’m going to “Create Record Set”.

Route 53 New Record Set

4. In the “Create Record Set” dialog, enter “example” as the name, which will configure the name “example.cs.cucloud.net”. Ensure that the record type is “A – IPv4 address”. Select “Yes” for “Alias.” and then choose your ELB from the dropdown menu. Leave the remaining items with their default values, and select the “Create” button.

Route 53 Alias Record

This should have created a new name in your hosted zone. That’s great, but you might find yourself asking what the heck is an “Alias” record? AWS calls the Route53 Alias functionality an “extension” to the standard DNS system as we know it. Alias records in Route 53 will automatically track changes in certain AWS resources and always ensure that a Route 53 name points to the right IP address. For example, if the IP address of our ELB ever changes, the Alias functionality will automatically track that change and keep our name working. AWS documentation contains more about deciding between alias and non-alias records. If you go back and look at the fine print on the ELB “Description” tab you will see a warning there about avoiding using the IP address of an ELB in DNS records because, over time, the IP address of the ELB may change through no fault of your own.

Route 53 Name

5. Now you should be able to point your browser at the new Route 53 name and see your test page being delivered from your EC2 instance via the ELB.

Test Route 53 Name

You are now done with the AWS side of things. Time to move on to Cornell DNS.

Step 4 – Configure Cornell DNS

The goal in this step is to setup a new CNAME in the Cornell DNS system pointing to the new Route 53 name we just created. You will need admin privileges to the Cornell sub-domain in which you want to create the name. Here, I have privileges to the cloud.cit.cornell.edu sub-domain.

1. Navigate to the Cornell DNS batch processing interface. You will need to authenticate with your Cornell netID to access the batch interface.

2. Enter “addcname [your-host-and-subdomain].cornell.edu [your-route-53-name]” into the batch processor. The batch command for this example is “addcname example.cloud.cit.cornell.edu example.cs.cucloud.net”. Also be sure to check the box that says “Allow cnames and mx records to point to targets outside your subnets and domains”.

Cornell Batch Processor Input

This should result in output confirming that the CNAME was created.

Cornell Batch Processor Output

3. Now wait. The Cornell DNS changes may not take effect immediately. In the worst case, they are pushed out about 5-10 minutes after the top of each hour. You can use ping to test whether the Cornell name has been published. E.g., “ping example.cloud.cit.cornell.edu”. As soon as ping stops reporting that it cannot resolve the name, you are in business.

4. Point your browser to your new Cornell DNS name. If you see your your AWS Apache test page, you have achieved your goal. If not go back and troubleshoot, examining the key configuration items pointed out earlier in this article.

Final Result

Improvements

Now that you know how to configure an AWS service to respond to Cornell DNS names, there are several things you might want to do to transform this example into something nearing a production configuration.

  • To achieve better fault tolerance, add more EC2 instances serving your content/service. You might even want to setup an AWS autoscaling group. If you manually add instances, be sure not to forget to register them with your ELB.
  • Instead of running EC2 instances on public subnets, move them to your private subnets to better protect them from the bad guys.
  • Review the Security Groups you used for the ELB and your EC2 instance(s). Improve them by reducing access to the minimum needed for your service. That might mean allowing only HTTP/HTTPS traffic on ports 80 and 443 to your ELB. For the EC2 instance(s), you could reduce the scope of traffic on ports 22, 80, 443 to just the IP range in your VPC subnets. Be sure that the ELB sitting on your public subnet(s) can still access the appropriate ports (e.g., 80, 443) on your EC2 instances. Also be sure not to block yourself from getting to port 22 if you need to admin your EC2 instance(s).
  • Once you begin serving real content on your backend server (i.e., not the default test page), change the ELB health check to use an HTTP or HTTPS check.
  • Consider adding health checks and failover to your Route 53 configuration.
  • Setup the ELB to serve HTTPS traffic by uploading a server certificate into AWS Identity and Access Management Service and configuring your ELB to use it.
  • Turn on access logging for the ELB.

The Cornell “Standard” AWS VPC

by Paul Allen

This post describes the standard AWS Virtual Private Cloud (VPC) provisioned for Cornell AWS customers by the Cornell Cloudification Service Team. This “standard” VPC is integrated with Cornell network infrastructure and provides several benefits over the default VPC provisioned to all AWS customers when a new AWS account is created.

So we don’t get confused, let’s call the VPC provisioned by the Cornell Cloudification Service Team the “Cornell VPC” and the VPC automatically provisioned by AWS the “default  VPC”. AWS itself calls this latter VPC by the same name (i.e. default VPC). See AWS documentation about default VPCs (more…)