Amazon AppStream 2.0 for Neuroscience Research

By Bertrand Reyna-Brainerd

It will come as news to few of you that the pace of advancement in computer technology has exceeded all expectations.  In 2013, 83.8% of American households owned at least one personal computer.[1]  In 1984, when the US census first began to gather data on computer ownership, that number was only 8.2%.  At that time, a debate was raging about the scope of personal computing.  Would every household need to own a computer, or would they remain a specialized tool for researchers and IT professionals?  Even the mere existence of that debate seems inconceivable today, and yet the era of personal computing may already be drawing to a close.  This is not due to a decline in the need for or utility of computing resources, but by the explosive growth of cloud computing.  Amazon Web Services’ AppStream 2.0 is a significant step toward “cloudification,” and toward a future in which all forms of computation are increasingly delocalized.

Our laboratory, which labors under the title of “The Laboratory for Rational Decision Making,” applies a variety of approaches in its search for patterns in human behavior and the brain activity that underlies them.  In 2016, David Garavito, a J.D./PhD graduate student at Cornell, Joe DeTello, a Cornell undergraduate, and Valerie Reyna, PhD, began a study on the effects of risk perception on sports-related concussions.  Joe is a former high school athlete who has suffered a number of concussions himself during play, and is therefore able to provide personal, valuable insights into how these injuries occur and why they are so common.

Bertrand Reyna-Brainerd
Bertrand Reyna-Brainerd
David Garavito
David Garavito
Valerie Reyna, PhD
Valerie Reyna
Joe DeTello
Joe DeTello

The team has previously conducted research for this project by physically traveling to participating high schools and carrying a stack of preconfigured laptops.  This is not an uncommon method of conducting research in the field, but it imposes several limitations (such as the cost of travel, the number of students that can be physically instructed in a room at one time, and by the number of laptops that can be purchased, carried, and maintained).  What’s more, the type of information required for the research could be easily assessed online—they only needed to collect the typical sort of demographic data that has long been gathered using online polls, as well as measures such as memory accuracy and reaction time.  Most of this legwork is performed by software that was designed to operate only in a physical, proctor-to-subject environment.

 

At that point – Joe and David came to me and asked if we could find a better solution.

 

Marty Sullivan, Cloud Engineer
Marty Sullivan

I was not initially sure if this would be possible.  In order for our data to be comparable with data gathered on one laptop at a time our applications would have to be reprogrammed from scratch to run on the web.  However, I agreed to research the problem and consulted with the IT@Cornell Cloudification Team. Marty Sullivan, Cloud Engineer, helped us discover that Amazon Web Services (AWS) and Cornell were testing AWS’s then-unreleased product: AppStream 2.0.  Amazon AppStream 2.0 is a fully-featured platform for streaming software, specifically desktop applications, through a web browser. The most attractive component is the fact that the end-users are not required to install any software – they can simply access the software via their preferred personal device.

 

Figure 1 - Screenshot of our assessment software running on an ordinary desktop.
Figure 1 – Screenshot of our assessment software running on an ordinary desktop.

 

Figure 2 - The same software running on AppStream.
Figure 2 – The same software running on AppStream.

 

AppStream 2.0 Application Selection Screen
Figure 3. AppStream 2.0 Application Selection Screen

 

The AppStream 2.0 environment has a number of advantages over a traditional remote desktop approach.  When a user connects to our system they are presented with an option to choose an application. After selecting an application, only that application and its descendants are displayed to the user.  This avoids several of the usual hassles with virtual computing, such as sandboxing users away from the full Windows desktop environment.  Instead, the system is ready-to-go at launch and the only barriers to entry are access to a web browser and an activated URL.

Once the virtual environment has been prepared, and the necessary software has been installed and configured, the applications are packaged into an instance that can then be delivered live over the web, to any number of different devices. Mac, PC, tablets and any other device that has a web browser can be used to view and interact with your applications!  This package resets to its original configuration periodically in a manner similar to how Deep Freeze operates in several of Cornell’s on-campus computer laboratories.  While periodic resets would be inconvenient in other contexts, this system is ideal for scientific work as it ensures that the choices made by one participant do not influence the environment of the next.

It is difficult to overstate the potential of this kind of technology.  Its advantages are significant and far-reaching:

  • Scalability: Because Amazon’s virtual machine handles most of the application’s processing demands, the system can “stretch” to accommodate peak load periods in ways that a local system cannot. To put it another way, if a user can run the AppStream web application, they can run the software associated with it.  This means that software that would otherwise be too large or too demanding to run on a phone, tablet, or Chromebook can now be accessed through AppStream 2.0.
  • Ease of use: End-users do not have to install new software to connect. Applications that would normally require installing long dependency chains, compilation, or complicated configuration procedures can be prepared in advance and delivered to end users in their final, working state.  AppStream 2.0 is more user friendly than traditional remote desktop approaches as well, all they need to do is follow a web link.  It is not necessary to create separate accounts for each user, but there is a mechanism for doing so.
  • Security: AppStream 2.0 can selectively determine what data to destroy or preserve between sessions. Because the virtual machine running our software is not directly accessible to users, this improves both security and ease of use (a rare pairing in information technology) over a traditional virtual approach.
  • Cross-platform access: Because applications are streamed from a native Windows platform, compatibility layers (such as WINE), partition utilities (such as Apple’s BootCamp), or virtual desktop and emulation software (such as VirtualBox or VMWare) are not needed to access Windows software on a non-Windows machine. This also avoids the performance cost and compatibility issues involved in emulation.  Another advantage is that development time can be reduced if an application is designed to be delivered through AppStream.
  • Compartmentalized deployment: Packaging applications into virtual instances enforces consistency across environments, which aids in collaborative development. In effect, AppStream instances can perform the same function as Docker, Anaconda, and other portable environment tools, but with a lower technical barrier for entry.

 

Disadvantages:

  • Latency: No matter how optimized an online system of delivery may be, streaming an application over the web will always impose additional latency over running the software locally. Software that depends on precise timing in order to operate, such as video games (or response time assessments in our case) may be negatively impacted.  In our case, we have not experienced latency above a barely-perceptible 50 ms.  Other users have reported delays of as much as 200ms, which is variable by region.
  • Compatibility: Currently AppStream 2.0 only has Windows Server 2012 instances available, and access to hardware-accelerated graphics such as OpenGL is not supported. This is expected to change in time.  In terms of the software itself, there are often additional configuration steps that are required to get a piece of software running within the Appstream environment compared to ordinary software install procedures.
  • Cost: To use AppStream 2.0 you must pay a recurrent monthly fee based on use. This includes costs associated with bandwidth, processing, and storage space.  You are likely to be presented with a bill on the order of tens to low hundreds of U.S. dollars per month, andyour costs will differ depending on use.

 

Ultimately, the impact of AppStream 2.0 will be determined by its adoption.  The largest hurdle that the product will need to clear is its cost, but its potential is enormous.  If Amazon leverages its position to purchase and deploy hardware at low rates, and passes these savings on to their customers, then it will be able to provide a better product at a lower price than the PC market.  And if Amazon does not do this another corporation will, and an end will come to the era of personal computing as we have known it.

[1] U.S. Census Bureau. (2013). Computer and Internet Use in the United States: 2013. Retrieved March 29, 2017, from https://www.census.gov/history/pdf/2013computeruse.pdf.

AWSCLI S3 Backup via AWS Direct Connect

By Ken Lassey, Cornell EMCS / IPP

Problem

AWSCLI tools default to using the Internet for its connection when reaching out to services like S3. This is due to the fact that S3 only provides public endpoints for network access to the service.  This is an issue if your device is in Cornell 10-space as it cannot get on the Internet.  It also could be a security concern depending on the data, although AWSCLI S3 commands do use HTTPS for their connections.

Another concern is the potential data egress charges for transferring large amounts of data from AWS back to campus. If you need to restore an on-premise system directly from S3, this transfer would accrue egress charges.

Solution

Assuming you have worked with Cloudification to enable Cornell’s AWS Direct Connect service in your VPC, you can force the connection through Direct Connect with a free tier eligible t2.micro Linux instance. By running the AWSCLI commands from this EC2 instance, it can transfer data from your on-premise system, through Direct Connect and drop the data in S3 all in one go.

Note that if your on-premise systems have only public routable IP Addresses, and no 10-Space addresses, your on-premise systems will not be able to route over Direct Connect (unless Cloudification has worked with you to enable special routing). In most campus accounts, VPC’s are only configured to route 10-Space traffic over Direct Connect. If the systems you want to back up have a 10-Space address, you are good to go!

Example Cases

  1. You have several servers that backup to a local storage server. You want to copy the backup data into AWS S3 buckets for disaster recovery.
  2. You want to backup several servers directly into AWS S3 buckets.

In either case, you do not want your data going in or out over the internet, or it cannot due to the servers being in Cornell 10-space.

AWSCLI Over Internet

awscli s3 sync <backupfolder> <s3bucket>

By running the awscli command from the backup server or individual server the data goes out over the internet.

Example Solution

By utilizing a free tier eligible t2.micro Linux server you can force the traffic over Cornell direct connect to AWS.

AWS CLI Over Direct Connect

Running the awscli commands from the t2.micro instance you force the data to utilize AWS direct connect.

On your local Windows server (backup or individual) you need to ‘share’ the folder you want to copy to S3.  You should be using a service account and not your own NetID. You can alternatively enable NFS on your folder if you are copying files from a Linux server, or even just tunnel through SSH. The following examples are copying from a Windows share.

In AWS:

  1. Create an s3 bucket to hold your data

On your systems:

  1. Share the folders to be backed up
  2. Create a service account for the backup process
  3. Give the service account at least read access to the shared backup folder
  4. If needed allow access through the Managed Firewall from your AWS VPC to the server(s) or subnet where the servers reside

On the t2.micro instance you need to:

  1. Install the CIFS Utilities:
    sudo yum install cifs-utils
  2. Create a mount point for the backup folder:
    mkdir /mnt/<any mountname you want>
  3. Mount the shared backup folder:
    sudo mount.cifs //<servername or ip>/<shared folder> /mnt/<mountname> -o \ 
    name=”<service account>”,password=”<service account password>”,domain=”yourdomain>”
  4. Run the s3 sync from the t2.micro instance:
    aws s3 sync \\<servername/IP>/\<share> s3:\\<s3-bucket-name>

 

Instead of manually running this I created on the t2.micro instance a script to perform the backup.  The script can be added to a CRON task and run on a schedule.

Sample Script 1

  1. I used nano to create backup.sh
       #!/bin/sh
       backupfolder=/mnt/<sharedfoldername>
       s3path=s3://<s3bucketname>
       aws s3 sync $backupfolder $s3path
  1. The script needs to be execuatable so run this command on the t2.micro instance
    sudo chmod +x <scriptname>

Sample Script 2

This script mounts the share and backups up multiple folders on my backup server then unmounts the shared folder

#!/bin/bash
sudo mount.cifs //128.253.109.11/ALC -o user="<serviceaccount>",password="<serviceaccount pwd>",domain=”<your domain"
srvr=(FOLDER1 FOLDER2 FOLDER3 FOLDER4 FOLDER5 FOLDER6)
for s in "${srvr[@]}"; do
    bufolder=/mnt/<mountname>/$s
    s3path=s3://<s3bucketname>/$s
    aws s3 sync $bufolder $s3path
done
sudo umount /mnt/<mountname> -f