An Intrusion on Cornell AWS Turf

By Paul Allen

So, it’s finally happened. Earlier this week Cornell University had an intrusion into one of the 80+ AWS accounts that Cornell uses. With that kind of scale, it was only a matter of time until a human made a mistake.

What Happened?

The short story is that a Cornell developer accidentally included AWS credentials in source code committed to a public git repository. Within a matter of minutes, the intruder had found those credentials and begun using them to fire up large AWS EC2 instance types across all AWS regions. Presumably, the goal was to steal compute cycles for cryptocurrency mining. The  developer working with the account noticed a problem when the intruder terminated the EC2 instances hosting development, test, and production deployments.

The Good

The good news is that it could have been worse.

Once the intrusion was noticed, it took only minutes to identify the IAM credentials that were being used by the intruder and deactivate them. Then cleanup began—shutting down EC2 instances that the intruder had launched to stop the AWS charges from mounting further.

More good news is that the leaked credentials had privileges limited to the AWS EC2 service. That limited the blast radius so forensics were not required for the myriad of other AWS services that are available.

A few other ways we were lucky:

  • The affected AWS account wasn’t being used for holding or processing any Cornell confidential data. This would have complicated things immensely.
  • The intruder terminated existing instances, triggering a look at what was going on in the account.
  • Beyond terminating those instances, the intruder didn’t seem to be bent on destroying existing resources, like EBS volumes or snapshots. This made recovery a bit easier.
  • The soft limits that AWS imposes on numbers of instance types that can be running at any given time restricted the number of instances that the intruder was able to start up. No doubt the bill would have been much larger if those limits weren’t in place.
  • Because the intruder only had privileges to EC2, he couldn’t create new IAM policies, users, or credentials thus saving us from playing whack-a-mole with a myriad of other IAM principals or credentials beyond the one originally leaked.
  • The affected AWS account had CloudTrail turned on so it was easy to see how the leaked credentials were being used. This is part of the standard audit configuration we enforce for Cornell AWS accounts.

The Bad

The total AWS bill resulting  from the intrusion was about $6,500. The normal daily spend in the affected AWS account was about $200/day and the bill ballooned to about $6,700 on the day of the intrusion. Ouch!

The Ugly

This intrusion resulted from a simple mistake, and mistakes happen whether we are newbies or experts. The challenge we face is determining what practical mechanisms should be in place to reduce the likelihood of mistakes or to limit the impact of those mistakes.

The leaked credentials granted a lot of privileges

The credentials that were compromised had privileges associated with the AmazonEC2FullAccess managed policy. That’s a lot of privileges. The blast radius of the intrusion could have been limited by better scoping those privileges in some way:

  • Often, privileges can be scoped reasonably to a specific AWS region without impairing their usability. It’s easy to restrict privileges to a specific region. Our intruder was able to launch instances in every public AWS region. The cost impact of our intrusion could have been reduced by over 90% if the privileges had been restricted to a specific region of interest.
  • The privileges granted by the compromised credentials could have been scoped to work just from a limited set of source IPs. Most of us at Cornell are using AWS from a limited set of IP addresses (Cornell campus IPs and maybe our home IP address) so scoping to that limited set of IPs wouldn’t have hindered legitimate use of the credentials. Here’s an example policy that you can use to help limit access to Cornell IP addresses.
  • Was full access to all EC2 operations required for the intended use of these credentials? Maybe, maybe not. While it can be challenging to figure out the itemized list of specific AWS operations that an IAM principal should have access to, it may have helped limit the scope of this intrusion. For example, our intruder was able to create new EC2 key-pairs with which to connect to his stolen instances. Did the intended use of these credentials require the ec2:CreateKeyPair permission? Were other limitations possible?

Avoid leaking credentials in the first place

It is well known that bad guys watch commits to public git repos for credentials and other secrets accidentally committed.  However, there is help available.

  • Tools like git-secrets can be used to avoid accidentally committing secrets like AWS credentials to git repos.
  • It is often possible to use temporary credentials instead of static, fixed credentials. Tools like aws-vault make it possible to improve your AWS credential hygiene in a variety of scenarios. In fact, one team at Cornell has integrated aws-vault into their development processes.
  • You can avoid AWS credentials entirely if you are operating from an EC2 instance. In that case, you can use IAM instance profiles to give your EC2 instance privileges to perform AWS operations without having to configure AWS access key credentials.

Better monitoring of activity in AWS accounts

The intrusion could have been recognized earlier.

  • We use CloudCheckr at Cornell for campus AWS accounts, giving our AWS users access to reports on billing, inventory, utilization, security, and best practices. CloudCheckr has a lot of information and could have alerted us about unexpected spending (among other things), but it isn’t designed as a realtime tool. So that wouldn’t have provided timely alerts in this situation.
  • AWS account users can set up billing alarms in CloudWatch to get alerted based on the billing parameters they choose. Unlike CloudCheckr alerts, these are based on real-time data so could have provided a warning that something untoward was happening in this situation.
  • The free tier of AWS Trusted Advisor provides Service Limit checks that could have told us that the EC2 instance limits had been reached in AWS regions around the globe.  CloudWatch Alarms can be combined with Trusted Advisor metrics to get notifications about that situation.

Conclusion

It is easy to come up with a list of things that could have been done differently in a situation like this. The items above are just a starting point, and there are a gabillion compilations of best practices for AWS security. It can be overwhelming.

Even though this intrusion happened elsewhere on campus, I am going to take Cornell’s brush with an intruder as an opportunity to critically review the practices that I and my team use. I won’t let myself get overwhelmed with all that could be done. Instead we’ll focus on incremental improvement to the practices we have in place and assess where more effort might have a disproportionately great payoff. If you or your team at Cornell would like help doing the same in AWS, please reach out to the Cloud Team.