AWS Cloud Development Kit (AWS CDK) is a toolkit for defining AWS cloud infrastructure in code. In short, we can manage the entities that run our application with a coding language we’re familiar with. I will write up a post on why we use infrastructure as code (IaC) for machine learning (ML) pipelines.

And so, AWS CDK joined in the party with other IaC development kits such as Terraform and Serverless.io. One of the major things that I have with exploring a new system is that they tell you how to run their work, but they often neglect to tell you what goes on under the hood. It’s definitely easy to run sample code and run with it, but it’s another world to troubleshoot bugs when they happen.

Thus, for the tutorial here, I will be talking about how to get started with AWS CDK, and I will attempt to explain some of the things I have learned about the system through my own research.

However, if you’re just raring to get a sample application out fast, the official documentation should be enough to get you through: https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html

Some Prerequisites And Context Before We Begin

We can run AWS CDK in these languages:

  • Typescript
  • Javascript
  • Python
  • Java
  • C#

I will be running my sample application on Typescript because:

  • If I’m not mistaken, AWS CDK is written in Typescript.
  • I tried to run it on Python but some sample code in the Python documentation contains Typescript code instead. I guess they were in a hurry and they cut and paste their examples without realizing.
  • Essentially, I gave up on Python and I would do it on Typescript instead.

I’m quite sure AWS is going to fix their documentation and this will become a non-problem, but as such, I’m doing it on Typescript since it’s the de facto language for CDK.

Make sure you have an AWS account, and a corresponding access key because you’ll be building infrastructure on AWS. But if you work in a company that embraces a single sign-on (SSO) system like Okta, then your access key may be hidden from you but you’ll still be able to run AWS CDK nevertheless. I will write another post on how you can create your access key ID and an access secret key in future.

Also, you’ll need to install on your local machine:

  • Nodejs (At least version 10.3 or later)
  • Typescript
  • AWS Cli

Configure your local machine to work with your AWS credentials. You can use something simple like:

aws configure

And after installing Nodejs, install the AWS CDK Toolkit:

npm install -g aws-cdk

If you have done the steps above properly, you should be able to see the version number of AWS CDK within your local machine:

npm install -g aws-cdk

So How Are We Going To Demystify AWS CDK?

  • Installing a sample boilerplate code using cdk init app --language typescript on an empty folder
  • Create some sample infrastructure such as a Lambda function and/or an S3 bucket
  • Explain what goes on under the hood when we deploy a sample application

It’s a little too long to write everything down on a single post, so I will break it down into digestible parts with several posts.

A Few Things To Note:

  • Knowing how to run AWS CDK doesn’t mean you’re going to be an expert in deploying AWS infrastructure immediately
    • That’s because AWS has hundreds of services, and each service has its own learning curve. You’ll have to learn the nuances of each service and how they relate to each other in an ML pipeline.
  • AWS CDK updates very frequently. A minor version update almost every week at the point of this writing.
    • I will do my best to keep up-to-date on the latest changes and reflect it on the posts here.
  • My git repo is found in https://github.com/jonathan-moo/gefyra-cdk-demo/tree/master.
    • It follows the format of CDK-T1, where T1 would be tutorial 1.

Next up, we will look into how we can build a CDK app on our AWS account here: Preparing Your AWS CDK Framework