Click here to Skip to main content
15,860,943 members
Articles / DevOps
Tip/Trick

AWS Analyze Big Data with Terraform

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
5 Jun 2019CPOL2 min read 8.5K   3  
Following 'Infrastructure as Code' rules we get a real project sample from the scratch which describes EMR cluster deploying and running Hive script there. It describes Analyze Big Data with Hadoop project from AWS 'Learn to Build' section.

Introduction

It's important to describe your infrastructure with a code. Terraform can help us with that.

Authentication

Don't forget to create variables.tf file in your project root directory where you should set 3 variables:

  • region - where all your infrastructure will be deployed
  • access_key and secret_key for your user which can be generated via AWS IAM (examples are below)
C#
variable "region" {
    default = "us-east-2"
}
variable "access_key" {
    default = "JFSKLGD8...UFDJKGJS"
}
variable "secret_key" {
    default = "sdfs8d9fgEG33VE...343rVFDV3vdfevr"
}

Step by Step Scripts

After passing exam AWS Solutions Architect Associate not to forget the stuff, I found out projects which AWS suggests at their getting started section to implement them one by one. I chose Analyze Big Data with Hadoop for the first step. For fun, I decided to describe this project via Terraform scripts.

I'd like to share this experience because I faced a couple of not trivial issues.

  1. First of all, we need to set up Terraform provider, see provider.tf.
    C#
    provider "aws" {
        access_key = "${var.access_key}"
        secret_key = "${var.secret_key}"
        region = "${var.region}"
    }
  2. Here, we should create S3 Bucket and EC2 Key Pair. Both are quite simple and straightforward steps which are described at s3_bucket.tf and key_pair.tf correspondingly.
    C#
    resource "aws_s3_bucket" "s3_bucket" {
        bucket = "tf-big-data"
    }
    C#
    resource "aws_key_pair" "emr_key_pair" {
        key_name = "tf-big-data"
        public_key = "ssh-rsa A...w== rsa-key-20180822"
    }
  3. Creating EMR cluster via the console needs 5-7 clicks choosing a couple of options and the rest of the options can be left by default. It looks like an apple pie but in fact a lot of actions are happening behind the scenes. So we have to take care about the roles and policies for EMR and its EC2 instances. For each of them, we have to create 2 data objects (aws_iam_policy and aws_iam_policy_document) and 2 resources (aws_iam_role_policy_attachment and aws_iam_role). These roles are at roles.tf module.
  4. Another important section is about network and security (vpc.tf). Here, we're creating 6 resources:
    • aws_vpc;
    • aws_subnet and aws_internet_gateway at this vpc;
    • aws_route_table at this vpc which has a route via created internet gateway;
    • aws_main_route_table_association which connects our vpc and route table;
    • aws_security_group at our vpc which depends on created subnet.
  5. We also need an aws_iam_instance_profile which is kept at the end of emr_cluster.tf module.
  6. Finally, we can create EMR cluster itself emr_cluster.tf. We should describe here all required properties such as: name, release_label, applications, service_role (from step 3), log_uri (from step 2), ec2_attributes (from steps 2, 4, 5), one or more instance groups. I also added there 'step' section where I put Hive-script to execute.

The full code of the project is here.

I will really appreciate any comments or suggestions about how this script could be simplified.

Points of Interest

It's not obvious how many resources are really created behind the scenes when you click the button to create EMR cluster at AWS Console. But it's useful to know to understand underlying things that are happening there.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Intetics
Ukraine Ukraine
AWS Solutions Architect Associate
https://www.certmetrics.com/amazon/public/badge.aspx?i=1&t=c&d=2018-02-19&ci=AWS00397579

Comments and Discussions

 
-- There are no messages in this forum --