Overview

This document describes steps to run DT apps on AWS cluster. Users can easily try out apps from the AppFactory by downloading the app installers from the DataTorrent website. A zip package containing bash scripts will be downloaded on user’s machine and user needs to follow the instructions below to deploy apps.

Setup

  • Install the AWS Command Line Interface
    • Refer to the link for instructions
  • Configure the AWS Command Line Interface
    •   $ aws configure
        AWS Access Key ID [None]: MENTION ACCESS_KEY_ID
        AWS Secret Access Key [None]: MENTION SECRET_ACCESS_KEY
        Default region name [None]: MENTION REGION
        Default output format [None]: _Press ENTER_
      
    • For detailed instructions, refer to the link
  • Verify that AWS Command Line Interface is configured by ensuring the following returns correct values
    • $ aws configure get aws_access_key_id
      $ aws configure get aws_secret_access_key
      $ aws configure get region
      

Package contents

The zip file (see end-to-end steps below) will contain following files:

  • DTapp-EMR-Deploy.sh - Script that user needs to execute to set up AWS EMR Cluster. The cluster will be created and DT-RTS installed along with the app.

  • config.properties - This file contains properties used by the script. File has sample values for the properties as mentioned below.
    (Note: Sample values for the properties should be sufficient unless it needs to be changed.)

Property Sample Value Description
SECURITY_GROUP_NAME DTapp-security-group Security Group with this name will be created and inbound rule with user machine’s external IP address will be added to it.
BUCKET_NAME dtapp-emr S3 bucket name. A bucket with this name will be created and script that installs DT-RTS will be placed into it. This script is used as a bootstrap script during cluster launch.
S3_REGION us-east-1 Region where S3 bucket is created
LOG_URI s3n://dtapp-logs S3 URI where log files would be stored
CLUSTER_NAME DTapp Name of the EMR cluster
CORE_INSTANCE_TYPE m1.large Type of machine for core instance
CORE_INSTANCE_COUNT 1 Number of core instances to be created
MASTER_INSTANCE_TYPE m1.medium Type of machine for master instance
MASTER_INSTANCE_COUNT 1 Number of Master instances to be created
  • configuration.json - This file contains hdfs block size,replication factor and virtual to physical memory ratio.

  • DTapp-EMR-Terminate.sh - Script to terminate the cluster

  • README - Readme file with instructions

End to End workflow

The steps below describe the end to end flow to run any app provided in the AppFactory. Let's consider that we want to run Kinesis to S3 app on the AWS cluster.

  1. Download the zip file for the app. https://www.datatorrent.com/downloads/aws-apps/app-kinesis-to-s3.zip

  2. Extract the zip

    unzip app-kinesis-to-s3.zip

  3. Verify the files. The directory structure should be as follows:

    user@localhost:~/Downloads/app-kinesis-to-s3$ ls
    config.properties  DTapp-EMR-Deploy.sh  DTapp-EMR-Terminate.sh README configuration.json
    
  4. Edit config.properties file. Modify the properties as described in the Package Contents section.

  5. Check the configuration.json file which contains properties like blocksize, replication factor etc. User can change this configuration, if required.

  6. Assign Execute permissions.

     user@localhost:~/Downloads/app-kinesis-to-s3$ chmod u+x /path/to/app-kinesis-to-s3/DTapp-EMR-*
    
  7. Execute the shell script.

     user@localhost:~/Downloads/app-kinesis-to-s3$ ./DTapp-EMR-Deploy.sh
    
  8. Wait till the EMR cluster is up. Once the cluster is up, a link to configure the DT-RTS will be provided. Click on the link. E.g http://ec2-54-91-43-25.compute-1.amazonaws.com:9090

  9. DT-RTS Welcome Screen will appear. Click continue aws-emr-1

  10. Default paths for Hadoop and the DFS root will be automatically populated, e.g. /usr/bin/hadoop and /user/dtadmin/datatorrent. You can leave these values alone unless you have some specific reason to change them. Click continue aws-emr-2

  11. A valid license would be needed to continue further. Click on Request License. aws-emr-3

  12. Download the appropriate license. aws-emr-4

  13. Upload the license. Once the license is successfully uploaded, Click Continue aws-emr-5 aws-emr-6

  14. Once the configuration is complete, Click Continue aws-emr-7

  15. The app is imported and is ready to launch aws-emr-8 To know more about how to configure a particular app, visit the App Templates section of our documentation

  16. To terminate the AWS cluster, execute the termination script:

     user@localhost:~/Downloads/app-kinesis-to-s3$ ./DTapp-EMR-Terminate.sh