This document describes steps to run DT apps on AWS cluster. Users can easily try out apps from the AppFactory by downloading the app installers from the DataTorrent website. A zip package containing
bash scripts will be downloaded on user’s machine and user needs to follow the instructions below to deploy apps.
- Install the AWS Command Line Interface
- Refer to the link for instructions
- Configure the AWS Command Line Interface
$ aws configure AWS Access Key ID [None]: MENTION ACCESS_KEY_ID AWS Secret Access Key [None]: MENTION SECRET_ACCESS_KEY Default region name [None]: MENTION REGION Default output format [None]: _Press ENTER_
- For detailed instructions, refer to the link
- Verify that AWS Command Line Interface is configured by ensuring the following returns correct values
$ aws configure get aws_access_key_id $ aws configure get aws_secret_access_key $ aws configure get region
The zip file (see end-to-end steps below) will contain following files:
DTapp-EMR-Deploy.sh - Script that user needs to execute to set up AWS EMR Cluster. The cluster will be created and DT-RTS installed along with the app.
config.properties - This file contains properties used by the script. File has sample values for the properties as mentioned below.
(Note: Sample values for the properties should be sufficient unless it needs to be changed.)
|SECURITY_GROUP_NAME||DTapp-security-group||Security Group with this name will be created and inbound rule with user machine’s external IP address will be added to it.|
|BUCKET_NAME||dtapp-emr||S3 bucket name. A bucket with this name will be created and script that installs DT-RTS will be placed into it. This script is used as a bootstrap script during cluster launch.|
|S3_REGION||us-east-1||Region where S3 bucket is created|
|LOG_URI||s3n://dtapp-logs||S3 URI where log files would be stored|
|CLUSTER_NAME||DTapp||Name of the EMR cluster|
|CORE_INSTANCE_TYPE||m1.large||Type of machine for core instance|
|CORE_INSTANCE_COUNT||1||Number of core instances to be created|
|MASTER_INSTANCE_TYPE||m1.medium||Type of machine for master instance|
|MASTER_INSTANCE_COUNT||1||Number of Master instances to be created|
configuration.json - This file contains hdfs block size,replication factor and virtual to physical memory ratio.
DTapp-EMR-Terminate.sh - Script to terminate the cluster
README - Readme file with instructions
End to End workflow
The steps below describe the end to end flow to run any app provided in the AppFactory. Let's consider that we want to run Kinesis to S3 app on the AWS cluster.
Download the zip file for the app. https://www.datatorrent.com/downloads/aws-apps/app-kinesis-to-s3.zip
Extract the zip
Verify the files. The directory structure should be as follows:
user@localhost:~/Downloads/app-kinesis-to-s3$ ls config.properties DTapp-EMR-Deploy.sh DTapp-EMR-Terminate.sh README configuration.json
config.propertiesfile. Modify the properties as described in the Package Contents section.
configuration.jsonfile which contains properties like blocksize, replication factor etc. User can change this configuration, if required.
Assign Execute permissions.
user@localhost:~/Downloads/app-kinesis-to-s3$ chmod u+x /path/to/app-kinesis-to-s3/DTapp-EMR-*
Execute the shell script.
Wait till the EMR cluster is up. Once the cluster is up, a link to configure the DT-RTS will be provided. Click on the link. E.g
DT-RTS Welcome Screen will appear. Click
Default paths for Hadoop and the DFS root will be automatically populated, e.g.
/user/dtadmin/datatorrent. You can leave these values alone unless you have some specific reason to change them. Click
A valid license would be needed to continue further. Click on
Download the appropriate license.
Upload the license. Once the license is successfully uploaded, Click Continue
Once the configuration is complete, Click Continue
The app is imported and is ready to launch To know more about how to configure a particular app, visit the
App Templatessection of our documentation
To terminate the AWS cluster, execute the termination script: