HDFS to S3 Sync Application

Summary

This application demonstrates continuous big data sync from a source HDFS to destination S3. It read files from source HDFS and upload it into Amazon S3 using multipart upload feature.

Required Properties

End user must specify the values for these properties.

Property Type Example Notes
Aws Credentials Access Key Id String ACCESS_XXX_KEY_XX_ID AWS credentials access key id for S3
Aws Credentials Secret Access Key String 8your+own0+AWS0 secret1230+8key8goes0here AWS Secret access key for accessing S3 output.
Input Directory Or File Path On HDFS String
  • /user/appuser/input /directory1
  • /user/appuser /input/file2.log
  • hdfs://node1.corp1 .com/user/user1 /input
HDFS path for input file or directory
Output Directory Path On S3Storage String hdfs_to_s3 Output directory for AWS S3
S3Storage Bucket Name String com.example.app.s3 Bucket name for AWS S3 output

Advanced Properties (optional)

Property Default Type Notes
Maximum Readers For Dynamic Partitioning 16 int Maximum no of partitions for Block Reader operator.
Minimum Readers For Dynamic Partitioning 2 int Minimum no of partitions for Block Reader operator.
Number Of Blocks Per Window 1 int File splitter will emit these many blocks per window for downstream operators.