Kafka to HDFS Sync Application
Ingest messages from kafka to hadoop HDFS for continuous ingestion to hadoop. The source code is available at: https://github.com/DataTorrent/app-templates/tree/master/kafka-to-hdfs-sync.
Please send feedback or feature requests to: firstname.lastname@example.org
This document has a step-by-step guide to configure, customize, and launch this application.
Click on the AppFactory tab from the top navigation bar. Page listing the applications available on AppFactory is displayed.
Search for Kafka to view all the applications related to Kafka.
Click on import button for
Kafka to HDFS Sync App. Notification is displayed on the top right corner after application package is successfully imported.
Click on the link in the notification which navigates to the page for this application package.
Detailed information about the application package like version, last modified time, and short description is available on this page. Click on launch button for
Kafka-to-HDFS-Syncapplication. In the confirmation modal, click the Configure button.
For example, suppose we wish to process all messages from topic
transactionsat the kafka server running on localhost port 9092 and write them to
/user/appuser/outputon HDFS. Properties should be set as follows:
name value Kafka Broker List kafka-server-node:9092 Kafka Topic Name test Output Directory Path /user/appuser/input Output File Name output.txt
Details about configuration options are available in Configuration options section.
When you are finished inputting application configuration properties, click on the
savebutton on the top right corner of the page to save the configuration.
Click on the
launchbutton at the top right corner of the page to launch the application. A notification will be displayed on the top right corner after application is launched successfully and includes the Application ID which can be used to monitor this instance and find its logs.
Click on the
Monitortab from the top navigation bar.
A page listing all running applications is displayed. Search for current application based on name or application id or any other relevant field. Click on the application name or id to navigate to application instance details page.
Application instance details page shows key metrics for monitoring the application status.
logicaltab shows application DAG, Stram events, operator status based on logical operators, stream status, and a chart with key metrics.
Click on the
physicaltab to look at the status of physical instances of the operator, containers etc.
End user must specify the values for these properties.
|dt.operator.fileOutput.prop.filePath||Output path for HDFS||String||/user/appuser/output|
|dt.operator.fileOutput.prop.outputFileName||Output file name||String||output.txt|
|dt.operator.kafkaInput.prop.clusters||Comma separated list of kafka-brokers||String||node1.company.com:9098, node2.company.com:9098, node3.company.com:9098|
|dt.operator.kafkaInput.prop.initialOffset||Initial offset to read from Kafka||String||
|dt.operator.kafkaInput.prop.topics||Topics to read from Kafka||String||event_data|
There are pre-saved configurations based on the application environment. Recommended settings for datatorrent sandbox are in
sandbox-memory-conf.xml and for a cluster environment in
|dt.operator.fileOutput.prop.maxLength||Maximum length for output file after which file is rotated||long||Long.MAX_VALUE||Long.MAX_VALUE|
Steps to customize the application
Make sure you have following utilities installed on your machine and available on
PATHin environment variables
Use following command to clone the examples repository:
git clone email@example.com:DataTorrent/app-templates.git
Change directory to 'examples/tutorials/kafka-to-hdfs-sync':
Import this maven project in your favorite IDE (e.g. eclipse).
Change the source code as per your requirements. Some tips are given as commented blocks in the Application.java for this project
Make respective changes in the test case and
properties.xmlbased on your environment.
Compile this project using maven:
mvn clean package
This will generate the application package with
.apaextension in the
Go to DataTorrent UI Management console on web browser. Click on the
Developtab from the top navigation bar.
Application Packagesfrom the list.
upload packagebutton and upload the generated
Application package page is shown with the listing of all packages. Click on the
Launchbutton for the uploaded application package.
Follow the steps for launching an application.