HDFS to HDFS filter transform Application
This application template demonstrates continuous big data preparation while reading data from a source Hadoop cluster. This data is considered to be in delimited format, which is further filtered, transformed based on configurable properties. Finally, this prepared data is written back in a desired format to the destination Hadoop cluster. This could be easily utilized and extended by any developer to create a fast, fault tolerant and scalable Big Data Application to serve business with rich data.
Please send feedback or feature requests to : firstname.lastname@example.org
Join our user discussion group at : email@example.com
End user must specify the values for these properties.
|Input Directory Or File Path||String||
||HDFS path for input file or directory|
|Output Directory Path||String||/user/appuser/output||HDFS path for the output directory. Generally, this refers to path on the hadoop cluster on which app is running.|
Advanced Properties (optional)
|Block Size For Hdfs Splitter||1048576 (1MB)||long||No of bytes record reader operator would consider at a time for splitting records. Record reader might add latencies for higher block sizes. Suggested value is 1-10 MB|
|Maximum Readers For Dynamic Partitioning||1||int||Maximum no of partitions for Block Reader operator.|
|Minimum Readers For Dynamic Partitioning||1||int||Maximum no of partitions for Block Reader operator.|
|Number Of Blocks Per Window||1||int||File splitter will emit these many blocks per window for downstream operators.|