I have a bunch of flows to/from kafka, hdfs, file and database.
Currently I’m running these as separate JVM processes, each with its own configuration.
Is there any orchestration tool or user interface similar apache nifi for deploying this kind of setup ?
If not, would anyone be interested in an open source project for this?
Lightbend recently released Lightbend Pipelines, a tool aimed at application developers building and deploying streaming data pipelines comprising Akka Streams and Spark (with more options, like Flink, coming) processing steps (in separate JVMs) on Kubernetes. Each of these processing steps are called “streamlets”, and streamlets are then wired up via a blueprint, and deployed in a single CLI command. Streamlets communicate to one another via Kafka. Our target use case is an application, and not really data scientists. HDFS can be treated as an egress, as can file and database, but our main goal is to ingest, process, and then serve real-time analysis on streams of data. You can learn more about Pipelines via our documentation at https://developer.lightbend.com/docs/pipelines/current/, or read a higher-level description of it at https://www.lightbend.com/lightbend-pipelines-accelerate-real-time-journey.