Create Pipelines using your own Apache Flink® Jobs

A pipeline is a set of data processing instructions that are written in SQL or expressed as an Apache Flink® job. Pipelines that are created from an Apache Flink job are called custom pipelines. When you create a custom pipeline, you are writing how you want your data to be processed in a JVM-based programming language of your choosing, such as Java or Scala. For example, you can create a pipeline that enriches incoming data by invoking an API from within the pipeline and sends that enriched data to a destination of your choosing.

If you are a developer with a use case where SQL is too inflexible or if you have an existing Apache Flink workload that you would like to migrate and use in Decodable, then create a custom pipeline. Once created, you can upload the pipeline to Decodable as a JAR file where it can be managed alongside any SQL-based pipelines that you have.

This feature is currently available as a Tech Preview. If you would like early access to this feature, contact us or request access in Decodable Web on the Pipelines page.

In addition, Role-based access control is not currently supported for custom pipelines. To access Decodable streams in a custom pipeline, use the Decodable pipeline SDK. See this example.

Create and upload a custom pipeline

Perform the following steps to create an Apache Flink job and upload it as a custom pipeline in Decodable.

Prerequisites

Before creating an Apache Flink job, confirm that you have the following installed on your machine:

  • Apache Maven

  • Flink 1.16.1 or higher

  • IntelliJ or any other IDE of your choosing

    • We recommend using IntelliJ since it supports Maven projects out-of-the-box.

    • In order for applications to run within IntelliJ, double check that the Include dependencies with "Provided" scope setting has been enabled in the run configuration.

include dependencies screenshot

  • Java 8 or 11

Steps

Perform the following steps to create an Apache Flink job that can be uploaded as a custom Decodable pipeline.

  1. Use Apache Maven to initialize a new Java application.

    mvn archetype:generate \
        -DarchetypeGroupId=org.apache.flink \
        -DarchetypeArtifactId=flink-quickstart-java \
        -DarchetypeVersion=1.16.1 \
        -DgroupId=org.example \
        -DartifactId=decodable-custom-pipeline \
        -Dversion=0.1 \
        -Dpackage=quickstart \
        -DinteractiveMode=false
  2. Navigate to the decodable-custom-pipeline directory. There you will find a pom.xml file with dependency definitions and a src/ directory.

  3. Import the decodable-custom-pipeline directory into IntelliJ or an IDE of your choosing.

  4. Start developing your Flink job with the Decodable Pipeline SDK. See the following pages for further assistance.

    1. See Decodable Pipeline SDK for instructions on retrieving the SDK.

    2. See the Decodable Pipeline SDK Javadoc for documentation on the SDK.

  5. When you are finished developing your Flink job, package the job into a JAR file in order to upload it to Decodable.

    $ pwd
    /private/tmp/decodable-custom-pipeline
    $ mvn clean package
  6. Make sure the target/ directory now contains a file called decodable-custom-pipeline-0.1.jar.

    $ ls target
    classes                                    maven-archiver
    decodable-custom-pipeline-0.1.jar          maven-status
    generated-sources
  7. Upload the JAR file to Decodable.

    • Using Decodable Web:

      • Navigate to the Pipelines page.

      • Select the dropdown icon next to New Pipeline, and then select Upload Custom Pipeline. Follow the prompts on the page to upload the pipeline.

    • Using the Decodable CLI:

      • Run the following command.

        decodable pipeline create --name <some name> --job-file target/decodable-custom-pipeline-0.1.jar

        The arguments are defined as follows:
        --name: The name that you want to assign to the pipeline.
        --job-file: The path to the JAR file containing the Flink job that you want to upload.
        --job-arguments: Optional job arguments for the custom pipeline.
        --entry-class: Optional entry class of the custom pipeline. If not provided, the entry class must be specified in the file META-INF/MANIFEST.MF in the pipeline’s JAR file, using the 'Main-Class' property key.

The uploaded custom pipeline now appears on the Pipelines page, and you can activate them to start processing data!

Monitor a custom pipeline

Once you’ve activated your custom pipeline, you’ll see some basic information about the pipeline.

The Flink Web Dashboard button allows you to open the Flink UI in a new window. You can use the Flink UI to view health and performance metrics, such as checkpoint monitoring, backpressure monitoring and general throughput information.

custom pipeline ui

Update a custom pipeline

Every time you start a custom pipeline, the latest JAR file for that pipeline is used. To update an existing custom pipeline, do the following.

  1. Upload a new JAR file for the pipeline you want to update.

  2. Restart the pipeline to pick up the latest JAR file.

State management

When you stop a pipeline, the state of your pipeline is automatically saved and backed up using a Flink savepoint or checkpoint. When you start the pipeline again, we’ll use the state to determine where to resume data processing.

If you want to start a custom pipeline without restoring its state, use the --force flag in the CLI or the Discard State option in the UI.