Pipeline Design Pattern in Java

Welcome to my development blog! In this post, we will see one of the most used design patterns in the software development industry, we are talking about the Pipeline Design Pattern. Let's explore together how this design pattern can help us to process and transform data efficiently.

The Pipeline Design Pattern belongs to the family of structural design patterns. Structural design patterns focus on how classes and objects in a system are composed and structured to achieve greater flexibility and modularity.

It is especially useful when we need to perform multiple transformations or manipulations on our input data. It provides an elegant way to chain different processing steps, where each step handles a specific task, and the result is passed to the next step transparently.

Problem description

Imagine you have a text string containing a mixture of numbers and letters, separated by commas. Your goal is to extract the numbers from this string and then calculate the Fibonacci numbers for each of them. Finally, you must print the resulting Fibonacci numbers in descending order.

The challenge lies in efficiently processing the input string, filtering only the numbers and applying the calculation of the Fibonacci numbers. In addition, it is important to sort the results in descending order to present them clearly.

In summary, the problem is to develop a program that can receive a text string with numbers and letters separated by commas, extract the numbers, calculate the Fibonacci numbers for each of them and display the results in descending order. The solution must be efficient and understandable, ensuring that the processing steps are properly accomplished.

Solution with Pipeline Design Pattern

We will address this challenge using the Pipeline Design Pattern. It provides us with an organized and modular structure for processing data in sequence, allowing us to chain a series of processing steps efficiently.

First, we will define an interface called Step that will represent each of the steps in the pipeline. This interface will have a process method that will take an input of one type and produce an output of another type.

public interface Step<T, R> {
    R process(T input);
}

Next, we will create three classes that will implement the Step interface and represent the specific steps of our pipeline.

The first step, FibonacciNumbersInputExtractStep, will extract the numbers from an input string. We will use regular expressions to filter the string elements and make sure they are valid numbers. The code is shown below:

public class FibonacciNumbersInputExtractStep implements Step<String, List<String>> {

    /**
     * This method is used to extract the numbers from a string
     * @param input a string
     * @return a list of strings
     */
    @Override
    public List<String> process(String input) {
        System.out.println("FibonacciNumbersInputExtractStep.process");
        System.out.printf("input: %s%n", input);
        List<String> result = Stream.of(input.split(","))
                .filter(s -> s.matches("\\d+"))
                .toList();

        System.out.printf("output: %s%n", result);
        return result;
    }
}

The second step, FibonacciTransformStep, will receive a list of extracted numbers and will take care of calculating the Fibonacci numbers for each of them. It will also sort the results in descending order. The code is shown below:

public class FibonacciTransformStep implements Step<List<String>, List<Integer>> {

    /**
     * This method is used to calculate the fibonacci numbers from a list of strings
     * and sort them in descending order
     * @param input a list of strings
     * @return a list of integers sorted in descending order
     */
    @Override
    public List<Integer> process(List<String> input) {
        System.out.println("FibonacciTransformStep.process");
        System.out.printf("input: %s%n", input);
        List<Integer> result = input.stream()
                .map(Integer::parseInt)
                .map(FibonacciTransformStep::fibonacci)
                .sorted(Comparator.reverseOrder())
                .toList();

        System.out.printf("output: %s%n", result);
        return result;
    }

    private static Integer fibonacci(Integer n) {
        if (n <= 1) return n;
        return fibonacci(n-1) + fibonacci(n-2);
    }
}

In the code above, we have implemented the process method that takes the list of extracted numbers, calculates the Fibonacci numbers for each of them using the fibonacci method and then sorts them in descending order.

Now, let's move on to the last step of the pipeline: FibonacciLoadStep. This step will receive the calculated Fibonacci numbers and take care of printing them. The code is shown below:

public class FibonacciLoadStep implements Step<List<Integer>, List<Integer>> {

    /**
     * This method is used to print the fibonacci numbers
     * @param input a list of integers
     * @return the same list of integers
     */
    @Override
    public List<Integer> process(List<Integer> input) {
        System.out.println("FibonacciLoadStep.process");
        System.out.printf("input: %s%n", input);
        System.out.printf("output: %s%n", input);
        return input;
    }
}

In the above code, we have implemented the process method that takes the list of Fibonacci numbers and prints them to the console.

Now, let's use the Pipeline design pattern to chain these steps and solve our problem.

The EtlPipeLine class will be responsible for organizing and executing the pipeline steps. It takes an object that implements the Step interface in its constructor to set the first step. It also has an addStep method to add additional steps to the pipeline. The execute method will execute the pipeline, processing the input through the steps sequentially and returning the final result. The code is shown below:

public class EtlPipeLine<T, R> {

    private final Step<T, R> currentStep;
    public EtlPipeLine(Step<T, R> currentStep) {
        this.currentStep = currentStep;
    }

    public <K> EtlPipeLine<T, K> addStep(Step<R, K> newStep) {
        return new EtlPipeLine<>(input -> newStep.process(currentStep.process(input)));
    }

    public R execute(T input) {
        return currentStep.process(input);
    }

}

Finally, in the PipelineExample class, we will use the EtlPipeLine class to create an instance of the pipeline and add the three steps in order. The code is shown below:

public class PipelineExample {
public static void main(String[] args) {
var pipeline = new EtlPipeLine<>(new FibonacciNumbersInputExtractStep())
.addStep(new FibonacciTransformStep())
.addStep(new FibonacciLoadStep());

        pipeline.execute("hsdhs,sds,1,2,a,3,4,5,6,7,8,9,10");
    }
}

When running the program, we will get the following output:

Task :PipelineExample.main()

FibonacciNumbersInputExtractStep.process
input: hsdhs,sds,1,2,a,3,4,5,6,7,8,9,10
output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

FibonacciTransformStep.process
input: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
output: [55, 34, 21, 13, 8, 5, 3, 2, 1, 1]

ReverseFibonacciLoadStep.process
input: [55, 34, 21, 13, 8, 5, 3, 2, 1, 1]
output: [55, 34, 21, 13, 8, 5, 3, 2, 1, 1]

Now that we have seen how the Pipeline design pattern works, let's take a look at its advantages and disadvantages.

Advantages

The Pipeline design pattern has a number of advantages, including

Efficiency: The Pipeline design pattern allows you to parallelize the processing of data, which can improve the performance of your program.
Flexibility: The Pipeline design pattern allows you to easily add or remove steps, which makes it very flexible.
Scalability: The Pipeline design pattern is scalable, as you can add more steps to the pipeline to increase the processing power.

Disadvantages

The Pipeline design pattern also has a few disadvantages, including

Complexity: The Pipeline design pattern can be complex, especially if you have a lot of steps.
Difficulty debugging: The Pipeline design pattern can be difficult to debug, especially if you have a lot of steps.
Difficulty maintaining: The Pipeline design pattern can be difficult to maintain, especially if you have a lot of steps.

Conclusions

In conclusion, the Pipeline design pattern provides a structured and modular solution for processing data sequentially. By chaining together a series of steps, the pipeline improves code readability, maintainability and reusability. Although it can present complexity and debugging challenges in more complex pipelines, the Pipeline pattern offers flexibility and extensibility, making it a valuable tool for addressing data processing problems in a sequential flow.

All the code snippets mentioned in the article can be found on GitHub.