Arbiter performance for HP tuning

Hi, i am currently working on CNN and using arbiter to tune the hyperparameter.
i tried using network model below.
as i read performance guide for DL4J, GPU cuDNN computing is only implemented to specific layer object and not arbiter layer space. correct me if i am wrong.

as i do comparison with python HP tuning. it is able to evaluate 20 models in 130 seconds
while in arbiter for 20 models, i takes at 13 minutes

is it normal for arbiter?
i already improve my heap memory and batch size for the arbiter. to match condition of python
i am calculating in both scenario using GPU. using CUDA 10.0
GPU : Nvidia RTX 2060
CPU : AMD Ryzen 5 2600
Memory : 16 GB
parameters for hyperparameter tuning : size of each convolutional layer, learning rate
model that is being compared are the same with same mnist dataaset.

am i doing it right?

        ParameterSpace<Double> learningRateHyperparam = new ContinuousParameterSpace(0.0001, 0.1);  //Values will be generated uniformly at random between 0.0001 and 0.1 (inclusive)
        ParameterSpace<Integer> layerSizeHyperparam = new IntegerParameterSpace(16, 256);            //Integer values will be generated uniformly at random between 16 and 256 (inclusive)
        ParameterSpace<Integer> inputLayerSizeHyperparam = new IntegerParameterSpace(32,256);
        MultiLayerSpace hyperparameterSpace = new MultiLayerSpace.Builder()
            //hyperparameter space for lerarning rate
            .updater(new SgdSpace(learningRateHyperparam))
            .addLayer(new ConvolutionLayerSpace.Builder()
            .addLayer(new SubsamplingLayerSpace.Builder()
                .kernelSize(3, 3)
            .addLayer(new ConvolutionLayerSpace.Builder()
            .addLayer(new ConvolutionLayerSpace.Builder()
            .addLayer(new ConvolutionLayerSpace.Builder()
            .addLayer(new OutputLayerSpace.Builder()

thanks in advance for your advice