Arbiter performance for HP tuning

Hi, i am currently working on CNN and using arbiter to tune the hyperparameter.
i tried using network model below.
as i read performance guide for DL4J, GPU cuDNN computing is only implemented to specific layer object and not arbiter layer space. correct me if i am wrong.

as i do comparison with python HP tuning. it is able to evaluate 20 models in 130 seconds
while in arbiter for 20 models, i takes at 13 minutes

is it normal for arbiter?
i already improve my heap memory and batch size for the arbiter. to match condition of python
i am calculating in both scenario using GPU. using CUDA 10.0
GPU : Nvidia RTX 2060
CPU : AMD Ryzen 5 2600
Memory : 16 GB
parameters for hyperparameter tuning : size of each convolutional layer, learning rate
model that is being compared are the same with same mnist dataaset.

am i doing it right?

        ParameterSpace<Double> learningRateHyperparam = new ContinuousParameterSpace(0.0001, 0.1);  //Values will be generated uniformly at random between 0.0001 and 0.1 (inclusive)
        ParameterSpace<Integer> layerSizeHyperparam = new IntegerParameterSpace(16, 256);            //Integer values will be generated uniformly at random between 16 and 256 (inclusive)
        ParameterSpace<Integer> inputLayerSizeHyperparam = new IntegerParameterSpace(32,256);
        MultiLayerSpace hyperparameterSpace = new MultiLayerSpace.Builder()
            .weightInit(WeightInit.XAVIER)
            //hyperparameter space for lerarning rate
            .updater(new SgdSpace(learningRateHyperparam))
            .l2(0.0001)
            .seed(123)
            .setInputType(InputType.convolutionalFlat(28,28,1))
            .addLayer(new ConvolutionLayerSpace.Builder()
                .kernelSize(3,3)
                .nOut(inputLayerSizeHyperparam)
                .activation(Activation.RELU)
                .build())
            .addLayer(new SubsamplingLayerSpace.Builder()
                .poolingType(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(3, 3)
                .stride(1,1)
                .build())
            .addLayer(new ConvolutionLayerSpace.Builder()
                .kernelSize(3,3)
                .nOut(layerSizeHyperparam)
                .activation(Activation.RELU)
                .build())
            .addLayer(new ConvolutionLayerSpace.Builder()
                .kernelSize(3,3)
                .nOut(layerSizeHyperparam)
                .activation(Activation.RELU)
                .build())
            .addLayer(new ConvolutionLayerSpace.Builder()
                .kernelSize(3,3)
                .nOut(layerSizeHyperparam)
                .activation(Activation.RELU)
                .build())
            .addLayer(new OutputLayerSpace.Builder()
                .nOut(10)
                .lossFunction(LossFunctions.LossFunction.MCXENT)
                .activation(Activation.SOFTMAX)
                .build())
            .numEpochs(1)
            .build();

thanks in advance for your advice