Tiny YOLO PredictedObjects NaN

Hi guys,

Im trying to run the code from this page: How to build a custom object detector using Yolo, It is an object detector for a rubix cube.

Below the code, i did a little changes:

    package com.dl4j.yolo.sample;   
 
    import java.io.File;
    import java.io.IOException;
    import java.io.Serializable;
    import java.net.URI;
    import java.util.List;
    import java.util.Random;
    
    import org.bytedeco.opencv.opencv_java;
    import org.datavec.api.io.filters.BalancedPathFilter;
    import org.datavec.api.io.labels.ParentPathLabelGenerator;
    import org.datavec.api.records.metadata.RecordMetaDataImageURI;
    import org.datavec.api.split.FileSplit;
    import org.datavec.api.split.InputSplit;
    import org.datavec.image.loader.NativeImageLoader;
    import org.datavec.image.recordreader.objdetect.ObjectDetectionRecordReader;
    import org.datavec.image.recordreader.objdetect.impl.VocLabelProvider;
    import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;
    import org.deeplearning4j.nn.api.OptimizationAlgorithm;
    import org.deeplearning4j.nn.conf.ConvolutionMode;
    import org.deeplearning4j.nn.conf.GradientNormalization;
    import org.deeplearning4j.nn.conf.WorkspaceMode;
    import org.deeplearning4j.nn.conf.inputs.InputType;
    import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
    import org.deeplearning4j.nn.conf.layers.objdetect.Yolo2OutputLayer;
    import org.deeplearning4j.nn.graph.ComputationGraph;
    import org.deeplearning4j.nn.layers.objdetect.DetectedObject;
    import org.deeplearning4j.nn.transferlearning.FineTuneConfiguration;
    import org.deeplearning4j.nn.transferlearning.TransferLearning;
    import org.deeplearning4j.nn.weights.WeightInit;
    import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
    import org.deeplearning4j.util.ModelSerializer;
    import org.deeplearning4j.zoo.model.TinyYOLO;
    import org.nd4j.linalg.activations.Activation;
    import org.nd4j.linalg.api.ndarray.INDArray;
    import org.nd4j.linalg.dataset.DataSet;
    import org.nd4j.linalg.dataset.api.preprocessor.ImagePreProcessingScaler;
    import org.nd4j.linalg.factory.Nd4j;
    import org.nd4j.linalg.learning.config.RmsProp;
    import org.opencv.core.Mat;
    import org.opencv.core.Point;
    import org.opencv.core.Scalar;
    import org.opencv.imgcodecs.Imgcodecs;
    import org.opencv.imgproc.Imgproc;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    
    public class YOLOTrainer {
    	 private static final Logger log = LoggerFactory.getLogger(YOLOTrainer.class);
    
    	    private static final int INPUT_WIDTH = 416;
    	    private static final int INPUT_HEIGHT = 416;
    	    private static final int CHANNELS = 3;
    
    	    private static final int GRID_WIDTH = 13;
    	    private static final int GRID_HEIGHT = 13;
    	    private static final int CLASSES_NUMBER = 1;
    	    private static final int BOXES_NUMBER = 5;
    	    private static final double[][] PRIOR_BOXES = {{1.5, 1.5}, {2, 2}, {3, 3}, {3.5, 8}, {4, 9}};
    
    	    private static final int BATCH_SIZE = 4;
    	    private static final int EPOCHS = 50;
    	    private static final double LEARNIGN_RATE = 0.0001;
    	    private static final int SEED = 7854;
    
    	    /*parent Dataset folder "DATA_DIR" contains two subfolder "images" and "annotations" */
    	    private static final String DATA_DIR = "C:\\Java\\Dataset";
    
    	    /* Yolo loss function prameters for more info
    	    https://stats.stackexchange.com/questions/287486/yolo-loss-function-explanation*/
    	    private static final double LAMDBA_COORD = 1.0;
    	    private static final double LAMDBA_NO_OBJECT = 0.5;
    
    	    public static void main(String[] args) throws IOException, InterruptedException {
    
    	        Random rng = new Random(SEED);
    
    	        //Initialize the user interface backend, it is just as tensorboard.
    	        //it starts at http://localhost:9000
    	        //UIServer uiServer = UIServer.getInstance();
    
    	        //Configure where the network information (gradients, score vs. time etc) is to be stored. Here: store in memory.
    	        //StatsStorage statsStorage = new InMemoryStatsStorage();
    
    	        //Attach the StatsStorage instance to the UI: this allows the contents of the StatsStorage to be visualized
    	        //uiServer.attach(statsStorage);
    
    	        File imageDir = new File(DATA_DIR, "images");
    
    	        log.info("Load data...");
    	        
    	        ParentPathLabelGenerator LABEL_GENERATOR_MAKER = new ParentPathLabelGenerator();
    	        BalancedPathFilter PATH_FILTER = new BalancedPathFilter(rng, NativeImageLoader.ALLOWED_FORMATS, LABEL_GENERATOR_MAKER);
    
    	        InputSplit[] data = new FileSplit(imageDir, NativeImageLoader.ALLOWED_FORMATS, rng).sample(PATH_FILTER, 85, 15);
    	        InputSplit trainData = data[0];
    	        InputSplit testData = data[1];
    
    	        ObjectDetectionRecordReader recordReaderTrain = new ObjectDetectionRecordReader(INPUT_HEIGHT, INPUT_WIDTH, CHANNELS,
    	                GRID_HEIGHT, GRID_WIDTH, new VocLabelProvider(DATA_DIR));
    	        recordReaderTrain.initialize(trainData);
    
    	        ObjectDetectionRecordReader recordReaderTest = new ObjectDetectionRecordReader(INPUT_HEIGHT, INPUT_WIDTH, CHANNELS,
    	                GRID_HEIGHT, GRID_WIDTH, new VocLabelProvider(DATA_DIR));
    	        recordReaderTest.initialize(testData);
    
    	        RecordReaderDataSetIterator train = new RecordReaderDataSetIterator(recordReaderTrain, BATCH_SIZE, 1, 1, true);
    	        train.setPreProcessor(new ImagePreProcessingScaler(0, 1));
    
    	        RecordReaderDataSetIterator test = new RecordReaderDataSetIterator(recordReaderTest, BATCH_SIZE, 1, 1, true);
    	        test.setPreProcessor(new ImagePreProcessingScaler(0, 1));
    
    	        /*
    	        ComputationGraph pretrained = (ComputationGraph) TinyYOLO.builder().build().initPretrained();
    
    	        INDArray priors = Nd4j.create(PRIOR_BOXES);
    	        FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
    	                .seed(SEED)
    	                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    	                .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
    	                .gradientNormalizationThreshold(1.0)
    	                .updater(new RmsProp(LEARNIGN_RATE))
    	                .activation(Activation.IDENTITY).miniBatch(true)
    	                .trainingWorkspaceMode(WorkspaceMode.ENABLED)
    	                .build();
    
    	        ComputationGraph model = new TransferLearning.GraphBuilder(pretrained)
    	                .fineTuneConfiguration(fineTuneConf)
    	                .setInputTypes(InputType.convolutional(INPUT_HEIGHT, INPUT_WIDTH, CHANNELS))
    	                .removeVertexKeepConnections("conv2d_9")
    	                .removeVertexKeepConnections("outputs")
    	                .addLayer("convolution2d_9",
    	                        new ConvolutionLayer.Builder(1, 1)
    	                                .nIn(1024)
    	                                .nOut(BOXES_NUMBER * (5 + CLASSES_NUMBER))
    	                                .stride(1, 1)
    	                                .convolutionMode(ConvolutionMode.Same)
    	                                .weightInit(WeightInit.UNIFORM)
    	                                .hasBias(false)
    	                                .activation(Activation.IDENTITY)
    	                                .build(), "leaky_re_lu_8")
    	                .addLayer("outputs",
    	                        new Yolo2OutputLayer.Builder()
    	                                .lambdaNoObj(LAMDBA_NO_OBJECT)
    	                                .lambdaCoord(LAMDBA_COORD)
    	                                .boundingBoxPriors(priors)
    	                                .build(), "convolution2d_9")
    	                .setOutputs("outputs")
    	                .build();
    
    	        log.info("\n Model Summary \n" + model.summary());
    
    	        log.info("Train model...");
    	        model.setListeners(new ScoreIterationListener(1));//print score after each iteration on stout 
    	        //model.setListeners(new StatsListener(statsStorage));// visit http://localhost:9000 to track the training process
    	        for (int i = 0; i < EPOCHS; i++) {
    	            train.reset();
    	            while (train.hasNext()) {
    	                model.fit(train.next());
    	            }
    	            log.info("*** Completed epoch {} ***", i);
    	        }
    
    	        log.info("*** Saving Model ***");
    	        ModelSerializer.writeModel(model, "C:\\Java\\model.data", true);
    	        log.info("*** Training Done ***");
    	           	        
    	        
    	        URI[] loc = testData.locations();
    	        for (int i = 0; i < loc.length; i++) {
    				URI uri = loc[i];
    				Mat image = Imgcodecs.imread(uri.getPath().substring(1));
    				
    				List<DetectedObject> objs = detect(image, model);
    	        	boolean found = addRects(image, objs);
    	        	String name = String.format("NF_%s.jpg", i);
    	        	
    	        	if(found) {
    	        		name = String.format("F_%s.jpg", i);
    	        	}
    	        	
    	        	Imgcodecs.imwrite("C:\\Java\\test\\" + name, image);
    			}	       
    	    }
    	    
    	    public static List<DetectedObject> detect(Mat image, ComputationGraph model) throws IOException {
    	    	org.deeplearning4j.nn.layers.objdetect.Yolo2OutputLayer yout = (org.deeplearning4j.nn.layers.objdetect.Yolo2OutputLayer) model.getOutputLayer(0);
    	    	 
    	    	 NativeImageLoader loader = new NativeImageLoader(INPUT_HEIGHT, INPUT_WIDTH, CHANNELS);
    	         INDArray ds = loader.asMatrix(image);        
    	         ImagePreProcessingScaler scaler = new ImagePreProcessingScaler(0, 1);
    	         scaler.transform(ds);
    	         
    	         INDArray results = model.outputSingle(ds);
    	         List<DetectedObject> objs = yout.getPredictedObjects(results, 0.4);	         
    	         
    	         return objs;
    	    }
    	    
    	    public static boolean addRects(Mat image, List<DetectedObject> objs) {
    	    	boolean result = false;
    	    	Scalar color = new Scalar(0, 0, 255);
    	    	for (int i = 0; i < objs.size(); i++) {
    				DetectedObject obj = objs.get(i);
    				
    				int imgW = image.width();
    				int imgH = image.height();
    				
    				double[] xy1 = obj.getTopLeftXY();
    				double[] xy2 = obj.getBottomRightXY();
    				
    				int x1 = (int) Math.round(imgW * xy1[0] / GRID_WIDTH);
    				int y1 = (int) Math.round(imgH * xy1[1] / GRID_HEIGHT);
    				int x2 = (int) Math.round(imgW * xy2[0] / GRID_WIDTH);
    				int y2 = (int) Math.round(imgH * xy2[1] / GRID_HEIGHT);
    				
    				if(x1 == 0 && y1 == 0 && x2 == 0 && y2 == 0) {
    					continue;
    				}
    				
    				result = true;
    				Imgproc.rectangle(image, new Point(x1, y1), new Point(x2, y2), color);			
    			}
    	    	
    	    	return result;
    	    }
    }

Dataset can be downloaded from here.

The problem is when i try to test the model, all the detected objects return NaN
Capture_NaN

Any hints on this topic would be very helpful.
Thanks.

@lquintero07 I have no idea but this is a cool project. I tried the yolo basic but I think Ill try this soon. Let me know of you get it to work.

Can you add this:

     Nd4j.getExecutioner().setProfilingConfig(ProfilerConfig.builder()
                .checkForINF(true)
                .checkElapsedTime(true)
                .checkLocality(true)
                .checkWorkspaces(true)
                .build());

NANs are generally an indicator of a bad dataset or tuning. I’d be curious when it NANs.

Hey hi,

Thanks for your response.

I deleted the previous model, so i trained it again with the lines you said at the start of main method.

This time i dont get NaN values but results dont seems good

F_3

F_5

When i was testing i see NaN from this line: INDArray results = model.outputSingle(ds);

Hi, I’m sorry to bump this old topics but I got these same error with my dataset after training it. It also occurs when I use the example for the TinyYoloHouseNumberDetection provided here : https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/advanced/modelling/objectdetection/TinyYoloHouseNumberDetection.java

I don’t even change anything from the example code and run it as is.
I eventually add this code to the example

and I got this exception :
Exception in thread "main" org.nd4j.linalg.exception.ND4JOpProfilerException: P.A.N.I.C.! Op.Z() contains 43264 Inf value(s) at org.nd4j.linalg.api.ops.executioner.OpExecutionerUtil.checkForInf(OpExecutionerUtil.java:94) at org.nd4j.linalg.api.ops.executioner.OpExecutionerUtil.checkForInf(OpExecutionerUtil.java:129) at org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner.profilingConfigurableHookOut(DefaultOpExecutioner.java:558) at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1993) at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6575) at org.deeplearning4j.nn.layers.mkldnn.MKLDNNConvHelper.preOutput(MKLDNNConvHelper.java:166) at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ConvolutionLayer.java:401) at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:489) at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111) at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380) at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1793) at org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(ComputationGraph.java:1775) at org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(ComputationGraph.java:1761) at org.deeplearning4j.nn.graph.ComputationGraph.outputSingle(ComputationGraph.java:1639) at TinyYoloHouseNumberDetection.main(TinyYoloHouseNumberDetection.java:215)

FYI I have able to use the TinyYolo example successfully before even with my own dataset. But now I can’t seems to get it right, it does the training process and everything seems to be fine but it still outputs the NaN after the training process.
It all happen after I install Cuda to speed up the training process but I’ve not even use it yet since I’m not done installing the cuDNN nor do I change my backed in the pom file.

Also the debug by ND4J for the AsyncMultiDataSetIterator saying Manually destroying AMDSI Workspace started to appear in the training process out of nowhere and my model start to be outputing NaN.

I don’t know if I’m stupid but I been debugging these things for quite some times now and not even get it working. If anyone know what causes this things please enlighten me, I really appreciate it. Thank You :pray:

Hello everyone, sorry to bump this old thread again.
I just want to say that I’ve got it working for me now, but since I cannot edit the reply anymore I’ll just post a new reply here :

For my case it’s the debug by ND4J for the AsyncMultiDataSetIterator saying Manually destroying AMDSI that causes the problem. I simply change the model.fit(train,nEpoch) with

for (int i = 0; i < nEpochs; i++) {
            while (train.hasNext()) {
                DataSet d = train.next();
                model.fit(d);
            }
            train.reset();
        }

and everything is running fine again, maybe the preloading stuff corrupt the data for me but idk for sure.
Hopefully this reply helps anyone with the same problem as mine :pray:t2:

Note : I’m using the newest version available (beta7)

@artinmare thanks for elaborating on this. Sorry just doing QA for the release. Anything where we have to dig too much I would have to do later. Could you elaborate a bit on what you think your issue was so others won’t repeat this? I"m wondering if there’s a bug in the AMDSI?

@agibsonccc
No problem, Happy I can help.

I think the issue might have related to the small amount of RAM that available for the training process ? Something along the buffering of the image to the memory for faster training ? since I’m trying to train the model using Laptop with just 8 GB of RAM and windows might free the memory or fill another data into the same address thus corrupted the data ?

For everyone with limited amount of RAM I suggesting to not using the preload method for now and use the Workaround if you got into the same problem.
For a reference, these guys here https://gist.github.com/saudet/fb8a4d9544dc3c411b302ccd6bbf87e7 and here Emaraic - How to build a custom object detector using Yolo also use the same code to train the data instead of the preload method. The Emaraic guy also state that he use old computer so it also might be the case ?

I don’t know much about how JVM handle the memory so I cannot give more proper information with the issue here. (I’ll try to read more about it)
But I don’t think it’s the AMDSI code itself since I don’t find anything wrong with the code, maybe the Workspace ? (I’ve yet to read into that part of the code). I’ll try to reread the code again and test it with different setup. If I found anything I’ll reply as soon as possible and give more elaborated answer. Thank You :pray:t2:

UPDATE :
As what I have guessed before, I’ve found that the Memory is what causes the problem. My guess is the JVM free up some memory and accidentally corrupted the training data that have been preloaded into the memory.
I’ve tested to limit the memory usage using this guide here https://deeplearning4j.konduit.ai/config/config-memory and everything is now working as intended.
I guess letting the JVM to automatically allocate the memory is not the best practice in the first place.

I’ve yet to try overfitting the network (only doing 200 Epoch) so I don’t know if it really is the answer but I hope it will not causes anymore NaN even if I overfitting it. The Inf exception is also not occurring so it’s a good news.

The training doing really great at 85% average confidence so I call it a success. I’ll try any other configuration to make sure everything is really fine. I’ll be back with more update