Hi everyone, I have been trying to make the Unet model using DL4J. In the line computational_graph.fit() I get following error
(I guess it is the memory error right? Is there any way to solve it or is it limited by the laptop configurations) Let me know if any more details are required
at org.nd4j.nativeblas.Nd4jCpu.mallocHost(Native Method)
at org.nd4j.linalg.cpu.nativecpu.CpuMemoryManager.allocate(CpuMemoryManager.java:48)
at org.nd4j.linalg.api.memory.abstracts.Nd4jWorkspace.alloc(Nd4jWorkspace.java:421)
at org.nd4j.linalg.api.memory.abstracts.Nd4jWorkspace.alloc(Nd4jWorkspace.java:320)
at org.nd4j.linalg.cpu.nativecpu.buffer.BaseCpuDataBuffer.<init>(BaseCpuDataBuffer.java:492)
at org.nd4j.linalg.cpu.nativecpu.buffer.FloatBuffer.<init>(FloatBuffer.java:68)
at org.nd4j.linalg.cpu.nativecpu.buffer.DefaultDataBufferFactory.create(DefaultDataBufferFactory.java:329)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1467)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:324)
at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:191)
at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.createUninitialized(CpuNDArrayFactory.java:226)
at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4364)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ConvolutionLayer.java:442)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:505)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110)
at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2135)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1372)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1018)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1006)
at activeSegmentation.deepLearning.UNet1.run(UNet1.java:111)
at activeSegmentation.deepLearning.UNet1.main(UNet1.java:287)
18:26:15.587 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT Version,ProductType,BuildNumber,CSDVersion,SuiteMask FROM Win32_OperatingSystem
18:26:16.055 [main] DEBUG oshi.software.os.windows.WindowsOSVersionInfoEx - Initialized OSVersionInfoEx
18:26:17.393 [main] DEBUG oshi.hardware.common.AbstractCentralProcessor - Oracle MXBean detected.
18:26:17.422 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace
18:26:17.422 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT ProcessorID FROM Win32_Processor
18:26:17.461 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace
18:26:17.461 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT Name,PercentIdleTime,PercentPrivilegedTime,PercentUserTime,PercentInterruptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation WHERE NOT Name LIKE "%_Total"
18:26:24.431 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace
18:26:24.431 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT PercentInterruptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation WHERE Name="_Total"
18:26:24.446 [main] DEBUG oshi.hardware.platform.windows.WindowsCentralProcessor - Initialized Processor
18:26:24.687 [main] ERROR org.deeplearning4j.util.CrashReportingUtil - >>> Out of Memory Exception Detected. Memory crash dump written to: G:\ACTIVESEGMENTATION-testBranch\dl4j-memory-crash-dump-1641214573133_1.txt
18:26:24.688 [main] WARN org.deeplearning4j.util.CrashReportingUtil - Memory crash dump reporting can be disabled with CrashUtil.crashDumpsEnabled(false) or using system property -Dorg.deeplearning4j.crash.reporting.enabled=false
18:26:24.688 [main] WARN org.deeplearning4j.util.CrashReportingUtil - Memory crash dump reporting output location can be set with CrashUtil.crashDumpOutputDirectory(File) or using system property -Dorg.deeplearning4j.crash.reporting.directory=<path>
Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new LongPointer(4): totalBytes = 560, physicalBytes = 7236M
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:88)
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:53)
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.createShapeInfo(NativeOpExecutioner.java:2016)
at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3247)
at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:68)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:180)
at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:174)
at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:78)
at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:409)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4033)
at org.nd4j.linalg.api.shape.Shape.newShapeNoCopy(Shape.java:2123)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ConvolutionLayer.java:477)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:505)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110)
at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2135)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1372)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1018)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1006)
at activeSegmentation.deepLearning.UNet1.run(UNet1.java:111)
at activeSegmentation.deepLearning.UNet1.main(UNet1.java:287)
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (7236M) > maxPhysicalBytes (6104M)
at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:700)
at org.bytedeco.javacpp.Pointer.init(Pointer.java:126)
at org.bytedeco.javacpp.LongPointer.allocateArray(Native Method)
at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:80)
... 26 more
@Purva-Chaudhari that doesn’t look related to anything related to memory. A NoSuchMethodError is a standard problem that typically comes up with java versions clashing. We would still need more information to go on on what actually causes the crash.
As I mentioned before, I think it’s related to your computer running out of memory. Generally when it comes to running out of memory you will want to limit batch sizes. std::bad_alloc is a c++ error that says that the kernel can’t allocate anymore memory. Due to our usage of c++ code that would be my first guess.
So for now try to do anything that will reduce your memory foot print and ensure you have enough RAM to actually train your model. Monitor the output and the cpu/ram usage of your computer first.
@Purva-Chaudhari
The error you share here is entirely unrelated to the one you initially shared.
When sharing your error messages, please share them fully. If the error you show here happens in addition to the error that you’ve shared partially in your initial post, then please say so and also add the full error that contains the hint that it is about memory allocation.
So as @agibsonccc said, we can’t really help you all that much with the limited information. All we can do is speculate.
Sorry I updated the output above
It looks like my memory is running out. I had a very small dataset (5 train with ground truths and 5-validation and ground truths). So my batch size was small. I see the Out of Memory Exception Detected.
I would double check also how big each image is. Convolution memory is fairly heavy when executing (the forward and backwards pass) despite what it looks like on disk. Depending on the size of your image you may need to shrink your batch size to 3 or lower.
The other aspect here is maxPhysicalBytes (6104M).
So despite you having 12GB of memory, you only allow your JVM to use about 6GB. The link @agibsonccc pointed you to will also tell you how to increase that.