Float vs Double on Spark?

krudd · June 23, 2020, 12:10pm

I have a job that I’m running over spark, but it throws a Spark error when doing the parameter averaging:
20/06/23 04:57:36 INFO scheduler.DAGScheduler: Job 7 failed: treeAggregate at ParameterAveragingTrainingMaster.java:666, took 4.857751 s
Exception in thread “main” org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 8.0 failed 4 times, most recent failure: Lost task 19.3 in stage 8.0 (TID 284, data27, executor 5): java.lang.IllegalArgumentException: Op.X [DOUBLE] type must be the same as Op.Y [FLOAT] for op org.nd4j.linalg.api.ops.impl.reduce3.EuclideanDistance: x.shape=[512, 125], y.shape=[512, 125]
at org.nd4j.common.base.Preconditions.throwEx(Preconditions.java:636)
at org.nd4j.common.base.Preconditions.checkArgument(Preconditions.java:219)
at org.nd4j.linalg.api.ops.BaseReduceFloatOp.validateDataTypes(BaseReduceFloatOp.java:110)
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:258)
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:250)
at org.deeplearning4j.nn.graph.vertex.impl.L2Vertex.doForward(L2Vertex.java:81)

I’m using DL4J version 1.0.0-beta7 and Spark version 3.0.0. I did just update from an earlier version - is that the issue?

agibsonccc · June 24, 2020, 10:51am

Dl4j hasn’t been upgraded to support spark 3 yet. Beyond that we need sample code you’re trying to run.

krudd · June 24, 2020, 11:45am

@agibsonccc - thanks for the response. The code is fairly verbose at this point, but your comment makes me think that it is a compatibility issue I’m seeing. Can you tell me what the latest supported Spark version is? 2.4.5?

agibsonccc · June 24, 2020, 11:51am

Yes 2.4.x should be fine.

treo · June 26, 2020, 6:49am

What version did you update from?

The error itself says the problem is that you have both float and double tensors here and that both of them should be of the same type. If you updated from a rather old version, where only a single type of tensors were supported at all, this change in behavior might surprise you.

In order to find what exactly is going on, you’d have to share the graph definition though.

krudd · June 26, 2020, 11:24am

I went from DL4J version 1 beta2 to beta7 and was trying to go from spark 2.3.x to 3.0. I did make some other code changes at the same time, but it was just making the output size 2 instead of 1. I noticed that I am able to skirt the issue if I cast the INDArrays in my MultiDataSet to DataType.FLOAT (instead of double) and then train. I’m not sure why that makes difference?

treo · June 26, 2020, 12:33pm

Ah, yes beta2 is almost 2 years old now. The support for differently typed tensors is newer than that.

I noticed that I am able to skirt the issue if I cast the INDArrays in my MultiDataSet to DataType.FLOAT

that is the actual solution. I guess you are creating your INDArrays from double[]?

krudd · June 26, 2020, 12:38pm

OK - good to know!
I am working in scala, but yes, I was using mostly doubles. I haven’t looked closely to see if I explicitly have all of them as doubles, or if some of them get changed to doubles in the scala to java translation. But at any rate, I’m glad to have a solution (and explanation). Thanks!

Topic		Replies	Views
ND4JIllegalStateException ND4J	6	528	December 8, 2020
Failed to calculate output shapes for op ND4J	5	876	May 18, 2021
Cannot perform gradient check: Datatype is not set to double precision DL4J	2	647	July 1, 2020
Inserting double value into a INDArray	1	461	October 16, 2020
Spark evaluationRegression causes NullPointerException DL4J	2	455	May 13, 2020

Float vs Double on Spark?

Related topics