val conf = new NeuralNetConfiguration.Builder()
.seed(12345)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.iterations(1)
.activation(Activation.LEAKYRELU)
.weightInit(WeightInit.XAVIER)
.learningRate(0.01)
.updater(Updater.NESTEROVS)
.momentum(0.9)
.regularization(true)
.l2(1e-4)
.graphBuilder()
.addInputs("userid","weekday","hour","isworkday")
.addLayer("L1", new EmbeddingLayer.Builder().nOut(512).activation(Activation.IDENTITY).build(),"ip")
.addVertex("merge",new MergeVertex(), "L1", "adid","appid","createtype","height","width","weekday","hour","isworkday")
.addLayer("L2", new DenseLayer.Builder().nOut(10).build,"merge")
.addLayer("out", new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .activation(Activation.SOFTMAX) .nIn(10).nOut(1).build,"L2")
.setOutputs("out")
how to structure multidataset data format to train ComputationGraph in deeplearning4j from spark dataframe data? âweekdayâ,âhourâ,âisworkdayâ through onehot encoder feature,âuseridâ through embedding feature,
It appears you are using a very old version of DL4J. Things like .iterations have been removed from the Network configuration a few versions ago.
The configuration of your Graph is all wrong. If you really want to have everything as a single input, you have to actually list everything that is going to be an input in addInputs. In your addVertex("merge", ...) line, you have listed things that were not listed as part of the inputs, nor are they produced by other layers.
Typically, you wouldnât want to have an input per feature of your data. It makes your network definition needlessly complex. Instead you want to vectorize your data appropriately. To learn how to do that, I suggest you take a look at Quickstart with Deeplearning4J â dubs¡tech first, and then dive into the datavec documentation
In general, a MultiDataSet will take an array of INDArray instances for features and datasets each. The mapping between the entries in those arrays and the names in your ComputationGraph is the their order. This means, that the INDArray in your MultiDataSet features will be matched to the first input of your ComputationGraph, and the same applies to the outputs, the first INDArray in your MultiDataSet labels will be matched to the first output defined in your ComputationGraph.
thank you for your answer, the version of DL4J is 0.8 ,it exactly old,now i use 1.0.0-beta5,and the configuration of my Graph is wrong in addvertex,I just want to express the multi input model need computation graph to solve, the computation graph need MultiDataSet to fit (like sparkNet.fitMultiDataSet()),but i do not how to strcture the MultiDataSet to fit the computation graph,
i understand your mean is that " userid",âweekdayâ,âhourâ,âisworkdayâ each field save to INDArray and combine to an array of INDArray? for this,i not familiar with it ,can you help me to solve it? It would be better to give me an example.
Everything else should be already answered in my previous answer.
Unfortunately, I donât have the time to provide examples for every possible situation. You will have to try to work it out and ask precise questions about the things where you struggle - ideally with the exact code you are using.
val conf = new NeuralNetConfiguration.Builder()
.activation(Activation.LEAKYRELU)
.weightInit(WeightInit.XAVIER)
.updater(new Sgd(0.01))
.graphBuilder()
.addInputs("ip","other")
.addLayer("L1", new EmbeddingLayer.Builder().nIn(ipfeaturesize).nOut(512).activation(Activation.IDENTITY).build(),"ip")
.addVertex("merge",new MergeVertex(), "L1", "other")
.addLayer("L2", new DenseLayer.Builder().nIn(512+otherfeaturesize).nOut(10).build,"merge")
.addLayer("out", new OutputLayer.Builder(LossFunctions.LossFunction.XENT)
.activation(Activation.SIGMOID)
.nIn(10).nOut(1).build,"L2")
.setOutputs("out")
.build
val tm = new ParameterAveragingTrainingMaster.Builder(1)
.averagingFrequency(5)
.workerPrefetchNumBatches(2)
.rddTrainingApproach(RDDTrainingApproach.Direct)
.storageLevel(StorageLevel.DISK_ONLY)
.batchSizePerWorker(batchSizePerWorker)
.build
val sparkNet = new SparkComputationGraph(sc, conf, tm)
sparkNet.setListeners(new ScoreIterationListener(1))
sparkNet.fitMultiDataSet(TrainData)
but an error was encountered while the spark program was running,the error information as follows:
org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 21 (treeAggregate at ParameterAveragingTrainingMaster.java:667) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Too large frame: 2326895492
Is it necessary to configure any parametersďźHave you ever had a similar problemďź
Looking forward to your solutionďź