Is it possible to implement CRF layer using Computation Graph?

Hi,
Recently I need a implementation of BiLSTM-CRF model in one of my school project. I searched online and find out that DL4J implements neither CRF or BiLSTM-CRF. Here I’m wondering that Can I implement the CRF layer in BiLSTM-CRF using Computation Graph and custom layer? Or at least using SameDiff. I’m not sure how to do it, but if it is possible, I’ll dig in to it.
Thanks and Regards

In theory, yes it should be possible to implement it by extending SameDiffLayer like the SelfAttentionLayer we currently have (https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/SelfAttentionLayer.java) But the recurrent part might be a little tricky.

Thanks, I would try. BTW, would dl4j officially support CRF layer?

At the moment there are no plans to add a CRF Layer officially.

Hi, it’s me again. After reading a lot of documents and codes, I decided to have a try. However when I was following a PyTorch(which according to SameDiff documents, it should be expect to have similarity with sameDiff) version of crf implementation, I found something different.

The line 88 of crf.py used an operation like this: torch.ones(emissions.shape[:2], dtype=torch.float), it’s seems like he created a NDArray with a variable shape, where emissions should have a shape of [seq_len, batch_size, nb_labels]. But I can’t find the operation corresponded to that in SameDiff. The getShape() function won’t return the true shape of input data, which I confirmed with layerInput. layerInput.getShape() in defineLayer(...) returns [-1, 14, 99], which is obviously not the true shape, since batch size won’t be -1. So… Any ideas?

Also, at line 151, there is a for loop depends on the seq_length which comes from tags.shape, and for now I have no idea how to do this in SameDiff. I’m trying extend a SameDiffOutputLayer since the document in SameDiffLayer suggested.

I feel a kind of confusion since I can’t find the corresponding operation of many functions from PyTorch. And it looks like SameDiff implements a static thing that you cannot change dynamically while PyTorch can. So… Can you help me out? Maybe a direction on how to do this in a proper / suitable way?