(Java) Hello. I am looking for the solution to identify if the text can be written by the given author.
I found out about word2vec, but then one person pointed out that word2vec only shows that 1 text tells about the same thing as the other. So i addressed the Doc2Vec.
Does Doc2Vec suitable for evaluating a few a4 text pages? As far as i understood, every time i want to compare some text, i need to get all the pages from author, feed them to model, and then ask it to compare with the new text?
And is there any example on using the DL4J Doc2Vec with existing word model? Or i do not need pretrained word model for the Doc2Vec(intend to use google’s news word2vec model)?
Can i compile all text’s from the author in to 1 large document and then just compile it to a vector. And then just take vector of any new document and then cosine it with existing vector of the author?
The intended use is
- For each existing author in DB i create a model.
- Each author gets it’s vector (all his text’s compiled in to one with label being AuthorName and vectorized)
- After it i for example get some text, vectorize it, and compare this vector with every author by cosine.
Any existing examples for this kinda use? I found a lot for word2vec, but nothing in java for doc2vec.
What is the best approach to it?
If i am constantly adding new text to authors or new authors, keeping one giant model would mean training it after every added label(author) or text. But maybe i am wrong and it works differently.
Other choice is to do the vector(but i do not know how accurate the vector for the label would be without all the other authors)