Is DL4J suitable for fine-tuning hugging face deep learning models?

  • The latest hugging face deep learning models are in safetensors format.
  • I’m new to machine learning and deep learning. I understand basic high-level concepts, but I don’t know math or technical details. For example, I know there are different kinds of layers and attention heads, but I don’t know any further.
  • Does DL4J support parallel (distributed) (GPU-)accelerated fine-tuning of hugging face safetensors deep learning models? All guides are in python.
  • Can I produce GGUF quantized models with DL4J?
  • What about parallel (distributed) GPU-accelerated inference of LLMs and other deep learning models on JVM? On python, there are transformers, vLLM, TensorRT-LLM, and maybe more.
  • Does deep java library fare any better against python ecosystem?

@amano probably post update when I integrate llamacpp as a backend it will be. Right now BERT is the best it can run though. The tooling isn’t quite there yet. I will post guides and the like when it it is though.
I’m not sure where DJL is but I don’t think they really support things to the level you’d want.

Feel free to keep an eye on developments though.

Right now we’re mainly at importing keras, tf, and onnx models and customizing them.

In order to modernize the framework, the main focus has been removing a lot of unmaintained modules, stripping it down and ensuring it’s stable to run in large scale multi gpu environments with big LLMs.

Everything you’re talking about here will be the main objective though. It’s not hard to integrate llama like everyone else is as a backend. You’re welcome to help contribute to such efforts. I know 99% of folks browsing here won’t have time or really want to though. Good luck in your search. Feel free to check back occasionally.

1 Like
  • DL4J isn’t really ready to fine-tune the latest deep learning models?
  • llama.cpp doesn’t seem optimized for parallel GPU inference.

@amano not without a bit of extra work with a huggingface level experience.

I’ve converted GPT2 to the framework and can probably finetune them but it’s probably not the experience you’d expect at the python level. Unless you want to help out probably just move on.
I know you’re probably just looking for something that’s fairly plug and play and a simple yes/no answer then no.

I’d be happy to talk about what I"m thinking for llama, vllm and other frameworks I want to integrate. I’m a big believe that the work’s been done for a very viable baseline but could probably a better experience. I won’t bother outlining that due to me assuming lack of interest though.

Good luck on your search.

Correct. I have neither the time nor the skills to help.

I’m largely a software engineer, not a deep learning researcher.

@amano yeah no worries. You seemed semi curious about the trade off of different frameworks but I figured outlining that and debating me wasn’t really what you wanted.

Feel free to check back occasionally if you want. I want strong GGUF support etc. That’s what the framework is mostly focused on anyways. We implemented a fairly strong finetuning solution for keras a while back and plan to replicate that. The main differentiator for us here will be largely not caring about what framework we use.

We have a fairly strong delegation framework to sub in different kernels for different operations so using vllm, llama or whatever has faster kernels is very much the idea.

1 Like

By the way, if you want to use vLLM directly, you have libpython-clj. libpython-clj is written in clojure but can be used in java. libpython-clj has been used in machine learning and deep learning.

No integration via direct c++ is the only way. Involving java will eat all performance gains.