How to dynamically load the required batch into GPU memory?

When the dataset is large, memory capacity is sufficient, but GPU memory (VRAM) is limited, how to efficiently utilize VRAM?
Store the sample data in memory first, and during training, dynamically load the required batch into GPU memory. After completing the training for that batch, release the GPU memory.

@cqiaoYc usually we preload that using an asyncdatasetiterator. You want a batch size and at most the next batch preloaded in the background. Your batch size is determined by what works well in training (different batch sizes can affect network quality)
and what your hardware allows.

I know what to do now. Thank you very much.