W3D3/D4
Currently, setting up distributed training servers on runpod.io to benchmark model training times (requesting 2 gpus, 1 node and possibly 2 gpus on 2 seperate nodes to see how that would affect training times)
Initial result: It was 45% faster to train with 2 gpus on 1 node than with just 1 gpu. This is kind of what I had expected. Running a few more experiments..more details later.
New idea: Can I use rust to improve the dataloader step of a separate model I had trained that I vaguely remember had some inefficiencies with the dataloader vs gpu utilization i.e. dataloader is CPU bound and GPU is sitting idly. My hypothesis is that rust can speed up the data pre-processing step without Python's multi-processing overhead.
First step..add some timing hooks to my model script to confirm that it is indeed a CPU bound problem