Google Colab for EGTB generation and PyTorch training?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
phhnguyen
Posts: 1517
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Google Colab for EGTB generation and PyTorch training?

Post by phhnguyen »

I have still been shopping for a PC. The main purposes are for chess: 1) to generate some EGTBs (mostly for Xiangqi) and 2) to train some NN chess models using PyTorch.

On one hand, I don’t need to continually train the NN, perhaps for a few weeks each time. On the other hand, the EGTB generator should run continuously for years, say, two years in a row. The generator is C++, and data file (endgame) sizes may reach 20 GB.

I’m considering buying a PC with a Ryzen with about 24 threads, at least 64 GB RAM, and an NVIDIA RTX 4090. The problem is that the price of that PC is so high. Thus, I have been looking for other solutions, such as using Google Colab or similar services, studying their feasibility, pros and cons. At a glance, Colab can provide a lot of threads and RAM as well as GPU power.

My questions are:
  • Is it worth using it to generate EGTBs? What is the price for the computing power similar to a Ryzen with over 20 threads, over 64 GB of RAM, TB of data/hard drive, running for a year?
  • Is it worth using Colab for NN training or buying a GPU card (NVIDIA RTX 4090)? I don’t train continuously, maybe a few weeks for each quarter
  • Any way to reduce the price/get a discount for open-source/community work? All my projects are open source, and all data will be freely published.
Ideas, suggestions are highly appreciated.
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
User avatar
jshriver
Posts: 1356
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: Google Colab for EGTB generation and PyTorch training?

Post by jshriver »

I wouldn't recommend colab for long running processes even on the paid plans. If you're going to spend money might as well save up for a good card.

In the free tier you get max 12 hours (more like 4 or 5 now adays), file syncing between it and google drive is horrible and often very laggy. To get around this I ran a background process to rsync my /content/project directory to my google drive every 5 minutes. The past year or so it's also been somewhat unstable in terms of random shutdowns. Which is sad because for the first couple years (K80-T100 era) I even paid for 2 subscriptions and would rotate runs like clockwork every 12 hours. Sadly those days are gone.

Believe all these issues are why, even the lc0 team, is no longer using colab for NN gen. It is however still a really good resource for students who just want to learn AI and work in very small bursts with very small data.

Good luck!