The effect of dual channel RAM

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

The effect of dual channel RAM

Post by Volker Annuss »

I wanted to test this for a long time, now I finally did it. The suprising result is, that single or dual channel RAM makes no difference for my engine.

For the experiment, I started n=1,2,4,8 independent processes with my engine Arminius searching the initial position. n-1 processes searched with no time limit to create background activity, 1 process was started with a fixed depth to measure nps.

Hard-/software
Core i7 860 processor 4 cores / 8 threads
32KB L1 data cache per core, 256KB L2 cache per core, 8MB L3 cache
8GB RAM in 4 modules for dual channel
2GB RAM in 1 module for single channel
Hyperthreading and Turbo Boost on
Windows 7 64 bit
Arminius (current development version) using 120MB main hash

Test results

Code: Select all

           nps      nps
processes  single   dual
1          2128K    2118K
2          2014K    2020K
4          1784K    1717K
8          1086K    1065K
Some characteristics of Arminius that affect memory accesses:

Arminius does not use hashing in qsearch. The main hash table is not aligned to cache boundaries. Every hash entry has 16 bytes. One position on the board can be in one of four hash positions. These regions are overlapping, so if one positon on the board can be somewhere in hash position 1..4, another one can be somewhere in 2..5.

Arminius is a magic bitboard engine with a 705KB lookup table (90232 positions with 8 bytes each).

Pawn hash size is 44KB, 512 positions with 88 bytes each. When searching from the initial position like in this test the hit rate is lower than normal.

Material hash table size is 128KB, 4096 positions with 32 bytes each.

And finally there are many other tables like history table, bitboards of positions between 2 given squares, moves along a single file/line/diagonal...
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: The effect of dual channel RAM

Post by matthewlai »

Volker Annuss wrote:I wanted to test this for a long time, now I finally did it. The suprising result is, that single or dual channel RAM makes no difference for my engine.

For the experiment, I started n=1,2,4,8 independent processes with my engine Arminius searching the initial position. n-1 processes searched with no time limit to create background activity, 1 process was started with a fixed depth to measure nps.

Hard-/software
Core i7 860 processor 4 cores / 8 threads
32KB L1 data cache per core, 256KB L2 cache per core, 8MB L3 cache
8GB RAM in 4 modules for dual channel
2GB RAM in 1 module for single channel
Hyperthreading and Turbo Boost on
Windows 7 64 bit
Arminius (current development version) using 120MB main hash

Test results

Code: Select all

           nps      nps
processes  single   dual
1          2128K    2118K
2          2014K    2020K
4          1784K    1717K
8          1086K    1065K
Some characteristics of Arminius that affect memory accesses:

Arminius does not use hashing in qsearch. The main hash table is not aligned to cache boundaries. Every hash entry has 16 bytes. One position on the board can be in one of four hash positions. These regions are overlapping, so if one positon on the board can be somewhere in hash position 1..4, another one can be somewhere in 2..5.

Arminius is a magic bitboard engine with a 705KB lookup table (90232 positions with 8 bytes each).

Pawn hash size is 44KB, 512 positions with 88 bytes each. When searching from the initial position like in this test the hit rate is lower than normal.

Material hash table size is 128KB, 4096 positions with 32 bytes each.

And finally there are many other tables like history table, bitboards of positions between 2 given squares, moves along a single file/line/diagonal...
That sounds about right.

In chess we don't really care about RAM bandwidth, only RAM latency, since no engine can saturate 20+GB/s bandwidth.

At 16 bytes per position, you'd need to be searching at more than 1,000,000,000 nps to saturate 20GB/s.

Latency is not significantly improved by dual channel.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.