Using Komodo 9.1 with Maximum Hash, 64GB machine

Zenmastur · Post by **Zenmastur** » Wed Jul 15, 2015 3:00 am

mjlef wrote:
RJN wrote:
lkaufman wrote:
RJN wrote:Komodo 9.1 uses 25% less actual memory than the hash setting (for example, 32GB uses only about 24GB of physical RAM). So, 64GB hash setting should use about 48GB of actual RAM, I would expect. Plenty left over on a 64GB machine. Yet when I try to use 64GB, Komodo reports an error that my system does not have enough RAM, and reverts to 32GB (actual 24GB physical).

Is there a workaround for this? Or possibly this is an obsolete error condition.
Mark would know better, but it is my understanding that you should set hash to 48GB in your situation, assuming you are using it for long analysis where this might help. Komodo is supposed to use 25% less than the setting IF YOU SET to a power of 2. But if you set to 3x a power of two, such as 48GB, it should use all of it. I don't have a 64Gb machine to test this on though.
Thanks, that worked. Funny thing is now that I have rebooted, setting hash at 64GB uses the same as 48GB, so it now uses the expected amount of physical RAM. BTW, as a reply to others, originally I had checked that most RAM was free, no other engines running, I have a discrete MSI video card, all that.

Here is the message that keeps coming up when hash is set to 65536, note that it does not sound like free memory, just a (inaccurate?) total:

Warning: Too Many RAM Set (Your PC only has 65437MB)!
Memory allocation for Hash tables requires contiguous memory. As you use your computer and run various programs, sections of memory get allocated and freed (well, they are supposed to get freed). So you are more likely to be able to use a larger hash size right after you reboot your computer than later on when various programs and processes have split memory up into smaller available chunks. Since programs have to coexist in the operating system, it is probably best not to allocate too much. I try to limit it to half of the total memory, and much less for faster time controls. Naturally, if you are going to run two programs against each other in a match, you would want to half that again. Otherwise the operating system will be forced to use virtual memory which will swap out memory between the real physical memory and the hard drive. And that will be a huge slowdown for the engines.

We include a document "sethash.txt" with Komodo that advises what size hash to use. A copy is below. This system adjusts for the speed of memory and machine based on the time control. Of course an infinite time control would have infinite memory available, so eventually you reach practical limits on a specific machine:

How should I set the hash table size?
=====================================

In general the default hash table size setting in Komodo will work well for you. However, if you want optimal performance you may wish to tweak this value. We are providing some guidelines here.

The optimal hash table size setting for Komodo is when the program utilizes half, or less, of the hash table. When the percentage exceeds 50% you will see a decline in performance. In modern chess programs the hash tables can fill very quickly, so it may not be possible to provide the optimal hash table size, especially when doing deep searches or using the program for deep analysis or where the amount of memory on a given machine is limited. So the rule of thumb in these cases is to set the hash table size as large as reasonably possible without sacrificing too much system performance.

Keep in mind that very heavy memory usage can impact the speed of the program and the underlying operating system. If you find your machine is less responsive, try a small hash table size.

We have devised a general rule of thumb for determining how large to set the hash table based on the time control you wish to play. Machine vary in speed, but these rule will help you set a p[roper hash table size. We consider sudden death or Fischer time controls but these guidelines can be extrapolated to other time controls. There are system considerations too so this is not a hard rule but really just a general guideline and does not take into consideration the memory caching performance of your machine. Nevertheless, most modern PC's will do well with these settings:

1. Take the main time in minutes and add the increment in seconds (for sudden death the increment is zero.) Example a 5 minute plus 1 sec increment game would be 5 + 1 = 6.

2. Multiply the value obtained in step 1 by 3. In the above example, 5x3 = 15.

3. From the opening position, do a fixed time search of exactly this amount of time in seconds. (15 seconds in this example)

4. Note the hash table utilization as reported by the GUI.

5. If the hash table utilization is much above 40%, double the hash table size and repeat this test.

6. If the hash table utilization is too small, for example well below 20%, you are probably setting your table higher than it needs to be.

For technical reasons, setting a hash table that is much larger than it needs to be will have a negative impact on the performance of your chess program (and the rest of the system) as it place more demands on the memory sub-system and cache. However the impact is generally pretty minor up to about half of the total RAM on your system. Going beyond that is risky. In general, all other things being equal, you should err on the side of being too large rather than being too small, as long as you don't exceed half of total memory. We generally suggest choosing the smallest hash table size that gives no more than about 40% utilization using this test.

I'm curious...

I did a bunch of analysis on cache requirements for long time control analysis. I came to the conclusion that in order to do a proper analysis to determine how much ram is needed one needs to know the TT entry size in bytes and what % of the cache is depth preferred. Is it possible for you to divulge this information for Komodo?

One other question. In the past I have run into problems with long analysis using various programs. The problem was the analysis became pathological. In each case I was doing multi-day analysis of related positions. When it became clear that the analysis was "bad" I restated the program, gui etc. This solved the problem. Later I tried just clearing the cache. The problem went away as well. My tentative conclusion was that there is a TT aging problem in some programs, SF included. I do lots of long analysis, so I'm interested in how Komodo handles aging during such sessions.

Regards,

Zen

mjlef · Post by **mjlef** » Wed Jul 15, 2015 12:13 pm

Zenmastur wrote:
mjlef wrote:
RJN wrote:
lkaufman wrote:
RJN wrote:Komodo 9.1 uses 25% less actual memory than the hash setting (for example, 32GB uses only about 24GB of physical RAM). So, 64GB hash setting should use about 48GB of actual RAM, I would expect. Plenty left over on a 64GB machine. Yet when I try to use 64GB, Komodo reports an error that my system does not have enough RAM, and reverts to 32GB (actual 24GB physical).

Is there a workaround for this? Or possibly this is an obsolete error condition.
Mark would know better, but it is my understanding that you should set hash to 48GB in your situation, assuming you are using it for long analysis where this might help. Komodo is supposed to use 25% less than the setting IF YOU SET to a power of 2. But if you set to 3x a power of two, such as 48GB, it should use all of it. I don't have a 64Gb machine to test this on though.
Thanks, that worked. Funny thing is now that I have rebooted, setting hash at 64GB uses the same as 48GB, so it now uses the expected amount of physical RAM. BTW, as a reply to others, originally I had checked that most RAM was free, no other engines running, I have a discrete MSI video card, all that.

Here is the message that keeps coming up when hash is set to 65536, note that it does not sound like free memory, just a (inaccurate?) total:

Warning: Too Many RAM Set (Your PC only has 65437MB)!
Memory allocation for Hash tables requires contiguous memory. As you use your computer and run various programs, sections of memory get allocated and freed (well, they are supposed to get freed). So you are more likely to be able to use a larger hash size right after you reboot your computer than later on when various programs and processes have split memory up into smaller available chunks. Since programs have to coexist in the operating system, it is probably best not to allocate too much. I try to limit it to half of the total memory, and much less for faster time controls. Naturally, if you are going to run two programs against each other in a match, you would want to half that again. Otherwise the operating system will be forced to use virtual memory which will swap out memory between the real physical memory and the hard drive. And that will be a huge slowdown for the engines.

We include a document "sethash.txt" with Komodo that advises what size hash to use. A copy is below. This system adjusts for the speed of memory and machine based on the time control. Of course an infinite time control would have infinite memory available, so eventually you reach practical limits on a specific machine:

How should I set the hash table size?
=====================================

In general the default hash table size setting in Komodo will work well for you. However, if you want optimal performance you may wish to tweak this value. We are providing some guidelines here.

The optimal hash table size setting for Komodo is when the program utilizes half, or less, of the hash table. When the percentage exceeds 50% you will see a decline in performance. In modern chess programs the hash tables can fill very quickly, so it may not be possible to provide the optimal hash table size, especially when doing deep searches or using the program for deep analysis or where the amount of memory on a given machine is limited. So the rule of thumb in these cases is to set the hash table size as large as reasonably possible without sacrificing too much system performance.

Keep in mind that very heavy memory usage can impact the speed of the program and the underlying operating system. If you find your machine is less responsive, try a small hash table size.

We have devised a general rule of thumb for determining how large to set the hash table based on the time control you wish to play. Machine vary in speed, but these rule will help you set a p[roper hash table size. We consider sudden death or Fischer time controls but these guidelines can be extrapolated to other time controls. There are system considerations too so this is not a hard rule but really just a general guideline and does not take into consideration the memory caching performance of your machine. Nevertheless, most modern PC's will do well with these settings:

1. Take the main time in minutes and add the increment in seconds (for sudden death the increment is zero.) Example a 5 minute plus 1 sec increment game would be 5 + 1 = 6.

2. Multiply the value obtained in step 1 by 3. In the above example, 5x3 = 15.

3. From the opening position, do a fixed time search of exactly this amount of time in seconds. (15 seconds in this example)

4. Note the hash table utilization as reported by the GUI.

5. If the hash table utilization is much above 40%, double the hash table size and repeat this test.

6. If the hash table utilization is too small, for example well below 20%, you are probably setting your table higher than it needs to be.

For technical reasons, setting a hash table that is much larger than it needs to be will have a negative impact on the performance of your chess program (and the rest of the system) as it place more demands on the memory sub-system and cache. However the impact is generally pretty minor up to about half of the total RAM on your system. Going beyond that is risky. In general, all other things being equal, you should err on the side of being too large rather than being too small, as long as you don't exceed half of total memory. We generally suggest choosing the smallest hash table size that gives no more than about 40% utilization using this test.
I'm curious...

I did a bunch of analysis on cache requirements for long time control analysis. I came to the conclusion that in order to do a proper analysis to determine how much ram is needed one needs to know the TT entry size in bytes and what % of the cache is depth preferred. Is it possible for you to divulge this information for Komodo?

One other question. In the past I have run into problems with long analysis using various programs. The problem was the analysis became pathological. In each case I was doing multi-day analysis of related positions. When it became clear that the analysis was "bad" I restated the program, gui etc. This solved the problem. Later I tried just clearing the cache. The problem went away as well. My tentative conclusion was that there is a TT aging problem in some programs, SF included. I do lots of long analysis, so I'm interested in how Komodo handles aging during such sessions.

Regards,

Zen

Zen,

Sure. I can only comment on Komodo versions I have worked on (so from Komodo 6 through Komodo 9.1). Komodo 6 through Komodo 9 used hash entries of 128 bits. In Komodo 9.1 I reduced the hash size to 96 bits. This has the advantages of slightly faster access, as well as allowing more hash entries at certain hash sizes, since it take up 25% less space.

In analysis mode, Komodo tries to preserve some deeper hash entries more than when analysis mode is off. This is controlled with the "UCI_AnalyseMode" UCI parameter. You will have to check your specific chess gui to see if it uses this. If it does, and you prefer Komodo's normal hash replacement scheme, you can always just use a super long time per move, where the GUI will not set that command since it considers this a "regular" move.

I cannot go into the specifics of how Komodo chooses what to replace in its hashing scheme. It has some things I have not seen in other programs, and since they were a gain for Komodo, I will have to leave it a "trade secret" for now. We might be more open to discussing them in the future if some other programs discover the same ideas. And if you have suggestions on methods you think will help, please suggest them to us.

Mark

mjlef · Post by **mjlef** » Wed Jul 15, 2015 12:45 pm

bob wrote:
mjlef wrote:
RJN wrote:
lkaufman wrote:
RJN wrote:Komodo 9.1 uses 25% less actual memory than the hash setting (for example, 32GB uses only about 24GB of physical RAM). So, 64GB hash setting should use about 48GB of actual RAM, I would expect. Plenty left over on a 64GB machine. Yet when I try to use 64GB, Komodo reports an error that my system does not have enough RAM, and reverts to 32GB (actual 24GB physical).

Is there a workaround for this? Or possibly this is an obsolete error condition.
Mark would know better, but it is my understanding that you should set hash to 48GB in your situation, assuming you are using it for long analysis where this might help. Komodo is supposed to use 25% less than the setting IF YOU SET to a power of 2. But if you set to 3x a power of two, such as 48GB, it should use all of it. I don't have a 64Gb machine to test this on though.
Thanks, that worked. Funny thing is now that I have rebooted, setting hash at 64GB uses the same as 48GB, so it now uses the expected amount of physical RAM. BTW, as a reply to others, originally I had checked that most RAM was free, no other engines running, I have a discrete MSI video card, all that.

Here is the message that keeps coming up when hash is set to 65536, note that it does not sound like free memory, just a (inaccurate?) total:

Warning: Too Many RAM Set (Your PC only has 65437MB)!
Memory allocation for Hash tables requires contiguous memory. As you use your computer and run various programs, sections of memory get allocated and freed (well, they are supposed to get freed). So you are more likely to be able to use a larger hash size right after you reboot your computer than later on when various programs and processes have split memory up into smaller available chunks. Since programs have to coexist in the operating system, it is probably best not to allocate too much. I try to limit it to half of the total memory, and much less for faster time controls. Naturally, if you are going to run two programs against each other in a match, you would want to half that again. Otherwise the operating system will be forced to use virtual memory which will swap out memory between the real physical memory and the hard drive. And that will be a huge slowdown for the engines.
Sorry to jump in, but that makes zero sense to me. All machines today use virtual memory, which means any two physical pages of memory can be made contiguous through the memory map, which every process uses. The only exception is if you try to use really big pages (i.e. 2mb pages is automatic under linux, I don't know how it is done under windows but many complain about fragmentation issues. In linux, data gets moved around and re-mapped to create 2mb physically contiguous chunks of memory that can be mapped as a single 2mb page.

I've not seen this "contiguous memory problem" either on windows nor on linux. It should not exist on any architecture since the 80286 came along with the MMU hardware.

We include a document "sethash.txt" with Komodo that advises what size hash to use. A copy is below. This system adjusts for the speed of memory and machine based on the time control. Of course an infinite time control would have infinite memory available, so eventually you reach practical limits on a specific machine:

How should I set the hash table size?
=====================================

In general the default hash table size setting in Komodo will work well for you. However, if you want optimal performance you may wish to tweak this value. We are providing some guidelines here.

The optimal hash table size setting for Komodo is when the program utilizes half, or less, of the hash table. When the percentage exceeds 50% you will see a decline in performance. In modern chess programs the hash tables can fill very quickly, so it may not be possible to provide the optimal hash table size, especially when doing deep searches or using the program for deep analysis or where the amount of memory on a given machine is limited. So the rule of thumb in these cases is to set the hash table size as large as reasonably possible without sacrificing too much system performance.

Keep in mind that very heavy memory usage can impact the speed of the program and the underlying operating system. If you find your machine is less responsive, try a small hash table size.

We have devised a general rule of thumb for determining how large to set the hash table based on the time control you wish to play. Machine vary in speed, but these rule will help you set a p[roper hash table size. We consider sudden death or Fischer time controls but these guidelines can be extrapolated to other time controls. There are system considerations too so this is not a hard rule but really just a general guideline and does not take into consideration the memory caching performance of your machine. Nevertheless, most modern PC's will do well with these settings:

1. Take the main time in minutes and add the increment in seconds (for sudden death the increment is zero.) Example a 5 minute plus 1 sec increment game would be 5 + 1 = 6.

2. Multiply the value obtained in step 1 by 3. In the above example, 5x3 = 15.

3. From the opening position, do a fixed time search of exactly this amount of time in seconds. (15 seconds in this example)

4. Note the hash table utilization as reported by the GUI.

5. If the hash table utilization is much above 40%, double the hash table size and repeat this test.

6. If the hash table utilization is too small, for example well below 20%, you are probably setting your table higher than it needs to be.

For technical reasons, setting a hash table that is much larger than it needs to be will have a negative impact on the performance of your chess program (and the rest of the system) as it place more demands on the memory sub-system and cache. However the impact is generally pretty minor up to about half of the total RAM on your system. Going beyond that is risky. In general, all other things being equal, you should err on the side of being too large rather than being too small, as long as you don't exceed half of total memory. We generally suggest choosing the smallest hash table size that gives no more than about 40% utilization using this test.

Virtual memory is pretty complex, especially since the operating system is in memory as well. It is true newer operating systems use virtual memory, but if the operating system allocates too much memory for itself (in Windows is maps this to some very high "address"), then there is not always enough left to allocate enough for a specific hash size. Komodo handles this the same wy NOW did. It tries to allocate what the user requested (rounded down to a power of 2 times the hash entry size). If this fails it cuts the allocating size in half and tries over and over again until it is successful. The max amount you can allocate is both a function of the specific operating system and physical memory available. I have seen lots of "free" memory reported but Windows not being able to allocate a hash above a certain size. {Perhaps some memory management unit limitation?

In Win32 it is limited to 2 GB. In windows 8 (64 bit) it is 128 GB, Windows 8 Professional 512 GB. I do not know why they have the limits in 64 bit Windows, other than to charge more money for some versions. Right now we limit Komodo 9. to 64 GBs (actually 46 GB in Komodo 9.1 since 64 GB is not big enough to get the next power of 2 doubling).

Memory is still very slow and anything larger would take too long to clear for most people. Hopefully memory speeds will increase in the future making these giant memory spaces more useful in computer chess.

syzygy · Post by **syzygy** » Thu Jul 16, 2015 12:39 am

mjlef wrote:It is true newer operating systems use virtual memory, ...

Ehm, "newer" = "not older than windows 3.0"...

Memory is still very slow and anything larger would take too long to clear for most people.

As long as the hash table fits in physical memory (which should easily be the case when allocating 48GB on a 64GB system that is not running other heavy tasks), I fail to see how clearing it could take too long when done properly.

Just a wild guess, but maybe you're allocating the hash table in a lazy manner, i.e. the OS does not actually map it into the address space until it is first written to. As a result, the initial clearing operation takes a long long time because each and every page needs to be faulted in separately.

On Linux, if you allocate using mmap(), then use MAP_POPULATE. (I think that fixed the problem for me.)

Zenmastur · Post by **Zenmastur** » Thu Jul 16, 2015 3:55 am

mjlef wrote:
Virtual memory is pretty complex, especially since the operating system is in memory as well. It is true newer operating systems use virtual memory, but if the operating system allocates too much memory for itself (in Windows is maps this to some very high "address"), then there is not always enough left to allocate enough for a specific hash size. Komodo handles this the same wy NOW did. It tries to allocate what the user requested (rounded down to a power of 2 times the hash entry size). If this fails it cuts the allocating size in half and tries over and over again until it is successful. The max amount you can allocate is both a function of the specific operating system and physical memory available. I have seen lots of "free" memory reported but Windows not being able to allocate a hash above a certain size. {Perhaps some memory management unit limitation?

In Win32 it is limited to 2 GB. In windows 8 (64 bit) it is 128 GB, Windows 8 Professional 512 GB. I do not know why they have the limits in 64 bit Windows, other than to charge more money for some versions. Right now we limit Komodo 9. to 64 GBs (actually 46 GB in Komodo 9.1 since 64 GB is not big enough to get the next power of 2 doubling).

Memory is still very slow and anything larger would take too long to clear for most people. Hopefully memory speeds will increase in the future making these giant memory spaces more useful in computer chess.

Memory is slow and this isn't likely to change until new technologies like crossbar memory become common. Even so, there is a need for programs to support greater than 64gb of memory. Large pages change the point at which a large TT slows the nps of a program. This makes very large TT a "reasonable" option for many people. Use of huge pages would increase this number to more than the current hardware will support. In addition, Skylake processors have increased the size of TLB by 50% and added a second TLB miss handler. These changes will shift the point at which the slow-down occurs and its magnitude. Cheap processors are available that can address 768Gb. E.G. E5-2603 is $210, a little more robust is the E5-2620 @ ~$430 and others. This makes systems with greater than 64Gb of memory within the reach of many people.

The need for support of larger TT is easy to demonstrate. A common time control for correspondence play is 10 moves in 50 days. Or an average of 10 days per move including your opponents response time. A single CPU processing nodes at 5.0 Mnps (an admittedly slow rate) will process 10*24*3600*5,000,000 = 4.32 trillion nodes. To completely overwrite the contents of a hash table that has N entries requires approximately N*(ln(N)+1) unique writes. If we assume a generous hit hate of 20% (opening or middle game position) then 80% of all writes will be unique. A table with 2^32 entries (the maximum that Komodo allows) will be completely overwritten in approximately 97.4 billion unique writes. So in this case the TT will be overwritten about every 6.75 hours (on faster machines this could be reduced to about an hour). This may at first glance seem like a reasonable numbers, but this number statistically guarantees that every entry in the cache is overwritten at least once. To overwrite (1-1/e) = 63.2% of the TT requires 2^32 unique writes which will take about 18 minutes(as little as 3 on a fast machine). According to your guidelines anything over 50% degrades efficiency. So a slow system with 64Gb of memory has degrade performance after about 15 minutes of analysis. A major drop in efficiency will occur when the 97.4 billion unique writes occur in less than one iteration. When this happen all the entries that are most likely to benefit the next iteration are overwritten before the next iteration begins. The problem gets worse when the number of nodes to analyze a single move of an iteration exceeds the number required to completely overwrite the TT. A more important issue is that when the TT is completely overwritten the chance of search instability is increased. Search instability is a real drag and can eat a huge amount of time. This is a problem for unattended analysis. If you're sitting there watching you can intervene. If your not, it could thrash forever or until you notice what's going on.

This analysis also seems to indicate that Komodo will be overwriting way more that 50% of its TT every move during the next TCEC if it's played on a fast 32-core machine. At 40Mnps 63% will be overwritten in about 102 seconds. I suspect this is why Bob is looking at large and huge page use. If he's getting 100Mnps with Crafty 25 on a medium speed 20 core box this implies almost 200Mnps on a fast 32-core box. 63% of a 4 billion entry TT will be overwritten in about 22 seconds. If he could gets his EBF down to approximately 1.5 with good effect he could have a real monster of a program.

In any case, the O/S's can handle more memory, CPU's that can handle more memory are relatively cheap, , AND LTC analysis would benefit if large pages are used. When Skylake CPU's are available the TT can be at least 50% larger than they are now due to the larger TLB and I hear that the second miss handler does wonders for programs with very large memory foot-prints. Something to keep in mind along with how fast Komodo will run on the new TCEC hardware.

Regards,

Zen

Using Komodo 9.1 with Maximum Hash, 64GB machine

Re: Using Komodo 9.1 with Maximum Hash, 64GB machine

Re: Using Komodo 9.1 with Maximum Hash, 64GB machine

Re: Using Komodo 9.1 with Maximum Hash, 64GB machine

Re: Using Komodo 9.1 with Maximum Hash, 64GB machine

Re: Using Komodo 9.1 with Maximum Hash, 64GB machine