Richard Allbert wrote:Rein Halbersma wrote:Sure, challenge accepted what would be an acceptable test for you (test suite, or search in specific position)?
And this is why I hardly post here anymore.
There was no 'challenge' no proving a point, just interest in your comment.
Which as with everything on is topic is taken as some kind of contest.
Small wonder the community has dwindled.
Oh sorry, mine was also just an innocent remark: what's wrong with a little competition? In any case, if you are still interested, here is what I get on the current master branch for ./stockfish bench (couldn't find a github repo for CPW-Engine)
Code: Select all
Total time (ms) : 4382
Nodes searched : 7634225
Nodes/second : 1742178
I then apply the following rather straightforward changes:
tt.h
Code: Select all
#ifndef TT_H_INCLUDED
#define TT_H_INCLUDED
#include "misc.h"
#include "types.h"
+#include <vector>
/// The TTEntry is the 10 bytes transposition table entry, defined as below:
///
@@ -83,19 +65,18 @@ struct TTCluster {
class TranspositionTable {
public:
- ~TranspositionTable() { free(mem); }
void new_search() { generation += 4; } // Lower 2 bits are used by Bound
const TTEntry* probe(const Key key) const;
- TTEntry* first_entry(const Key key) const;
+ const TTEntry* first_entry(const Key key) const;
+ TTEntry* first_entry(const Key key);
void resize(uint64_t mbSize);
void clear();
void store(const Key key, Value v, Bound type, Depth d, Move m, Value statV);
private:
uint32_t clusterCount;
- TTCluster* table;
- void* mem;
+ std::vector<TTCluster> table;
uint8_t generation; // Size must be not bigger than TTEntry::genBound8
};
@@ -106,7 +87,12 @@ extern TranspositionTable TT;
/// a cluster given a position. The lowest order bits of the key are used to
/// get the index of the cluster inside the table.
-inline TTEntry* TranspositionTable::first_entry(const Key key) const {
+inline const TTEntry* TranspositionTable::first_entry(const Key key) const {
+ return &table[(uint32_t)key & (clusterCount - 1)].entry[0];
+}
+
+inline TTEntry* TranspositionTable::first_entry(const Key key) {
return &table[(uint32_t)key & (clusterCount - 1)].entry[0];
}
So I removed the destructor ~TranspositionTable(), removed the void* mem, and added a non-const overload for first_entry() and made the return value for its const brother equal to const TTEntry*. Finally I made the table equal to a std::vector<TTCluster>.
tt.cpp
Code: Select all
#include <cstring>
#include <iostream>
+#include <algorithm>
#include "bitboard.h"
#include "tt.h"
@@ -40,18 +41,22 @@ void TranspositionTable::resize(uint64_t mbSize) {
return;
clusterCount = newClusterCount;
-
- free(mem);
- mem = calloc(clusterCount * sizeof(TTCluster) + CACHE_LINE_SIZE - 1, 1);
-
- if (!mem)
- {
- std::cerr << "Failed to allocate " << mbSize
- << "MB for transposition table." << std::endl;
- exit(EXIT_FAILURE);
+ table.reserve(clusterCount * TTClusterSize);
+
+ /*
+ * unfortunately, Stockfish compiles with -fno-exceptions
+ *
+ try {
+ table.reserve(clustercount * TTClusterSize);
+ } catch(std::bad_alloc const&) {
+ std::cerr << "Failed to allocate " << mbSize
+ << "MB for transposition table." << std::endl;
+ exit(EXIT_FAILURE);
+ } catch(...) {
+ std::cerr << "Unknown exception";
+ exit(EXIT_FAILURE);
}
-
- table = (TTCluster*)((uintptr_t(mem) + CACHE_LINE_SIZE - 1) & ~(CACHE_LINE_SIZE - 1));
+*/
}
@@ -61,7 +66,7 @@ void TranspositionTable::resize(uint64_t mbSize) {
void TranspositionTable::clear() {
- std::memset(table, 0, clusterCount * sizeof(TTCluster));
+ std::fill(table.begin(), table.end(), TTCluster());
}
@@ -71,7 +76,7 @@ void TranspositionTable::clear() {
const TTEntry* TranspositionTable::probe(const Key key) const {
- TTEntry* tte = first_entry(key);
+ TTEntry* tte = const_cast<TTEntry*>(first_entry(key));
Here I removed the manual memory allocation with calloc() in favor of the std::vector reserve() member function, and the std::memset() in favor of a std::fill(). Note that because Stockfish does not use exceptions, I had to comment out the try/catch clause around the reserve(). I also drop the manual memory alignment trick. It is possible to restore this through using a special allocator, e.g. one from Boost.Align. I haven't tried it. Note also that I put in an ugly const_cast<TTEntry*> to alert the innocent viewers to the fact that probe() is a const member function but yet it modifies the generation inside TT entries.
OK, payment time: what's the penalty for using std::vector over raw memory:
Code: Select all
Total time (ms) : 4438
Nodes searched : 7634225
Nodes/second : 1720194
That's a 1.3% penalty for not worrying about memory allocation anymore. Not quite the zero overhead that I imagined, but close enough for my taste. Perhaps all that cache alignment is good for something. In any case, I take that to go please