I've finished implementing the multi-threaded memory blaster and it's passed the first set of tests.
There is a speed gain, but it's difficult to quantify because so much depends upon core count, cache issues, possible hyper-threading, table size, memory channel count, memory layout, etc.
The only non-trivial part of the code is the section which partitions the memory region across the threads. This is handled in two passes; the first calculates the start address of each thread's memory segment, and the second determines via subtraction the exact byte count for each segment. There is a need for some care here because of integer division truncation; not all segments are guaranteed to be of the same length.
I recommend multi-threaded memory blasting to any author whose program has a need to clear large tables.
Code: Select all
typedef unsigned int ui;
typedef unsigned char ui8;
typedef struct
{
pthread_t pthread; // Thread ID
void *baseptr; // Memory region start
size_t bytelength; // Memory region length
} MemZapRec;
void MemClearST(void * const baseptr, const size_t bytelength)
{
// Memory clear: single threaded
memset(baseptr, 0, bytelength);
}
void *MemClearTask(void *ptr)
{
// Called only via pthread_create()
const MemZapRec * const mzrptr = (MemZapRec *) ptr;
MemClearST(mzrptr->baseptr, mzrptr->bytelength);
return 0;
}
void MemClearMT(void * const baseptr, const size_t bytelength)
{
// Memory clear: multi threaded
const ui corecount = FetchCoreCount();
MemZapRec memzapvec[CpuCoreLen]; // CpuCoreLen is the maximum core count for a CPU
// For each thread, set the segment start address
for (ui index = 0; index < corecount; index++)
memzapvec[index].baseptr = ((ui8 *) baseptr) + (index * (bytelength / corecount));
// For each thread, set the segment length
for (ui index = 0; index < corecount; index++)
{
if (index < (corecount - 1))
memzapvec[index].bytelength = (ui8 *) memzapvec[index + 1].baseptr - (ui8 *) memzapvec[index].baseptr;
else
memzapvec[index].bytelength = (ui8 *) baseptr + bytelength - (ui8 *) memzapvec[index].baseptr;
};
// For each thread, create/run
for (ui index = 0; index < corecount; index++)
pthread_create(&memzapvec[index].pthread, 0, MemClearTask, (void *) &memzapvec[index]);
// For each thread, wait until finish
for (ui index = 0; index < corecount; index++)
pthread_join(memzapvec[index].pthread, 0);
}
void MemClear(void * const baseptr, const size_t bytelength)
{
// This is the externally visible routine for clearnig a memory region
if (FetchCoreCount() == 1)
MemClearST(baseptr, bytelength); // No threading used if only one core
else
MemClearMT(baseptr, bytelength); // At least two threads will be used
}