[.Net only] - fast bit operations

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

[.Net only] - fast bit operations

Post by bpfliegel »

Does anyone have fast x86/x64 bitscan functions (fsb, lsb, popfirst) in C# (which are not Mono bound)? What do you use?
Playing around with these ideas, but not sure that they are worth the possible overhead (like having a delegate):

http://blogs.msdn.com/b/devinj/archive/ ... 38323.aspx
http://stackoverflow.com/questions/3216 ... in-c-sharp
http://blogs.microsoft.co.il/blogs/sash ... rom-c.aspx

Unsafe code ideas are welcome aswell.

Any thoughts?
Thanks, Balint
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

Any dirty tricks regarding biboard related operations are welcome. Checked/unchecked, safe/unsafe.
My current ones are available in Portfish: https://github.com/bpfliegel/Portfish

Cheers, Balint
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

Okay, so here we go - going for an x64 pop first bit function.

Idea from here:
http://blogs.msdn.com/b/devinj/archive/ ... 38323.aspx

Original C code looks like this:

Code: Select all

#include "stdafx.h"
#include <intrin.h>

int __declspec&#40;noinline&#41; __stdcall pop_1st_bit_64&#40;unsigned long& b&#41; &#123;
	unsigned long index;
	_BitScanForward64&#40;&index, b&#41;;
	b &= ~&#40;1ULL<<&#40;index&#41;);
	return index;
&#125;
C# code with some tests like this (can't even use FastCall...)

Code: Select all

using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;

namespace Popcount
&#123;
	// Original idea here&#58; http&#58;//stackoverflow.com/questions/3216535/x86-x64-cpuid-in-c-sharp
	public static class PopFirst
	&#123;
		#region Interop

		&#91;DllImport&#40;"kernel32.dll", SetLastError = true&#41;&#93;
		private static extern IntPtr VirtualAlloc&#40;IntPtr lpAddress, UIntPtr dwSize, AllocationType flAllocationType,
			MemoryProtection flProtect&#41;;

		&#91;DllImport&#40;"kernel32")&#93;
		private static extern bool VirtualFree&#40;IntPtr lpAddress, UInt32 dwSize, UInt32 dwFreeType&#41;;

		&#91;Flags&#40;)&#93;
		private enum AllocationType &#58; uint
		&#123;
			COMMIT = 0x1000,
			RESERVE = 0x2000,
			RESET = 0x80000,
			LARGE_PAGES = 0x20000000,
			PHYSICAL = 0x400000,
			TOP_DOWN = 0x100000,
			WRITE_WATCH = 0x200000
		&#125;

		&#91;Flags&#40;)&#93;
		private enum MemoryProtection &#58; uint
		&#123;
			EXECUTE = 0x10,
			EXECUTE_READ = 0x20,
			EXECUTE_READWRITE = 0x40,
			EXECUTE_WRITECOPY = 0x80,
			NOACCESS = 0x01,
			READONLY = 0x02,
			READWRITE = 0x04,
			WRITECOPY = 0x08,
			GUARD_Modifierflag = 0x100,
			NOCACHE_Modifierflag = 0x200,
			WRITECOMBINE_Modifierflag = 0x400
		&#125;

		#endregion

		&#91;UnmanagedFunctionPointerAttribute&#40;CallingConvention.Cdecl&#41;&#93;
		public delegate int PopFirstBit&#40;ref UInt64 b&#41;;

		static byte&#91;&#93; pop_first_x64 = new byte&#91;&#93;
		&#123;
			0x48 ,0x89 ,0x4C ,0x24 ,0x08       //mov         qword ptr &#91;rsp+8&#93;,rcx  
			,0x57                   //push        rdi  
			,0x48 ,0x83 ,0xEC ,0x40          //sub         rsp,40h  
			,0x48 ,0x8B ,0xFC             //mov         rdi,rsp  
			,0xB9 ,0x10 ,0x00 ,0x00 ,0x00       //mov         ecx,10h  
			,0xB8 ,0xCC ,0xCC ,0xCC ,0xCC       //mov         eax,0CCCCCCCCh  
			,0xF3 ,0xAB                //rep stos    dword ptr &#91;rdi&#93;  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			//unsigned long index;
			//_BitScanForward64&#40;&index, b&#41;;
			,0x48 ,0x8B ,0x44 ,0x24 ,0x50       //mov         rax,qword ptr &#91;b&#93;  
			,0x8B ,0x00                //mov         eax,dword ptr &#91;rax&#93;  
			,0x48 ,0x0F ,0xBC ,0xC0          //bsf         rax,rax  
			,0x89 ,0x44 ,0x24 ,0x24          //mov         dword ptr &#91;index&#93;,eax  
			//b &= ~&#40;1ULL<<&#40;index&#41;);
			,0x8B ,0x44 ,0x24 ,0x24          //mov         eax,dword ptr &#91;index&#93;  
			,0xB9 ,0x01 ,0x00 ,0x00 ,0x00       //mov         ecx,1  
			,0x48 ,0x89 ,0x4C ,0x24 ,0x38       //mov         qword ptr &#91;rsp+38h&#93;,rcx  
			,0x0F ,0xB6 ,0xC8             //movzx       ecx,al  
			,0x48 ,0x8B ,0x44 ,0x24 ,0x38       //mov         rax,qword ptr &#91;rsp+38h&#93;  
			,0x48 ,0xD3 ,0xE0             //shl         rax,cl  
			,0x48 ,0xF7 ,0xD0             //not         rax  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			,0x8B ,0x09                //mov         ecx,dword ptr &#91;rcx&#93;  
			,0x48 ,0x23 ,0xC8             //and         rcx,rax  
			,0x48 ,0x8B ,0xC1             //mov         rax,rcx  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			,0x89 ,0x01                //mov         dword ptr &#91;rcx&#93;,eax  
			//return index; 
			,0x8B ,0x44 ,0x24 ,0x24          // mov         eax,dword ptr &#91;index&#93;  
			//&#125;
			,0x8B ,0xF8                // mov         edi,eax  
			,0x48 ,0x8B ,0xCC             // mov         rcx,rsp  
			//,0x48 ,0x8D ,0x15 ,0x83 ,0x5F ,0x00 ,0x00 // lea         rdx,&#91;std&#58;&#58;numeric_limits<float>&#58;&#58;min_exponent10+26Ch &#40;013FACBEB0h&#41;&#93;  
			//,0xE8 ,0xFE ,0xDB ,0xFF ,0xFF       // call        _RTC_CheckStackVars &#40;013FAC3B30h&#41;  
			,0x8B ,0xC7                // mov         eax,edi  
			,0x48 ,0x83 ,0xC4 ,0x40          // add         rsp,40h  
			,0x5F                   // pop         rdi  
			,0xC3                   // ret  
		&#125;;

		// Taken from Portfish/Stockfish
		internal static readonly int&#91;&#93; PopTable = new int&#91;64&#93;;
		internal static int pop_first_x64_sw&#40;ref UInt64 b&#41;
		&#123;
			UInt64 bb = b;
            b &= &#40;b - 1&#41;;
            return &#40;PopTable&#91;(&#40;bb & &#40;0xffffffffffffffff - bb + 1&#41;) * 0x218A392CD3D5DBFUL&#41; >> 58&#93;);
		&#125;

		public static void PopFirstTest&#40;)
		&#123;
			// Init &#40;taken from Portfish/Stockfish&#41;
			for &#40;int i = 0; i < 64; i++)
			&#123;
				PopTable&#91;(&#40;1UL << i&#41; * 0x218A392CD3D5DBFUL&#41; >> 58&#93; = i;
			&#125;

			IntPtr codepointer = IntPtr.Zero;
			try
			&#123;
				byte&#91;&#93; codeBytes = pop_first_x64;
				codepointer = VirtualAlloc&#40;
					IntPtr.Zero,
					new UIntPtr&#40;&#40;uint&#41;codeBytes.Length&#41;,
					AllocationType.COMMIT,
					MemoryProtection.EXECUTE_READWRITE
				);

				Marshal.Copy&#40;codeBytes, 0, codepointer, codeBytes.Length&#41;;

				PopFirstBit pop1st = &#40;PopFirstBit&#41;Marshal.GetDelegateForFunctionPointer&#40;codepointer, typeof&#40;PopFirstBit&#41;);

				Console.WriteLine&#40;"test prepared, press any key");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41; &#123; int first = pop1st&#40;ref target&#41;; &#125;
				&#125;

				Console.WriteLine&#40;"first passed");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41; &#123; int first = pop_first_x64_sw&#40;ref target&#41;; &#125;
				&#125;

				Console.WriteLine&#40;"second passed");
				Console.ReadKey&#40;);
			&#125;
			finally
			&#123;
				if &#40;codepointer != IntPtr.Zero&#41;
				&#123;
					VirtualFree&#40;codepointer, 0, 0x8000&#41;;
					codepointer = IntPtr.Zero;
				&#125;
			&#125;
		&#125;
	&#125;
&#125;
Slow as hell! (~around 1:5)

There is still some hope: the bytecode is full of junk for the first look (the 2 lines for checking was removed by me btw) . If someone could help me out to replace it into a specific optimized bytecode - we might be able to move forward...

Anyone in?

Cheers, Balint
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: [.Net only] - fast bit operations

Post by Gerd Isenberg »

bpfliegel wrote:
C# code with some tests like this (can't even use FastCall...)

Code: Select all

		&#91;UnmanagedFunctionPointerAttribute&#40;CallingConvention.Cdecl&#41;&#93;
		public delegate int PopFirstBit&#40;ref UInt64 b&#41;;

		static byte&#91;&#93; pop_first_x64 = new byte&#91;&#93;
		&#123;
			0x48 ,0x89 ,0x4C ,0x24 ,0x08       //mov         qword ptr &#91;rsp+8&#93;,rcx  
			,0x57                   //push        rdi  
			,0x48 ,0x83 ,0xEC ,0x40          //sub         rsp,40h  
			,0x48 ,0x8B ,0xFC             //mov         rdi,rsp  
			,0xB9 ,0x10 ,0x00 ,0x00 ,0x00       //mov         ecx,10h  
			,0xB8 ,0xCC ,0xCC ,0xCC ,0xCC       //mov         eax,0CCCCCCCCh  
			,0xF3 ,0xAB                //rep stos    dword ptr &#91;rdi&#93;  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			//unsigned long index;
			//_BitScanForward64&#40;&index, b&#41;;
			,0x48 ,0x8B ,0x44 ,0x24 ,0x50       //mov         rax,qword ptr &#91;b&#93;  
			,0x8B ,0x00                //mov         eax,dword ptr &#91;rax&#93;  
			,0x48 ,0x0F ,0xBC ,0xC0          //bsf         rax,rax  
			,0x89 ,0x44 ,0x24 ,0x24          //mov         dword ptr &#91;index&#93;,eax  
			//b &= ~&#40;1ULL<<&#40;index&#41;);
			,0x8B ,0x44 ,0x24 ,0x24          //mov         eax,dword ptr &#91;index&#93;  
			,0xB9 ,0x01 ,0x00 ,0x00 ,0x00       //mov         ecx,1  
			,0x48 ,0x89 ,0x4C ,0x24 ,0x38       //mov         qword ptr &#91;rsp+38h&#93;,rcx  
			,0x0F ,0xB6 ,0xC8             //movzx       ecx,al  
			,0x48 ,0x8B ,0x44 ,0x24 ,0x38       //mov         rax,qword ptr &#91;rsp+38h&#93;  
			,0x48 ,0xD3 ,0xE0             //shl         rax,cl  
			,0x48 ,0xF7 ,0xD0             //not         rax  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			,0x8B ,0x09                //mov         ecx,dword ptr &#91;rcx&#93;  
			,0x48 ,0x23 ,0xC8             //and         rcx,rax  
			,0x48 ,0x8B ,0xC1             //mov         rax,rcx  
			,0x48 ,0x8B ,0x4C ,0x24 ,0x50       //mov         rcx,qword ptr &#91;b&#93;  
			,0x89 ,0x01                //mov         dword ptr &#91;rcx&#93;,eax  
			//return index; 
			,0x8B ,0x44 ,0x24 ,0x24          // mov         eax,dword ptr &#91;index&#93;  
			//&#125;
			,0x8B ,0xF8                // mov         edi,eax  
			,0x48 ,0x8B ,0xCC             // mov         rcx,rsp  
			//,0x48 ,0x8D ,0x15 ,0x83 ,0x5F ,0x00 ,0x00 // lea         rdx,&#91;std&#58;&#58;numeric_limits<float>&#58;&#58;min_exponent10+26Ch &#40;013FACBEB0h&#41;&#93;  
			//,0xE8 ,0xFE ,0xDB ,0xFF ,0xFF       // call        _RTC_CheckStackVars &#40;013FAC3B30h&#41;  
			,0x8B ,0xC7                // mov         eax,edi  
			,0x48 ,0x83 ,0xC4 ,0x40          // add         rsp,40h  
			,0x5F                   // pop         rdi  
			,0xC3                   // ret  
		&#125;;

		// Taken from Portfish/Stockfish
		internal static readonly int&#91;&#93; PopTable = new int&#91;64&#93;;
		internal static int pop_first_x64_sw&#40;ref UInt64 b&#41;
		&#123;
			UInt64 bb = b;
            b &= &#40;b - 1&#41;;
            return &#40;PopTable&#91;(&#40;bb & &#40;0xffffffffffffffff - bb + 1&#41;) * 0x218A392CD3D5DBFUL&#41; >> 58&#93;);
		&#125;

		public static void PopFirstTest&#40;)
		&#123;
			// Init &#40;taken from Portfish/Stockfish&#41;
			for &#40;int i = 0; i < 64; i++)
			&#123;
				PopTable&#91;(&#40;1UL << i&#41; * 0x218A392CD3D5DBFUL&#41; >> 58&#93; = i;
			&#125;

			IntPtr codepointer = IntPtr.Zero;
			try
			&#123;
				byte&#91;&#93; codeBytes = pop_first_x64;
				codepointer = VirtualAlloc&#40;
					IntPtr.Zero,
					new UIntPtr&#40;&#40;uint&#41;codeBytes.Length&#41;,
					AllocationType.COMMIT,
					MemoryProtection.EXECUTE_READWRITE
				);

				Marshal.Copy&#40;codeBytes, 0, codepointer, codeBytes.Length&#41;;

				PopFirstBit pop1st = &#40;PopFirstBit&#41;Marshal.GetDelegateForFunctionPointer&#40;codepointer, typeof&#40;PopFirstBit&#41;);

				Console.WriteLine&#40;"test prepared, press any key");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41; &#123; int first = pop1st&#40;ref target&#41;; &#125;
				&#125;

				Console.WriteLine&#40;"first passed");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41; &#123; int first = pop_first_x64_sw&#40;ref target&#41;; &#125;
				&#125;

				Console.WriteLine&#40;"second passed");
				Console.ReadKey&#40;);
			&#125;
			finally
			&#123;
				if &#40;codepointer != IntPtr.Zero&#41;
				&#123;
					VirtualFree&#40;codepointer, 0, 0x8000&#41;;
					codepointer = IntPtr.Zero;
				&#125;
			&#125;
		&#125;
	&#125;
&#125;
Slow as hell! (~around 1:5)

There is still some hope: the bytecode is full of junk for the first look (the 2 lines for checking was removed by me btw) . If someone could help me out to replace it into a specific optimized bytecode - we might be able to move forward...

Anyone in?

Cheers, Balint
Hi Balint,

The machine language PopFirstBit seems to come from a debug version. There is much to throw out - a naked function without locals and stackframe should suffice. In bitboard serialization, I prefer a (inlined) routine for bitscan forward via call by value to reset the bit with independend "and" with one's decrement.

Code: Select all

if ( x ) do &#123;
   int idx = bitScanForward&#40;x&#41;; // square index from 0..63
   *list++ = foo&#40;idx, ...);
&#125; while &#40;x &= x-1&#41;; // reset LS1B
see also
https://chessprogramming.wikispaces.com/BitScan
https://chessprogramming.wikispaces.com ... %20Version

why (0xffffffffffffffff - bb + 1) intstead of (-bb) or (0-bb)?

Gerd
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

Thanks Gerd, agree with all of what you wrote!

- the code itself is just a lame initial version, I have to work on the .Net related parts.
- right, bb & ~(0-bb) might be used aswell, bb & ~-bb won't compile as I remember.
- the while trick is lovely!
- I have near zero x86/x64 asm knowledge, so the code might be best replaced anyhow by something written by a human expert and not by any compiler at the end, I think. But this link is great: https://chessprogramming.wikispaces.com ... %20Version

Have to run,
Balint
zongli
Posts: 13
Joined: Sat May 12, 2012 9:45 pm

Re: [.Net only] - fast bit operations

Post by zongli »

I think you mean -bb or ~bb + 1. :) It is indeed annoying that you can't just negate a UInt64 value. For the pop function I found that isolating the LSB first was a bit faster in practice (perft) even though it was slower when testing with random values in a loop:

Code: Select all

public static Int32 pop_first_x64_sw&#40;ref UInt64 b&#41; &#123;
     UInt64 bb = b & &#40;0UL - b&#41;;
     b &= b - 1;
     return BitIndex&#91;&#40;bb * 0x218A392CD3D5DBFUL&#41; >> 58&#93;;
&#125;
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

zongli wrote:I think you mean -bb or ~bb + 1. :) It is indeed annoying that you can't just negate a UInt64 value. For the pop function I found that isolating the LSB first was a bit faster in practice (perft) even though it was slower when testing with random values in a loop:

Code: Select all

public static Int32 pop_first_x64_sw&#40;ref UInt64 b&#41; &#123;
     UInt64 bb = b & &#40;0UL - b&#41;;
     b &= b - 1;
     return BitIndex&#91;&#40;bb * 0x218A392CD3D5DBFUL&#41; >> 58&#93;;
&#125;
Correct. Thanks for the runtime info!

Created a new code from release build and - hopefully - proper optimizations set. Performance is still around 1:2,5 even with code security supressed, too much overhead...

Balint

Code: Select all

using System;
using System.Security;
using System.Runtime.InteropServices;

namespace Popcount
&#123;
	// Idea&#58; http&#58;//ybeernet.blogspot.hu/2011/03/techniques-of-calling-unmanaged-code.html

	// Original idea here&#58; http&#58;//stackoverflow.com/questions/3216535/x86-x64-cpuid-in-c-sharp
	public static class PopFirst
	&#123;
		#region Interop

		&#91;DllImport&#40;"kernel32.dll", SetLastError = true&#41;&#93;
		private static extern IntPtr VirtualAlloc&#40;IntPtr lpAddress, UIntPtr dwSize, AllocationType flAllocationType,
			MemoryProtection flProtect&#41;;

		&#91;DllImport&#40;"kernel32")&#93;
		private static extern bool VirtualFree&#40;IntPtr lpAddress, UInt32 dwSize, UInt32 dwFreeType&#41;;

		&#91;Flags&#40;)&#93;
		private enum AllocationType &#58; uint
		&#123;
			COMMIT = 0x1000,
			RESERVE = 0x2000,
			RESET = 0x80000,
			LARGE_PAGES = 0x20000000,
			PHYSICAL = 0x400000,
			TOP_DOWN = 0x100000,
			WRITE_WATCH = 0x200000
		&#125;

		&#91;Flags&#40;)&#93;
		private enum MemoryProtection &#58; uint
		&#123;
			EXECUTE = 0x10,
			EXECUTE_READ = 0x20,
			EXECUTE_READWRITE = 0x40,
			EXECUTE_WRITECOPY = 0x80,
			NOACCESS = 0x01,
			READONLY = 0x02,
			READWRITE = 0x04,
			WRITECOPY = 0x08,
			GUARD_Modifierflag = 0x100,
			NOCACHE_Modifierflag = 0x200,
			WRITECOMBINE_Modifierflag = 0x400
		&#125;

		#endregion

		&#91;SuppressUnmanagedCodeSecurity&#93;
		&#91;UnmanagedFunctionPointerAttribute&#40;CallingConvention.Cdecl&#41;&#93;
		public delegate int PopFirstBit&#40;ref UInt64 b&#41;;

		static byte&#91;&#93; pop_first_x64 = new byte&#91;&#93;
		&#123;
			0x44 ,0x8B ,0x01					//mov         r8d,dword ptr &#91;rcx&#93;  
			,0x4C ,0x8B ,0xC9					//mov         r9,rcx  
			,0xBA ,0x01 ,0x00 ,0x00 ,0x00       //mov         edx,1  
			,0x49 ,0x0F ,0xBC ,0xC0				//bsf         rax,r8  
			,0x8B ,0xC8							//mov         ecx,eax  
			,0x48 ,0xD3 ,0xE2					//shl         rdx,cl  
			,0xF7 ,0xD2							//not         edx  
			,0x41 ,0x23 ,0xD0					//and         edx,r8d  
			,0x41 ,0x89 ,0x11					//mov         dword ptr &#91;r9&#93;,edx  
			,0xC3								//ret 
		&#125;;

		// Taken from Portfish/Stockfish
		internal static readonly int&#91;&#93; PopTable = new int&#91;64&#93;;
		internal static int pop_first_x64_sw&#40;ref UInt64 b&#41;
		&#123;
			UInt64 bb = b;
            b &= &#40;b - 1&#41;;
            return &#40;PopTable&#91;(&#40;bb & &#40;0xffffffffffffffff - bb + 1&#41;) * 0x218A392CD3D5DBFUL&#41; >> 58&#93;);
		&#125;

		public static void PopFirstTest&#40;)
		&#123;
			// Init &#40;taken from Portfish/Stockfish&#41;
			for &#40;int i = 0; i < 64; i++)
			&#123;
				PopTable&#91;(&#40;1UL << i&#41; * 0x218A392CD3D5DBFUL&#41; >> 58&#93; = i;
			&#125;

			IntPtr codepointer = IntPtr.Zero;
			try
			&#123;
				byte&#91;&#93; codeBytes = pop_first_x64;
				codepointer = VirtualAlloc&#40;
					IntPtr.Zero,
					new UIntPtr&#40;&#40;uint&#41;codeBytes.Length&#41;,
					AllocationType.COMMIT,
					MemoryProtection.EXECUTE_READWRITE
				);

				Marshal.Copy&#40;codeBytes, 0, codepointer, codeBytes.Length&#41;;

				PopFirstBit pop1st = &#40;PopFirstBit&#41;Marshal.GetDelegateForFunctionPointer&#40;codepointer, typeof&#40;PopFirstBit&#41;);

				Console.WriteLine&#40;"test prepared, press any key");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41;
					&#123; 
						int first = pop1st&#40;ref target&#41;;
					&#125;
				&#125;

				Console.WriteLine&#40;"first passed");
				Console.ReadKey&#40;);

				for &#40;UInt64 i = 0; i < 5000000; i++)
				&#123;
					UInt64 target = i;
					while &#40;target != 0&#41;
					&#123; 
						int first = pop_first_x64_sw&#40;ref target&#41;; 
					&#125;
				&#125;

				Console.WriteLine&#40;"second passed");
				Console.ReadKey&#40;);
			&#125;
			finally
			&#123;
				if &#40;codepointer != IntPtr.Zero&#41;
				&#123;
					VirtualFree&#40;codepointer, 0, 0x8000&#41;;
					codepointer = IntPtr.Zero;
				&#125;
			&#125;
		&#125;
	&#125;
&#125;
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

I retested UInt64 bb = b & (0UL - b); instead of the old code and performs somewhat better. I have a very minimal movegen code and now it's visible - did not detect it when I tested this change on Portfish earlier. But is very logical, one op less.

Thanks anyway! Balint
zongli
Posts: 13
Joined: Sat May 12, 2012 9:45 pm

Re: [.Net only] - fast bit operations

Post by zongli »

Well, assuming the compiler understands the commutative property of addition, it's the same number of operations. :) I think the gain comes from slightly better parallelism once the method gets inlined.

I'm not sure if you'll be able to get much out of the assembly code; it might look terse but there's a lot of instructions needed to put it in a delegate. Also, it probably can't get inlined.
bpfliegel
Posts: 71
Joined: Fri Mar 16, 2012 10:16 am

Re: [.Net only] - fast bit operations

Post by bpfliegel »

zongli wrote:Well, assuming the compiler understands the commutative property of addition, it's the same number of operations. :) I think the gain comes from slightly better parallelism once the method gets inlined.

I'm not sure if you'll be able to get much out of the assembly code; it might look terse but there's a lot of instructions needed to put it in a delegate. Also, it probably can't get inlined.
Agree again. Let's hope someone will come up with a better technical idea - I'm abandoning this attempt.
Thanks, Balint