Development of Shen Yu

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: Development of Shen Yu

Post by algerbrex »

AAce3 wrote: Sat Aug 27, 2022 6:34 pm Hey everyone,
I'm currently writing a chess engine called Shen Yu, written in rust. It's not open source because I haven't put in the work to document it that well (And I'm somewhat of a novice at the language :P ), but rest assured that when I have time I will end up doing so.

Shen Yu is currently just a legal move generator, and it sort of has built-in selective move generation already (i.e., my "Move" type has a built in capture flag, so I just decided to generate captures and quiets spearately). It hits 140 Mnps on perft 7 from starting position with bulk counting, 45 Mnps without on my old laptop.

The eventual goal is for Shen Yu to become a MCTS/Alpha-Beta hybrid engine, i.e. the expansion phase of MCTS will involve some form of alpha-beta search. I have a lot of ideas I want to try for that, but first I will need to implement the alpha-beta part.I'm not planning to release Shen Yu until after I've implemented most of the basic ideas (LMR, NMP, History, etc. etc.), the reason being that I just don't have that much time.

I'm currently a high school, and am applying for college soon, so updates and posts may be infrequent.

This thread is really just a brain dump where I can toss ideas that I want to try. I woud love any kinds of feedback and suggestions.

Huge thanks to Blunder, Rustic, Leorik, BBC, and many other wonderful open source engines for having such excellent readable code and useful comments. I have never really used C or C++, so having lots of useful comments is so helpful (And why I want to make sure my code is well document it before I make it open source :D ). And, huge thanks to the chess programming wiki. You all have made some really wonderful stuff.
Cool! I'll bookmark this thread, I look forward to the first release of Shen Yu. A nice perft speed. I think the highest I ever got from Blunder was ~25 Mnps, after a good bit of optimizing and tweaking. I suppose I could do better, but Blunder is a chess engine firstly, not a perft counter.

And I'm glad Blunder was helpful to you. Feel free to implement whatever ideas you see. Part of the reason I put so much work into thoroughly documenting the interesting parts of Blunder's code was that I wanted to give back to the chess community and help out newcomers, like so many people helped me out when I first started. So I'm always very happy to see Blunder mentioned when someone lists engines they've found helpful on their chess programming journey :)
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

algerbrex wrote: Tue Aug 30, 2022 8:45 am And I'm glad Blunder was helpful to you. Feel free to implement whatever ideas you see. Part of the reason I put so much work into thoroughly documenting the interesting parts of Blunder's code was that I wanted to give back to the chess community and help out newcomers, like so many people helped me out when I first started. So I'm always very happy to see Blunder mentioned when someone lists engines they've found helpful on their chess programming journey :)
Blunder definitely has some of the most readable and well documented implementations out of any engine I've seen and I really do appreciate that (and the expansive feature set means that if I want to implement something I know where to find a readable and didactic implementation if I don't understand the CPW :D).
hgm wrote: Mon Aug 29, 2022 7:51 am You would need a larger divisor than 2 to get MVV/LVA order. 8 will do. In Fairy-Max I use victim itself, instead of value[victim], because it represents pieces by type in order of ascending value (and values in centiPawn). As for SEE: testing whether you need that could be postponed until after the move is next in line in the MVV/LVA ordering. SEE can be relatively expensive, so it would be a waste to calculate it for moves that you will never get to because of a beta cutoff.
I just finished debugging staged move generation, with MVV/LVA sorting for captures, using your suggestions.
I use mvv-lva for sorting captures, and add an offset of 1 if it's a HxL, so that I know which ones to skip while iterating through winning captures.

Then, when I'm in the "losing captures stage" (stages are ttmove -> winning captures -> killers -> losing captures -> quiets), I find the best mvv-lva score, I swap it into the current index then check if the SEE is less than the mvv-lva value. If it is, then I replace mvv-lva with SEE, then try to swap again.

I ended up filling in the staged move generator into perft() rather than my normal one, and checking to see if it produced correct results. I did successfully debug it, but it ended up being much slower, roughly 18 Mnps. though that is to be expected, considering that sorting and SEE etc. have to be done. I think I'll try to speed it up later but 18 Mnps for staged movegen and sorted moves is satisfactory for now.

The other thing that was bugging me was that I tried to make the movepicker an iterator, and it itself held a pointer to move ordering data (killers and history). The issue was that I had to modify the move ordering data while iterating through the movepicker, which in turn modified the move picker. Rust's borrow checker really hated this and I couldn't get the compiler to shut up. I ended up scrapping the "iterator" idea and just used a normal loop, where the movepicker had a function that had the pointer as an input and returned the next best move. This ended up working.

And, of course, there was the transposition table. I wanted to probe the entry then keep a pointer around so that I could write to it with the score, without having to recompute the index and the pointer. The issue was, of course, that I was also passing a pointer to the transposition table as a whole recursively. Again, rust's borrow checker got angry at me and I ended up using unsafe code and raw pointers.

All this took 6-7 hours total, looking through stackoverflow and rust documentation. Luckily I'm still on summer break, but when classes start for me in a week I probably won't have that kind of time.

I think I will release an alpha-beta version of Shen Yu first, maybe in a month or two, then focus on the MCTS part. And, of course, I need to get around to documenting my code.
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

I'm scrapping staged movegen for now. Too many moving parts to debug :oops: I put another 4 hours into it and it still didn't work, probably because I wasn't careful enough when implementing it at first.

Bugs are annoyingly difficult to find in chess programming. I think that next time I will need to be more thorough when writing the code, because reading it and trying to debug it is significantly more difficult than just being careful when writing it in the first place :oops:.
clayt
Posts: 29
Joined: Thu Jun 09, 2022 5:09 am
Full name: Clayton Ramsey

Re: Development of Shen Yu

Post by clayt »

AAce3 wrote: Wed Aug 31, 2022 8:20 pm I'm scrapping staged movegen for now. Too many moving parts to debug :oops: I put another 4 hours into it and it still didn't work, probably because I wasn't careful enough when implementing it at first.

Bugs are annoyingly difficult to find in chess programming. I think that next time I will need to be more thorough when writing the code, because reading it and trying to debug it is significantly more difficult than just being careful when writing it in the first place :oops:.
I'm not sure if you know, but it's possible to convert a normal move generator in Rust to a staged move generator relatively easily using const generics. Although enums are not yet supported in const generics, we can do it the long way around with integers.

Code: Select all

type GenMode = u8;
const ALL: GenMode = 0;
const CAPTURES: GenMode = 1;
const QUIETS: GenMode = 2;

fn get_moves<const M: GenMode>(b: &Board) -> Vec<Move> {
	assert!(M == ALL || M == CAPTURES || M == QUIETS);
	// you can now use M to check the generation mode at any step in the way.
	// it's common to do something like `if M != QUIETS { add captures }` and so on.
	// This way, I don't need to manage a different helper function for each piece type and generation mode.
	todo!()
}
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

Yeah the main issue was that I kept adding more and more code for checking the killers and the tt move so it ended up becoming a huge mess. So I kept outputting illegal moves and such and that caused a bunch of crashes. Eventually I just gave up entirely on staged move generation.

Const generics are fun. I use them a lot in shen yu.
Last edited by AAce3 on Wed Aug 31, 2022 9:17 pm, edited 1 time in total.
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

clayt wrote: Wed Aug 31, 2022 8:52 pm
I'm not sure if you know, but it's possible to convert a normal move generator in Rust to a staged move generator relatively easily using const generics. Although enums are not yet supported in const generics, we can do it the long way around with integers.
Actually, fun fact- enums are supported if you use a specific library feature on nightly. I use that for my direction shifts.
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

I’ve implemented transposition, PVS, IID, Killer, History and MVV-LVA and SEE, and delta pruning with SEE for losing captures in quiescence search. I did a quick test run of it, with 50 cp aspiration window just to see how it would go. I’m a little disappointed- with no forward pruning it takes 2.5 seconds to reach depth 9 from the starting position, with 6Mnps. In kiwipete, it reaches depth 7 and gets hit hard by quiescence search explosion. (I’ll probably try limiting quiescence ply- maybe doing only recaptures after a certain depth?) Anyway, I was just wondering how other engines do without forward pruning. There might be a bug somewhere, who knows?
JVMerlino
Posts: 1396
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Development of Shen Yu

Post by JVMerlino »

AAce3 wrote: Sat Sep 03, 2022 2:15 am I’ve implemented transposition, PVS, IID, Killer, History and MVV-LVA and SEE, and delta pruning with SEE for losing captures in quiescence search. I did a quick test run of it, with 50 cp aspiration window just to see how it would go. I’m a little disappointed- with no forward pruning it takes 2.5 seconds to reach depth 9 from the starting position, with 6Mnps. In kiwipete, it reaches depth 7 and gets hit hard by quiescence search explosion. (I’ll probably try limiting quiescence ply- maybe doing only recaptures after a certain depth?) Anyway, I was just wondering how other engines do without forward pruning. There might be a bug somewhere, who knows?
I might have missed it, but if you have no reductions (e.g. LMR, Null Move, etc.), then you will certainly have relatively poor time-to-depth results. My engine, Myrddin (CCRL 2600) is pretty slow at around 2M NPS. But it completes depth 9 from the start position in 0.06 seconds, and takes almost exactly the same amount of time (and nodes, oddly enough) to complete depth 7 on Kiwipete.

Indeed, if I disable all reductions, it takes 1.65 seconds to complete depth 9 from the start position (27x longer), and 0.5 seconds to complete depth 7 on Kiwipete (10x longer).
User avatar
AAce3
Posts: 80
Joined: Fri Jul 29, 2022 1:30 am
Full name: Aaron Li

Re: Development of Shen Yu

Post by AAce3 »

JVMerlino wrote: Sat Sep 03, 2022 4:00 am
AAce3 wrote: Sat Sep 03, 2022 2:15 am I’ve implemented transposition, PVS, IID, Killer, History and MVV-LVA and SEE, and delta pruning with SEE for losing captures in quiescence search. I did a quick test run of it, with 50 cp aspiration window just to see how it would go. I’m a little disappointed- with no forward pruning it takes 2.5 seconds to reach depth 9 from the starting position, with 6Mnps. In kiwipete, it reaches depth 7 and gets hit hard by quiescence search explosion. (I’ll probably try limiting quiescence ply- maybe doing only recaptures after a certain depth?) Anyway, I was just wondering how other engines do without forward pruning. There might be a bug somewhere, who knows?
I might have missed it, but if you have no reductions (e.g. LMR, Null Move, etc.), then you will certainly have relatively poor time-to-depth results. My engine, Myrddin (CCRL 2600) is pretty slow at around 2M NPS. But it completes depth 9 from the start position in 0.06 seconds, and takes almost exactly the same amount of time (and nodes, oddly enough) to complete depth 7 on Kiwipete.

Indeed, if I disable all reductions, it takes 1.65 seconds to complete depth 9 from the start position (27x longer), and 0.5 seconds to complete depth 7 on Kiwipete (10x longer).
That's somewhat a relief to hear, though I'm wondering what move ordering techniques you have to get that sort of speed :D.

I think the main thing that's slowing my search is QS nodes. Even from startpos, where there should be relatively few QS nodes over 90% of my nodes are QS nodes, and for Kiwipete it's even more ridiculous, at almost 98% QS nodes. I have delta pruning and move ordering, and I think that it's just a search explosion, with too many captures to analyze. In fact, if I disable QS move ordering on kiwipete it takes nearly 3-4 seconds to finish depth 3. My current search does zero limiting of QS ply, so I think I'm going to work on that next.

I'm also going to check history ordering and killer moves just to make sure they're working properly. Maybe add countermove heuristic and follow up heuristic as well as butterfly boards to see if that reduces nodes.

I read somewhere else on this forum that a lot of engines only examine recaptures after ply 6 of QS. I'm going to try doing that next.
JVMerlino
Posts: 1396
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: Development of Shen Yu

Post by JVMerlino »

AAce3 wrote: Sat Sep 03, 2022 4:30 am I think the main thing that's slowing my search is QS nodes. Even from startpos, where there should be relatively few QS nodes over 90% of my nodes are QS nodes, and for Kiwipete it's even more ridiculous, at almost 98% QS nodes. I have delta pruning and move ordering, and I think that it's just a search explosion, with too many captures to analyze. In fact, if I disable QS move ordering on kiwipete it takes nearly 3-4 seconds to finish depth 3. My current search does zero limiting of QS ply, so I think I'm going to work on that next.
That does seem extremely high. If I let Myrddin run to depth 20 on the initial position, it's only 27% QS nodes. For Kiwipete, again to depth 20, it's 42% QS nodes.