Move sorting is not a 'monolithic' problem, and there are many ways to approach it. Captures have to be sorted on other criteria as non-captures (MVV-LVA & SEE vs history / killer). Some sort algorithms are more efficient than others (qsort, shell sort), but these are often difficult to perform in a 'staged' way. Extracting the best move every time is one of the slowest sorting methods (O(N^2)), but if you get a beta cutoff after the first move, you have spent only O(N) time, and never have to do the remaining part of the sort.
It also depends on how you separate the captures from the non-captures. Before you even can start sorting you have to assign a sort key to the moves, and to determine what key to use, you have to determine whether the move is a capture. I usually do that already during move generation. With a mailbox design this information becomes available as a side effect: you have to examin destination squares to see whether they contain an enemy or a friend, and if they contain an enemy so that you generate the capture you can add an MVV/LVA sort key to the move. I then just store the captures at the beginning of the move list, while non-captures go at the end. In bitboard designs you can also use different loops to extract captures and non-captures, and store them in seperate places.
When you already start with the captures separated from the non-captures, the sorting of the captures (which you will search first) is not a big deal, as there are only few of those. So you don't care too much if the sorting algoritm is O(N^2), O(N*log(N)) or O(N), and you just extract the next in line. This is different for the non-captures, but by the time you get to those, the chances that you will still get a beta cutoff are rather slim. Extracting killers is not really a soring problem; you have to identify them in the list. And after you have searched those the prospects for a beta cutoff are even slimmer. So it could help to have an efficient algorithm for the history sort, even when this needs to be completed before you can search your first move.
Now it happens to be not to difficult to do a complete history sort in O(N) steps. The idea is to 'bin' the moves by history score, distributing them over a number of bins that is so large that there will be hardly any bin that gets more than a single move in it. You can then loop through the bins from high to low history score, and search the moves you encounter in them. When a bin occasionally contains more than one move, you could in principle sort that small number. But since they will have very nearly the same history score, this would probably be a waste of time, and you might as well search them in any order.
You could reserve an array of 64 bytes for the bins, which each could contain the number of a move in the move list. And you could use the high byte of the moves in that list (normally used for the sort key) to indicate there are more moves in the same bin (i.e. use it as a link in a linked list). To quickly loop through a large and sparsely filled set of bins, you would assign a bit to each bin in a set of larger integer variables, which you set to 1 when you put a move in the bin; You can then loop over the bins by extracting the 1 bits from this 'bin directory', and then run through the linked list they contain (usually of length 1) to search the moves there. To make efficient use of the bins you could keep track of the largest history score in the table, and take the history scores relative to that. (E.g. you could do the history calculation in 32-bit floting point, but read it back for scoring the moves as 32-bit integer, to get it on an approximately logarithmic scale.)