Page 1 of 2

[SCID Database] Updated FIDE spelling files - February 2018+

Posted: Tue Feb 27, 2018 9:11 pm
by styx
The SCID spelling files are useful if you like to have the full names in your database (e.g. "Carlsen, Magnus" instead of "Carlsen, M") and for general database maintenance. After importing games from a pgn-source (like TWIC) you can automatically rename the player names (only works flawlessly for unambiguous names).

I was a bit disappointed, that the original SCID spelling file is a bit outdated (December 2015, 240.743 players). So I wrote a parser and updated the files myself:

>> Download new SCID Spelling Files <<

I hope you find it helpful.

Annotations:
- There are two versions available:
  • 1. FULL: contains all registered players even with no official FIDE-rating (749.798 Players in February 2018 list, only standard rating!)
    2. RATED ONLY: contains only FIDE-rated players (298.691 Players in February 2018 list, only standard rating!)
- There are NO players included who are not currently listed in the FIDE database (you still need the original SCID spelling.ssp if you need spell checking on - for example - dead players)
- I will irregularly update the files (download link will stay the same)
- I highly recommend using these two new spelling-files parallelly to the original
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them

Data-Source: http://ratings.fide.com/download.phtml

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Tue Feb 27, 2018 10:02 pm
by Norm Pollock
styx wrote:The SCID spelling files are useful if you like to have the full names in your database (e.g. "Carlsen, Magnus" instead of "Carlsen, M") and for general database maintenance. After importing games from a pgn-source (like TWIC) you can automatically rename the player names (only works flawlessly for unambiguous names).

I was a bit disappointed, that the original SCID spelling file is a bit outdated (December 2015, 240.743 players). So I wrote a parser and updated the files myself:

>> Download new SCID Spelling Files <<

I hope you find it helpful.

Annotations:
- There are two versions available:
  • 1. FULL: contains all registered players even with no official FIDE-rating (749.798 Players in February 2018 list, only standard rating!)
    2. RATED ONLY: contains only FIDE-rated players (298.691 Players in February 2018 list, only standard rating!)
- There are NO players included who are not currently listed in the FIDE database (you still need the original SCID spelling.ssp if you need spell checking on - for example - dead players)
- I will irregularly update the files (download link will stay the same)
- I highly recommend using these two new spelling-files parallelly to the original
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them

Data-Source: http://ratings.fide.com/download.phtml
I have had an issue with the spelling of names in SCID and I have made several complaints which were ignored. The issue is the initial(s) after the family name. They never include a period "." after the initial(s). This causes a lot of work for me to correct when I add new names to the databases that I maintain.

Here is the requirement from the PGN Standards that SCID ignores:

8.1.1.5: The White tag

The White tag value is the name of the player or players of the white pieces. The names are given as they would appear in a telephone directory. The family or last name appears first. If a first name or first initial is available, it is separated from the family name by a comma and a space. Finally, one or more middle initials may appear. (Wherever a comma appears, the very next character should be a space. Wherever an initial appears, the very next character should be a period.) If the name is unknown, a single question mark should appear as the tag value.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Tue Feb 27, 2018 10:29 pm
by styx
It's not SCID's fault. It's not the spellcheck files' fault.
It's the fault of the people who register a name in the FIDE database. I don't know exactly how it works. Is it the player who registers an account at FIDE?

Take a look at the FIDE Database and you can see that there is no consistent way of storing names. You can find all kind of stuff:

Doe, John
Doe A. B. C., John
Doe A, J
Doe, J
Doe, J.
Doe, A John

and so on...

I think it's possible to add a period after every character with length 1 on the right side of the comma. But I am not sure if I want to do it. Maybe there are countries with one-character first-names or countries with the first name on the left side of the comma (china?). The result would be a mess.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Wed Feb 28, 2018 1:51 am
by Norm Pollock
It's not much of a problem now, because I wrote a tool that inserts a period after an initial. It is called "tagFix" and is in "40H-PGN" tools.

But my beef with SCID is that ALL periods are removed after initials when I use "Spell Checking of Players". So if I have "Fischer, R.", the spell check corrects it to "Fischer, R".

On the other hand, periods are sometimes not appropriate:

King George V
King, George V.

Komodo 9
Komodo 9.0

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Wed Feb 28, 2018 2:26 am
by styx
Norm Pollock wrote: But my beef with SCID is that ALL periods are removed after initials when I use "Spell Checking of Players". So if I have "Fischer, R.", the spell check corrects it to "Fischer, R".
That is still not SCID's fault. SCID uses exactly the names as specified in the spelling file. If the spelling file says "Fischer, Robert J.", SCID will display the period. But if the name in the spelling file is "Fischer, Robert J" then SCID will show it like this.

In other words: SCID will never delete the periods by itself, only if the spelling file says so.

And yes: in the spelling files that I provided, most names will not have the period. But it's not my fault. This data is taken straight out of the FIDE database and that's exactly how they saved it.

A lot of work is required to make it absolutely consistent.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Wed Feb 28, 2018 2:48 am
by Norm Pollock
How can I download the FIDE database of player names?

I do not see any link for it on their site.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Wed Feb 28, 2018 4:15 am
by styx
http://ratings.fide.com/download.phtml

Klick on "TXT Format" or "XML Format" to download the desired version.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Wed Feb 28, 2018 8:49 am
by Fulvio
styx wrote: So I wrote a parser and updated the files myself:
...
- I will irregularly update the files (download link will stay the same)
That's great, thanks!

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Thu Mar 01, 2018 5:04 am
by Norm Pollock
Thanks.

Re: [SCID Database] Updated FIDE spelling files - February 2

Posted: Fri Mar 02, 2018 11:03 pm
by styx
I cleaned up the database a bit. There is now always a comma separating first- and last name. If there are more than two strings (separated by a whitespace) per name, the first string longer than 1 character (periods ignored) will be separated from the rest. This might not be 100% correct, but most of the time it is.
24.424 names changed/separated in the RATED ONLY version, 73.439 in the FULL version. Same link. Files are updated.

Example:

Code: Select all

Praggnanandhaa R --> Praggnanandhaa, R

A Doe B C John --> A Doe, B C John
Another thing I'd like to mention:
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them
Turned out this statement of mine was incorrect. The "FULL" version contains all the entries of the "RATED ONLY" version PLUS all the unrated players.