[SCID Database] Updated FIDE spelling files - February 2018+

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

[SCID Database] Updated FIDE spelling files - February 2018+

Post by styx » Tue Feb 27, 2018 8:11 pm

The SCID spelling files are useful if you like to have the full names in your database (e.g. "Carlsen, Magnus" instead of "Carlsen, M") and for general database maintenance. After importing games from a pgn-source (like TWIC) you can automatically rename the player names (only works flawlessly for unambiguous names).

I was a bit disappointed, that the original SCID spelling file is a bit outdated (December 2015, 240.743 players). So I wrote a parser and updated the files myself:

>> Download new SCID Spelling Files <<

I hope you find it helpful.

Annotations:
- There are two versions available:
  • 1. FULL: contains all registered players even with no official FIDE-rating (749.798 Players in February 2018 list, only standard rating!)
    2. RATED ONLY: contains only FIDE-rated players (298.691 Players in February 2018 list, only standard rating!)
- There are NO players included who are not currently listed in the FIDE database (you still need the original SCID spelling.ssp if you need spell checking on - for example - dead players)
- I will irregularly update the files (download link will stay the same)
- I highly recommend using these two new spelling-files parallelly to the original
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them

Data-Source: http://ratings.fide.com/download.phtml

Norm Pollock
Posts: 1017
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by Norm Pollock » Tue Feb 27, 2018 9:02 pm

styx wrote:The SCID spelling files are useful if you like to have the full names in your database (e.g. "Carlsen, Magnus" instead of "Carlsen, M") and for general database maintenance. After importing games from a pgn-source (like TWIC) you can automatically rename the player names (only works flawlessly for unambiguous names).

I was a bit disappointed, that the original SCID spelling file is a bit outdated (December 2015, 240.743 players). So I wrote a parser and updated the files myself:

>> Download new SCID Spelling Files <<

I hope you find it helpful.

Annotations:
- There are two versions available:
  • 1. FULL: contains all registered players even with no official FIDE-rating (749.798 Players in February 2018 list, only standard rating!)
    2. RATED ONLY: contains only FIDE-rated players (298.691 Players in February 2018 list, only standard rating!)
- There are NO players included who are not currently listed in the FIDE database (you still need the original SCID spelling.ssp if you need spell checking on - for example - dead players)
- I will irregularly update the files (download link will stay the same)
- I highly recommend using these two new spelling-files parallelly to the original
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them

Data-Source: http://ratings.fide.com/download.phtml
I have had an issue with the spelling of names in SCID and I have made several complaints which were ignored. The issue is the initial(s) after the family name. They never include a period "." after the initial(s). This causes a lot of work for me to correct when I add new names to the databases that I maintain.

Here is the requirement from the PGN Standards that SCID ignores:

8.1.1.5: The White tag

The White tag value is the name of the player or players of the white pieces. The names are given as they would appear in a telephone directory. The family or last name appears first. If a first name or first initial is available, it is separated from the family name by a comma and a space. Finally, one or more middle initials may appear. (Wherever a comma appears, the very next character should be a space. Wherever an initial appears, the very next character should be a period.) If the name is unknown, a single question mark should appear as the tag value.
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. -- Charles Darwin

styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by styx » Tue Feb 27, 2018 9:29 pm

It's not SCID's fault. It's not the spellcheck files' fault.
It's the fault of the people who register a name in the FIDE database. I don't know exactly how it works. Is it the player who registers an account at FIDE?

Take a look at the FIDE Database and you can see that there is no consistent way of storing names. You can find all kind of stuff:

Doe, John
Doe A. B. C., John
Doe A, J
Doe, J
Doe, J.
Doe, A John

and so on...

I think it's possible to add a period after every character with length 1 on the right side of the comma. But I am not sure if I want to do it. Maybe there are countries with one-character first-names or countries with the first name on the left side of the comma (china?). The result would be a mess.

Norm Pollock
Posts: 1017
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by Norm Pollock » Wed Feb 28, 2018 12:51 am

It's not much of a problem now, because I wrote a tool that inserts a period after an initial. It is called "tagFix" and is in "40H-PGN" tools.

But my beef with SCID is that ALL periods are removed after initials when I use "Spell Checking of Players". So if I have "Fischer, R.", the spell check corrects it to "Fischer, R".

On the other hand, periods are sometimes not appropriate:

King George V
King, George V.

Komodo 9
Komodo 9.0
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. -- Charles Darwin

styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by styx » Wed Feb 28, 2018 1:26 am

Norm Pollock wrote: But my beef with SCID is that ALL periods are removed after initials when I use "Spell Checking of Players". So if I have "Fischer, R.", the spell check corrects it to "Fischer, R".
That is still not SCID's fault. SCID uses exactly the names as specified in the spelling file. If the spelling file says "Fischer, Robert J.", SCID will display the period. But if the name in the spelling file is "Fischer, Robert J" then SCID will show it like this.

In other words: SCID will never delete the periods by itself, only if the spelling file says so.

And yes: in the spelling files that I provided, most names will not have the period. But it's not my fault. This data is taken straight out of the FIDE database and that's exactly how they saved it.

A lot of work is required to make it absolutely consistent.

Norm Pollock
Posts: 1017
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by Norm Pollock » Wed Feb 28, 2018 1:48 am

How can I download the FIDE database of player names?

I do not see any link for it on their site.
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. -- Charles Darwin

styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by styx » Wed Feb 28, 2018 3:15 am

http://ratings.fide.com/download.phtml

Klick on "TXT Format" or "XML Format" to download the desired version.

Fulvio
Posts: 146
Joined: Fri Aug 12, 2016 6:43 pm

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by Fulvio » Wed Feb 28, 2018 7:49 am

styx wrote: So I wrote a parser and updated the files myself:
...
- I will irregularly update the files (download link will stay the same)
That's great, thanks!

Norm Pollock
Posts: 1017
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by Norm Pollock » Thu Mar 01, 2018 4:04 am

Thanks.
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change. -- Charles Darwin

styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

Re: [SCID Database] Updated FIDE spelling files - February 2

Post by styx » Fri Mar 02, 2018 10:03 pm

I cleaned up the database a bit. There is now always a comma separating first- and last name. If there are more than two strings (separated by a whitespace) per name, the first string longer than 1 character (periods ignored) will be separated from the rest. This might not be 100% correct, but most of the time it is.
24.424 names changed/separated in the RATED ONLY version, 73.439 in the FULL version. Same link. Files are updated.

Example:

Code: Select all

Praggnanandhaa R --> Praggnanandhaa, R

A Doe B C John --> A Doe, B C John
Another thing I'd like to mention:
- Don't just use the "FULL" version since it does not contain all the names in the "RATED ONLY" file (Why? I guess it's FIDE's secret...) and I was too lazy to merge them
Turned out this statement of mine was incorrect. The "FULL" version contains all the entries of the "RATED ONLY" version PLUS all the unrated players.

Post Reply