[OT] LOC counting tool

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

[OT] LOC counting tool

Post by mar »

I've just finished my simple LOC counting tool for C/C++ so I thought it might be interesting for some to try:
http://www.crabaware.com/Utils/CountLoc/CountLoc.zip
This is a binary-only release (Win/Linux/OSX).

I tried to match cloc counts but I failed (because I found a nasty bug in cloc which in rare cases miscounts comments vs code; at least as of 1.62 but I already found the bug in their tracker)
My tool operates in two modes:
- conservative (default), where punctuation on separate lines is not counted as LOC
- greedy (trying to mimic cloc)

conservative considers the following fragments 2 LOC (2, 4, 3 in greedy mode):

Code: Select all

if (a)
    b();

if (a)
{
    b();
}

if (a) {
    b();
}
I also thought about counting logical lines (basically counting ; [with lots of exceptions] but this would make things a lot more complicated to output reliable results) but I didn't bother in the end.
While LOC is probably not a good measure to compare effort or compare different projects (if two programs do the same but one is much smaller then obviously less is more),
still it gives a rought idea about the size of a project.

Because of the way it works, it's actually pretty fast (compared to cloc from my experience), but this comparison is a bit unfair as cloc attempts to auto-detect file type while mine simply relies on file extension.
Duplicate (md5 hash)/empty files are ignored.
Each file is loaded into memory, then running 1 MD5 pass and 1 LOC pass.

Archives aren't supported.
UTF-16 source files aren't supported (wrong result in that case as comments/CRLF EOLs won't be detected properly).

List of supported extensions:
cc, cpp => C++ source
h, hpp => C/C++ include
inl => C/C++ inline
mm => Objective C++ source
m => Objective C source
cs => C# source
java => Java source
js => JavaScript source

for more information about command-line arguments, see readme.txt
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: [OT] LOC counting tool

Post by mcostalba »

Nice !

This is a run on current SF master (without TB code)

Code: Select all

CountLoc v1.0, (c) 2015 mar
-------------------------------------------------------------------------------
Type                         Files          Blank        Comment           Code
-------------------------------------------------------------------------------
C++ source                      18           1897           1367           4060
C/C++ header                    18            598            509           1286
-------------------------------------------------------------------------------
Total                           36           2495           1876           5346
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: [OT] LOC counting tool

Post by lucasart »

mar wrote: conservative considers the following fragments 2 LOC (2, 4, 3 in greedy mode):

Code: Select all

if (a)
    b();

if (a)
{
    b();
}

if (a) {
    b();
}
This is an interesting feature, and something that CLOC cannot do. Whether people use the K&R or ANSI style, the line count should be same. Counting lines containing a single brace is silly, and prevents comparisons between two styles.

That being said, there are style many ways to abuse the C cryptic syntax to cheat line counts.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

v1.5

Post by mar »

New vastly improved version can be downloaded here: http://www.crabaware.com/Utils/CountLoc/CountLoc.zip
Now also including freebsd binary.

New features:
- proper lexing of string literals (plus raw strings where appropriate), so that /* inside string won't be mistaken for a comment anymore
- process individual files/zip archives in args (zip files inside scanned folders will be ignored)
- punctuation lines shown in a separate column in non-greedy mode (should sum up with code to LOC in greedy mode)
- pretty-print results
- support for new languages:
- D (including raw strings and nested comments)
- Lua (multi-line comments also work)
- Pascal (directives are counted as code)
- Python (separate doc strings act as multi-line comments)
- PHP (including heredoc and nowdoc)
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: v1.5

Post by cdani »

Current Andscacs:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   13       1'408       2'692       1'128       7'526
C/C++ header                 14         301         702         132       1'289
-------------------------------------------------------------------------------
Total                        27       1'709       3'394       1'260       8'815
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: v1.5

Post by mar »

cdani wrote:Current Andscacs:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   13       1'408       2'692       1'128       7'526
C/C++ header                 14         301         702         132       1'289
-------------------------------------------------------------------------------
Total                        27       1'709       3'394       1'260       8'815
That's pretty cool, less is more. Cheng currently has 12'779 LOC and is 100+? elo weaker than Andscacs :)
However this includes ~3k LOC of generated kpk table
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: v1.5

Post by mcostalba »

Current SF master branch:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   19       1'798       1'328         719       3'917
C/C++ header                 17         565         484         183       1'197
-------------------------------------------------------------------------------
Total                        36       2'363       1'812         902       5'114
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: v1.5

Post by cdani »

mcostalba wrote:Current SF master branch:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   19       1'798       1'328         719       3'917
C/C++ header                 17         565         484         183       1'197
-------------------------------------------------------------------------------
Total                        36       2'363       1'812         902       5'114
Obviously clearer and more optimized :-)
mar wrote:
cdani wrote:Current Andscacs:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   13       1'408       2'692       1'128       7'526
C/C++ header                 14         301         702         132       1'289
-------------------------------------------------------------------------------
Total                        27       1'709       3'394       1'260       8'815
That's pretty cool, less is more. Cheng currently has 12'779 LOC and is 100+? elo weaker than Andscacs :)
However this includes ~3k LOC of generated kpk table
Yes, as we know it requires a lot of time and resources to optimize! :-)
User avatar
Bloodbane
Posts: 154
Joined: Thu Oct 03, 2013 4:17 pm

Re: v1.5

Post by Bloodbane »

Current Hakkapeliitta. I suppose I should comment the .cpp files a bit more.

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C++ source                   33         922         697       1'537       3'505
C/C++ header                 34         902       1'256         806       2'456
-------------------------------------------------------------------------------
Total                        67       1'824       1'953       2'343       5'961
Functional programming combines the flexibility and power of abstract mathematics with the intuitive clarity of abstract mathematics.
https://github.com/mAarnos
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: v1.5

Post by mvk »

mar wrote:New vastly improved version can be downloaded here: http://www.crabaware.com/Utils/CountLoc/CountLoc.zip
Now also including freebsd binary.

New features:
- proper lexing of string literals (plus raw strings where appropriate), so that /* inside string won't be mistaken for a comment anymore
- process individual files/zip archives in args (zip files inside scanned folders will be ignored)
- punctuation lines shown in a separate column in non-greedy mode (should sum up with code to LOC in greedy mode)
- pretty-print results
- support for new languages:
- D (including raw strings and nested comments)
- Lua (multi-line comments also work)
- Pascal (directives are counted as code)
- Python (separate doc strings act as multi-line comments)
- PHP (including heredoc and nowdoc)
I was already using CountLoc v1.0 to track where Floyd was going.

Before:

Code: Select all

CountLoc v1.0, (c) 2015 mar
-------------------------------------------------------------------------------
Type                         Files          Blank        Comment           Code
-------------------------------------------------------------------------------
C source                        13            791            939           2810
C/C++ header                     8            195            307            488
-------------------------------------------------------------------------------
Total                           21            986           1246           3298
New CountLoc:

Code: Select all

-------------------------------------------------------------------------------
Type                      Files       Blank     Comment Punctuation        Code
-------------------------------------------------------------------------------
C source                     13         791         939         405       2'810
C/C++ header                  8         195         307          24         488
-------------------------------------------------------------------------------
Total                        21         986       1'246         429       3'298
1. No version is actually listed
2. Is there a way to hide those thousands ticks? (Or maybe much better: to give machine readable output as an option, such as JSON, CSV or whatever).
3. Good: the categories indeed add up again to the raw line count. (v1.0 made lines disappear). It is now more informative than cloc for me:

Reference (cloc):

Code: Select all

http://cloc.sourceforge.net v 1.64  T=0.10 s (209.5 files/s, 59454.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               13            791            939           3215
C/C++ Header                     8            195            307            512
-------------------------------------------------------------------------------
SUM:                            21            986           1246           3727
-------------------------------------------------------------------------------
[Account deleted]