Rust compiler in C

Discussion of chess software programming and technical issues.

Moderator: Ras

JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Rust compiler in C

Post by JohnWoe »

Found an interesting project going on. Since every machine has C compiler and Linux comes with Rust these days. Instead of shipping rustc binaries. Let's build a Rust compiler in C. ( Not even using C extensions ) Compile the Rust compiler with C compiler than use that compiler to compile Linux Rust code. Which is very little atm.
Since LLVM backends are avoided, which requires C++. You can compile the whole thing with 100KB TinyCC compiler.
You could compile Rust chess engines with this.

It's a very complex project and C makes code very sparse and tedious to write. I'm writing my current simple language in Python. I would never start writing Rust compiler in C. Since I don't enjoy writing C. It's like assembly with some syntax sugar. But C is the mother of all languages.

https://notgull.net/announcing-dozer/
Repo: https://codeberg.org/notgull/dozer
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: Rust compiler in C

Post by JohnWoe »

It actually compiles this Rust code:

Code: Select all

// exitcode:1

fn do_nothing() {}

fn do_something(x: i64) {
    add_two_numbers(1, 2);
}

fn rust_main() -> i32 {
    let two = 2i32;
    let my_str = unsafe {
        let foo = two;
        foo;

        c"abcd🐬\" \u{1b}[31m é 举 \x1b[0m"

        // b"hello world: \x55\nyay\0"

    };
    unsafe {
        puts(my_str);
    };
    let res = two.sub(1);

    let one_byte = 'a'; // 97
    let two_bytes = 'é'; // 233
    let three_bytes = '举'; // 20030
    let four_bytes_expected = '\u{1f42c}';
    let four_bytes = '🐬'; // 128044

    res
}

impl i32 {
    fn add(this: i32, other: i32) -> i32 {
        this + other
    }

    fn sub(self, other: i32) -> i32 {
        self - other
    }
}

extern "C" {
    fn puts(data: *const u8);
}

fn add_two_numbers(x: i32, y: i32) -> i32 {
    x + y
}
1. Build dozer.
2. Then download qbe and compile it.
3. ???
4. It's alive !!!

Code: Select all

ThinkPad-E14-Gen-2:~/Lataukset/dozer$ ./dozer tests/compile/compile_test.rs hello.nhad
ThinkPad-E14-Gen-2:~/Lataukset/dozer$ ./dozer-qbe hello.nhad > hello.qbe
ThinkPad-E14-Gen-2:~/Lataukset/dozer$ ./qbe hello.qbe > hello.S
ThinkPad-E14-Gen-2:~/Lataukset/dozer$ cc hello.S objs/libdozerrt.a -o hello
ThinkPad-E14-Gen-2:~/Lataukset/dozer$ ./hello 
abcd🐬"  é 举 
User avatar
Ras
Posts: 2694
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Rust compiler in C

Post by Ras »

JohnWoe wrote: Sat Dec 14, 2024 12:40 amSince I don't enjoy writing C. It's like assembly with some syntax sugar.
C is the best portable macro assembler ever. :D
Rasmus Althoff
https://www.ct800.net
User avatar
hgm
Posts: 28331
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Rust compiler in C

Post by hgm »

I though that most compilers for other languages were implemented nowadays as ...-to-C transpilers. So that you can make use of the optimizing code generation backend of the C compiler, and don't have to bother with that part of the task. I once tried to use a BASIC-to-C transpiler, but this was not very successful. In the end I just used an edit script for doing the translation, followed by some hand-editing for the rare cases. If the laguages are sufficiently similar an edit script can get a long way.
mar
Posts: 2646
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Rust compiler in C

Post by mar »

hmm, that project looks largely unfinished - compiling toy programs that add two numbers and print hello world is not that useful, even though it's a start
my guess is that the author severely underestimated the scope of the project

one doesn't simply "write a rust compiler" - that'd be a huge undertaking. there are people out there who single-handledly
wrote C++ compilers (I know of 2), but we're talking experts here, not college students
I believe that jai, odin and zig are also (or started as) one-man projects - and all of them use LLVM as a backend:
C transpilers probably only at the start of a project - jai originally had a C transpiler backend before switching to LLVM for release builds

the most popular codegen backend for most modern serious languages is LLVM. we're talking 2.5 million lines of C++ code,
writing optimizers is by far the hardest task. lexer is trivial, parser is easy (asusming a sufficiently simple language), codegen is hard.

there are exceptions like gcc and msvc having their own backends, intel stopped developing their own compiler and simply uses clang now,
ditto for embarcadero - no borland compiler anymore, they use clang too

as for tcc while it's cool, it will also produce a significantly slower binary than optimizing compilers, which has to be taken into account
interpreted python, well.. we're talking a dynamic language two orders of magnitude slower than C. probably the most overrated glue language of all time, but people love it (probably because it has a nice package system with a ton of useful libraries)

having written an unsafe statically typed scripting language myself, I have to say that
- it was harder than I thought
- I suck at writing "compilers"
- when JITted (and without bounds checks) runs probably in the same ballpark
as modern JITted javascript, i.e. roughly 4 to infinity times slower than native compiled languages
nevertheless I succeeded and love writing code in it, even use it for fast prototyping
User avatar
Ras
Posts: 2694
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Rust compiler in C

Post by Ras »

hgm wrote: Sat Dec 14, 2024 9:08 amI though that most compilers for other languages were implemented nowadays as ...-to-C transpilers.
The problem with that is that today's C compilers are not what they had in mind when they made the standard with tons of undefined behaviour that should have been implementation defined behaviour to begin with. And even with implementation defined aspects, what implementation(s) will you target? So you have to take all C quirks into account, and at that point, you're better off dropping the C intermediate altogether and just go directly for LLVM. Especially for languages like Rust that advertise memory safety, you probably don't want a "fast and loose" compiler under the hood. The only downside is that you lose target platforms without LLVM support.
Rasmus Althoff
https://www.ct800.net
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Rust compiler in C

Post by Michel »

Ras wrote: Sat Dec 14, 2024 11:31 pm The problem with that is that today's C compilers are not what they had in mind when they made the standard with tons of undefined behaviour that should have been implementation defined behaviour to begin with.
Not sure why that's relevant... Just target a subset of C without undefined behavior.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Michel
Posts: 2292
Joined: Mon Sep 29, 2008 1:50 am

Re: Rust compiler in C

Post by Michel »

Michel wrote: Tue Dec 17, 2024 2:19 pm
Ras wrote: Sat Dec 14, 2024 11:31 pm The problem with that is that today's C compilers are not what they had in mind when they made the standard with tons of undefined behaviour that should have been implementation defined behaviour to begin with.
Not sure why that's relevant... Just target a subset of C without undefined behavior.
I read some more. The main difficulty seems to be the undefined behavior when overflow occurs with signed integers. This is indeed a serious problem.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Ras
Posts: 2694
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Rust compiler in C

Post by Ras »

Michel wrote: Tue Dec 17, 2024 2:19 pmJust target a subset of C without undefined behavior.
That is C, i.e. valid C. The problem is that a bug in the code generator may lead to undefined behaviour. The main problem with that is that it may not even manifest with current C compilers, so all tests would pass - and then suddenly, after a C compiler update, the generated code breaks (stuff like that did happen before). C99 alone has a whopping 193 cases of undefined behaviour. That would be kind of a ticking bomb under the hood for a language that advertises memory safety, compiler guarantees and the like.
Rasmus Althoff
https://www.ct800.net
syzygy
Posts: 5673
Joined: Tue Feb 28, 2012 11:56 pm

Re: Rust compiler in C

Post by syzygy »

Michel wrote: Tue Dec 17, 2024 2:19 pm
Ras wrote: Sat Dec 14, 2024 11:31 pm The problem with that is that today's C compilers are not what they had in mind when they made the standard with tons of undefined behaviour that should have been implementation defined behaviour to begin with.
Not sure why that's relevant... Just target a subset of C without undefined behavior.
Which is all of C :-)

Just generate legal C code.