Skip to content

The weird of function-local types in Rust

2024-08-10

Context โ€‹

I was writing documentation for my proc-macro crate that generates builders for functions and structs called bon. I made the following simple example of code to compare bon with the other alternative crate buildstructor.

rust
#[derive(buildstructor::Builder)]
struct User {
    name: String
}

User::builder()
    .name("Foo")
    .build();

This example code was part of the doc comment, which I tested by running cargo test --doc. However, it didn't compile:

log
cannot find type `User` in this scope
 --> doc-test/path/here.rs
  |
2 | struct User {
  |        ^^^^ not found in this scope

Suddenly the code generated by macro can't find the User struct it was placed on ๐Ÿคจ. And this is where weird things need some explanation. To figure out what's happening, let's build an understanding of how name resolution works for "local" items.

Name resolution for local items โ€‹

It's possible to define an item such as a struct, impl block or fn inside any block expression in Rust. For example, this code defines a "local" anonymous struct inside of a function block:

rust
fn example() {
    struct User;

    let user = User;
}

Here, the User struct is only accessible inside of the function block's scope. We can't reference it outside of this function:

rust
fn example() {
    struct User;
}

// error: cannot find type `User` in this scope
type Foo = User;                                

mod child_module {
    // error: unresolved import `super::User`; no `User` in the root
    use super::User;                                                 
}

This doesn't work because, logically, there should be something in the path that says {fn example()}::User. However, there is no syntax in Rust to express the {fn example()} scope.

But what does std::any::type_name() return for that User struct then? Let's figure this out:

rust
fn example() -> &'static str {
    struct User;

    std::any::type_name::<User>()
}

fn main() {
    println!("{}", example());
}

This outputs the following:

log
crate_name::example::User

So, the function name becomes part of the path as if it was just a simple module. However, this isn't true, or at least this behaviour isn't exposed in the language. If we try to reference the User from the surrounding scope using that syntax, we are still out of luck:

rust
fn example() {
    struct User;
}

type Foo = example::User; 

This generates a compile error:

log
error[E0433]: failed to resolve: function `example` is not a crate or module
 --> path/to/code.rs
  |
6 | type Foo = example::User;
  |            ^^^^^^^ function `example` is not a crate or module

So there is just no way to refer to the User struct outside of the function scope, right?... Wrong ๐Ÿฑ! There is a way to do this, but it's so complicated that let's just assume we can't do that in production code.

If you are curious, first, try to solve this yourself:

rust
fn example() {
    struct User;
}

type Foo = /* how can we get the `User` type from the `example` function here? */;

and then take a look at the solution below:

Solution for referring to a local item outside the function body.

The idea is to implement a trait for the local type and then use that trait in the outside scope to get the local type.

rust
trait TypeChannel {
    type Type;
}

struct GetUserType;

fn example() {
    struct User {
        name: String
    }

    // We can implement a trait from the surrounding scope
    // that uses the local item.
    impl TypeChannel for GetUserType {
        type Type = User;
    }
}

type RootUser = <GetUserType as TypeChannel>::Type;

// Another interesting fact. The fields of the `User` struct aren't private
// in the root scope. You can create the `User` struct via the `RootUser` type
// alias and reference its fields in the top-level scope just fine ๐Ÿฑ.
fn main() {
    let user = RootUser {
        name: "Bon".to_owned()
    };

    println!("Here is {}!", user.name);
}

Now this compiles... but well, I'd rather burn this code with fire ๐Ÿ”ฅ.

By the way, rust-analyzer doesn't support this pattern. It can't resolve the RootUser type and its fields, but rustc works fine with this.


Now, let's see what happens if we define a child module inside of the function block.

rust
fn example() {
    struct User;

    mod child_module {
        use super::User; 
    }
}

Does this compile? Surely, it should compile, because the child module becomes a child of the anonymous function scope, so it should have access to symbols defined in the function, right?... Wrong ๐Ÿฑ!

It still doesn't compile with the error:

txt
unresolved import `super::User`; no `User` in the root

This is because super doesn't refer to the parent function scope, instead it refers to the top-level module (called root by the compiler in the error message) that defines the example() function. For example, this code compiles:

rust
struct TopLevelStruct;

fn example() {
    struct User;

    mod child_module {
        use super::TopLevelStruct; 
    }
}

As you can see we took TopLevelStruct from super, so it means super refers to the surrounding module of the example function, and we already know we can't how hacky it is to access the symbols defined inside of that example function from within the surrounding module.

So.. this brings us to the following dilemma.

How does this affect macros? โ€‹

Macros generate code, and that code must not always be fully accessible to the scope where the macro was invoked. For example, a macro that generates a builder struct would like to restrict access to the private fields of the generated builder struct for the surrounding module.

I'll use bon's macros syntax to showcase this.

rust
use bon::Builder;

#[derive(Builder)]
struct User {
    name: String,
}

Let's see what the generated code for this example may look like (very simplified).

TIP

The real code generated by #[bon::builder] is a bit more complex, it uses typestate pattern to catch all potential developer errors at compile time ๐Ÿฑ.

rust
struct User {
    name: String,
}

#[derive(Default)]
struct UserBuilder {
    name: Option<String>,
}

/* {snipped} ... impl blocks for `UserBuilder` that define setters ... */

fn example() {
    let builder = UserBuilder::default();

    // oops, we can access the builder's internal fields here
    let _ = builder.name;                                     
}

The problem with this approach is that UserBuilder is defined in the same module scope as the User struct. It means all fields of UserBuilder are accessible by this module. This is how the visibility of private fields works in Rust - the entire module where the struct is defined has access to the private fields of that struct.

The way to avoid this problem is to define the builder in a nested child module, to make private fields of the builder struct accessible only within that child module.

rust
struct User {
    name: String,
}

use user_builder::UserBuilder;

mod user_builder { 
    use super::*;

    #[derive(Default)]
    pub(super) struct UserBuilder {
        name: Option<String>,
    }
}

fn example() {
    let builder = UserBuilder::default();

    // Nope, we can't access the builder's fields now.
    // let _ = builder.name;
}

So... problem solved, right?... Wrong ๐Ÿฑ!

Now imagine our builder macro is invoked for a struct defined inside of a local function scope:

rust
use bon::Builder;

fn example() {
    struct Password(String);

    #[derive(Builder)]
    struct User {
        password: Password,
    }
}

If #[derive(Builder)] creates a child module, then we have a problem. Let's see the generated code:

rust
fn example() {
    struct Password(String);

    struct User {
        password: Password,
    }

    mod user_builder {                  
        use super::*;                   

        pub(super) struct UserBuilder { 
            password: Option<Password>, 
        }                               
    }                                   
}

This doesn't compile with the error:

log
password: Option<Password>,
                 ^^^^^^^^ not found in this scope

Why is that? As we discussed higher child modules defined inside function blocks can't access symbols defined in the function's scope. The use super::* imports items from the surrounding top-level module instead of the function scope.

It means, that if we want to support local items in our macro we just can't use a child module if the code inside of that child module needs to reference types (or any items) from the surrounding scope.

The core problem is the conflict:

  • We want to make the builder's fields private, so we need to define the builder struct inside of a child module.
  • We want to reference types from the surrounding scope in the builder's fields, including local items, so we can't define the builder struct inside the child module.

This is the problem that I found in buildstructor. The only way to solve this is to make a compromise, which I did when implementing #[derive(bon::Builder)]. The compromise is not to use a child module, and obfuscate the private fields of the builder struct with leading __ and #[doc(hidden)] attributes to make it hard for the user to access them (even though not physically impossible).

But then... Defining types inside of functions is rather a niche use case. How do child modules in macro-generated code break the doc test mentioned at the beginning of this article?

How does this break doc tests? โ€‹

Doc tests are usually code snippets that run some code defined on the top level. They don't typically contain an explicit main() function.

For example, a doc test like this:

rust
let foo = 1 + 1;
assert_eq!(foo, 2);

is implicitly wrapped by rustdoc in a main() function like this:

rust
fn main() {
    let foo = 1 + 1;
    assert_eq!(foo, 2);
}

So... If we write a code example in a doc comment with a macro that generates a child module, the doc test will probably not compile. This is what happened in the original doc test featuring buildstructor.

Let's bring it up again:

rust
#[derive(buildstructor::Builder)]
struct User {
    name: String
}

User::builder()
    .name("Foo")
    .build();

When preprocessing the doc test rustdoc wraps this code in main():

rust
fn main() {
    #[derive(buildstructor::Builder)]
    struct User {
        name: String
    }

    User::builder()
        .name("Foo")
        .build();
}

Then buildstructor generates a child module, that refers to User (next code is simplified):

rust
fn main() {
    struct User {
        name: String
    }

    mod user_builder {
        use super::*;

        struct UserBuilder {
            name: Option<String>
        }

        impl UserBuilder {
            // `User` is inaccessible here
            fn build(self) -> User {       
                /* */
            }
        }
    }
}

Summary โ€‹

Does this mean generating child modules for privacy in macros is generally a bad idea? It depends... The main thing is not to reference items from the surrounding scope in the child module. For example, if you need to add use super::* in your macro-generated code, then this is already a bad call. You should think of local items and doc tests when you do this.

If you liked this article check out my previous blog post 'How to do named function arguments in Rust' (it's also available on Reddit). Also, check out the bon crate on GitHub. Consider giving it a star โญ if you like it.

TIP

You can leave comments for this post on Reddit.

Veetaha

Veetaha

Lead developer @ elastio

Creator of bon