Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In favor of NULL #146

Closed
rcarver opened this issue Feb 28, 2013 · 15 comments
Closed

In favor of NULL #146

rcarver opened this issue Feb 28, 2013 · 15 comments

Comments

@rcarver
Copy link

rcarver commented Feb 28, 2013

Here's an argument in favor of NULL values, as previously discussed and rejected in #30.

I believe that in configuring a system, the most important things are:

  1. The set of available keys (and key groups).
  2. An understanding of what those keys do, and as necessary, the type of value expected.

Therefore, I think it's important to be able to define, and comment, keys for which you don't yet have a value. A TOML document should be able to act as a specification for the possible configuration. It may be preferable not to define a value in the TOML config - say, in order to set a reasonable default at runtime. But, it is important to specify that such a value can be set. This is typically done by commenting out the key, and that seems ugly.

Put another way, it's the difference between hash[key].nil? and hash.key?(key) in Ruby or hash[key] == null and hash[key] === undefined in JavaScript. I think it's important, to aid in the downstream validation and use of the data provided by a TOML document.

Disclosure: my own take of this whole situation is levels, which defines a way to merge multiple inputs into a final configuration. When adding TOML support in rcarver/levels#3 I realized that we have a fundamental disagreement here. In all other ways, TOML is the ideal format for levels configuration.

As far as the syntax, I don't have a strong opinion. I think I'm leaning toward a lack of value because it doesn't introduce a new keyword, and it resembles what you'd do in bash.

this_is_null = 
@BurntSushi
Copy link
Member

Note that this necessarily adds a new NULL type to the spec. (A type containing precisely one value. Otherwise known as the unit type.) A new type isn't so much a big deal, but it bullies itself into the type of all other types in TOML. Namely, an integer is no longer just an integer. It's an integer or NULL.

I think the added type complicates things. It means that every valid TOML parser has to differentiate between non-existence and NULL. This complicates types in static languages.

@rcarver - Could you maybe elaborate on why it is important for a TOML file to have knowledge of the set of definable keys? (As opposed to this information being in the application, or perhaps defined in a TOML array somewhere.)

@rcarver
Copy link
Author

rcarver commented Feb 28, 2013

@BurntSushi the or case and static languages are good points. I'm still pondering the implications myself, thanks.

In practice, I find that something needs to define the set of possible keys. Again, in practice, they tend to accumulate over time and it's difficult to track. I'm thinking about both traditional applications and also provisioning tools (Chef) that use lots and lots of configuration variables. I like that the config file can act as the one place that defines the possible keys. I like that the app can enforce that the key is defined in the config file. If it's defined as NULL, the app can provide a default value if appropriate.

To put this all into perspective, I think we should look at the use of TOML data. TOML parses to a hash, which I understand to return NULL when an undefined key is read (coming from Ruby). Here are some examples to consider how an application might want to treat various cases.

When NULL is not allowed

[user]
username = "rcarver"
# name = "example name"

Obviously, this works:

config["user"].key?("username") # => true
config["user"]["username"] # => "rcarver"

Generally reading an undefined key returns NULL.

config["user"].key?("name") # => false
config["user"]["name"] # => nil

Alternatively, and application could choose to raise an error:

config["user"].key?("name") # => false
config["user"]["name"] # raises exception

When NULL is allowed

[user]
username = "rcarver"
name = # "example name"

We can safely read the key, and still decide between the options above for both undefined and null keys.

config["user"].key?("name") # => true
config["user"]["name"] # => nil

So, if we agree that a hash returns NULL for an undefined key, an application already has to deal with "value or NULL" case. Adding NULL support to TOML lets an application differentiate between "no value" and "undefined" if it chooses to do so.

All that said, I do agree that it complicates TOML. I'll think on this some more. Happy to hear more perspectives here.

@BurntSushi
Copy link
Member

@rcarver

I'm not sure how NULL gives an application the ability to enforce that a key is defined. Doesn't that ability exist anyway? If the key isn't defined, then a default value can be given.

I think I just have a fundamentally different opinion about where the Truth of which keys are available should be known. I don't believe it belongs in a configuration file (controlled by users). I'll leave this point to be debated by others.

With that said, I still want to make the typing implications of NULL values clear for anyone else that wants to weigh in.

So, if we agree that a hash returns NULL for an undefined key, an application already has to deal with "value or NULL" case.

Almost all implementations of a hash table provide a way to distinguish between keys that are defined and keys that map to a NULL value. (The lone exception that I know of is Lua.) Namely, the possibility of non-existence is handled by the type of the hash rather than the values stored in the hash. In this way, non-existence does not creep into the type of any value, as it is handled implicitly in the type of a hash.

With NULL values, every parser has to distinguish between non-existence and NULL for every value.

In dynamic languages, this isn't an unreasonable burden. Indeed, the distinction is even difficult to notice. Mostly because dynamic languages allow any type to contain NULL values (they've allowed it to be a big bully). In static languages, not all types can have NULL values.

Most static languages have facilities to handle such things, but it becomes a burden when they must be anticipated for all types.

@rcarver
Copy link
Author

rcarver commented Mar 1, 2013

@BurntSushi I completely agree with the typing implications of NULL. In fact, most of the time I would take your position.Two things continue to have me question that in this context:

  • Experience tells me there's something inherently messy about config files and configuration, so I'm not sure what adding this level of purity/cleanliness will provide (other than simpler parsers, but maybe that's enough).
  • I'm not sure how opinionated TOML intends to be about the workflows implied by not supporting NULL. I hope this thread at least shows what that means.

I'm glad to have had this discussion. At this point I could go either way, whatever @mojombo thinks aligns with the overall goals of TOML.

@tnm
Copy link
Contributor

tnm commented Mar 4, 2013

My initial intuition tells me that NULL should be avoided for the general historical reasons most of us are familiar with. I'd be curious if there was evidence of wide-spread usage of a NULL value in existing application/database/system config files (not in the format specs themselves, but simply on-the-wild config file content), but to my knowledge it's pretty rare.

@rossipedia
Copy link
Contributor

Null has some value as representative of the idea of an unknown, especially in RDBMS. However, I've rarely ever found it useful in actually application code, as most usages of it are better served by patterns such as Null Object.

@tnm
Copy link
Contributor

tnm commented Mar 8, 2013

Yeah I don't really see a strong enough argument to justify the complexity and burden of NULL. I remain in favor of keeping it out.

@ambv
Copy link
Contributor

ambv commented Apr 24, 2013

Definitely keep it out. As @BurntSushi correctly points out, if NULL is in the file format, you have to special-case it while using any other type. In the real world, this is already the case because a key might not be set at all. So while I think having NULL as a type is bad, the "explicitly unset a key" syntax looks useful:

[integration]
api_key= 

@88Alex
Copy link

88Alex commented Jun 27, 2013

You can just do this:

[toml]
null_integer = 0
null_string = ""

This is much easier to parse than null values.

@mojombo
Copy link
Member

mojombo commented Sep 24, 2013

I think an application should be in charge of knowing what the valid keys are and making sure sane defaults are set. It's too risky to leave that to a user editable config file. If you want to document all the available keys, but leave them "null" until they're set, then I think commenting those lines out is the best solution. Thanks for all the thoughts on this everyone!

@bitc
Copy link

bitc commented Oct 10, 2020

Here's a trick that wasn't immediately obvious to me for those of you who need some NULL-like value:

Use false

For example:

a.toml:

timeout = 1000

b.toml:

timeout = false

Now why not just leave out timeout completely if you want to disable the timeout? One reason is you may want to prefer to be explicit. Another reason is you might have some override system. Example:

# Global settings
timeout = 1000

# Override settings for admin users
[admin]
timeout = false

Of course, this trick won't work for boolean fields, but arguably you should never make these nullable anyway since it is too confusing ("What's the difference between false and null?")

@matthew-dean
Copy link

I know this is an old thread, but doesn’t the lack of inclusion of null mean that TOML can never be used as a replacement for JSON? Or transmitting many types of DB data? I get that it adds some complexity but it’s not like that complexity hasn’t been solved many times over.

@ChristianSi
Copy link
Contributor

@matthew-dean: I wouldn't say so. A SQL table will be usually serialized as an array of tables in TOML; absence of a key-value pair in a table signals that the value is NULL. TOML is flexible where SQL is strict (there is no need to repeat the same keys in every table of an array), hence no explicit NULL is needed. Other DB data can be treated in a similar matter.

Strict round-trip compatibility from and to JSON is not possible, but that's also due to other factors, such as TOML having datetimes which JSON lacks. As for data being exportable to JSON, but not TOML, I think there will rarely be issues. The most problematic case is probably a simple array (of values) that includes some NULL values, but I think such data will not be all that common in well-structured data sets. If it occurs, it's probably best discussed on a case-by-case basis (some kind of sentinel value might be usable).

@pedromorgan
Copy link

pedromorgan commented Apr 6, 2024

Can't use TOML because of null "problem", in my case its dumping db tables (see @ChristianSi comment above)

@eksortso
Copy link
Contributor

eksortso commented Apr 7, 2024

Can't use TOML because of null "problem", in my case its dumping db tables (see @ChristianSi comment above)

We can't tell what's going on with your program, because you didn't give us any code examples. So we can't offer you much advice.

But pertinent to @ChristianSi's comment, let's say you have a table that looks like this:

id name stuff
0 Naive One umlaut ignored
1 Grotus NULL

What do your rows look like in your TOML?

If they look like this, then of course you're going to have problems.

rows = [
    [0, "Naive", "One umlaut ignored"],
    [1, "Grotus", ],  # TOML does not permit NULLs
]

But if you reread the comment, then you'll see that this is the proper way to represent rows.

rows = [
    {id = 0, name = "Naive", stuff = "One umlaut ignored"},
    {id = 1, name = "Grotus"},
]

Or this way, if you prefer to use sections.

[[rows]]
id = 0
name = "Naive"
stuff = "One umlaut ignored"

[[rows]]
id = 1
name = "Grotus"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests