The Hidden Side-effect of Enums and Values

Vlad Neculai Vizitiu

Rate me:

3.55/5 (34 votes)

7 Feb 2019CPOL5 min read

20.6K

Recently, I encountered an issue with enums, which I wanted to share in case someone else encounters it along the way.

Introduction

Recently, I encountered an issue with enums, which I wanted to share in case someone else encounters it along the way.

Disclaimer: So that there is minimal miss-intepretations on this article, this is not a guide as how you should write enums, this is an article showing what the compiler allows to happen and it is meant as a way to spot the issue if you ever encounter it in code written by someone else or third party systems.

So what are enums?

Enums are a list of numerical constants that help us in a number of situations, like for example when something can have a tag or a property on it to distinguish it from another object, or they represent different options for methods. Other times, they represent states of an object or relationships like for example, we can have an Employee enum which tells us if a person is a manager, vice-president, CEO.

But no matter what name we give an enum, in the background, it is just a number used to represent that state like this:

enum Foo {
Bar1,
Bar2,
Bar3,
}

So, in this case, we declared an enum called Foo which can have one of 3 values, Bar1, which implicitly has a value of 0, Bar2, which has a value of 1 and Bar3 which has a value of 2.

The issue lies in the background value of the enum and how we use it, for example since we know that Bar2 has a value of 1, we could cast the number 1 to an enum of type Foo and we will get Bar2 like this:

Console.WriteLine((Foo)1);

// Will output: Bar2

But since we are talking about numbers, enums can also be mapped to a specific value like this:

enum Foo {
Bar1 = 2,
Bar2,
Bar3 = 5,
}

Basically, in this case, Bar1 will have a value of 2, Bar2 will have a value of 3, and Bar3 will have a value of 5.

And now for the odd part and side-effect.

We can have an enum defined with two or more identifiers for the same value like so:

enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3,
}

Notice that Bar1 and Bar2 have the same value (doesn’t have to be 2). So if we now run the following command, the run-time does not know to which identifier we are referring to so it will give the middle identifier with that value:

Console.WriteLine((Foo)2);

// Will output: Bar2 because it is the latest

What I mean by the middle is that the output will be the same even if we have an enum defined like this:

enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
}

So no matter how we run it, the output will still be Bar2 but if we have an enum defined like this:

enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
Bar4 = 2,
Bar5 = 2,
Bar6 = 2,
Bar7 = 2
}

Running the same command will give us Bar4 because it’s the middle one, and if we have an even number of enums, it will give us the middle one closer to the end, so for two enum identifiers, it will give us the second one, for three it will give us the second one, but for four it will give us the third one, and again if we had five identifiers for the same value. For six, it will give us the third one, and so on and so forth.

But what happens when putting another enum identifier with a lower value before Bar1 like this?

enum Foo {
Bar,
Bar1 = 2,
Bar2 = 2,
Bar3,
}

Now if we run the output command, it will not show the middle enum with that value, instead, it will show the middle value – 1 so in this case, it will show Bar1 and for the previous enum, but let us take it a step further and see for this one:

enum Foo {
Bar,
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
Bar4 = 2,
Bar5 = 2,
Bar6 = 2,
Bar7 = 2
}

Then the output for the value 2 will be Bar3, and even worse, if we were to add 2 more values before Bar1 (I shifted the value to 5 so that we don’t overlap with the ones we’re trying to check:

enum Foo {
Bar0,
Bar00,
Bar000,
Bar1 = 5,
Bar2 = 5,
Bar3 = 5,
Bar4 = 5,
Bar5 = 5,
Bar6 = 5,
Bar7 = 5
}

Then if we run the command:

Console.WriteLine((Foo)5);

// Will output: Bar2

So for every 2 identifiers added before that sequence, it will go back one, but I tried something else and added 2 more identifiers after the sequence, and guess what, it went back to Bar3 and with another two, it went to Bar4 and if you keep going and adding so many identifiers that it should be more than Bar7, then it will cycle around and show Bar1.

This, I admit, baffled me a bit, because that means that using enums by value when we have more than one identifier for a given value it becomes unpredictable, especially when this isn’t clear and during the course of development, we add to that enum without knowing that it affects us and we might have outputs that depend on those identifiers.

Even though we won’t see all that many cases with more than two identifiers per value, it is still something to take note of because using enums by value is not that uncommon, and by that I mean that at least three common usages come to mind when I think of this, like HTML drop-downs which have number values behind them, WebAPI calls that use numbers to denote a certain enum value, databases persistence, like MongoDB will use the numerical value to store an enum, and I’m sure there are many more cases that use such mechanisms.

Fortunately, a colleague of mine came up with an answer to avoid this issue and that is to save or send enum values as text and then parse them, that way, we know for sure that we are referring to the right identifier.

I hope you found this as interesting and weird as I did, and if you know the reason why this happens, feel free to share and let me know because I admit, my curiosity has peaked.

Thank you and see you next time.

CodeProject

This article was originally posted at https://buildingsteps.wordpress.com/2016/03/08/the-hidden-side-effect-of-enums-and-values

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Vlad Neculai Vizitiu

Software Developer

Romania

When asked, I always see myself as a .Net Developer because of my affinity for the Microsoft platform, though I do pride myself by constantly learning new languages, paradigms, methodologies, and topics. I try to learn as much as I can from a wide breadth of topics from automation to mobile platforms, from gaming technologies to application security.

If there is one thing I wish to impart, that that is this "Always respect your craft, your tests and your QA"

Comments and Discussions

Bad idea to give two (or more) times the same value

Bertrand Gilliard11-Feb-19 5:18

Bertrand Gilliard

11-Feb-19 5:18

For me, the compiler should issue at least a warning, evtl an error ! This situation is often due to a programming error, it's not at all frequent to have a good reason to specify several times the same value

Good point - would not happen though

Sammuel Miranda11-Feb-19 2:27

Sammuel Miranda

11-Feb-19 2:27

I got your point Vlad; interesting "discovery", i'd never imagine that would be the behavior of an Enum in C#.

However, it would not happen in any of my codes since (as you said), Enums exist to represent numbers, so i will always specify their values, otherwise there is no point and would be better to just use the integer.

So, this:

public enum Status : byte
{
    Pending = 0,
    Doing = 1,
    Done = 2
}

Would not trigger any issues. Notice that i also declare the cast type (be int, short or byte) that would fit for.

If i could add another issue to your list would be that all Enums automatically cast from Int32. So if you declare a Enum to be cast by another number struct and you use that enum in a generic setting (as a Generic argument on a function) and cast it to Int32 it would cause a cast issue.

Like:

public int GetValue<E>(E enumVal) { return (int)enumVal; }

That code would run fine if the Enum was "inherited" (not the right term) to an Int32 instead of a Byte.

And i've not found any functions os tests on the Type class that would answer if it can be cast to an specific type (for a generic argument).

Insight as to how this might happen

Vlad Neculai Vizitiu28-Jan-19 10:04

Vlad Neculai Vizitiu

28-Jan-19 10:04

I wholly agree that is this not proper way to use Enums, and also I'm not advocating for it to be used as such.

This issue came to light when we had data being brought in from a 3rd party system (game consoles) which mapped data from the inner workings of the game to the application and we needed to respond in kind.

[My vote of 2] Mathematically determined.

stixoffire22-Jan-19 2:15

stixoffire

22-Jan-19 2:15

Think in Averages.. Average value.. the median between low and high.. I would speculate there is more in the background to this than meets the eye and that it is not a random hodge podge result - but more of a mathematical determinant.

That said however , I do not know any valid reason for using an Enumeration with duplicate integer assignments. The whole point of Enumeration is the Uniqueness of values to provide for a 1 to 1 relationship between value and String. If two options are to have the same effect then your code using the enumeration should handle the use case : case 1,3 break; case 2 break; or one should rethink what they are doing with the enumeration.

A string can be a number too but we do not use it that way, we have defined the rules of use and even code pages are rules of use for numerical representation of string values..

While the intro is a statement of uses for enumerations it does not define the point and purpose of using them.

Re: [My vote of 2] Mathematically determined.

Heriberto Lugo9-Feb-19 20:44

Heriberto Lugo

9-Feb-19 20:44

an enum which contains duplicate values is used to have 2 different names which actually refer to the same thing. Some enums in windows are like this. Its not very common, but it is done. i agree there is probably a mathematical reason, or maybe an inherited reason because of the way the code is translated to clr or binary. maybe even because they followed a pattern/convention from c or c++ that yielded such.

if op had read Microsofts spec on their enum implementation on msdn, he would not have been surprised to see that it is perfectly valid to have 2 enum members with the same value.

it is also valid to cast any integer to the enum, even if there is no such value in that enum.

it is also true the enum will have a member value of 0, even if it is not specified. And the first in the enum will by default have the value 0, which is also the default for the enum. The first item should be used to represent the "none" value.

the 0 value is always present in every enum value.

i think any of these tidbits wouldve been better to write an article about, instead of about someone who didnt keep track of their values, duplicated one and then passed it around to discover someone had screwed up.

i dont see how this is a "side effect of enums" as the title states. it was something the op didnt realize because he didnt know how enums actually worked. thats not a side-effct.

modified 10-Feb-19 3:01am.

My vote of 1

Imirzyan18-Nov-18 19:43

Imirzyan

18-Nov-18 19:43

This is neither hidden, nor a side effect. Simply don't use duplicate values in enums. What are you actually trying to achieve?

Re: My vote of 1

George Gomez7-Feb-19 12:25

George Gomez

7-Feb-19 12:25

Agreed. If there are similar underlying values in a Enum, then, maybe you should not be using an Enum.

To me an Enum implies an Enumeration of different states (underlying values).

Enums helps us make sense of that State, for example, bar1 is a lot easier to process and understand than just the number 1.

Thanks,

Re: My vote of 1

Vlad Neculai Vizitiu7-Feb-19 12:34

Vlad Neculai Vizitiu

7-Feb-19 12:34

Kinda missed the point of the article, it's not a matter of "you should do this", it's more a matter of "this is what happens if you would, so that you don't spend hours debugging", also conside that not all code you work on has been actually approved by personal standards

mapping an enum to a string is implementation-dependant

Philippe Verdy16-Nov-18 7:43

Philippe Verdy

16-Nov-18 7:43

Your tests that return specific names from a typecasted enum is in my opinion completely implementation dependant:
I think that there's no reliable way to determine which name you'll get from an enum value: this depends on how the map from enum to name is implemented.
* it may be simply array-based, ordered by enum numeric value, but then when there are multiple enum names mapped to the same numeric value, the array will contain multiple names arranged in arbitrary order in the range of enum values that have the same numeric value. In that case the name you get is the closed found from the binary search... if a binary search is used
* it may be array-based, but with a linear search of the numeric value: you'll get the first name in that array that has this value, but here also multiple names may also have the same value, so the name is unpredictable.
* it may be based on a hash table lookup: you'll get a random position in the table where there an entry mapping numeric entries to a name, the name you'll get is the first one from the collision list that contain all other names mapped with the same numeric value.

In summary: it's not portable at all to define an enum type with multiple names mapped to the same integer value. And I bet that C/C++ compilers should emit a warning if ever you attempt to define an enum type with multiple names mapped to the same integer value.

But then a typical enum declaration like:
enum {a, b, c, first=a, last=c}
will emit warnings for the duplicate mappings of the same integer value 1 for "a" and "first", and and the same integer value 3 for "c" and "last".

There's no reliable way to determine what name will be displayed even by "a >> cout", UNLESS you augment your enum type with a static method that will mapd which name (string) to return from any enum value (that method may use the implementation it wants: array-based with a binary search or linear search, or hash-based, or using arbitrary code).

If the C++/C/C# compiler automatically builds a mapping function (from your enum type to a string), you have to wonder what this mapping function does between the three approaches (possibly several apparoches may be used simultaneously, depending on the number of elements defined in your enum, or depending if they have duplicate names mapped to the same numeric value).

Your enum type must then define its static method to cast an enum value to a string.

The effect that was observed by the author of the article is what you he expect when the C++ compiler provided a default mapping method (to map an enum value to a string) using a table-based approach with binary search.

But then compare:
enum{a=1, b=1} and enum{b=1, a=1}

The mapping using a table-based array (used with a binary search or linear search) could define an internal static table of strings like this:
{ {1,"a"}, {1,"b"} } or { {1,"b"}, {1,"a"} }
or even just (removing duplicates and keeping arbitrary names, e.g. the first one defined in the source code, or the last one defined):
{ {1,"a"} } or { {1,"b"} }

This method does not need to keep duplicate names (given that the mapping cannot return multiple names, so this alternative is not useful:
{ {1,{"a","b"}} } or { {1,{"b","a"}} }
At best the mapping function could also return all possible names (space-separated or comma-separated, it does not matter), but here also in arbitrary order:
{ {1, "a b"} } or { {1, "b a"} } or
{ {1, "a,b"} } or { {1, "b,a"} }
(such mapping from integer values to one or more names will only useful in the generated debugging info and will be used by debuggers so they recognize all the defined names, but they may still display by default an arbitrary name for such case, but it won't be used by your program itself which should not depend at all on this synthetic method or data built for debuggers, or for the introspection/reflection API which will return you all the possible names for each value).

This synthetic mapping method automatically built by the compiler for you should be avoided: provide your own mapping method using the approach you want for your goals.

In my opinion, a C/C++/C# compiler should not even try to create any synthetic mapping method converting an enum to a string, if ever the enum has multiple different names defined with the same numeric value, and then at compile time (or at least at link time), it should emit an error that such conversion from enum to string is NOT defined (so the basic typecast from such enum type to string is NOT defined by your program): the only safe synthetic method which is reliable for this case is only to display the numeric value itself, and not any name defined internally by the source code of your enum.

Note: this also applies to other languages which allow defining "enum types" (e.g. in Java, but Java requires that you create your own static toString() method, if you want to get any name from enum values; these names are not necessarily those defined in your source code, for example the defined names may be language-neutral, frequently technical and abbreviated, but the actual strings needed may be translated in a user's locale, or could be full sentences; Java cannot guess what you want and will only be able to generate a string representation of the numeric value; Java will expose all the multiple names in its introspection/reflection API if you query the datatype info: the enum type is a normal "class" with static methods, and it's very usual in Java to explicitly define these static "toString()" methods for almost every class).

----

Note2: conceptually, an enum type is just a finite set of distinct names. So they have no other numeric values, and the names themselves are unordered, have no defined arithmetic, so you cannot safely increment any given enum value and get a predictable other distinct value (it's not even warrantied that incrementing it will return a different value)

Assigning integer values to the declared enum values (i.e. names) creates a mapping, i.e. an surjection function, i.e. a projection, not necesssarily bijective which allows defining an arithmetic, but this does not mean that the arithmetic is safe (not all numeric values will have a successor, so not all enum values will have a successor): you'll experiment "overflow" situations.

But such mapping allows defining an "order" between all declared enum values. But this is a total order (i.e. a relation based on "<="), not a partial order (i.e. a relation based only on "<"), because now you have also duplicates (two distinctly defined enum values can compare as being equal to each other, under the semantic created by this mapping/projection to integers).

Concpetually, without this mapping, it should be possible to define an enum type as being strictly a finite set (with all elements distinguished), optionally orderable (with a partial order based on the "<" operation), but still without any arithmetic (if it is ordered with "<" you can still determine a "first" value in that set, and a "last": an enum value is the "first" if it is not the successor of any enum value in that set, it is the last if it is not the predecessor of any enum value in that set, and the order allows defining a **bijection** to the bounded set of integers from 1 to N, inclusive, where N is the cardinality of the enum type.

But to define such bijection, you need to declare the enum with a static comparator method: this allows a mapping function from enum values to their names to be efficently implemented not just with a binary search, but by a direct table indexing method.

The enum types in C/C++/C# are not strictly sets, as you can alter the order as you want and skip numeric values, for example enum{a=1, b=1000}, creating "holes" where you cannot define a single "first" and a single "last" element, so that all elements except the single "last" one have a successor, and all elements except the single "first" one have a predecessor: this complicates a lot things, because "overflows" can occur anywhere for all basic artihmetic operations for ALL enum values.

And for this reason, you cannot assume any implementation of the mapping from enum values to strings (to return their name). The compiler has to make arbitrary choices.

So the enum types in C/C++/C# are extremely poor, and should have no other operations defined, other than comparing if they are equal or different.

Everything else is fuzzy, and your defined enum type MUST be specified precisely by defining all other operations (all conversions to any other type, all arithmetic operations, all binary operations needed for total or partial ordering). Your compiler may automatically generate some default synthetic static methods for all these operations, but you'll be surprised that this can give results you did not expect. If this is not what you want, just treat enum types as true classes, and define these static methods yourself in these classes!

This is what is required in Java where enum types can only be compared with a single equalTo, and where there's absolutely no constructor: all instances are static, created by the enum type declaration itself, and there's no way to convert/typecast them to other types, not even an integer; the only synthetic method generated is a toString method returning only the name of the defined constant, and a static values() method returning an ordered array of names for all defined constants, it will also declare a comparator that allows comparing them, and an ordinal().
To assign different integer values, you'll need to declare yourself the ordinal() and compareTo() methods; you'll probably have to declare also your own toString() method to return them as strings displaying their ordinal value, or something else, and you may optionally add static constructors (which are more exactly "factory" methods which will return one of the statically declared instances).

I think that Java (which has enum types since version 5) is much more correct here than what C/C++/C# are tolerating (in a very fuzzy way) and supporting with their ill-defined default synthetic methods for all operations.

modified 16-Nov-18 20:05pm.

Re: mapping an enum to a string is implementation-dependant

Chad3F16-Nov-18 13:52

Chad3F

16-Nov-18 13:52

I see no reason why C/C++ should emit a warning if multiple names mapped to the same value. Having aliases (either for backward compatibility or just "person friendly" names along side complete naming, e.g. color.gray10 ... color.gray90 and color.lightGray) seems common enough.

But I do agree that a compiler/runtime that automatically creates value to string mappings should generate an error if resolving a value with multiple names. In C, this isn't an issue since there is no automatic resolver. Given all the STL stuff added to C++ over the decades, the same assumption can't be made, depending on the std used. I would consider this a C# design flaw on MS's part.

Re: mapping an enum to a string is implementation-dependant

Philippe Verdy16-Nov-18 15:48

Philippe Verdy

16-Nov-18 15:48

One way to define "safe" enums with two names for the same value, would just be to define them as

enum{a, b, c; const first = a, last = c}

which defines only "a", "b", "c" as "canonical" values (that have predictable names), and defining "first","last" only as aliases (whose name will not be returned when querying names from an enum alue that can only return "a", "b", or "c") which share the same equal value (i.e. the same ordinal); here the semicolon instead of the comma, or the const keyword is enough to say that we are defining an alias.

This allows defining a safe arithmetic restricted only in {a,b,c} (whose result is restricted to that unique set or will cause a predictable overflow exception, for example a-1 or c+1 would unconditionally overflow, and a+1 would still give b, and b+1 would still give c even if it is also equal to the alias named "last").

But assigning random numeric values to constants declared in an enum causes various problems: we cannot safely define a "first" and "last" element, and not easily define an ordinal if we allow the declared enum constants to create "holes" between elements of the ordered set (i.e. they are assigned in non consecutive ranges), and so we cannot safely define any arithmetic on them as all members of these sets can overflow.

You can only define a *single* numeric constraint on one of the defined constants (for example the first one can be set to 0 or 1 or 1000, it does not matter, all other members are assigned to create a unique sequence of consecutive integers). So this declaration is safe:

enum{a=1000, b, c}

But not this one, even if there's no pair of defined constants that are given the same integer value:

enum{a=1, b=10, c}

because (a+1) is not part of the set but a is not the highest value (i.e. not the last one), and because (b-1) is also not part of the set but is also not the smallest value (i.e. not the first one).

A compiler however may infer default names such as "a", "(a+1)" (or just "2" in that last example) for the undefined value a+1, and so on up to "(a+9)" (or just "9"), just before "c"; it won't cause any overflow exception, and the defined set above would actually contain 11 distinct constants each one with a distinct name as well.

And then we could define valid restricted integer types like:

enum{min=-100, max=100}

containing 201 constants from -100 to 100 inclusively (using a modulo 201 arithmetic not requiring any overflow checks). So we could define:

typedef enum{min=-128, max=127} int8_t;

(such definition defining a strict set with 256 distinct values would precisely perform an arithmetic modulo 256); or:

typedef enum{zero=0, max=9} decimaldigit_t;

(such definition defining a strict set with 10 distinct values would precisely perform an arithmetic modulo 10, whose constants are named "zero", "(zero+1)", ..., "(zero+8)", "max", or just named "zero", "1", ... "8", "max": these names can safely be returned by a synthetic default static method generated by the compiler, that converts an enum value to a string showing the canonical names of defined constants, necessarily starting with a letter or underscose, or otherwise showing just their numeric values if no name is defined for other constants that are also part of the defined enum set).

Whever the compiler will generate an unchecked "modulo N" arithmetic or a checked "bounded" arithmetic could also be an option for the defined enum type so that:

typedef enum { zero=0, max=9 } catch(i) { throw(new Error("decimal digit overflow %d", i)); } strictdecimaldigit_t;

would throw overflow exceptions if the result of an arithmetic causes out-of-range values, but the default "modulo N" arithmetic could be also changed, for example to add a carry:

typedef enum { min=0, max=9; } catch(int i) { const int N = max - min + 1; return (i - min) / N + (i - min) % N + min; // this value is checked again by the catcher! } carryingdecimaldigit_t;

Note also that instead of defining this catcher, you may want to define a constructor (from an integer type) for the enum type. But the semantic is a bit different and both mayt be used simultaneously in the definition of the enum type:

- If there's a constructor, the enum value returned by the constructor will be used, otherwise if there's a catcher defined, it will be used (see below), otherwise a default synthetic "modulo N" method will be used.

- When the constructor returns a value, its value is not returned immediately as is: if there's a catcher defined for the type, then the value is checked and if it falls out of range, then the catcher is invoked to fix it.

- When a catcher is invoked, its integer return value will be used to invoke the constructor if there's one (see above), otherwise it will be fixed by the default synthetic "modulo N" catcher (in current implementations of enum types in C/C++/C#, this default synthetic "modulo N" catcher uses a value of N which is some power of 2 not clearly defined (but usually it is 2^8 if the enum type is represented as a byte, so the value range is not completely restricted to the strict range going from the minimum to the maximum values defined in the enum, but to a wider unspecified range).

The value of N is just sufficient to hold all the declared numeric values distinctly, but not minimal (when you declare for example an enum{a,b,c} with 3 distinct values, the compiler may use N=256 instead of N=3); this however allows faster code because the "modulo N" checker actually does not generate any code at runtime, the compiler just silently truncates some unnecessary bits when storing values, without performing any actual check, so the results are Ok to preserve distinction, but not good enough to create a safe arithmetic (this makes impossible to define a safe "enumerator" to iterate over all constants actually defined in the enum type, and a "switch(enumvalue)" in the code of the enumerator-based loop should always include a "default" after listing cases only on defined enum constants, to handle other undefined/anonymous constants that are part of the declared enum type).

The compiler should check that the code handles these omitted cases properly, signaling missing "default" in "switch" (even if the arbitrarily chosen value N is minimal, for example in enum{a,b,c,d} and the compiler chooses N=4, storing only 2 bits per value, because other compilers may as well choose N=256, storing 8 bits per value).

Enum types should also modify the integer type promotion rules in expressions, for example:

- (-enum) or (+enum) returns a value of the same enum type (first, the enum is promoted to an int, then the expression is evaluated, then the value is passed through the declared enum constructor, and its declared "catcher")
- (enum + int) returns a value of the same enum type (same algorithm)
- (int + enum) silently promotes the enum to an int and evaluates the expression as an int without using any constructor or catcher.
- (enum >> int) returns a value of the same enum type.

Ideally the same promotion rules should be used between other distinct numeric types (char, short, int, long, long long, float, double, long double and signed variants) using inference on the left-most operand, so that:

- (int + long) is an int
- (long + int) is a long
- (int >> long) is an int
- (long >> int) is a long
- (int + float) is an int
- (float + int) is a float
- (char + unsigned char) is a char
- (unsigned char + char) is an unsigned char
- and so on...

This also means that binary arithmetic operators must NOT be commutative, when operands are not the same integer type, all would be strictly driven by the type of the left-most operand; but this would change the existing promotion rules in C/C++ for basic numeric types; this also changes the associativity and then requires a precise evaluation order, so that "a+b+c" must be evaluated only as ((a+b)+c) but not as (a+(b+c)): this associativity is possible only if operands are the same numeric type (i.e. with the same declared range and precision for its values).

And then we could as well define an enum type for non-integers (here based on declaration of numeric "double" or float values:

- typedef enum{min=0.0, max=1.0} drate_t;
- typedef enum{min=0.0f, max=1.0f} frate_t;

The following would be either invalid, or would promote the numeric values to the same numeric type:

- typedef enum{min=0, max=1.0} drate_t; // same as before: 0 is promoted to 0.0
- typedef enum{min=0, max=1.0f} frate_t; // same as before: 0 is promoted to 0.0f

Another interesting declaration:

typedef enum { min = 0, max = 100.0f; const pi = 3.14f } catch (int i) { return (i <= min) ? min : (i >= max) ? max : (float)i; } catch (float f) { return (f <= min) ? min : (f >= max) ? max : (float)math.floor(f * 10.0f + 0.5f) / 10.0f; } estimate_t;

This last declaration defines a strict enum type with exactly 1001 distinct numeric values {0.0f, 0.1f, 0.2f, ..., 99.9f, 100.0f} which are "capped" between min and max (no modulo N) and rounded.
The declaration of the "estimate_t::pi" constant (as an alias, not as an additional value of the set) actually gives it exactly the numeric value 3.1f (assigning numeric values to declared constants passses them through the declared constructor if there's one, or throught the declared "catchers", both of which enforcing the arithmeitc rules.

----

Another interesting case:

typedef enum {'A', 'Z'} capital_t;

This would also be a valid declaration: you are not required to name distinctly the constants that are part of the declared numeric type. All that is enough is that any variable declared with that enum type (which is based on any basic numeric type of the language), must be able to store distinctly all the constants between the lower bound and upper bound of constants declared in the enum. Here it would declare a type large enughand precise enough to hold one of the 26 constants between 'A' and 'Z' inclusively.

So as well the declarations below would be valid:

typedef enum {'A', 'Z', 'A'} capital_t;
typedef enum { char::min, char::max, (unsigned char)::min, (unsigned char)::max, (signed char)::min, (signed char)::max } anychar_t;

The declared constant values don't need to be unique; the compiler determines itself the lower and upper bounds of the type, and the minimum precision needed to store the relevant differences and allocates enough bits, determining itself the basic numeric type to use for the values; and no constant need to be named explicitly.

Ideally however the compiler should automatically declare two constant names for the bounds, such as __min and __max, and possibly the cardinality of the set, such as __prec for the minimum precision (given in one of the basic numeric types, including long long or long double) as the estimate of the base-2 logarithm of the number of distinct values between these bounds, and __size (or just sizeof) for the actual precision stored(these precisions will be given in bits so that __prec <= __size, and (2^__size) is the value of "N" for the default ''modulo N'' catcher synthetically generated).

So for

typedef enum {'A', 'Z', 'A'} capital_t;,

we would have:

capital_t::__min == 'A' (which is a constant part of the declared type),
capital_t::__max == 'Z' (which is a constant part of the declared type),
__prec<capital_t> == math.log2(__max - __min + 1) (which is a constant in a basic floating point numeric type, roughly equal to 4.75488750216 here)
__size(capital_t) == 5, (which is a constant in a basic integer numeric type);
sizeof(capital_t) == 1 (assuming that a single "char" can hold all 5 bits needed to store distinct constants from 'A' to 'Z' and that sizeof(char) == 1 which generally means at least 8 bits;

As well we would have:

anychar_t::__min == (signed char)::min (which is a constant part of the declared type, generally -128),
anychar_t::__max == (unsigned char)::max (which is a constant part of the declared type, generally 255),
__prec<anychar_t> == math.log2(__max - __min + 1) (which is a constant in a basic floating point numeric type, generally roughly equal to 8.5849625007211561 here)
__size<anychar_t> == 9, (which is a constant in a basic integer numeric type, but this could be equal to 16 instead of 9);
sizeof(anychar_t) == 2 (assuming that sizeof(char)=1)

Note that __prec is given as a logarithm instead of giving the real __cardinality directly (the value of ''N'' described above, because the cardinality of the set may not not expressible for all numetic types such as "long long" and "long double", as a constant of one of the basic numeric types, without causing an overflow (notably for "long double" where __prec=80, and ''N'' could be N=2^80 and its inverse exceeds the actual epsilon separating non-infinite and non-NaN values).

Other numeric type properties could also be infered as additional constants (not values in the declared type itself), such as the number of distinct NaN values, the number of distinct infinite values, the number of distinct zero values, the number of distinct denormal values, and a type constant giving the infered native numeric type:

__type<anychar_t> == short (if __size<anychar_t> == 9 or 16)
__type<enum{'a','z'}> == char (if __size<enum{'a','z'}> == 8)

Also __prec is not determining directly the step that allows enumerating all distinct values in the defined type (e.g. for integers you can enumerate them by adding 1, but for floatting points the additive step depends on the magnitude of each enumerated value, and there are special steps to enumerate negative and positive zeroes, or denormal values, or signaling and non-signaling NaNs, or positive and negative infinite values).

Also for this reason, the compiler should also automatically declare default forward and default backward enumerators for the declared enum type, which you can instanciate from any enum value and then call once to get the previous and next distinct value.

With all these, we no longer need any preprocessor defines to know the limits of any type (not even native numeric types). All numeric types, including native ones are declared explicitly as enum types in <stdtype>, so macros defined in <limits.h> are deprecated.

We can also view enums like a typesafe version of unions and also allow declaring an enum like this:

typedef enum{(value1), (value2), (value3), (value4)} generictype_t;

The idea is here not to define constants, but create a type that can hold any sample values listed and that are comparable (so that we can define an full order between them and know if they are equal), without having to list all possible values. For example:

typedef enum{100, 200, "x", "y"} generictype_t;

are theoretically possible to create a type storing integers or strings if we also have a full order between them: here this is a type that will include either integers between 100 and 200 (these can only be these two), or strings between "x" and "y" (so including also "x0", "x1", "x11", "xy", "xyz"...) (note: the set has no cardinality, we now the number of possible integers, but not the number of strings, we can only know the number of possible distinct pointers/references according to limits of pointers, i.e. the pointer size in bits.)

The compiler will automatically infer a distinctive tag value when necessary and store that tag value if there's more than one tag. In this example, a tag=0 will be used for integers (0 to 1) and tag=1 will be used for strings (between "x" and "y").

It will generate the set of tags automatically using synthetic constructors like this:

generictype_t(int) : tag(0) {};
generictype_t(string) : tag(1) {};

(these constructors only specify the distinctive tag, not the value which is assigned automatically).

You assign a declared variable of that type normally, without having to specify the tag:
generictype_t x = 102;
and you can then query the tag of any value in that typed variable:
int t = tag<x>; (sets t = 0)
For this the declared type automatically builds a synthetic static method for that enum type...

Then you can specify also tags explicitly in the declaration of the enum (if two distinct values declared in the set have the same tag, they will be stored as an union and no way to distinguish them by the tag, only by their distinct value:

typedef enum{ 100: 0, 200: 0, 300: 1, 400: 2} t;

(this enum contains values between 100 to 200, or equal to 300, or to 400, in three subsets with tags 0, 1, or 2)

Each subset, i.e. each distinct tag value, has its own minimum and maximum bounds, its own size in bits, its own cardinality. The compiler has now 3 declared tags, and the set of tags is also an enum type declared implicitly.

This allows replacing unsafe type declarations like:

typedef enum {int_tag, double_tag, string_tag} tag_t;
typedef struct {
  tag_t tag;
  union {
    int int_val;
    double double_val;
    string string_val;
  }
} variant_t;

by:

typedef enum {<int>, 10, <double>, <string>} variant_t;

(the tag values are assigned automatically by the compiler; tag=0 for int values, tag=1 for double values, tag=2 for string values; here instead of specifying examplare constant values of each type, we just cite their typename between angle brackets, but even if we add examplar values like 10 in this example, as it matches the <int> type also declared, it does not add another tag value and the compiler can discard it; the order of declaration of members of the enum is significant if they are different types).

We could also declare the tag values ourself:

typedef enum {<int>: 'I', <double>: 'D' , <string>: 'S'} variant_t;

and the distinctive tag values will have a char datatype. The enum declaration does not create a new type for tags; if needed types for tag values can be declared separately:

typedef enum {'I', 'D', 'S'} tag_t;
typedef enum: tag_t {<int>, <double>, <string>} variant_t;

(here the compiler assign tags with values taken by enumerating the given "tag_t" type, instead of enumerating "int" by default).

or by using declared constant names given in the tag type:

typedef enum {Int: 'I', Double: 'D', String: 'S'} tag_t;
typedef enum {<int>: (tag_t::Int), <double>: (tag_t::Double), <string>: (tag_t::String)} variant_t;

Here the tags are given a char datatype, but it could also be a string, giving its distinctive name or description:

typedef enum {<int>: "this is an integer", <double>: "this is a floatting point number", <string>: "this is a name"} variant_t;

When the tag type given is a string, it can be used by the synthetic default toString() method when showing the actual value like this:

variant_t::toString() { return new string( tag<*this>, ':', value<*this>.toString() ); }

We an also select one of the tag subtypes:

variant_t<int> (because <int> is a member type declared in this enum)

to create explicit type conversion (typecast) of the value with some defined method if needed (the effect of that explicit method will be to generate a new enum value with the new tag value, for example converting an enum value with value type <double> into another enum value of the same enum type but with value type <int>).

The compiler can make lot of typesafe inference and generate the optimal storage, reducing the number of bits needed for storing each tag (or not storing it at all if the declared enum has only one tag) if we don't specify ourself a specific type for the tag. In all cases, it will build itself the synthetic code for the static property tag<variant_t>...

No more need of any unsafe unions, including with complex datatypes within unions, no more need to name each member of the union, type inference determines the correct member and sets the tag value properly and implicitly when we set the actual value of an enum variable !

We can even imagine a language that predefines absolutely NO native datatype, all datatypes being declared by an enum declaration (starting by definining them '''only''' with constants supported by the language parser, like: false, true, nil, 10, 3.14, 1.23e45, 'A', "AAAA"...

We can imagine also some new kinds of "tagged constants" recognized by the parser like: 0t12.2'i' == 0t12.2('h'+1) to represent a imaginary number represented by a constant <double> value tagged by a <char> value, this constant having data type "double<'I'>" here, itself a subtype of "double<char>", or 0t0x0a'i' == 0t10'i' which is a constant of type "int<'i'>", itself a subtype of "int<char>" ...).

Another alternative but equivalent syntax for tagged constants would be <'i'>12.2 == <'h'+1>12.2. Untagged constants like 12.2 are equivalent to <0>12.2 (the default tag constant is 0, the default tag type is an int, enumerated by default by a forward iterator starting from 0 with increment 1; this default enumerator is used when enum members are declared without a tag and a new tag is needed because they are not the same base type):

- enum{ a, b, c } is then equivalent to enum:int{<0>a, <0>b, <0>c} (these <0> tags don't need to be stored, they are implicit, not significant)
- enum{ <int>, <double>, <string> } is then equivalent to enum:int{ <0><int>, <1><double>, <2><string> } (these 3 distinct <0>, <1>, <2> tags are needed because we use types, and not constants, as members of the declared enum, even if every <int> instance can be compared as equal to an existing <double>, something that cannot be asserted for all of them, for example when <int> requires a 64-bit value, and <double> also requires 64-bit but not for the same precision and value range, so each <int> member will be stored differently from each <double> member, and a distinct tag value is needed; here also the order of declaration of members in the enum type is significant when value types are different; here also you can have constructors for the enum type, as well as catchers...). Here also the tag value can be a constant expression.

Every native numeric type, every objects like strings or arrays, or structs, classes, pointers, references, functions/methods can also be type members of an enum, and be used as a tag type. All native types can be declared in the language itself (so no more need to reserve keywords like "bool", "char", "short", "int", "long", "float", "double": they can be all declared using a typedef as an enum (or enum + catches), with their minimum, maximum, precision, rounding modes, and other predefined named constants of these types (with these constant names scoped in their defining type)... We have now a fully defined semantics for all arithmetic operations, orderings, comparations. The preprocessor is no longer required at all (except possibly for #include, which may instead be better replaced by "require(package)").

----

Being able to define a type-safe arithmetic for enum types (at least the arithmetic giving the successor, i.e. constant+1), allows defining useful objects, notably iterators (that we could really name "enumerators") over the value range of enum types, which would in turn permit object-oriented constructs like:

for (i: enumerator<enum_t>) { ... }

which won't forget to handle any possible value of an enum type (so won't generate bugs at runtime, like those occuring when using switch statements only with a missing "default:" selector: the compiler would know that the "default:" is missing acording to the type of "i").

And the following declaration is also unsafe, if "a" is assigned by the compiler the integer value 0 (without taking into account the single constraint given to "b" which could instead be used by the compiler to assert that a=-1 and c=1):

enum{a, b=0, c}

These tricks inherited in C# from C and C++ are really bad, these generate unchecked conditions and unexpected bugs with possible overflows, silently generating values that are not part of the defined set.

For now, the only interest of enum types in C/C++/C# is not to restrict the set of values for strict type safety, but just:

- to define constants with possibly scoped names (qualifiable with the typename) and not depending on a preprocessor (whose scoping rules are only global, and severely depend on #included source reading order).

- to use the appropriate integer type (with the minimum bit-size) to store a single enum value.

But I consider that bitfields (using notations like ":1" in C/C++ declarations of structures) are much safer: at least we know precisely their value range, there's no aliases at all, and the arithmetic is precisely defined.

modified 17-Nov-18 17:33pm.

Nice Article

MrFunke3.1416-Nov-18 4:06

MrFunke3.14

16-Nov-18 4:06

Very thought provoking. I just learned about enum and I appreciate the insight into possible issues. As a database person, I would want to put these lists into a database with multiple keys, one for identifying, and one for list order. This way changes could be made in the database without affecting the code. Probably less efficient though.

An educational exercise

SirGrowns15-Nov-18 21:19

SirGrowns

15-Nov-18 21:19

It is an interesting educational exercise but I don't really see any problem in practice. The syntax in declaring a new enum is so small that this problem would be glaringly obvious, assuming that you actually change the default numeric values in the first place.

My vote of 5

Donmorcombe15-Nov-18 14:22

Donmorcombe

15-Nov-18 14:22

Well thought out

Side effect of ToString

Member 812807315-Nov-18 12:20

Member 8128073

15-Nov-18 12:20

If you dig into the Enum source code, you'll find that it does a binary search of the enum values in the GetEnumName method. GetEnumName is (eventually) called when calling ToString on an enum that does not have the Flags attribute

Does it matter

TrendyTim15-Nov-18 12:11

TrendyTim

15-Nov-18 12:11

Nice to know but does it really matter?

As shown here
https://dotnetfiddle.net/2ro2BG
If Foo.Bar1 =2 and Foo.Bar2=2 then they are for all intents and purposes the same enum.

Though id be more tempted to define Bar2 as Bar2 = Bar1 instead of both to 2 to remove any code comprehension confusion. Either way any test for variable = Foo.Bar1 will also pass if the value was Foo.Bar2 as its comparing the underlying value (perhaps you want to fix a typo without just removing the bad enum and breaking other dependant code).

So I don't see it as an issue.

The real issue i have with enums is you cant do strings or objects, that would help to eliminate passing magic strings around (or case/swicth blocks or reflection to get attribute values) to convert enums to strings/objects while providing a nicer untellisense experience

My vote of 5

dmjm-h15-Nov-18 11:35

dmjm-h

15-Nov-18 11:35

Interesting article and well-written.

I would consider an enum like you're proposing to be a 'code smell'

Will Wayne15-Nov-18 11:09

Will Wayne

15-Nov-18 11:09

The simplest solution might be to re-evaluate why your enumeration has multiple identical values in the first place. Perhaps an enum is the wrong solution.

However, in the event that this somehow happens accidentally, it is pretty critical to understand that this is neither a compile-time nor a run-time error, and to understand what will actually happen. Encountering this as a bug would probably vex me for over an hour trying to figure out why I'm always getting the wrong enum value.

Re: I would consider an enum like you're proposing to be a 'code smell'

Philippe Verdy17-Nov-18 6:21

Philippe Verdy

17-Nov-18 6:21

This is certainly causing unexpected bugs when you use them with generic types (with templates) and you make assumptions that enum values are all distinguished by a fixed number of possible names (notably when converting enums to strings and then the reverse: this is not a bijection).
This can severely impact the validity of generic algorithms (template functions or methods) with enum types in template parameters.

Answer in the form of a question...

Member 460345715-Nov-18 10:41

Member 4603457

15-Nov-18 10:41

The answer is that the inversion of an enum requires a search of the key value pairs for the value. My question is: what search algo would produce this? Looks like a form of binary search to me...

No dupes?

vbjay.net15-Nov-18 9:28

vbjay.net

15-Nov-18 9:28

An enum shouldn't have duplicate values. Just because it's possible doesn't mean you should.

Re: No dupes?

Member 1372366918-Nov-18 22:02

Member 13723669

18-Nov-18 22:02

It shouldn't even compile this way...but it does

Re: No dupes?

Heriberto Lugo9-Feb-19 20:46

Heriberto Lugo

9-Feb-19 20:46

it should compile this way. and so it does.

Re: No dupes?

Philippe Verdy1-Mar-19 9:57

Philippe Verdy

1-Mar-19 9:57

I also agree; dupe values may be useful to help defining other objects, but only as a helper to mark some of the enum values with specific properties.
This means that if you need dupes, then enum definitions are most probably missing the possibility of adding some tagging qualifiers to some of their members: this is a syntaxic problem incorrectly solved by breaking the basic rule that each enum value should be uniquely identifiable.

In this case (as in the article, we need a way to distinguish which name is "canonical" for the enum definition). But the way C# compiles these cases is very intrigatring because it randomly chooses the names (not the first one defined, not the last one, not even always the one in the middle, and the choice also depends on other names defined for other values...)

This is clearly inconsistant, and a bug of existing C# implementations (created by insufficient specification of the language): the existing synthetic mapping from values to names is unpredictable, and non-portable, at can only cause problems at any time, and it is already broken; it won't cost any more compatibility problem if this is really fixed by specifying the behavior correctly.

One way would be be add some declarator tags to canonical names (that must be unique per assigned value), or by adding a declarator tag to the enum type itself (to enforce the uniqueness): this way the synthetic generation of the mapping from value to strings is necessarily predictable and valid, and all the other legacy uses will generate at least warnings (compiling in "legacy" mode), or errors directly (compiling in "strict" mode and if this enum type does not define explicitly itself its "tostring()" static method which will have the effect of forbidding the compiler to generate a synthetic method and its companion static table).

There should be a cleaner way to define additional non canonical aliases for enums, but enum types should already have predefined properties (at least minvalue, maxvalue, and basetype) that we shouldn't need to declare but that we can use directly instead of defining specific aliases.

That is bizarre

Marc Clifton15-Nov-18 0:28

Marc Clifton

15-Nov-18 0:28

While I probably would never assign the same ordinal value to an enum, under certain conditions, such as working with hardware where an I/O pin might represent either an address bit or a data bit (particularly found on small microprocessors that share 8 bits for both high/low byte addressing and data) I might do that, but even then it would probably be separate enums...

...anyways, good to know, particularly with regards to serialization / deserialization of the ordinal value.

BTW, I like these small vignettes, keep on posting!

Latest Article - A Concise Overview of Threads

Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny

Artificial intelligence is the only remedy for natural stupidity. - CDP1802

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.