Click here to Skip to main content
15,886,075 members
Articles / Programming Languages / C#

The Hidden Side-effect of Enums and Values

Rate me:
Please Sign up or sign in to vote.
3.55/5 (34 votes)
7 Feb 2019CPOL5 min read 20.6K   7   26
Recently, I encountered an issue with enums, which I wanted to share in case someone else encounters it along the way.

Introduction

Recently, I encountered an issue with enums, which I wanted to share in case someone else encounters it along the way.

Disclaimer: So that there is minimal miss-intepretations on this article, this is not a guide as how you should write enums, this is an article showing what the compiler allows to happen and it is meant as a way to spot the issue if you ever encounter it in code written by someone else or third party systems.

So what are enums?

Enums are a list of numerical constants that help us in a number of situations, like for example when something can have a tag or a property on it to distinguish it from another object, or they represent different options for methods. Other times, they represent states of an object or relationships like for example, we can have an Employee enum which tells us if a person is a manager, vice-president, CEO.

But no matter what name we give an enum, in the background, it is just a number used to represent that state like this:

C#
enum Foo {
Bar1,
Bar2,
Bar3,
}

So, in this case, we declared an enum called Foo which can have one of 3 values, Bar1, which implicitly has a value of 0, Bar2, which has a value of 1 and Bar3 which has a value of 2.

The issue lies in the background value of the enum and how we use it, for example since we know that Bar2 has a value of 1, we could cast the number 1 to an enum of type Foo and we will get Bar2 like this:

C#
Console.WriteLine((Foo)1);

// Will output: Bar2

But since we are talking about numbers, enums can also be mapped to a specific value like this:

C#
enum Foo {
Bar1 = 2,
Bar2,
Bar3 = 5,
}

Basically, in this case, Bar1 will have a value of 2, Bar2 will have a value of 3, and Bar3 will have a value of 5.

And now for the odd part and side-effect.

We can have an enum defined with two or more identifiers for the same value like so:

C#
enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3,
}

Notice that Bar1 and Bar2 have the same value (doesn’t have to be 2). So if we now run the following command, the run-time does not know to which identifier we are referring to so it will give the middle identifier with that value:

C#
Console.WriteLine((Foo)2);

// Will output: Bar2 because it is the latest

What I mean by the middle is that the output will be the same even if we have an enum defined like this:

C#
enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
}

So no matter how we run it, the output will still be Bar2 but if we have an enum defined like this:

C#
enum Foo {
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
Bar4 = 2,
Bar5 = 2,
Bar6 = 2,
Bar7 = 2
}

Running the same command will give us Bar4 because it’s the middle one, and if we have an even number of enums, it will give us the middle one closer to the end, so for two enum identifiers, it will give us the second one, for three it will give us the second one, but for four it will give us the third one, and again if we had five identifiers for the same value. For six, it will give us the third one, and so on and so forth.

But what happens when putting another enum identifier with a lower value before Bar1 like this?

C#
enum Foo {
Bar,
Bar1 = 2,
Bar2 = 2,
Bar3,
}

Now if we run the output command, it will not show the middle enum with that value, instead, it will show the middle value – 1 so in this case, it will show Bar1 and for the previous enum, but let us take it a step further and see for this one:

C#
enum Foo {
Bar,
Bar1 = 2,
Bar2 = 2,
Bar3 = 2,
Bar4 = 2,
Bar5 = 2,
Bar6 = 2,
Bar7 = 2
}

Then the output for the value 2 will be Bar3, and even worse, if we were to add 2 more values before Bar1 (I shifted the value to 5 so that we don’t overlap with the ones we’re trying to check:

C#
enum Foo {
Bar0,
Bar00,
Bar000,
Bar1 = 5,
Bar2 = 5,
Bar3 = 5,
Bar4 = 5,
Bar5 = 5,
Bar6 = 5,
Bar7 = 5
}

Then if we run the command:

C#
Console.WriteLine((Foo)5);

// Will output: Bar2

So for every 2 identifiers added before that sequence, it will go back one, but I tried something else and added 2 more identifiers after the sequence, and guess what, it went back to Bar3 and with another two, it went to Bar4 and if you keep going and adding so many identifiers that it should be more than Bar7, then it will cycle around and show Bar1.

This, I admit, baffled me a bit, because that means that using enums by value when we have more than one identifier for a given value it becomes unpredictable, especially when this isn’t clear and during the course of development, we add to that enum without knowing that it affects us and we might have outputs that depend on those identifiers.

Even though we won’t see all that many cases with more than two identifiers per value, it is still something to take note of because using enums by value is not that uncommon, and by that I mean that at least three common usages come to mind when I think of this, like HTML drop-downs which have number values behind them, WebAPI calls that use numbers to denote a certain enum value, databases persistence, like MongoDB will use the numerical value to store an enum, and I’m sure there are many more cases that use such mechanisms.

Fortunately, a colleague of mine came up with an answer to avoid this issue and that is to save or send enum values as text and then parse them, that way, we know for sure that we are referring to the right identifier.

I hope you found this as interesting and weird as I did, and if you know the reason why this happens, feel free to share and let me know because I admit, my curiosity has peaked.

Thank you and see you next time.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Romania Romania
When asked, I always see myself as a .Net Developer because of my affinity for the Microsoft platform, though I do pride myself by constantly learning new languages, paradigms, methodologies, and topics. I try to learn as much as I can from a wide breadth of topics from automation to mobile platforms, from gaming technologies to application security.

If there is one thing I wish to impart, that that is this "Always respect your craft, your tests and your QA"

Comments and Discussions

 
QuestionBad idea to give two (or more) times the same value Pin
Bertrand Gilliard11-Feb-19 5:18
Bertrand Gilliard11-Feb-19 5:18 
QuestionGood point - would not happen though Pin
Sammuel Miranda11-Feb-19 2:27
professionalSammuel Miranda11-Feb-19 2:27 
AnswerInsight as to how this might happen Pin
Vlad Neculai Vizitiu28-Jan-19 10:04
Vlad Neculai Vizitiu28-Jan-19 10:04 
News[My vote of 2] Mathematically determined. Pin
stixoffire22-Jan-19 2:15
stixoffire22-Jan-19 2:15 
GeneralRe: [My vote of 2] Mathematically determined. Pin
Heriberto Lugo9-Feb-19 20:44
Heriberto Lugo9-Feb-19 20:44 
GeneralMy vote of 1 Pin
Imirzyan18-Nov-18 19:43
Imirzyan18-Nov-18 19:43 
GeneralRe: My vote of 1 Pin
George Gomez7-Feb-19 12:25
George Gomez7-Feb-19 12:25 
GeneralRe: My vote of 1 Pin
Vlad Neculai Vizitiu7-Feb-19 12:34
Vlad Neculai Vizitiu7-Feb-19 12:34 
Suggestionmapping an enum to a string is implementation-dependant Pin
Philippe Verdy16-Nov-18 7:43
Philippe Verdy16-Nov-18 7:43 
Your tests that return specific names from a typecasted enum is in my opinion completely implementation dependant:
I think that there's no reliable way to determine which name you'll get from an enum value: this depends on how the map from enum to name is implemented.
* it may be simply array-based, ordered by enum numeric value, but then when there are multiple enum names mapped to the same numeric value, the array will contain multiple names arranged in arbitrary order in the range of enum values that have the same numeric value. In that case the name you get is the closed found from the binary search... if a binary search is used
* it may be array-based, but with a linear search of the numeric value: you'll get the first name in that array that has this value, but here also multiple names may also have the same value, so the name is unpredictable.
* it may be based on a hash table lookup: you'll get a random position in the table where there an entry mapping numeric entries to a name, the name you'll get is the first one from the collision list that contain all other names mapped with the same numeric value.

In summary: it's not portable at all to define an enum type with multiple names mapped to the same integer value. And I bet that C/C++ compilers should emit a warning if ever you attempt to define an enum type with multiple names mapped to the same integer value.

But then a typical enum declaration like:
enum {a, b, c, first=a, last=c}
will emit warnings for the duplicate mappings of the same integer value 1 for "a" and "first", and and the same integer value 3 for "c" and "last".

There's no reliable way to determine what name will be displayed even by "a >> cout", UNLESS you augment your enum type with a static method that will mapd which name (string) to return from any enum value (that method may use the implementation it wants: array-based with a binary search or linear search, or hash-based, or using arbitrary code).

If the C++/C/C# compiler automatically builds a mapping function (from your enum type to a string), you have to wonder what this mapping function does between the three approaches (possibly several apparoches may be used simultaneously, depending on the number of elements defined in your enum, or depending if they have duplicate names mapped to the same numeric value).

Your enum type must then define its static method to cast an enum value to a string.

The effect that was observed by the author of the article is what you he expect when the C++ compiler provided a default mapping method (to map an enum value to a string) using a table-based approach with binary search.

But then compare:
enum{a=1, b=1} and enum{b=1, a=1}

The mapping using a table-based array (used with a binary search or linear search) could define an internal static table of strings like this:
{ {1,"a"}, {1,"b"} } or { {1,"b"}, {1,"a"} }
or even just (removing duplicates and keeping arbitrary names, e.g. the first one defined in the source code, or the last one defined):
{ {1,"a"} } or { {1,"b"} }

This method does not need to keep duplicate names (given that the mapping cannot return multiple names, so this alternative is not useful:
{ {1,{"a","b"}} } or { {1,{"b","a"}} }
At best the mapping function could also return all possible names (space-separated or comma-separated, it does not matter), but here also in arbitrary order:
{ {1, "a b"} } or { {1, "b a"} } or
{ {1, "a,b"} } or { {1, "b,a"} }
(such mapping from integer values to one or more names will only useful in the generated debugging info and will be used by debuggers so they recognize all the defined names, but they may still display by default an arbitrary name for such case, but it won't be used by your program itself which should not depend at all on this synthetic method or data built for debuggers, or for the introspection/reflection API which will return you all the possible names for each value).

This synthetic mapping method automatically built by the compiler for you should be avoided: provide your own mapping method using the approach you want for your goals.

In my opinion, a C/C++/C# compiler should not even try to create any synthetic mapping method converting an enum to a string, if ever the enum has multiple different names defined with the same numeric value, and then at compile time (or at least at link time), it should emit an error that such conversion from enum to string is NOT defined (so the basic typecast from such enum type to string is NOT defined by your program): the only safe synthetic method which is reliable for this case is only to display the numeric value itself, and not any name defined internally by the source code of your enum.

Note: this also applies to other languages which allow defining "enum types" (e.g. in Java, but Java requires that you create your own static toString() method, if you want to get any name from enum values; these names are not necessarily those defined in your source code, for example the defined names may be language-neutral, frequently technical and abbreviated, but the actual strings needed may be translated in a user's locale, or could be full sentences; Java cannot guess what you want and will only be able to generate a string representation of the numeric value; Java will expose all the multiple names in its introspection/reflection API if you query the datatype info: the enum type is a normal "class" with static methods, and it's very usual in Java to explicitly define these static "toString()" methods for almost every class).

----

Note2: conceptually, an enum type is just a finite set of distinct names. So they have no other numeric values, and the names themselves are unordered, have no defined arithmetic, so you cannot safely increment any given enum value and get a predictable other distinct value (it's not even warrantied that incrementing it will return a different value)

Assigning integer values to the declared enum values (i.e. names) creates a mapping, i.e. an surjection function, i.e. a projection, not necesssarily bijective which allows defining an arithmetic, but this does not mean that the arithmetic is safe (not all numeric values will have a successor, so not all enum values will have a successor): you'll experiment "overflow" situations.

But such mapping allows defining an "order" between all declared enum values. But this is a total order (i.e. a relation based on "<="), not a partial order (i.e. a relation based only on "<"), because now you have also duplicates (two distinctly defined enum values can compare as being equal to each other, under the semantic created by this mapping/projection to integers).

Concpetually, without this mapping, it should be possible to define an enum type as being strictly a finite set (with all elements distinguished), optionally orderable (with a partial order based on the "<" operation), but still without any arithmetic (if it is ordered with "<" you can still determine a "first" value in that set, and a "last": an enum value is the "first" if it is not the successor of any enum value in that set, it is the last if it is not the predecessor of any enum value in that set, and the order allows defining a **bijection** to the bounded set of integers from 1 to N, inclusive, where N is the cardinality of the enum type.

But to define such bijection, you need to declare the enum with a static comparator method: this allows a mapping function from enum values to their names to be efficently implemented not just with a binary search, but by a direct table indexing method.

The enum types in C/C++/C# are not strictly sets, as you can alter the order as you want and skip numeric values, for example enum{a=1, b=1000}, creating "holes" where you cannot define a single "first" and a single "last" element, so that all elements except the single "last" one have a successor, and all elements except the single "first" one have a predecessor: this complicates a lot things, because "overflows" can occur anywhere for all basic artihmetic operations for ALL enum values.

And for this reason, you cannot assume any implementation of the mapping from enum values to strings (to return their name). The compiler has to make arbitrary choices.

So the enum types in C/C++/C# are extremely poor, and should have no other operations defined, other than comparing if they are equal or different.

Everything else is fuzzy, and your defined enum type MUST be specified precisely by defining all other operations (all conversions to any other type, all arithmetic operations, all binary operations needed for total or partial ordering). Your compiler may automatically generate some default synthetic static methods for all these operations, but you'll be surprised that this can give results you did not expect. If this is not what you want, just treat enum types as true classes, and define these static methods yourself in these classes!

This is what is required in Java where enum types can only be compared with a single equalTo, and where there's absolutely no constructor: all instances are static, created by the enum type declaration itself, and there's no way to convert/typecast them to other types, not even an integer; the only synthetic method generated is a toString method returning only the name of the defined constant, and a static values() method returning an ordered array of names for all defined constants, it will also declare a comparator that allows comparing them, and an ordinal().
To assign different integer values, you'll need to declare yourself the ordinal() and compareTo() methods; you'll probably have to declare also your own toString() method to return them as strings displaying their ordinal value, or something else, and you may optionally add static constructors (which are more exactly "factory" methods which will return one of the statically declared instances).

I think that Java (which has enum types since version 5) is much more correct here than what C/C++/C# are tolerating (in a very fuzzy way) and supporting with their ill-defined default synthetic methods for all operations.

modified 16-Nov-18 20:05pm.

GeneralRe: mapping an enum to a string is implementation-dependant Pin
Chad3F16-Nov-18 13:52
Chad3F16-Nov-18 13:52 
GeneralRe: mapping an enum to a string is implementation-dependant Pin
Philippe Verdy16-Nov-18 15:48
Philippe Verdy16-Nov-18 15:48 
PraiseNice Article Pin
MrFunke3.1416-Nov-18 4:06
MrFunke3.1416-Nov-18 4:06 
GeneralAn educational exercise Pin
SirGrowns15-Nov-18 21:19
professionalSirGrowns15-Nov-18 21:19 
GeneralMy vote of 5 Pin
Donmorcombe15-Nov-18 14:22
Donmorcombe15-Nov-18 14:22 
GeneralSide effect of ToString Pin
Member 812807315-Nov-18 12:20
Member 812807315-Nov-18 12:20 
QuestionDoes it matter Pin
TrendyTim15-Nov-18 12:11
TrendyTim15-Nov-18 12:11 
GeneralMy vote of 5 Pin
dmjm-h15-Nov-18 11:35
dmjm-h15-Nov-18 11:35 
GeneralI would consider an enum like you're proposing to be a 'code smell' Pin
Will Wayne15-Nov-18 11:09
Will Wayne15-Nov-18 11:09 
GeneralRe: I would consider an enum like you're proposing to be a 'code smell' Pin
Philippe Verdy17-Nov-18 6:21
Philippe Verdy17-Nov-18 6:21 
QuestionAnswer in the form of a question... Pin
Member 460345715-Nov-18 10:41
Member 460345715-Nov-18 10:41 
SuggestionNo dupes? PinPopular
vbjay.net15-Nov-18 9:28
vbjay.net15-Nov-18 9:28 
GeneralRe: No dupes? Pin
Member 1372366918-Nov-18 22:02
Member 1372366918-Nov-18 22:02 
GeneralRe: No dupes? Pin
Heriberto Lugo9-Feb-19 20:46
Heriberto Lugo9-Feb-19 20:46 
GeneralRe: No dupes? Pin
Philippe Verdy1-Mar-19 9:57
Philippe Verdy1-Mar-19 9:57 
QuestionThat is bizarre Pin
Marc Clifton15-Nov-18 0:28
mvaMarc Clifton15-Nov-18 0:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.