Sunday, 26 January 2014

Guideline 4. Avoid fall-through in nonempty cases of "switch" statements

Guideline

Each case label in a switch statement shall be followed by one of the following:

  • A set of one or more statements, each in its own line, terminated by a break instruction.
  • Another case label.
  • A default label.
We may call "empty case" a case label which is followed by either another case label or  default. Empty here does not mean, of course, that no action is executed if the code flows to that label; it is just that the action executed is shared between several labels.

A "nonempty" case is one which is followed directly by a set of one or more statements. The guideline states that these statements shall always be terminated by a break instruction. These statements shall not lead to a fall through another case or default label. This is a legal possibility, of course, but this is why I am writing this guideline in the first place - if it weren't legal, the compiler would take care of it!

Discussion

A switch statement is a controlled jump. Every time the code execution encounters the keyword switch, it will jump to a different place. The question is where.

The answer is:

  • The execution will jump to the line after the case which is equal to the value of the expression in the switch, if there is one;
  • If there isn't one such case, but there is a default case, the execution will jump to the line after the default.
  • If there is none of the above (that is, no case equal to the value of the expression in the switch, and no default label), the execution will jump to the line after the closing bracked of the switch block.
It is interesting to note that this behaviour does not depend on the order of the case and default labels. default could be the first label to appear inside the switch block, some cases could go after it, and the language rules above would still hold. If there is a default , the execution jumps to it when no other case expression is equal to the switch expression, no matter where the other cases are located.

The lines with the case and default keywords are not instructions by themselves. Instead, they are just labels. They mark spots where the code execution may jump, depending on the value of the switch expression.

Ok. So what could go wrong?

One of the few things that could go wrong is that in C++, there is no implicit goto between the end of each case block and the end of the switch. This is good because it allows several cases to lead to the same action. This is done by putting several empty cases in consecutive lines, followed by a nonempty case. But it has one drawback, too: It also allows nonempty cases to fall through and continue executing below any further case labels they encounter, until a break is found or the switch block ends.

"What's wrong with that?", you say. "After all, this is how the language was defined. Every programmer must know how this kind of code works and write their switch blocks accordingly. If they're not aware of the need to write break to jump to the end of the switch block, it's their fault. I won't do such basic mistakes - if I fall through a case label it will be because I wanted to do so! If other people make mistakes when using a simple switch, what will they do with pointers?"

You are right. You are totally entitled to fall through case labels. I have no argument against that. You write correct C++ code and there is nothing I can obect to that.

Or is it?

Please listen to me for a minute. I will make one or two points and then you'll decide what you do with your switches.

First of all: Yes, I admit that you are confident and experienced in C++, and that you write correct code in this language. You never, ever forget to write a break, so whenever you use fall-through it's because you willingly choose to do so. But the next time I see one of them, how will I know it was you who wrote it? How will I know that it wasn't some beginner programmer who was confused about the need to write break and thought that his code would jump right to the end of the switch after the case was "finished", and forgot to test that particular part of the code? How will I, a humble reader, tell one unintended fall-through from your very well informed ones? I'll tell you how: In no way. I'll have to try and guess it from the context, or just trust that the existing code is correct in the absence of evidence to the contrary.

One more thing: Have you ever forgotten to pick your keys? I have, and I know perfectly how my door works. I know I won't be able to open it without them. Nevertheless, a few times in my life I forgot to pick the keys and I got locked out.

I know perfectly how switch works, but even so I have forgotten to write a break statement in certain occasions. I found out shortly after, because I tested the code and it didn't do what I expected. But nevertheless I'd like to prevent such simple mistakes in the future.

In case I haven't convinced you yet, let me quote this StackOverflow answer, which is itself quoting a designer of the C# language, who arguments against fall-through between cases being built into the language in the first place:
This implicit fall-through behavior is often used to reduce the amount of code needed and often isn't an issue the first time that code is written. However, as code moves from the initial development phase into a maintenance phase, the code above can lead to subtle errors that are very hard to debug. These errors result from the very common mistake of the developer adding a case, yet forgetting to put a break at the end of the block.
In C#, the switch statement requires that explicit flow control occur at the end of a case, either a break, goto, return, or throw. If the developer desires fall-through semantics, it can be achieved by an explicit goto at the end of a case statement.
There is not one single truth, except maybe for the fact that you need to know the syntax of your language and use it correctly. But I am strongly inclined to treat C++ as if it was C# in this particular item: do not use  fall-through, and always use break at the end of a nonempty switch statement. This will make your C# and C++ switch statements look the same, and behave in the same way - a robust, less error prone way.

Example

Don't

If write a switch statement with fallthrough, such as:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
  case 4:
  case 5:
    // do something else
    break;
}

You'll leave future readers of your code intrigued as to whether you forgot to write a break after the "do something". It is good that you solve that doubt, just in case they don't succeed at inventing a time machine so that they can come to ask you. 

Do

You could at least add a comment stating that your fallthrough is intended:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
    // FALLTHROUGH
  case 4:
  case 5:
    // do something else
    break;
}

This violates my guideline, but at least makes it clear that you knew what you were doing and that you didn't just forget the break .

A better option is to not use fallthrough at all:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
    // do something else
    break;
  case 4:
  case 5:
    // do something else
    break;
}

Of course, "do something else" should better be a very simple instruction or a function call, because you're now repeating that instruction. If it was originally a set of eight instructions, it is not a good idea to just copy and paste them in the above case, because that introduces code duplication. If it is just a trivial line or a function call, it's ok.

While we're at it, it's not a bad idea to always put a default statement:

switch (option)
{
   case 1:
   case 2:
   case 3:
      // do something
      // do something else
      break;
   case 4:
   case 5:
      // do something else
      break;
   default:
      break;
}

When a new case 6 comes, the more clear, readable and well structured is your switch statement, the less likely you will be to make a mistake.

Given that this is the best way to write a switch statement, why not build it like this from the beginning? For example, you could start by writing:

switch (option)
{
   case 1:
   case 2:
   case 3:
   case 4:
   case 5:
   default:
}

Then consider which cases should be grouped, and divide them by adding the break statements:

switch (option)
{
   case 1:
   case 2:
   case 3:
      // action for 1, 2, 3
      break;
   case 4:
   case 5:
      // action for 4, 5
      break;
   default:
      break;
}

And in a third step, add the instructions before each break, so that your actually does something. If you get used to write switch blocks this way, you'll minimize your mistakes and, more importantly, you will generate readable and well structured code.

Bibliography

[McCONNELL 2004] This book discusses switch statements in Chapter 15: "Using conditionals", Section 15.2 "case statements" (pages 361-365).

Monday, 13 January 2014

Guideline 3. The condition in an "if" statement should be a simple boolean expression with no side effects

Guideline

  • The expression inside an if statement should be boolean.
  • It should be simple and readable.
  • It should have no side effects.
  • It should not include assignments or variable declarations.

    Discussion

    1. A boolean is a boolean

    In C++, every statement is an expression of a certain type. It may be an ordinary type, such as boolint or a user-defined type such as Customer), or a special type called void which basically means "no type".

    The language rules establish which types are fundamental in the language. You don't need to invent your own bool type, for example, because the language has one. So you won't use invented types such as BOOL or Bool, will you?

    The language rules also determine its syntax, of course. They say what types are expected at each part of an instruction. Thework of the people who make compilers is to follow these rules so that they create C++ programs which execute the code you write in the right way. Your part of the work is to know these rules so that after being compiled, your program does not only what you wrote (it will, if the compiler is correct), but also what you wanted to do when you were writing it.

    Different languages have different rules. The C language, for example, does not have a native boolean type. When that language was created, if was defined as taking an int as an argument, so that you would write code such as if (count). This worked, but was not natural.

    The natural argument of a conditional expression is a boolean expression. If the boolean expression is true, the code block whichs follow the if will be executed. Otherwise, it won't.

    In C++ the expression inside the pair of brackets will be understood as a boolean expression. Some people carry the habit from their C days and pass an as int an argument to the if. Other people even use a pointer, so that the if will check whether it is NULL and will evaluate to true in that case. Don't do it - it makes the code less readable and a step closer to being error prone.

    Guideline #2 advised you to avoid implicit type conversions. This is a very good place to start applying it. The if statement requires a bool, so why pass it anything else?

    2. And nothing else than a boolean

    A boolean expression, that's all. Don't overcomplicate things by putting several actions together. Don't use complex commands which may succeed or not, returning a boolean which indicates success, as conditions inside if statements. Don't write assignments or, worse, variable declarations in the condition, either. In short, keep it simple. Don't do anything in your code that you wouldn't do when writing an instructions list for someone to follow. If you do, you'll (or someone else will) soon be asking yourself (or themselves) where is the software failing, only to discover that you need to debug into each and every complex step which you one day decided to write together inside of an if condition.

    While we're at it, everything which has been said here also applies to the conditions inside loop instructions such as for, while and do ... while .

    This second part (keep it simple, no side effects) also applies to the expression which is evaluated in a switch instruction, altough in that case of course you're not using a boolean expression, but an integer or an enum.

    Example

    Don't

    if (!isConnected() || !saveToFile())
    showMessage("Error saving to file.");

    What have you done? You have omitted the square brackets in the code block which is executed if the condition is true! Please don't do that. I didn't mention that in the guideline, but I took it for granted. Put the square brackets in their place and then we'll continue the discussion.

    if (!isConnected() || !saveToFile())
    {
    showMessage("Error saving to file.");
    }

    Thank you.

    In this example you mixed two possible complex operations (one may be a query, isConnected; the other is a command) in the condition, so you won't easily know which one failed when you see the message. At least you used a boolean expression, which is good; but it is not a simple one, and it has side effects. Let's see how we can improve it.

    Do

    if (!isConnected())
    {
    showMessage("Cannot not save to file: Not connected.");
    }
    else
    {
    bool saveSucceeded = saveToFile();
    if (!saveSucceeded)
    {
    showMessage("Error saving to file.");
    }
    }

    There are a few alternatives, most of them better than the original code. In this case at least I separated the two possible causes of error, giving them different error messages (the error in saveToFile probably needs more detailed information, but that is beyond the scope of this example). It's easy to debug the code step by step (or imagine it being debugged) and figure out what's going on.

    I assumed that isConnected is simple query and left it inside the condition; had it been a complex operation, I'd isolate it in its own line just as I did with saveToFile.

    Which leads us to the most important point: a command such as saveToFile should be called in its own line, not buried inside the conditional expression of an if statement. It's better to separate the list of things that you need to do (check whether the object, whatever it is is connected, and then save to file) from a conditional action which you will perform if the result of one of these actions is not satisfactory.

    Bibliography

    This materials listed in this section discuss the issue described above in more detail. See the page Bibliography for details of the referenced materials.

    [McCONNELL 2004] This book discusses conditionals in Chapter 15. In particular, see Section 15.1 "if statements" (pages 355-360) and the checklist at the end of the chapter (page 365).

    Monday, 6 January 2014

    Guideline 2. Make all type conversions explicit

    Guideline

    Assignments, function calls and expressions in general should not contain implicit type conversions. All type conversions should be explicit.

    Type conversions should be isolated in their own line, which should consist only of the assignment of the result of the type conversion to a variable of the target type.

    A statement should not mix type conversion with other operations. For example, if you need to convert an object to one of another type to pass it as a function parameter, do not do it in the function call itself; instead, do it in a separate instruction in the previous line.

    Discussion

    Type conversion occurs when an entity of type A is, well, converted to an entity of type B. This may occur in one of these circumstances:
    • Using the object as a parameter in a function call, where the type of the parameter in the function is not the same as the type actually passed to it.
    • Using the object as an operand in an expression which requires an object of a different type.
    • Assigning the object to an object of a different type.
    In many cases this operation will be impossible. If a function takes a Customer, for example, you cannot call it with a totally unrelated type such as a Printer. In other cases the conversion may be possible but involve some risks. Finally, there may be cases in which it can be done safely.

    A type conversion may be implicit or explicit. An explicit type conversion is one which you can read in the code. An implicit type conversion is one which does not appear in the source code, but is done automatically when the software is executed.

    Making all type conversions explicit, as this guideline states, helps you to identify the potentially dangerous ones. Type conversions shouldn't be spread all over your code, because good design techniques should avoid them in most cases. In such specific cases where a type conversion is needed, it's better to make it shine out so that more coding, review and testing effort is dedicated to ensure its correctness.

    Example

    Don't

    int i = 0;
    double d = 10.0;

    (…)

    i = d;

    This intentionally simple example is enough to show my point. There are always several things in each piece of code that might potentially go wrong, and one of them in this case (converting a value from one type to another) is totally hidden in the assignment expression, so that an error in it might go unnoticed.

    Do

    int i = 0;
    double d = 10.0;

    (…)

    i = static_cast<int>(d);

    The type conversion is made explicit by means of a cast instruction. That is precisely the meaning of a cast: an explicit type conversion.

    C++ offers several cast instructions, namely static_castdynamic_castreinterpret_cast. and const_cast. They are the equivalent of a warning sign in a road. They tell the reader of the code that something might go wrong because of the type conversion. It won't go wrong, because you designed and tested it well - but the potential risk is there. Something's going on which shouldn't be hidden, for the sake of today's and tomorrow's correctness.

    If you get used to always respect the types of your objects, identify the occasional type conversions and make them explicit, you will prevent some common mistakes and avoid future problems.

    Bibliography

    This materials referenced in this section discuss the issue described above in more detail. The range they cover is much wider, though, and dives into more depth. See the page Bibliography for the detailed references.

    [MEYERS 1998-1] The new forms of cast are discussed in the Introduction, pages 10-11. Implicit type conversions are treated on pages 67, 83, 85 and 86.

    [MEYERS 1998-2] Implicit type conversions are central to the subject of Item 5: "Be wary of user-defined conversion functions", pages 24-31.

    [SUTTER 1999] Implicit type conversions appear on pages 19, 70, 164-165 and 191. See especially Item 39: "Automatic conversions".

    Wednesday, 1 January 2014

    Guideline 1. Do one thing at a time

    Guideline

    • Put each instruction in one separate line of code.
    • Each instruction shall do exactly one thing.

    Discussion

    This will enhance the readability of your code and make it easier to debug and possibly change in the future.

    It should not be possible for an instruction to have several causes of failure. Constructing objects, calling functions and combining several values into a nontrivial expression are operations relevant enough to deserve their own line of code.

    For example, if an instruction is a function call, its arguments should be trivial expressions. If they are not, their values should be obtained in independent instructions prior to the function call. If the purpose of that function call is to get some result, don’t do anything complicated with it once you get it. Instead, place the result in a variable and do subsequent operations with it in the following lines.

    If you write code in this way you will find out that it follows your train of thought quite closely. For this reason, a future reader (whether it is another programmer or you) will have a better chance of understanding it.

    Example

    Don't

    int i = 0, j = 1;

    (…)

    i = ++j;

    In the don't section you see that two variables are declared in the same line. After doing some omitted work which possibly changes their initial values, j is incremented, and the result of this increment is assigned to i.

    Do

    int i = 0; // Count of i
    int j = 1; // Count of j

    (…)

    ++j;
    i = j;


    This code puts the two variable declarations in separate lines, thus complying with the guideline and allowing to easy document their meaning - something which may not always be necessary, but which should always be considered.

    The change of value of j (its increment) is given a specific line of code, and after that, the assignment of the value of j to i is performed. The code becomes instantly clearer, just by doing one thing at a time.

    Bibliography

    See the Bibliography page for full book references.

    [McCONNELL 2004] This book discusses layout in Chapter 31: "Layout and Style". In particular, see Section 31.5 "Laying out individual statements" and its Subsection "Using Only One Statement Per Line" (page 758-761).