Sunday, 9 February 2014

Guideline 5. "for" loops should be simple and well-formed

Guideline

for loops should be well-formed. This means:
  1. They should have a single loop counter, which shall be of integral type or an iterator.
  2. The loop counter should  only be modified in the increment expression of the loop.
  3. The loop counter should be incremented or decremented in a constant amount at each loop iteration.
  4. The loop counter should be accompanied in the loop condition, if anything, by boolean variables.
  5. The loop body should be a compound statement, delimited by brackets.
  6. The loop body should not contain labels which are the destination of goto statements.
  7. The loop body should contain at most one break statement.
  8. Use continue with care, at best in the beginning of the loop body, to exclude certain iterations from the action.

Discussion

To loop or not to loop, that is the question

Loops are an important part of functions in any structured programming language. Actually, code has one of the following three structures: a sequence of actions, which are performed in the order you write them; an alternative, in which the action or actions to perform depend on a certain condition; and a repetition, which is expressed in the form of a loop. Hence, you will easily know when you need to loop: when you need to perform something several times, rather than just once. When there is repetition, there is a loop.

A loop - but which one?

Once you have decided that you need to loop, the next question which arises is how - in other words, which loop structure you will use, among the ones supplied by the language.

C++ has three loop instructions: for, while and do ... while. To choose wisely among them, you need to know what their differences are. You surely know the for syntax:

for (initialization condition; expression) statement

The first thing you need to know is that statement should actually be a compound statement, that is, a block of code, comprised between brackets: { ... }, with n single statements inside it. And, yes, each of them in its own line. This is called the body of the loop. In contrast, the initialization, condition and expression together are called the header of the loop.

initialization, in turn, is a single statement, finished with its own semicolon (;) which I did not write separately because, technically, it's part of the initialization itself.

You may of course use the comma operator inside the initialization statement, to combine several actions, but frankly, I don't recommend it. Let's keep things as simple and readable as possible.

So what's going on here?

The initialization is performed. Then, the condition is evaluated. Then, two things can happen. If the condition is false, the code jumps right after the for statement block. If instead it is true, the statement (the loop body) is executed. After it, the incrementing expression is evaluated. And then, again, the condition is evaluated and there you have the loop - this sequence of actions is repeated, until the condition is evaluated to false.

I said incrementing expression. Now we've reached the final answer. A for loop makes sense when you repeat an action for a known number of times, or for all the elements of a known set. It makes sense when you have a loop counter - one variable which is initialized in the initialization, evaluated in the condition, and incremented (or decremented) by a constant amount in the expression. This variable should be of an integer type, or better said, it should not be of a floating point type. Iterators (variables used to iterate through standard containers - one day we'll talk about the Standard Template Library) are also accepted as loop counters.

If you don't know the number of times or the total set of elements for which you're going to do something, and what's essential in your looping activity is a certain condition which will at some point interrupt it, you'd better chose one of the other two looping structures: a while or a do ... while. It is easy to choose between them - the former checks the condition beforehand and thus may never perform the repeated action (if the condition happens to be false at the beginning), and the latter executes the repeated action at least once and then checks the condition to see if it must keep repeating it, or just finish.

A for loop - but how?

Now let's suppose you have already chosen a for loop. You want to do something n times, with n known, or for each element of a certain set which you need to iterate through.

for loop is a practical, readable (once you get used to it) and terse construct, but you need to use it well. Because of its uncommon syntax, using it in a too imaginative way is not a good idea.

All parts of the for loop should be short and readable. Variable names should be chosen to make it easy to understand.

It you follow all the parts of the guideline above, you'll avoid most of the error-prone deviations in for loops. Let's repeat them here:
  • They shall have a single loop counter, which shall be of integral type or an iterator. Using a floating point variable as a loop counter is against the good practices of a C++ programmer. Using more than one counter is overcomplicating things - a while loop should be preferred in such case.
  • The loop counter shall only be modified in the increment expression of the loop. Don't play with the loop counter inside the loop body, because if you do your code will be playing "loop pinball": the reader will have to perform complicated calculations in his or her head just to know what's going on, and the number of times the loop is executed will not be obvious anymore. This is a total no-no.
  • The loop counter shall be incremented or decremented in a constant amount at each loop iteration. Again, don't play strange games with your loop counter. It's a counter, that's all. I'm already letting you increment or decrement it in quantities other than 1 (you should seldom do it, anyway), but incrementing it in a value which is not constant is just too much.
  • The loop counter shall be accompanied in the loop condition, if anything, by boolean variables. Imagine there's a special condition which terminates the loop (for example, finding one element), apart from the loop counter reaching its limit. This is a natural sophistication and understandable of a loop - if this additional variable is boolean. If it is not, the for loop becomes too complicated.
  • The loop body shall be a compound statement, delimited by brackets. I told you before. Never use single statements, omitting the brackets, in structured keywords, i.e. in conditions or loops. Doing so is just asking for errors. You'll have more bugs in your software just because you decided to overlook this advice. Don't do it.
  • The loop body shall not contain labels which are the destination of goto statements. Well, I'm sure you don't use goto that much, but if one day the idea of jumping with a goto to a label which is inside a code block which has an inner scope - you'd better abandon the idea right away. If that inner code block happens to be a loop body, well, you shouldn't even have had the idea in the first place. It's a particularly bad idea. As a bad idea, it's hard to beat, actually. Although we'll have the opportunity to discuss even worse ideas when we talk about syntactic overloading, this is certainly close to the bottom.
  • The loop body should contain at most one break statement. This is a mechanism to interrupt the loop early if a certain special condition is met in the middle of the loop body. For example, you were searching for something and you found it. Since this keyword adds complexity to the code flow, it's best to only use it once, and it's certainly best to avoid it altogether.
  • Use continue with care, at best in the beginning of the loop body, to exclude certain iterations from the repeated action. Again, this rule is aimed not to overcomplicate the flow of your code.
All these rules can be summarized in the following sentence: "A loop shouln't be complicated". But if for a good reason it needs to be so, then it shouldn't be a for loop. When there is complexity regarding the exit condition, or the initialization is not that simple, or the increment expression needs to be something more than just incrementing or decrementing a variable... Then change your loop to a while or a do ... while, which are inherently more readable and less error prone in these more difficult cases.

Bibliography

See the page Bibliography for details of the referenced materials.

[McCONNELL 2004] This book discusses loops in Chapter 16: "Loops" (pages 367-389).

No comments:

Post a Comment