• 0

Merits of using prefix vs postfix increment operator


Question

Conversation split from https://www.neowin.net/forum/topic/1229331-c-need-help-for-converting-roman-numerals-to-decimalinteger/?p=596576971
 

You may already know that pre-increment/decrement is faster than post, because post creates a copy of the original value before incrementing, whereas pre doesn't, and so you should only do post when you actually need to. With the for loop here it may be a little confusing to some about which is needed; this increment operation actually takes place at the end of each loop not the start, so post incrementation is entirely unnecessary. Your compiler may very well optimise this away for you automatically, but personally I think it's better to always explicitly use pre unless you really specially need post.

Even in debug builds any compiler worth its salt will optimize away an unused value like that, but funny things can happen in C++ with overloaded operators on STL iterators and whatnot. As you mention, it's more correct anyway to write the pre-increment. That said I find that in C# it is basically a matter of style and I don't try to impose this on others (as I do with many other nagging details :p).

Link to comment
Share on other sites

Recommended Posts

  • 0

Only as part of another expression. In isolation, (++i) and (i++) both effectively equal i += 1 or i = i + 1. The assembly demonstrates that in the for loop. A compiler will generate a single increment cpu instruction that's performed on the stack - addl $1, -4(%rbp). It doesn't evaluate to the original value and no additional instructions are performed unless the programmer explicitly avails himself of that feature by including it as part of a larger expression.

My point is at the source language level, not whatever compiled form it may have. Languages are defined by a specification, not a compiled representation. ++i is an expression that has the same value as i += 1 and i = i + 1. This is always true however the compiler chooses to compile it. Even if in a particular assembly for a particular case there's no machine code corresponding to the value of the expression, it doesn't change the value of that expression. And just because you can choose not to use this value doesn't change that the value exists, because these are expressions, not statements, and expressions have values.

 

I don't think it's unreasonable to ask the compiler to do its job. I'd be surprised if a compiler produced a different result for both a pre and post increment iterator. C++'s STL might be an exception. I'd imagine that in languages like C#, Java, and Python, there's absolutely no difference.

 

You seem to think that there's some rule in all C-like languages that compilers must optimize the postfix to the equivalent of a prefix operator if the return value is unused, which isn't even the case in C much less in C++ or other languages. It's a common optimization for integral types, nothing more. In any language where the operator can be overloaded for custom types, that is not the case. C++ is not generally able to inline functions across translation units, the C# compiler practically doesn't inline anything and the CLR has some strict heuristics about that which may or may not apply, I'm pretty sure it would never be able to remove an arbitrary object copy which may have side-effects.

  • Like 2
Link to comment
Share on other sites

  • 0

My point is at the source language level, not whatever compiled form it may have. Languages are defined by a specification, not a compiled representation. ++i is an expression that has the same value as i += 1 and i = i + 1. This is always true however the compiler chooses to compile it.

Okay, let's define it in terms of source language then. For the sake of non-partisanship, let's take Microsoft's definition:

It is important to note that a postfix increment or decrement expression evaluates to the value of the expression prior to application of the respective operator. The increment or decrement operation occurs after the operand is evaluated. This issue arises only when the postfix increment or decrement operation occurs in the context of a larger expression.

http://msdn.microsoft.com/en-us/library/e1e3921c.aspx

Which all but admits what I've been saying. In isolation, the operand isn't evaluated, only incremented. Only when it's part of a broader expression does the operand get evaluated, hence returning its original value. This is reflected in both the specification and the disassembly we've analysed thus far for C and C#.

Even if in a particular assembly for a particular case there's no machine code corresponding to the value of the expression, it doesn't change the value of that expression.

If there's no machine code, it means the compiler isn't evaluating the operand at all. That's why it's part of the design of the language, not the optimisation process. If the operand isn't part of a larger expression, there's nothing to evaluate, it's as simple as that.

 

And just because you can choose not to use this value doesn't change that the value exists, because these are expressions, not statements, and expressions have values.

An isolated postfix incrementation is only the simplest of expressions. Therefore is doesn't need to be evaluated. If it's not evaluated, there is no value to speak of.

 

You seem to think that there's some rule in all C-like languages that compilers must optimize the postfix to the equivalent of a prefix operator if the return value is unused

Absolutely not. It's not an optimisation, it's part of the design of most languages. That's why with all optimisations disabled, a compiler never evaluates an isolated postfix incrementation, that includes inside the iterator of a for loop. That's just an efficient compiler design. Why create space for the value of an operand that's never evaluated?

 

which isn't even the case in C much less in C++ or other languages. It's a common optimization for integral types, nothing more. In any language where the operator can be overloaded for custom types, that is not the case.

I'm willing to bet in the majority of cases it will. Again, it's not an optimisation, it's just a logical design. If an iterator object is never evaluated, why create a copy of it?

 

C++ is not generally able to inline functions across translation units, the C# compiler practically doesn't inline anything and the CLR has some strict heuristics about that which may or may not apply, I'm pretty sure it would never be able to remove an arbitrary object copy which may have side-effects.

If the operand in the iterator of a for loop isn't being evaluated, then I don't see why it should be any different to integral types. It's the same logic.
Link to comment
Share on other sites

  • 0

Okay, let's define it in terms of source language then. For the sake of non-partisanship, let's take Microsoft's definition:

http://msdn.microsoft.com/en-us/library/e1e3921c.aspx

Which all but admits what I've been saying. In isolation, the operand isn't evaluated, only incremented. Only when it's part of a broader expression does the operand get evaluated, hence returning its original value. This is reflected in both the specification and the disassembly we've analysed thus far for C and C#.

That's not the C++ language specification and it doesn't state that the operator is not evaluated if the value is not used. In fact it has to be evaluated in the general case where the operator is an arbitrary function just like any other function because functions may have side-effects and optimizing them away would change the behavior of the program.

 

To illustrate:

int printHello() { printfn "Hello"; return 0; }

int main() { printHello(); }

If the compiler could simply remove function calls because their return value was not used, this program would not print "Hello" on release builds.

 

If the operand in the iterator of a for loop isn't being evaluated, then I don't see why it should be any different to integral types. It's the same logic. 

 

Again, because it's an arbitrary function and you can't just willy-nilly remove arbitrary code because it may have side-effects. Also as I said, C++ is generally not able inline functions across translation units so even if the optimisation was possible after inlining, the lack thereof prevents it outright.

 

I'm willing to bet in the majority of cases it will. Again, it's not an optimisation, it's just a logical design. If an iterator object is never evaluated, why create a copy of it?

 

Because that's what your code does according to spec and the job of the compiler is to do what your code does according to spec. You're expecting the compiler to figure out that your program will have the same behavior while removing part of what you said it should do, which is a reasonable optimisation in the case of integral types, but not for arbitrary types with user-defined overloads.

 

Absolutely not. It's not an optimisation, it's part of the design of most languages. 

 

Ok then please find the relevant part of the specs.

  • Like 1
Link to comment
Share on other sites

  • 0

That's not the C++ language specification and it doesn't state that the operator is not evaluated if the value is not used.

The point being, by itself and without being a part of a larger expression, the operand of a postfix operator behaves identically to a prefix. There is no original value because that only applies when it's part of another expression. The for loop construct iterator (;; i++) isn't part of a larger expression, thus there's nothing to evaluate to. It's merely incremented.

 

In fact it has to be evaluated in the general case where the operator is an arbitrary function just like any other function because functions may have side-effects and optimizing them away would change the behavior of the program.

To illustrate:

int printHello() { printfn "Hello"; return 0; }

int main() { printHello(); }
If the compiler could simply remove function calls because their return value was not used, this program would not print "Hello" on release builds.
I see your point. However, this function isn't really the same as a postfix operator, where unused copies are disposable. In that situation, the object could be safely elided (RVO) if it isn't used in a for loop.

Perhaps we're getting away from the original purpose of this thread. The initial debate was centred on whether or not theblazingangel's assertion that a prefix operator was faster and somehow more efficient in the context of a for loop construct. The example of this dealt with an integral datatype, specifically an integer.

I think we have adequately proved that assertion to be false. Whether that can be applied to more complex objects such as classes and custom overloaded operators is another matter. I'm not entirely sure myself on the latter. From what I've seen, unused complex objects can be safely elided, and most compilers already do this.

Link to comment
Share on other sites

  • 0

Perhaps we're getting away from the original purpose of this thread. The initial debate was centred on whether or not theblazingangel's assertion that a prefix operator was faster and somehow more efficient in the context of a for loop construct. The example of this dealt with an integral datatype, specifically an integer.

We agree on that point. If you only ever use C or Java or other languages where using the postfix operator in an isolated context will never be an issue, then it doesn't matter. In the case of C++, the correct convention is to always use the pre-increment as recommended by the people designing the standard (i.e. Herb Sutter) because it matters for user-defined types. Using post-increment only for integral types in C++ would be error-prone. If I had to pick one for all C-like languages I would go with the C++ convention, because there is no disadvantage in doing so and it's always correct.

Link to comment
Share on other sites

  • 0

I personally prefer post increment, as 9 times out of 10 I am incrementing a value and I need to start at 1 I will start it at 1 and post increment.  Otherwise my increment being after doesn't really affect anything.

I do my best to optimize code as I go such as..
 

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

for (int i = 0, x = objects.Count; i < x; i++) {
   //do stuff
}

vs.

 

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

for (int i = 0; i < objects.Count; i++) {
   //do stuff
}


OR

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

foreach (object o in objects) {
   //do stuff
}


For incrementing and loops.. if I need to start at 1.. I will set it to 1

int count = 1;
while (count++ < 10) {
}

Though, I rarely would do that.  However now that I am doing a lot of work with 3d and parsing files I rely on the way x++ works ie)

 

int offset = 0;
while (true) {
   for (int i = 0; i < 3; i++) {
     myVar[i] = val[offset][i++]; //i = 0; offset = 0
     myVar[i] = val[offset][i++]; //i = 1; offset = 0
     myVar[i] = val[offset++][i]; //i = 2; offset = 0 (next iteration it would be 1)
  }
}

I like clean code, done in few lines.  I know I could do:   i, i+1, i+2 etc but I can do it with an i++ and get the same values.  Same with offset++ done in the last line, I could have it outside of the val[] brackets do a val[offset];  offset++; or offset+=1.

Link to comment
Share on other sites

  • 0

I haven't been bothering to reply here for a while because I'm too busy to dedicate the time to properly participate in the discussion; I have been keeping an eye on it though, and I do plan to respond at some point when I can find enough free time, but I want to quickly respond to parts of two posts now:
 

Using pre-increment where post-increment is desirable can introduce very nasty bugs into code. Consider the following:

Example #1:

for (int i = 0; i < 5; i++)
{
    printf("Programming is fun\n");
}
Example #2:
int i = 0;
while (i++ < 5)
{
    printf("Programming is fun\n");
}
Two loops, both using post increment. The first loop iterates four times, the second loop iterates five times. To make the loops equivalent, pre-increment is the better choice here.

 

 
Actually, both examples here loop five times. In the for loop, the increment, whether post or pre, occurs at the end of each loop. With the while loop, the incrementation is performed as part of the comparison check used to determine whether to enter the loop each time, and thus is occurring at the start of each loop. To be clear, with the while loop, the original value is being used in the determination of whether to enter the loop, and the incremented value is then used/available within that loop.
 
A post-inc used in the for loop where a pre-inc should really have been used is easy for the compiler to optimise away. With the while loop, a little less so, but surely not too tricky that it wouldn't optimise what you've written here.
 
Switching both to using pre-incrementation would actually break things. It would make no difference to the for loop, which would loop five times as before, but the while loop would only loop four times. That is unless you also changed the evaluation from < to <=. I would suggest one of the following as a better way to write that while loop:

// A
// (i within loop has value 1->5)
int i = 0;
while (++i <= 5)
{
    printf("Programming is fun\n");
}

// B
// (i within loop has value 0->4)
int i = 0;
while (i < 5)
{
    printf("Programming is fun\n");
    ++i
}

// C
// (i within loop has value 1->5)
int i = 1;
while (i <= 5)
{
    printf("Programming is fun\n");
    ++i
}

All three of these provide an optimal implementation that does not rely on compiler optimisation.
 

I personally prefer post increment, as 9 times out of 10 I am incrementing a value and I need to start at 1 I will start it at 1 and post increment.  Otherwise my increment being after doesn't really affect anything.

I do my best to optimize code as I go such as..
 

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

for (int i = 0, x = objects.Count; i < x; i++) {
   //do stuff
}
<snip>

 

I'm not following your logic and argument for your use of post-inc here. Understand that the entire increment operation is performed at the end of each loop, whether using post or pre. The use of post-inc does not tell the compiler to delay incrementation until after each loop, and pre-inc to do it at the start. It is performed at the end of each loop in both cases!!! Take the following two loops (identical except one uses pre-inc and one post):

// A
for (int i = 0; i < 5; i++) {
   //do stuff
}
// B
for (int i = 0; i < 5; ++i) {
   //do stuff
}

 Both of these have identical outcomes and provide the same value of i within each loop. The only difference is that the pre-increment is optimal without relying on compiler optimisation.
 

However now that I am doing a lot of work with 3d and parsing files I rely on the way x++ works ie)
 

int offset = 0;
while (true) {
   for (int i = 0; i < 3; i++) {
     myVar[i] = val[offset][i++]; //i = 0; offset = 0
     myVar[i] = val[offset][i++]; //i = 1; offset = 0
     myVar[i] = val[offset++][i]; //i = 2; offset = 0 (next iteration it would be 1)
  }
}
I like clean code, done in few lines.  I know I could do:   i, i+1, i+2 etc but I can do it with an i++ and get the same values.  Same with offset++ done in the last line, I could have it outside of the val[] brackets do a val[offset];  offset++; or offset+=1.

 

 
I also like clean efficient code. This code is not that (without relying on compiler optimisation). This would be a little nicer I think:
 

int offset = 0;
while (true) {
   for (int i = 0; i < 3; ++offset) {
     myVar[i] = val[offset][i]; //i = 0; offset = 0
     myVar[++i] = val[offset][i]; //i = 1; offset = 0
     myVar[++i] = val[offset][i]; //i = 2; offset = 0 (next iteration it would be 1)
  }
}

(Though I'm confused about why you're repeating the three lines within the loop like that; perhaps just a badly put together example? Thus the following would be best)

int offset = 0;
while (true) {
   for (int i = 0; i < 3; ++i, ++offset) {
     myVar[i] = val[offset][i];
  }
}
  • Like 1
Link to comment
Share on other sites

  • 0

I personally prefer post increment, as 9 times out of 10 I am incrementing a value and I need to start at 1 I will start it at 1 and post increment.  Otherwise my increment being after doesn't really affect anything.

I do my best to optimize code as I go such as..

 

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

for (int i = 0, x = objects.Count; i < x; i++) {
   //do stuff
}

vs.

 

List<object> objects = Lookup.GetObjects().Where(o => o.ID == 0).ToList();

for (int i = 0; i < objects.Count; i++) {
   //do stuff
}

There will be no difference between those cases, the compiler is smart enough to realize it only needs to evaluate the Count property once. And the second sample is easier and cleaner to read.

 

Furthermore, when looping through IEnumerables in .NET it's always cleaner (imho) to use a foreach loop.

 

foreach(var object in objects)

{

//do stuff

}

 

So, I really don't think your examples are good in any way

 

Edit: in fact, if you're into functional programming there is no need for the intermediate objects variable or the for loop at all. Use lambdas ;)

Link to comment
Share on other sites

  • 0

I'm not following your logic and argument for your use of post-inc here. Understand that the entire increment operation is performed at the end of each loop, whether using post or pre. The use of post-inc does not tell the compiler to delay incrementation until after each loop, and pre-inc to do it at the start. It is performed at the end of each loop in both cases!!! Take the following two loops (identical except one uses pre-inc and one post):

 

 

Wasn't meant to showcase the pre-or post more just that I do try and watch what I code and make it as clean as I can.

 

 

 

 

I also like clean efficient code. This code is not that (without relying on compiler optimisation). This would be a little nicer I think:

int offset = 0;
while (true) {
    for (int i = 0; i < 3; ++offset) {
        myVar[i] = val[offset][i]; //i = 0; offset = 0
        myVar[++i] = val[offset][i]; //i = 1; offset = 0
        myVar[++i] = val[offset][i]; //i = 2; offset = 0 (next iteration it would be 1)
    }
}

(Though I'm confused about why you're repeating the three lines within the loop like that; perhaps just a badly put together example? Thus the following would be best)

 

int offset = 0;

while (true) {

    for (int i = 0; i < 3; ++i, ++offset) {

        myVar = val[offset];

    }

}

They were just rough written examples, not actual snippets of code.  More just to show that sometimes ++var works, but other times var++ is required.

 

There will be no difference between those cases, the compiler is smart enough to realize it only needs to evaluate the Count property once. And the second sample is easier and cleaner to read.

 

Furthermore, when looping through IEnumerables in .NET it's always cleaner (imho) to use a foreach loop.

 

foreach(var object in objects)

{

//do stuff

}

 

So, I really don't think your examples are good in any way

 

Interesting, I had always read that

a) don't have it constantly make calls to properties of objects use a set variable.  I do the same in C++ not sure if maybe it compiles differently.

b) Foreach loops create extra overhead and that for (x,y,z) loops are better to us.

 

Link to comment
Share on other sites

  • 0

So just did a quick test with loops in C#.  

Looks like for loops get changes into while loops.  So
 

public void loopA()
{
    for (int i = 0, x = l.Count; i < x; i++)
    {
       Console.WriteLine(l[i]);
     }
}
And
 
public void loopB()
{
   for (int i = 0, x = l.Count; i < x; i++)
   {
      Console.WriteLine(l[i]);
   }
}

Both became

 

// Tester.Form1
public void loopA()
{
   int i = 0;
   int count = this.l.Count;
   while (i < count)
   {
       Console.WriteLine(this.l[i]);
       i++;
   }
}

26 IL lines.

The foreach loop stayed the same (sans variable names) and was 27 IL lines, but gives you an object reference.

Link to comment
Share on other sites

  • 0

Wasn't meant to showcase the pre-or post more just that I do try and watch what I code and make it as clean as I can.

 

They were just rough written examples, not actual snippets of code.  More just to show that sometimes ++var works, but other times var++ is required.

 

 

Interesting, I had always read that

a) don't have it constantly make calls to properties of objects use a set variable.  I do the same in C++ not sure if maybe it compiles differently.

b) Foreach loops create extra overhead and that for (x,y,z) loops are better to us.

 

 

a) no, every compiler worthy of the name will optimize your for condition

b) see: http://www.dotnetperls.com/for-foreach. Concerning this, if you are using a very high level language/framework like c#/.NET the difference between for/foreach will be too small in the entire picture anyway. Make your code readable and easy to follow instead of being concerned about 2 or 4 stack variables. Only when there really is a need for micro management in relation to performance (and there seldom is one) you can start worying about it. And even then, there are so many things you can do before even getting to the for or foreach loops/post or pre etc.

Link to comment
Share on other sites

  • 0

I haven't been bothering to reply here for a while because I'm too busy to dedicate the time to properly participate in the discussion; I have been keeping an eye on it though, and I do plan to respond at some point when I can find enough free time, but I want to quickly respond to parts of two posts now:

 

 

Actually, both examples here loop five times. In the for loop, the increment, whether post or pre, occurs at the end of each loop. With the while loop, the incrementation is performed as part of the comparison check used to determine whether to enter the loop each time, and thus is occurring at the start of each loop. To be clear, with the while loop, the original value is being used in the determination of whether to enter the loop, and the incremented value is then used/available within that loop.

 

A post-inc used in the for loop where a pre-inc should really have been used is easy for the compiler to optimise away. With the while loop, a little less so, but surely not too tricky that it wouldn't optimise what you've written here.

 

Switching both to using pre-incrementation would actually break things. It would make no difference to the for loop, which would loop five times as before, but the while loop would only loop four times. That is unless you also changed the evaluation from < to <=. I would suggest one of the following as a better way to write that while loop:

You're absolutely correct, I can't believe I missed that. I might have to hand in my qualifications over that blunder :p

Link to comment
Share on other sites

  • 0

 

So just did a quick test with loops in C#.  

Looks like for loops get changes into while loops.

Actually in IL there are just branches (gotos), so everything gets translated into gotos. The while loop you see is just how your decompiler interpreted a particular branch in the IL.

Link to comment
Share on other sites

  • 0

It's clear to me from this thread that there are two distinct use cases of the postfix operator.

1) In isolation and without being part of a larger expression. Such as:

i++;

or

for ( i = 0; i < 10; i++ )

In this particular use case, the prefix and postfix operators are interchangeable. That is to say, they perform and behave identically. They only serve to increment/decrement the operand (i). Any side effects such as returning the original value in the case of the postfix don't apply.

Now that we have established that both forms are the same for this use case, which should we prefer? That answer comes down to consistency, with ones own code, that of others, and the standardised literature. Above all else, the code should be easy for others to understand and determine intent. "Why is the programmer using this operator?" should be the thinking. If that question can't be answered simply and precisely, then it's probably for the wrong reasons. Premature optimisation is one such misguided reason. Often the result of the programmer 'trying to be clever', it leads to non-standard code that's difficult to decipher the intent of. In some cases it even impedes performance because it's trying to second guess the compiler. Moreover, it can become habitual.

 

The comment which started this thread is an example of this. It was suggested that someone's code could be 'optimised' by substituting a prefix (++i) in place of a postfix (i++) operator for an integral datatype. That is in my mind without question - premature optimisation.

2) The second use case is based on intent. This is where the programmer specifically avails himself of the feature of the postfix that allows him to save the original value, or in the case of a pointer, dereference it, while still incrementing or decrementing the operand. The desired behaviour only works as part of a larger expression. For example:
 

PRIVATE char *
fs_fgetln ( FILE *stream )
{
    static char     *buf = NULL;
    static size_t   size = FGET_SIZE;
    size_t          tmp;
    char            ch, *c;

    if ( !buf )
        buf = malloc ( FGET_SIZE );
    
    c   = buf;
    ch  = fgetc ( stream );
    
    while ( '\n' != ch && EOF != ch ) {
                
        if ( size < ( c + 1 ) - buf ) { 
            
            tmp     = c - buf;
            size    += FGET_SIZE;  
            buf     = realloc ( buf, size );
            c       = buf + tmp;
        }

        *c++ = ch;  /* increment operand after making use of its original value */             
        ch   = fgetc ( stream );  
    } 
        
    *c = '\0';
    
    return '\n' == ch ? buf : NULL;
}    

In the situation above, a prefix operator wouldn't give the desired effect.

So in summary, by itself, the postfix operator will simply function as a increment/decrement statement. And as part of a larger expression, intent should dictate its use. That is to say, if the programmer wishes to preserve the original value as the above example illustrates, then he should use the postfix.

Link to comment
Share on other sites

  • 0

Actually in IL there are just branches (gotos), so everything gets translated into gotos. The while loop you see is just how your decompiler interpreted a particular branch in the IL.

From what I can tell, that's what all loops get broken down to - jmp .L3, repeat [ cmp, jmp .L2 ]. Whether it's assembly or some kind of byte code for a virtual machine. Which kind of makes the hysteria surrounding the goto statement seem irrational :laugh:

Link to comment
Share on other sites

  • 0

From what I can tell, that's what all loops get broken down to - jmp .L3, repeat [ cmp, jmp .L2 ]. Whether it's assembly or some kind of byte code for a virtual machine. Which kind of makes the hysteria surrounding the goto statement seem irrational :laugh:

Well ultimately it's all 1s and 0s, that doesn't mean trying to code in binary is a good idea. ;)

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.