Tuesday, May 29, 2007

Myth - Defining loop variables inside the loop is bad for performance

I like to define loop variables inside the loop on the line where the assignment is happening. For example if I was iterating through a String collection I would do it like below

private void test() {
for (Iterator iter = list.iterator(); iter.hasNext();) {
String str = (String) iter.next();
System.out.println(str);
}
}


The other day a colleague of mine was doing a peer review and he said this was bad and I should have defined the String variable str outside the loop. What he suggested was something like below.

private void test() {
String str;
for (Iterator iter = list.iterator(); iter.hasNext();) {
str = (String) iter.next();
System.out.println(str);
}
}

When I questioned the logic behind it he said something like:

If you define str inside the loop it will get initialized in each iteration and that’s bad for performance.

Or something like that… and so the argument started. I mean str is a frigging variable not an object that would get initialized or anything. The compiler wouldn’t go allocating memory to str in each iteration and my belief was that it didn’t matter whether you declare str inside the loop or outside the loop. The compiler wouldn’t care and would generate the same byte code in either case.



I did some googling around and came across this blog which discusses the same thing though his preference was to declare the variable outside the loop. The first example was very similar to what I had in mind but it was with primitives. The second example with StringBuffer is a totally different story from mine.

So I had to do my own test, I wrote another class with the str declared outside the loop and compiled them both. Then using javap utility that comes with the JDK printed the byte code for both implementations (ie. javap –c –private <classname>)


Byte Code genarated when the variable is declared inside the loop:

private void test();
Code:
0: aload_0
1: getfield #15;
4: invokevirtual #22;
7: astore_1
8: goto 28
11: aload_1
12: invokeinterface #26,
17: checkcast #32;
20: astore_2
21: getstatic #34;
24: aload_2
25: invokevirtual #40;
28: aload_1
29: invokeinterface #46,
34: ifne 11
37: return
}


Byte Code genarated when the variable is declared outside the loop:

private void test();
Code:
0: aload_0
1: getfield #15;
4: invokevirtual #22;
7: astore_2
8: goto 28
11: aload_2
12: invokeinterface #26,
17: checkcast #32;
20: astore_1
21: getstatic #34;
24: aload_1
25: invokevirtual #40;
28: aload_2
29: invokeinterface #46,
34: ifne 11
37: return
}

Basically it’s the same byte code except the variable numbers are switched (ie. line 11 aload_1 become aload_2). So it really doesn't make a difference and where you declare the variable is a just a matter of taste.


Don' forget to click the +1 button below if this post was helpful.

34 comments:

Jay said...

I think you are only looking at the letter of the law, and ignoring the the spirit...

In reality, you can declare the variable inside or outside or a loop, and your code might perform just the same (compiler optimizations). However, it's a "better practice" to declare variables outside of a loop.

Some junior devs would take what you are saying to mean declaring variables within a loop is fine (regardless of it being a String or a more complex object).

bchoi said...

Why is it "better practice". I would contend that it's not better due to scope bleeding.

Given that there's no performance benifit, I would place more weightage on readibility.

Jason said...

Presuming no difference in performance, I would argue that it's better practice to declare your variables in as narrow a scope as possible. It looks like better practice to me to declare the variable w/i the loop itself. Plus, if I'm reading the code later it saves me from hunting outside the loop to find the declaration.

Master of Nothing said...

Jason and bchoi makes a very valid point about the scope of the variable. If it is meant to be used exclusively within the loop, then it should be declared inside the loop.

As for junior developers, they should understand what is happening under the covers. It's better for them to make mistakes and learn from it rather than blindly following the 'best practices'. At least thats what I think :)

Cheers...

Anonymous said...

Good for you to decide to look at the byte code instead of running a microbenchmark. This is the best measurement you could have taken to clearly demonstrate that there is no performance penalty for this particular coding style.

As you can see in the byte code, the compiler hoists the declaration to avoid the need to keep reallocating the place holder.

Kirk,
www.javaperformancetuning.com

Java Donkey said...

I always thought that by declaring the String variable (which IS an object) inside the loop, you are essentially creating a new object each time through the loop which is a performance hit??? I guess I am wrong.

Abdul Habra said...

What confuses many people is that just declaring a variable does not actually creates an object of that variable. For example

String str;

Does not allocate or create a string object. The object is created when the variable is assigned a new value. For example:

String str;
...
str= "hello"; // object is created now.

Good job Master Of Nothing :)

Anonymous said...

jvm code optimization for the loop... it optimizeds for String and probably the methods for String. What happens when you have a more complex Object doing more complex things? what happens when you have inner loops? What do you think the jvm will do?

Chris said...

Jay However, it's a "better practice" to declare variables outside of a loop.

I'm with the others asking why is this a good practice. To me it is a lot more confusing -- I have to look elsewhere to see where "str"'s type is defined. In this case it isn't a big deal as you can see the case, but with generics it gets a little confusing.

Plus having the variable defined outside the loop makes me think you want to do something with it outside of there. After you exit the loop do you need str? Should you even have the opportunity to get it?

Mike said...

A String is different. Its immutable, so it will get created again weather its inside or out. Like you said, a StringBuffer would be totally different. int, long, and any other primitive would act like the String. (I'm not saying a String is a primitive)

Anonymous said...

Variables should only exist within the scope that they are used. That is the best practice. Your colleague is wrong.

As for performance impact, Abdul Habra is absolutely correct.

Anonymous said...

There is a theory of optimization called linear optimization that states that even in simple problems, trying to optimize too much simply creates a problem with no solutions, while trying to optimize too little creates a problem with an infinite number of solutions.

If a program is created so that you try to optimize every snippet of code, you end up with a rigid system that is impossible to stabilize (solve bugs).

You first need to functionally achieve every little functionality in the spec, and then and only then, measure its performance.

If the performance is good or acceptable (under 1 sec for every operation), leave as it is.

If the performance is not acceptable, optimize, using standard optimization techniques like buffers and caches, etc.

Along the way to optimization nirvana, at every little step make sure you are not breaking any functionality and that the performance improvements are relevant (10x, 2x, etc.). If the performance improvements are 0.02%, better remove that optimization and apply one that makes an impact.

Anonymous said...

Hi everyone, I know this is a little bit late, but hopefully it will help future lookers (this was the top result on a goolge search for "java defining vs declaring variables").

Anyway, in Java there is what is known as an internal "garbage collector" which basically gets rid of memory not being used. This happens when an objects reference loses scope and nothing else holds its reference.

What this means is that declaring String str before the loop results in better performance (Your test was just a "best case" scenario, with more complicated objects, methods, or loops, it will more than likely fail), after your clever little loop is finished executing, you still have a reference to that string object, which means our fancy JVM doesn't do any cleaning up for us.

The optimum solution to your problem, if you're not intending to use str after your loop and the loop isn't the last statement in the current scope, then you should place it in another scope, like so:

{ // notice the beginning block of code creating an inner scope
String str;
for (Iterator iter = list.iterator(); iter.hasNext();)
{
str = (String) iter.next(); System.out.println(str);
}
} // end of inner scope
// The rest of your code


I Hope this has been enlightening. Thanks for reading.

gag_72 said...

Somehow, I still don't find 2nd one better than the first. As far as gc is concerned, in both cases the object will not be collected till base data structure of iterator goes out of scope.In the 2nd case since scope for str is bigger, so
GC will collect it little later(although in this case it does not matter considering today's garbage collection algorithms). Please help me understand if I'm wrong, as I think objects are garbage collected.t
First code snippet
for (Iterator iter = list.iterator(); iter.hasNext();) {
String str = (String) iter.next();
System.out.println(str);
}

2nd Code Snippet
{ // notice the beginning block of code creating an inner scope
String str;
for (Iterator iter = list.iterator(); iter.hasNext();)
{
str = (String) iter.next(); System.out.println(str);
}
} // end of inner scope
// The rest of your code

http://eternalcrib.blogspot.com/

bytor99999 said...

String should not be the example. Why because Strings do a whole different thing when it comes to creating objects and the heap. String have their own "heap" so to speak.

The law is if you have an object that stays the same value. then declare it outside the loop, instead of creating many instances in the Heap, which causes performance hit when the heap needs to be constantly garbage collected.

So

OtherObject other = new OtherObject();
for (MyObject a : someCollection) {
a.setOtherObject(otherObject);
}

Now if I had
OtherObject other = new OtherObject();

Inside the loop and the loop had to run 100,000 times, then I would have 100,000 OtherObjects in the heap. (which obviously would fill up the heap, and in a short time most likely cause a "stop the world" garbage collection. Which is a big performance hit.

Whereas outside the loop I have 1 OtherObject in the heap.

Sudhan said...

There is definitely some difference with the scope of reference variable.

WeakReference<SomeObject> ref = new WeakReference<SomeObject>(new SomeObject());

case 1
for(;;)
{
//some code
Object hardRef = ref.get();
if(hardRef == null)
{
break;
}
}

case 2
Object hardRef;
for(;;)
{
//some code
hardRef = ref.get();
if(hardRef == null)
{
break;
}
}

In case 1, SomeObject can get GCed in any iteration of loop while "some code" is being executed.

In case 2, SomeObject can get GCed only in first iteration while "some code" is being executed. It will never be GCed once hardRef gets initialized.

Matt said...

What gag_72 said is technically incorrect. You would think that the object referenced by a local variable would become available for garbage collection (assuming no other references) as soon as the local variable goes out of scope, but in reality, the JVM does not release local references until the stack frame is popped (i.e., the method returns or throws an exception). Placing a variable declaration inside a scope block doesn't change the stack frame. If you want to explicitly release the reference to the object prior to returning from the method, you should assign null to the variable before the variable goes out of scope.

Anonymous said...

It seems like good practice, until you switch to another language, and you realize you've run out of memory because you allocated a string 1000 times, Or sun decides to change the compiler slightly.

Bottom line is that a majority of languages want the variable declared outside the loop so you don't waste memory. If you do it like this in java it's fine, but when you get a new job working with C/C++ i can assure you it is not fine.

Anonymous said...

The SMALLER the scope, the better. Declare the variable inside the loop.

Anonymous said...

RE: Increasing scope due to the "garbage collector" running.


This is not a useful argument, because the garbage collector does not run in realtime (i.e., after each loop iteration).

Therefore, as was stated concisely above:

Variables should only exist within the scope that they are used (in this case, declared within the loop).

gregturn said...

Don't code for performance. Code for functionality. Then, if there is a performance issue, use proper tools and tactics to find what needs to be improved. Ideas like "if you had run this 100,000 times, you would have run out of memory" may not apply to the business case. Oftentimes, code performance issues arise in the strangest of places. And also, JITs make performance advise age very quickly. If you are making something very fast, but it only involves 1% of total performance costs, you aren't going to get much leverage.

Anonymous said...

I always define my variables outside the loop because I don't want to depend on the compiler to do the right thing. Since I find myself programming in several different languages I just don't want to have to remember which ones will or won't do the right thing.

If I only coded java or if I knew with certainty that all languages and compilers would optimize the declaration I would definitely define within in the loop to minimize scope.

Just my 2 cents...

Anonymous said...

Why not use the advanced for loop as the example. That will declare the variable with in the loop.

sevencardz said...

I advocate eliminating the extra declaration altogether if possible:

private void test() {
for(String str: list)
System.out.println(str);
}

OR if you're pre Java 1.5:

private void test() {
for(Iterator iter = list.iterator(); iter.hasNext();)
System.out.println(iter.next());
}

But say you want to do some series of complex operations before you printed each String and you're pre-Tiger so you can't use the fancy for/in loop and you want to avoid calling each method in the same line for better readability and/or testing purposes (whew - catches breath).

private void test() {

String str;

for(Iterator iter = list.iterator(); iter.hasNext(); ) {

str = (String)iter.hasNext();

str = complexOperation1(str);
str = complexOperation2(str);
str = complexOperation3(str);
str = complexOperation4(str);

System.out.println(str);

}

str = null; // unnecessary unless there is more code to follow ...
}

I would agree with what the veteran coders and most practices advocate: declare the variable OUTSIDE of the loop rather than inside to ensure it is not re-declared each iteration of the loop despite the fact that Java may optimize for this. You can not be sure this is the case absolutely everywhere your code is being implemented and I am sure we are all trying to write open-source code that is helpful to everyone and not just ourselves. :-D I would also argue that since the compiler hoists the declaration anyway, it would seem better practice to go ahead and write it that way.

Though this does cause concern for the scope of at least the last iteration through the list as it leaves the str variable with a reference to the last element in the list which is accessible by any code written after the for() loop. =-O Ironically, in this example it doesn't make a difference because Strings are immutable. But assuming we were using some other type of Object ...
We could make sure that the reference is released by explicitly doing so after the end of the loop but in this case it is really unnecessary as the scope of the entire method ends directly afterward.

Even if the code logic did not end after the loop, it would probably have something to do with accessing the list again, where we could probably reuse the str variable, or it might use the value stored in str for something. Either way, it holds mostly true that if there were code after the for() loop, it would probably want to have the access to the str variable anyway and if there is no code afterward, we don't have to worry about scope violations.

sevencardz said...

For those paying attention, the line in my previous post should read:

str = (String)iter.next();

Anonymous said...

All you "declare outside" so-called vets are *wrong*.

It's *more* efficient to declare inside the loop, plus the scope makes more sense.

http://stackoverflow.com/questions/407255/difference-between-declaring-variables-before-or-in-loop

If you disagree, post actual test figures, not just your gassed opinion.

Vitorio said...

Doing this in Java is OK as you pointed (compiler optimises it to get the same final code). But, get your results only for Java... If you try this code in a more primitive language like C, you will get surprises with your "bad habits" from Java.

If you are a java coder and will eternally be a Java coder, fine. But if sometime you touch another language less safe, you will get lot of headache compiling and optimising your code. Or it will demand a Core i7 to parse some lines of text...

Anonymous said...

Declaring the variable inside the loop makes sense, in Java or C. All you're doing in either case is allocating space for a pointer, almost certainly in the stack frame, so there's no object creation overhead. In Java, it's 'String str;', in C it's 'char* str;'... no heap allocations are being performed. (There are differences between those two statements of course, but in both cases all you're defining is a pointer and not an object.)

Vilmonts said...

I would prefer declaring a variable inside the loop. As 'str' variable is a pointer inside the stack frame, the higher it is in the stack, the more likely it is to be cached.

Please read 'Stack allocation' chapter at http://www.ibm.com/developerworks/java/library/j-jtp09275.html

Javin Paul said...

This is somewhat related to your personal choice and I agree if it doesn't offer any performance benefit than better stuck with readability.

Javin
How HashMap works in Java

Fatih said...

Yes, for all which argument with gc stuff, just learn the differences between objects and references (heap and stack) first. In the code example, there is NO object created.

Anonymous said...

My 2 cents...
(I realize this is long-dead, but maybe future lurkers may benefit)

In an immutable object (such as a String) there may be no difference in whether the variable is declared inside or outside the loop, since either way a new object will be created when you "change" the value of your variable.

HOWEVER, in mutable objects (such as StringBuffer), declaring and instantiating the variable and object outside the loop would create ONE object which would then be changed by the iteration in your loop. Instantiating the object inside the loop will be making a new object for each iteration (which might be what you want), but is not efficient if you just want to update an instance of an object.

In short, its not really about where the variable declaration occurs, but rather about where the instantiation of the object occurs. If you're going to be creating new objects with each iteration, then it doesn't matter if you declare the variable inside or out. If your goal is NOT to create a new object with each iteration, but rather change the value of a object, then that object needs to be created outside the loop, which means the variable declaration needs to be outside as well.

I personally prefer to declare my variables outside of the loop, that way I don't have to be thinking about what is and is not a mutable object. And I just think it looks cleaner (that's personal style).

Hope this helps others in the future.

Cheers!
Jose

Anonymous said...

System.out cannot be a part of this test, taking it out and then measuring performance is fair

Anonymous said...

It's the reverse these days. It's likely that escape analysis in the compiler would optimise the 'declared and assigned and even 'expensively' created' within the loop better than the intuitive option of creating a single complex object outside the loop first and reusing it.

Any thinking on this problem without tests is really premature optimisation - go for clarity first, make it hard to read only if a demonstrated performance hit is happening.