Why branches are bad




















There are conditions that are mostly true and there are conditions that are mostly false. There are also conditions that have equal chances of being true or false. These are the conditions where the optimization potential is hidden.

Second thing, we will use a term computational intensive , expensive or heavy condition. This term can actually mean two things: 1 it takes a lot of instruction to calculate it or 2 the data needed to calculate it is not in the cache and therefore a single instruction takes a lot of time to finish.

If we access the memory in a random fashion 2 , the data will probably not be in the cache and this will cause pipeline stalls and lower performance. Now moving on to programming tips. Here are several techniques to make your program run faster by rewriting the critical parts of your program. Note however that these tips can also make your programs run slower and this will depend on 1 does your CPU support branch prediction and 2 does your CPU have to wait for data from the memory.

Therefore, measure! Similarly, in case of cond1 cond2 , if cond1 is true, cond2 will not get evaluated. So, in case you have two conditions out of which one is cheap to evaluate and the other is expensive to evaluate, put the cheap one first and expensive one second. For example:. In that case it would be most logical to rearrange the above code like this:. Lookup tables LUT are occasionally handy when it comes to removing branches. Unfortunately, in a switch statement, branches are easy to predict most of the time, so this optimization might turn out not to have any effect.

Nevertheless, here it is:. Often the compilers can do this work for you, by replacing a switch with a lookup table. However, there is no guarantee this happens and you would need to take a look at the compiler vectorization report. There is a GNU language extension called computed labels that allows you to implement look-up tables using labels stored in an array. It is very useful to implement parsers. Here is how it looks for our example:. In case you are using switch command and one case seems to be most common, you can move it out of the switch and give it a special treatment.

Continuing on the example from the previous section:. How does the compiler do that? Take the following function as an example:. This sequence would translate to following pseudoassembler:. What you have here are two difficult to predict branches. The same story applies for operator and its twin operator. You can normalize bool variable by applying!! When annotated like this, the compiler will rearrange the instructions in if and else branches in order to most optimally use the underlying hardware.

Please make sure that the condition probabilities are correct, otherwise you can expect performance degradation. Some algorithms which are naturally expressed with branches can be converted to branchless algorithms.

For example, a function abs bellow uses a trick to calculate the absolute value of a number. Can you guess what trick is?

There is a whole bunch of branchless algorithms and the list is carefully maintained on site Bit Twiddling Hacks. God bless them! Many CPUs have support for conditional move instructions that can be used to remove branches.

Here is an example:. The compiler should recognize that the command on line 2 can be written as a conditional load to the variable x and emit conditional move instruction. Unfortunately, the compilers have their own internal logic on when to emit conditional branches which is not always as the developer expects. However, you can use inline assembly to force the conditional load more on this later. Please note that the branchless version does more work.

The variable x is increased regardless if the branch is taken or not. Addition is a cheap operation, but for other expensive operations like division this kind of optimizations might be bad for performance. There is a way to go branchless by cleverly using arithmetic operations. Example of conditional increment:. All of the above examples use arithmetic to avoid branches.

In case you are writing software that needs to be high-performance, you should definitely have a look at data oriented design principles. Here is one of the recommendations that applies to branches. Say you have a class called animation which can be visible or hidden. Processing a visible animation is quite different from processing a hidden one.

The branch predictor can really have a hard time processing the above code unless the animations are sorted according to visibility. There are two approaches to solve this. If a boolean is passed to the function and it is used inside the function as a parameter, you can remove it by passing it as a template parameter. To remove the evaluation, pass the parameter as a template parameter instead of a function parameter.

The branches have completely disappeared, and the code in the unused branches is gone as well. This is in fact a compiler optimization called branch optimization. However, the version with templates guarantees this, which is not the case with the original version. The compilers can often do this optimization for you. This optimization is called loop invariant code motion and you can learn more about it in our post about loop optimizations.

Using templates guarantees that this optimization always happens. If you are checking an unchangeable condition several times in your code, you might achieve better performance by checking it once and then doing some code copying.

So, in the following example, two branches can be replaced with one branch. You could also introduce a two element array, one to keep the results when the condition is true, the other to keep results when the condition is false. An example:. Like what you are reading?

And this is just a small portion of our git history from last week, when we started using pull requests. Looks like the information superhighway! There are some things we really like about pull requests, though. They give us a nice mechanism for code review and agile QA before code is merged to master. But another disadvantage of pull requests can be that this workflow promotes long-lived feature branches , unless you take specific steps to combat them.

When the Core App team discussed this issue, there was a lot of disagreement as to whether long-lived feature branches were a good idea or a bad one. Long-lived feature branches hide your work from the rest of your team. Think of your work as delta off of master. The size of the delta increases as your branch incorporates more and more work. As the size of your team grows, the amount of work hidden from each other increases.

And the chances that your assumptions about the state of the code hold true decreases the more you use long-lived feature branches. You can argue this makes your code more complex. When you merge your code more frequently to master, the pain of integration happens at the beginning instead of the end of your work. You can find issues faster and fix them at the earliest possible moment. This gives you time to react and communicate with colleagues, saving you a lot of pain and time.

You take smaller steps, which generally breaks less and leads to more stable development. And when you do break something, you can find it sooner and fix it faster, instead of waiting days or weeks. In general, being forced to take smaller steps is a good thing. A large part of our craft is about breaking work down into manageable chunks. There are times you have to break code to make it work in a new way.

Generally, this is less common if you challenge whether breaking it is really necessary. I don't see value in either of these states. The idea that you're committing half-baked features also points to a development flaw. Why would you do this? A limited feature is fine, but something that is admittedly "half-baked" shouldn't be in shipping code. Your whole team will love this: you can activate the code on any environment at any time to see how it looks or performs. Testers can prepare to test early on.

Product owners can comment on your work along the way. It is all live and easy to access for everyone! If your work is just started, this provides little value.

But evil lies in the details. Another thing that comes for free: you can turn the feature off in production if a problem arises after deployment.

After a few days or weeks, once the feature runs smoothly, just remove the toggle from the code. It is In pure Trunk Based Development the feedback will come after the merge and has to be fixed right away. If you are using short-lived branches the merge should be blocked by your CI tool. A short-lived branch is something that should last 1 or 2 days max and carry a consistent piece of code that contributes to the feature you are building.

Code reviews do not need feature branches though. If the code review culture is strong in your team then it can very well be done on the commit to the main branch. The reviewer would stop by the author of the commit and discuss what needs to be fixed.

The fix would come in another commit. Even better, have the code review together before pushing the commit in the first place. If you have a given branch per feature then it is easy to track code back to your agile board. You can navigate from a task to the branch that implements it.

It sounds cool, while in reality, this is useless! How many people can you have on an agile team? Up to 5? Up to 10?



0コメント

  • 1000 / 1000