Learning to optimize with Assembly

  • Learning to optimize with Assembly niktehpui

    I am a second year student of Computer Games Technology. I recently finished my first prototype of my "kind" of own pathfinder (that doesn't use A* instead a geometrical approach/pattern recognition, the pathfinder just needs the knowledge about the terrain that is in his view to make decisions, because I wanted an AI that could actually explore, if the terrain is already known, then it will walk the shortest way easily, because the pathfinder has a memory of nodes).

    Anyway my question is more general: How do I start optimizing algorithms/loops/for_each/etc. using Assembly, although general tips are welcome. I am specifically looking for good books, because it is really hard to find good books on this topic. There are some small articles out there like this one, but still isn't enough knowledge to optimize an algorithm/game...

    I hope there is a modern good book out there, that I just couldn't find...

  • Usually, solid optimisation doesn't depend on using Assembly, or doing micro-optimisations with code in higher level languages. If you read a lot of research papers (as I do -- or try to!), you'll see that oftentimes the improvements made to algorithms are at a broader conceptual, "qualitative" level, rather than at the more "quantitative" level of micro-optimisation. I would stress that order-of-magnitude gains are more likely to be found by looking at algorithms from this point of view, or from vectorising/parallelising existing solutions.

    Having said that, I recently happened upon this, which may be a good route towards learning x86 ASM specifically for game developers.


    Two sources off the top of my head:

    Additionally, reading research papers is an excellent way to follow the thought processes of the wise as they optimise algorithms for better performance. Most often, gains are seen by:

    • Reducing the use of the most costly operations (div, SQRT, trig ops, and conditionals, primarily);
    • Improving cache performance through use of more efficient data structures, memory alignment, and reduced conditionals;
    • Reducing quality of output in acceptable areas for improved performance;
    • Vectorisation (SIMD);
    • Parallelisation (threading, includes shifting tasks off to the GPU);
    • And of course (increasingly rarely) hand-coded assembly. First inspecting C/C++ assemblies to see where the compiler is making non-optimal choices, of course. You will find more of this in older papers from the 80's and 90's, IME.

    Reading research also keeps you at the cutting edge of your field, instead of waiting for that knowledge to filter down into the industry.

  • The first tip you'll get is this - don't.

    Modern compilers are actually really really good at optimizing code, and will be much more likely to do a better job of it than any self-rolled assembly language you may write.

    The exception would be any specific case where you have determined for certain that the compiler is doing a bad job of optimizing, so that's the second tip. There are no general guidelines here, you need to know your own code, know what it's doing, be able to jump into a disassembly of it, and be able to determine for absolute certain that the compiler is doing a bad job.

    Even in this case you still may not want to. You need to be certain that there is not going to be any ongoing maintenance overhead for you. You may wish to come back to this code in 6 months time and modify part of it, or you may find an extremely subtle bug that's going to be more difficult to fix in an assembly language version. Even if you think you've worked all the bugs out, once your program goes to the public bugs you never even thought could happen will become a reality for you. That's quite an eye-opener (and a humbling experience).

    And even if you're happy to accept that, you may still find that there is absolutely no measurable performance improvement as your main bottleneck could be somewhere completely different in your program. So that brings me back to number 1 again. Don't.

  • I think it might be too early.

    Anyway, it is important to understand that the compiler itself does not produce slower code than the assembly equivalent, you don't get any performance simply from writing the same assembly code as the compiler would.

    For a start at least concentrate on assembly-free optimizations. Igor Ostrovsky have a few good articles that demonstrate some of the basics: http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/

    Do note that branch mispredictions and cache misses are what you should primarily optimize against, even if you have to pay by doing some extra arithmetic operations it is usually worth it to avoid an unpredictable branch or reading randomly from too much memory.

    And of course, most importantly, optimize your algorithm first. A slow implementation of a fast algorithm will almost always be faster than a fast implementation of a slow algorithm.

  • I'll be the one going against the grain here and say, it is never too early to learn about optimizations, especially assembly optimizations and more importantly, debugging in assembly. I believe that you will gain the maximum benefit of it if you are a student (because then you have very little to lose [i.e. time/money wise]) and everything to gain.

    If you are in the industry and not tasked with tinkering around in assembly, then don't. Otherwise, if you are a student or have time in general, I would find the time to learn to disassemble programs and see if I can come up with a better solution than the compiler. If I can't, who cares! I just learned how to write as well as compiler and that is a HUGE plus when you are faced with a bug in release code (with no debug symbols) and staring at the disassembly because that's the only thing you can look at.

    The answer

    This is one of the best resource I have found for learning about optimizations.


    The rant

    If you read some articles by major developers (for example, reasoning behind the making of EASTL and closer inspection of the code will lead you to comments like did this because GCC is terrible at inlining this if statement which will tell you, what the majority of people tell you trust the compiler is not always right, ESPECIALLY in game development) and then set foot in the industry you will find that optimizations are an everyday thing and knowing what the assembly output means is a big plus. Also, people don't seem to realize (especially on stackoverflow) that profiling games is very hard and not always accurate.

    There is a caveat though. You can spend time optimizing something and later on realize that was time wasted. But what did you learn? You learned not to repeat that same mistake in a similar circumstance.

    What SO is now taking is in my opinion a religious stance to the statement don't optimize until you profile and don't worry, the compiler knows better than you. It hinders learning. I know experts in the industry who are paid very good money (and I mean VERY good money) to fiddle around in assembly to optimize the game and debug it because the compiler is bad at it or simply cannot help you, because, well, it cannot (GPU related crashes, crashes where data involved is impossible to read in a debugger etc. etc.)!

    What if someone who loves doing that, hasn't fully realized it yet, asks the question here and is turned away/off by the many answers compiler knows better than you! and never becomes one of those highly paid programmers?

    One final thought. If you start doing this early, you will find that soon you will start writing code that is at worst, has no performance improvements whatsoever because the compiler optimized it the same way or at best, has some performance improvements because now the compiler can optimize it. In either case, it has become habit, and you are no slower at writing code this way than what you did before. A couple of examples are (there are many more):

    1. Pre-incrementing unless you really want post-increment
    2. Writing loops for containers using a constant local size variable rather than calling size() on the container within the loop.

  • This book is exceptionally good for a text book. But its not specifically geared towards optimization. Assembly Language for x86 Processors, 6th edition

    It's more about teaching the fundamentals of assembly, using MASM. Then towards the end of the book it gets into how to inline assembly with c++ and integrate it into bigger programs.

    I put this up here because it makes sense to learn the fundamentals of assembly before you learn how to optimize programs with it.

    I like this book because Irvine teaches you how to use the tools needed to write masm programs. He specifically goes into how to use the IDE (Visual Studio C++) and the debugger. Each chapter has a few videos dedicated towards solving problems. Some of this information is available freely on the website listed.

c++ optimization books
Related questions and answers
  • I started playing around with terrain and I am getting this assertion in QueuedRenderableCollection::addRenderable: "Error inserting new pass entry into PassGroupRenderableMap" I am trying to find out what I have done wrong, but it is hard, so perhaps someone has any idea of what may it be. I can reproduce it easily on my engine and my scene consists only of: 1 terrain page 1 mesh... to fire that I have found is the hash being changed after the node was inserted on the map. Any ideas how can I find a solution to it? Which data and information I can provide to help debug this? Also I

  • hey so I've decided to Code my own 2D soft-body physics engine in C++ since apparently none exist and I'm starting only with a general idea/understanding on how physics work and could be simulated: by giving points and connections between points properties such as elasticity, density, mass, shape retention, friction, stickiness, etc. What I want is a starting point: resources and helpful examples/sites that could give me the specifics needed to actually make this such as equations and required physics knowledge. It would be great if anyone out there also would give me their attempts

  • the argument list" or if I mouse hover over Ogre::TerrainGlobalOptions() it says "Error: expected a type specifier I searched on Google but couldn't find my answer. I got about a year C++ experience...I am trying to get more experienced with Ogre 3D and I am following the basic tutorials on the website. All the code samples compile just fine. The first time I encountered an "error" was in Basic Tutorial 3. The code line where Visual Studio says something is not right is mTerrainGroup = OGRE_NEW Ogre::TerrainGroup(mSceneMgr, Ogre::Terrain::ALIGN_X_Z, 513, 12000.0f); Visual Studio compiles

  • I want to make a simple game where I have some characters fighting on a plane (level). I find the trickiest part as figuring out who should do what here. I want my terrain to have friction (so you... by its coefficient of friction. Because I don't think the player needs to know about the level (and preferably the level does not know about characters) It's in these little design aspects that I get a bit confused. Is this a good design? What could be improved / changed / added given what I want to do? Thanks

  • such as World of Goo, Gish, and Aquaria; but I'm really missing out on good games over here in my Linux hut. What I have found So, I did a little research on how the games mentioned were made. Thus, I have started learning OpenGL and SDL using my C++ programming experience (which is about a year of programming, not much). I also picked up the following books, which were recommended from programmers... experienced programmer who recommends that I take the C++ route to gaming, and that OpenGL and SDL would be a good way to go. He also recommended that I start using versioning control with my programs (Git

  • or OpenGL? If I use DirectX, are there sufficient enough books on the subject for a BEGINNER? Same question for OpenGL. I love programming books, I just feel that most of them are way below par - they either have good general explanations and terrible source code, or vice versa. Thanks for your time. I'll do my best to turn this into an active discussion on the matter. ...EDIT: There aren't sufficient enough tutorials for what I'm trying to do in Python, so I'm going to take my time and slowly build up my skills in C++. It's hard to trip and not land in a book of C

  • be categorized as premature optimization, but I don't want to eventually find out that it is horribly slow. Any design suggestions / comments would be welcome. Does it seem like a good idea to try...My game will be made up of objects. Essentially the level editor will give me a bunch of objects to choose from and I can drag them in and thus a level is made. The objects could have animation... Unanimated movable objects So first I was thinking of, when the level starts, build a quadtree for all non-movable objects. This way I can easily avoid rendering ones which are not in view. Then from

  • Choosing an Audio API Aidan Knight

    , features seem generic. Pretty much coming down to a battle between this and BASS. I have heard good and bad things. The other libraries such as OpenAL, PortAudio, Audiere, etc. were considered, but I am looking for something more drag and drop rather than something I am going to be basically writing my own front-end for. I am curious though, I remember stumbling across another library or two a few months ago on par with FMOD/BASS/etc that were fairly new. No matter how hard I look now, I cannot find them. Hopefully someone here knows what they might be called. Anyways, basically just looking

  • Camera rotation (OpenGL) paintstripper

    I am having trouble with a camera class I am trying to use in my program. When I change the camera_target of the gluLookAt call, my whole terrain is rotating instead of just the camera rotating like it should. Here is some code from my render method: camera->Place(); ofSetColor(255, 255, 255, 255); //draw axis lines //x-axis glBegin(GL_LINES); glColor3f(1.0f,0.0f,0.0f); glVertex3f..._target.y, cam_target.z, cam_up.x, cam_up.y, cam_up.z); } The problem is that the whole terrain is moving around the camera, whereas the camera should just be rotating. Thanks for any help!