Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Assembly 80x86 - Zeroing a register

Options
  • 26-01-2007 3:30pm
    #1
    Registered Users Posts: 695 ✭✭✭


    Kinda trivial, but at the same time has me wondering.

    Which instruction is faster to zero out a register?

    xor AX,AX
    mov AX,0

    I could check which one is faster on this machine, but it wouldnt tell me in general which instruction takes less cycles (i imagine both are very low anyway).

    I see both methods being used all the time, so just makes me wonder, are they both the same really, and just some people prefare one over the other?


Comments

  • Registered Users Posts: 441 ✭✭robfitz


    The xor will be faster. It generates less code taking up less space in memory and caches, the mov needs to encode the zero in the code so you get something like this:

    xor ax, ax # 31 c0
    mov ax, 0 # b8 00 00 00 00


  • Closed Accounts Posts: 1,567 ✭✭✭Martyr


    robfitz wrote:
    The xor will be faster. It generates less code taking up less space in memory and caches, the mov needs to encode the zero in the code so you get something like this:

    xor ax, ax # 31 c0
    mov ax, 0 # b8 00 00 00 00

    it may interest you to know that on 32 and 64-bit cpus, bigger code usually runs faster than smaller.

    there are good examples here

    for example, in zeroing a register, you could have
    small but slow:
          push 0
          pop eax
    
    small and fast on pre-p4 systems.
          xor eax,eax
    
    small and fast on all systems.
          sub eax,eax
    
    big, but faster than push/pop
          mov eax,0
    
    big but fast.
          and eax,0
    

    and i've seen some asm programmers use the loop instruction which is small, but terribly slow on post-pentium processors, as are similar conveniant instructions like lodsb/scasb (for example in strlen()) movsb (in memcpy() or strcpy() functions)

    for loops, i've seen:
    small but slower

    label:
    ; body of loop
    loop label

    most compilers optimising pre-p4 will use something like
    big but faster
    label:
          ; execute body
          dec ecx
          jnz label
    

    this is because the pentiums > execute more than one instructions at once..code that "pairs" is much faster than some smaller code..but not on 16-bit processors obviously.

    and (imho) p4's don't work well with pentium optimised code same way as amd64 does.

    they (intel) recommend replacing DEC with SUB in loops for better performance.they also don't like using LEA which is very useful in optimisation.

    good source is the mark larson tutorial above

    like you could have:
         mov eax, 12345678h                    ;5 bytes
         add eax, ebp                          ;2 bytes
         imul ecx, 4                          ;3 bytes
         add eax, ecx                          ;2 bytes
    

    and optimise it into:
         lea eax, [ebp+ecx*4+12345678h]        ;7 bytes
    

    the xor as rob says (or sub) would be best on 16-bit


Advertisement