CodeToLive

ARM Assembly Language

ARM (Advanced RISC Machine) is a family of reduced instruction set computing (RISC) architectures widely used in mobile devices, embedded systems, and increasingly in servers and desktop computers. ARM assembly language has a different syntax and instruction set compared to x86.

ARM Architecture Overview

Key features of ARM architecture:

ARM Registers

ARM processors have 16 general-purpose 32-bit registers (r0-r15):

Register Alias Purpose
r0-r3 - Argument/scratch registers
r4-r8 - Callee-saved registers
r9 sb Static base (platform dependent)
r10 sl Stack limit
r11 fp Frame pointer
r12 ip Intra-procedure call scratch
r13 sp Stack pointer
r14 lr Link register (return address)
r15 pc Program counter

Basic ARM Instructions

ARM instructions follow a regular format:


opcode{cond}{S} Rd, Rn, Operand2
      

Where:

Data Processing Instructions


mov r0, #42        ; r0 = 42
add r1, r2, r3     ; r1 = r2 + r3
sub r4, r5, #10    ; r4 = r5 - 10
and r6, r7, r8     ; r6 = r7 & r8
orr r9, r10, r11   ; r9 = r10 | r11
      

Comparison Instructions


cmp r0, r1         ; Compare r0 and r1 (set flags)
cmn r2, #5         ; Compare r2 with -5
tst r3, #0xFF      ; Test bits (r3 & 0xFF)
teq r4, r5         ; Test equality (r4 ^ r5)
      

Conditional Execution

ARM allows conditional execution of most instructions:


cmp r0, #10        ; Compare r0 with 10
movgt r1, #20      ; If greater than, r1 = 20
addle r1, r1, #5   ; If less or equal, r1 += 5
      

Load and Store Instructions

ARM uses a load-store architecture:


ldr r0, [r1]       ; Load word from address in r1 to r0
str r2, [r3]       ; Store word from r2 to address in r3
ldrb r4, [r5]      ; Load byte (zero-extended)
ldrh r6, [r7, #4]  ; Load halfword from r7+4
      

Branch Instructions

Control flow in ARM:


b label       ; Unconditional branch
beq label     ; Branch if equal
bl func       ; Branch with link (call function)
bx lr         ; Return from function
      

Stack Operations

ARM doesn't have explicit push/pop instructions, but they can be synthesized:


; Push r0 and r1
str r0, [sp, #-4]!  ; Pre-indexed store, decrement sp
str r1, [sp, #-4]!

; Pop r0 and r1
ldr r1, [sp], #4    ; Post-indexed load, increment sp
ldr r0, [sp], #4
      

Function Calls

ARM function call convention:

Function Example


; Function to add two numbers
add_numbers:
    add r0, r0, r1    ; r0 = r0 + r1
    bx lr             ; Return

; Call the function
mov r0, #10          ; First argument
mov r1, #20          ; Second argument
bl add_numbers       ; Call function
; r0 now contains 30
      

ARM vs. Thumb

ARM processors support two instruction sets:

Feature ARM Thumb
Instruction Size 32-bit 16-bit (mostly)
Performance Higher Lower
Code Density Lower Higher
Register Access All 16 Limited to r0-r7

AArch64 (ARM64)

The 64-bit ARM architecture introduces changes:

AArch64 Example


// Add two numbers in AArch64
add_numbers:
    add w0, w0, w1    // 32-bit addition
    ret

// Call the function
mov w0, #10          // First argument
mov w1, #20          // Second argument
bl add_numbers       // Call function
// w0 now contains 30
      

System Calls

Making system calls in ARM Linux:


// ARM 32-bit system call example
mov r7, #4           // sys_write
mov r0, #1           // stdout
ldr r1, =message     // message pointer
mov r2, #len         // message length
swi 0                // software interrupt

// ARM 64-bit system call example
mov x8, #64          // sys_write
mov x0, #1           // stdout
ldr x1, =message     // message pointer
mov x2, #len         // message length
svc 0                // supervisor call
      

SIMD (NEON) Instructions

ARM's SIMD extension for parallel processing:


// Add four floats in parallel
vld1.32 {d0-d1}, [r0]!  // Load 4 floats
vld1.32 {d2-d3}, [r1]!  // Load 4 floats
vadd.f32 q0, q0, q1     // Add vectors
vst1.32 {d0-d1}, [r2]!  // Store result
      

Common ARM Assemblers

Cross-Platform Considerations

When writing ARM assembly:

Next Steps

To continue learning ARM assembly: