ARM Assembly Language
ARM (Advanced RISC Machine) is a family of reduced instruction set computing (RISC) architectures widely used in mobile devices, embedded systems, and increasingly in servers and desktop computers. ARM assembly language has a different syntax and instruction set compared to x86.
ARM Architecture Overview
Key features of ARM architecture:
- RISC design: Simple, fixed-length instructions
- Load-store architecture: Operations only on registers
- Conditional execution: Many instructions can be conditionally executed
- Multiple instruction sets: ARM (32-bit), Thumb (16-bit), AArch64 (64-bit)
- Large register file: 16 general-purpose registers (32-bit ARM)
ARM Registers
ARM processors have 16 general-purpose 32-bit registers (r0-r15):
Register | Alias | Purpose |
---|---|---|
r0-r3 | - | Argument/scratch registers |
r4-r8 | - | Callee-saved registers |
r9 | sb | Static base (platform dependent) |
r10 | sl | Stack limit |
r11 | fp | Frame pointer |
r12 | ip | Intra-procedure call scratch |
r13 | sp | Stack pointer |
r14 | lr | Link register (return address) |
r15 | pc | Program counter |
Basic ARM Instructions
ARM instructions follow a regular format:
opcode{cond}{S} Rd, Rn, Operand2
Where:
cond
: Optional condition codeS
: Optional flag setRd
: Destination registerRn
: First operand registerOperand2
: Flexible second operand
Data Processing Instructions
mov r0, #42 ; r0 = 42
add r1, r2, r3 ; r1 = r2 + r3
sub r4, r5, #10 ; r4 = r5 - 10
and r6, r7, r8 ; r6 = r7 & r8
orr r9, r10, r11 ; r9 = r10 | r11
Comparison Instructions
cmp r0, r1 ; Compare r0 and r1 (set flags)
cmn r2, #5 ; Compare r2 with -5
tst r3, #0xFF ; Test bits (r3 & 0xFF)
teq r4, r5 ; Test equality (r4 ^ r5)
Conditional Execution
ARM allows conditional execution of most instructions:
cmp r0, #10 ; Compare r0 with 10
movgt r1, #20 ; If greater than, r1 = 20
addle r1, r1, #5 ; If less or equal, r1 += 5
Load and Store Instructions
ARM uses a load-store architecture:
ldr r0, [r1] ; Load word from address in r1 to r0
str r2, [r3] ; Store word from r2 to address in r3
ldrb r4, [r5] ; Load byte (zero-extended)
ldrh r6, [r7, #4] ; Load halfword from r7+4
Branch Instructions
Control flow in ARM:
b label ; Unconditional branch
beq label ; Branch if equal
bl func ; Branch with link (call function)
bx lr ; Return from function
Stack Operations
ARM doesn't have explicit push/pop instructions, but they can be synthesized:
; Push r0 and r1
str r0, [sp, #-4]! ; Pre-indexed store, decrement sp
str r1, [sp, #-4]!
; Pop r0 and r1
ldr r1, [sp], #4 ; Post-indexed load, increment sp
ldr r0, [sp], #4
Function Calls
ARM function call convention:
- Arguments passed in r0-r3
- Additional arguments on stack
- Return value in r0 (or r0-r1 for 64-bit)
- Caller saves r0-r3, r12
- Callee saves r4-r11, sp, lr
Function Example
; Function to add two numbers
add_numbers:
add r0, r0, r1 ; r0 = r0 + r1
bx lr ; Return
; Call the function
mov r0, #10 ; First argument
mov r1, #20 ; Second argument
bl add_numbers ; Call function
; r0 now contains 30
ARM vs. Thumb
ARM processors support two instruction sets:
Feature | ARM | Thumb |
---|---|---|
Instruction Size | 32-bit | 16-bit (mostly) |
Performance | Higher | Lower |
Code Density | Lower | Higher |
Register Access | All 16 | Limited to r0-r7 |
AArch64 (ARM64)
The 64-bit ARM architecture introduces changes:
- 31 general-purpose 64-bit registers (x0-x30)
- Dedicated zero register (xzr/wzr)
- New instruction encoding
- More registers for parameter passing
- No conditional execution (except branches)
AArch64 Example
// Add two numbers in AArch64
add_numbers:
add w0, w0, w1 // 32-bit addition
ret
// Call the function
mov w0, #10 // First argument
mov w1, #20 // Second argument
bl add_numbers // Call function
// w0 now contains 30
System Calls
Making system calls in ARM Linux:
// ARM 32-bit system call example
mov r7, #4 // sys_write
mov r0, #1 // stdout
ldr r1, =message // message pointer
mov r2, #len // message length
swi 0 // software interrupt
// ARM 64-bit system call example
mov x8, #64 // sys_write
mov x0, #1 // stdout
ldr x1, =message // message pointer
mov x2, #len // message length
svc 0 // supervisor call
SIMD (NEON) Instructions
ARM's SIMD extension for parallel processing:
// Add four floats in parallel
vld1.32 {d0-d1}, [r0]! // Load 4 floats
vld1.32 {d2-d3}, [r1]! // Load 4 floats
vadd.f32 q0, q0, q1 // Add vectors
vst1.32 {d0-d1}, [r2]! // Store result
Common ARM Assemblers
- GNU Assembler (as): Used with GCC
- ARM Compiler (armasm): ARM's proprietary assembler
- LLVM Integrated Assembler: Used with Clang
Cross-Platform Considerations
When writing ARM assembly:
- Be aware of endianness (ARM can be little or big endian)
- Consider alignment requirements
- Account for different ABIs (Application Binary Interfaces)
- Be mindful of differences between ARM versions
Next Steps
To continue learning ARM assembly:
- Experiment with ARM emulators (QEMU)
- Study compiler output for ARM targets
- Explore ARM documentation and reference manuals
- Practice on real ARM hardware (Raspberry Pi, ARM-based phones)