Pre-main rituals: How Zephyr prepares Cortex-M CPUs

9 mins read

Pre-main source files are something you hardly ever have to worry about. More often than not, you’ll be using pre-existing files that you don’t need to modify or even understand deeply. They’re frequently copied from one project to another without change. It’s rare that you need to dive into their contents, but when you do they can reveal a lot about how your system actually works.

In this post, we’ll cover Zephyr Project’s startup process on ARM Cortex-M CPUs, focusing on its early assembly code and vector table.

Setup

We’ll use the Arduino Nano 33 BLE board. The Zephyr source code referenced here is based on commit 55db18, from Monday, June 23rd. This post assumes some familiarity with embedded systems, and assembly programming.

First stop: the bootloader

Although it’s not at the heart of this post, it’s worth saying a few things about the bootloader. The Nano 33 comes with a BOSSA-compatible bootloader. BOSSA (Basic Open Source SAM-BA Application) is a flash programming utility initially developed for Atmel’s SAM family. It was forked and adapted by Arduino to support nRF52840 + USB CDC. In practice, the bootloader enables flashing a new application through USB CDC (Communications Device Class) by double-tapping the onboard button, and then calling bossac with the binary to upload.

On the Zephyr side, we can clearly see two config variables defined in our board’s default configuration file boards/arduino/nano_33_ble/arduino_nano_33_ble_defconfig:

CONFIG_BOOTLOADER_BOSSA=y
CONFIG_BOOTLOADER_BOSSA_LEGACY=y

They are mainly used in the Python runner scripts called by west (Zephyr’s meta-build tool, see my dedicated article.)

For this particular board, the bootloader is instrumental in uploading a new firmware to the board. But we’re more interested in user code and the process that leads into executing it.

In the beginning was the reset handler

Whenever an ARM Cortex-M CPU gets powered on, it automatically executes the reset handler function, which is located at the very top of the vector table, a table of pointers (function addresses) starting at a known location in memory, typically 0x0, unless overriden by VTOR (Vector Table Offset Register.)

Since we now deal directly with the CPU (in this case, ARM Cortex M4F), we’ll be looking for source code under arch/arm/core/cortex_m.

vector_table.S is a relatively small assembly file that can be broken down into the following:

SECTION_SUBSEC_FUNC(exc_vector_table,_vector_table_section,_vector_table)

Like every big project, Zephyr makes extensive use of macros. In particular, the one above expands into placing the data that follow inside the .vector_table section, so the CPU can find it at boot.

.word z_main_stack + CONFIG_MAIN_STACK_SIZE

This first instruction defines the first word in the vector table, which corresponds, according to the docs, to the stack pointer.

.word z_arm_reset

Next is the reset handler address. A very important piece, since as the name suggests, the CPU jumps here after reset. In this case, the reset handler is set to be the symbol z_arm_reset. We will focus on this in the next sections.

The next two lines have the same purpose. They define Zephyr handler functions for respectively the NMI (Non-Maskable Interrupt) and hard fault exceptions.

What follows is where the vector table diverges depending on the ARM architecture version. The only thing to notice is that ARMv6-M (used in Cortex M0, M0+ and M1) and ARMv8-M Baseline (used in Cortex M23, M33 and M35P) have fewer exceptions, so many entries in the vector table are left undefined (.word 0). For newer architecture versions, we can find implementations for the handlers corresponding to each of the possible exceptions (bus fault, secure fault, etc.)

Close lookup on the reset handler

Moving forward, let’s get a closer look at z_arm_reset. A quick grep and we find out that the symbol is defined inside arch/arm/core/cortex_m/reset.S. At the top you’ll see a helpful comment that makes understanding even easier.

SECTION_SUBSEC_FUNC(TEXT,_reset_section,z_arm_reset)
SECTION_SUBSEC_FUNC(TEXT,_reset_section,__start)

Just like the vector table, these two macros place the z_arm_reset and __start (an alias of the former) labels in a subsection of .text (code segment), named .text._reset_section, possibly for allowing finer-grained control during the linking step.

Miscellaneous initialization (specific to bootloader-less applications)

#if defined(CONFIG_INIT_ARCH_HW_AT_BOOT)
    /* Reset CONTROL register */
    movs.n r0, #0
    msr CONTROL, r0
    isb
#if defined(CONFIG_CPU_CORTEX_M_HAS_SPLIM)
    /* Clear SPLIM registers */
    movs.n r0, #0
    msr MSPLIM, r0
    msr PSPLIM, r0
#endif /* CONFIG_CPU_CORTEX_M_HAS_SPLIM */
#endif /* CONFIG_INIT_ARCH_HW_AT_BOOT */

At this stage, it’s handy to check whether a macro is defined in your build config. There are multiple ways to do so:

  • grep CONFIG_NAME build/zephyr/.config
  • grep CONFIG_NAME build/zephyr/include/generated/zephyr/autoconf.h

In my particular case, neither CONFIG_INIT_ARCH_HW_AT_BOOT nor CONFIG_CPU_CORTEX_M_HAS_SPLIM are defined. These are typically enabled when you boot without a bootloader. Otherwise, that piece of code has already done these initializations, so Zephyr’s reset handler skips it.

#if defined(CONFIG_PM_S2RAM)
ldr r0, =z_interrupt_stacks + CONFIG_ISR_STACK_SIZE + MPU_GUARD_ALIGN_AND_SIZE
msr msp, r0
bl arch_pm_s2ram_resume
#endif /* CONFIG_PM_S2RAM */

PM_S2RAM stands for suspend-to-RAM. When enabled, this configuration temporarily sets the reset handler to the interrupt stack for the duration of the resume logic (performed in arch_pm_s2ram_resume).

But again, if your application is using a bootloader, this section is also skipped.

Main stack setup

ldr r0, =z_main_stack + CONFIG_MAIN_STACK_SIZE
msr msp, r0

These two instructions set MSP (Main Stack Pointer) to z_main_stack + CONFIG_MAIN_SIZE. If you’ve been reading carefully, you might have noticed the same thing is also done in vector_table.S at offset 0:

.word z_main_stack + CONFIG_MAIN_STACK_SIZE

And you’re… right. However, although the two instructions do the same thing, they’re not redundant, and don’t serve the same purpose.

Both lines set MSP, but they have distinct purposes:

  • The vector table’s entry is used by the CPU at power-on-reset.
  • The msr msp instruction is used at runtime to ensure the value in MSP is correct, typically after bootloader handoff.

Post-kernel flag clear

#if defined(CONFIG_DEBUG_THREAD_INFO)
    /* Clear z_sys_post_kernel flag for RTOS aware debuggers */
    movs.n r0, #0
    ldr r1, =z_sys_post_kernel
    strb r0, [r1]
#endif /* CONFIG_DEBUG_THREAD_INFO */

The next part is another setup for debuggers, the comment is self-explanatory so no need to elaborate.

SoC Hook

#if defined(CONFIG_SOC_RESET_HOOK)
bl soc_reset_hook
#endif

It’s not always clear in which cases this configuration variable is used/defined. However, for Nordic MCUs (The Nano 33 embeds the nRF52840), you can find this line:

# soc/nordic/common/CMakeLists.txt
zephyr_linker_symbol(SYMBOL soc_reset_hook EXPR "@SystemInit@")

It’s a symbol aliasing trick which instructs the linker script generator to resolve soc_reset_hook (a weak symbol) to SystemInit when it hasn’t been defined. From there, we can guess it’s everything related to resetting peripherals, setting up clock trees, etc.

Disabling the MPU

#if defined(CONFIG_INIT_ARCH_HW_AT_BOOT)
#if defined(CONFIG_CPU_HAS_ARM_MPU)
    /* Disable MPU */
    movs.n r0, #0
    ldr r1, =_SCS_MPU_CTRL
    str r0, [r1]
    dsb
#endif /* CONFIG_CPU_HAS_ARM_MPU */

    /* Initialize core architecture registers and system blocks */
    bl z_arm_init_arch_hw_at_boot
#endif /* CONFIG_INIT_ARCH_HW_AT_BOOT */

In certain configurations, the reset handler clears MPU CONTROL register, disabling memory protection (temporarily).

Interrupt masking

#if defined(CONFIG_ARMV6_M_ARMV8_M_BASELINE)
    cpsid i
#elif defined(CONFIG_ARMV7_M_ARMV8_M_MAINLINE)
    movs.n r0, #_EXC_IRQ_DEFAULT_PRIO
    msr BASEPRI, r0
#else
#error Unknown ARM architecture
#endif

Depending on the Cortex-M variant this code masks interrupts. They’re re-enabled when the CPU switches to the main thread.

Watchdog setup

#ifdef CONFIG_WDOG_INIT
    /* board-specific watchdog initialization is necessary */
    bl z_arm_watchdog_init
#endif

Early watchdog setup is very rare. That’s why only a few platforms do define CONFIG_WDOG_INIT, and thus provide an implementation of z_arm_watchdog_init. As far as I can tell, it’s NXP’s KE1xF and S32K1 microcontrollers.

Stack painting

#ifdef CONFIG_INIT_STACKS
    ldr r0, =z_interrupt_stacks
    ldr r1, =0xaa
    ldr r2, =CONFIG_ISR_STACK_SIZE + MPU_GUARD_ALIGN_AND_SIZE
    bl z_early_memset
#endif

What follows is known as “stack painting”, where the OS fills the stack RAM segment with a known pattern. Zephyr uses the ad-hoc value 0xAA, FreeRTOS for instance uses 0xA5 instead. One use case of this mechanism is detecting stack overflows, or simply knowing stack usage of a given task (using the dedicated k_thread_stack_space_get method.)

Last step before jumping to C runtime

ldr r0, =z_interrupt_stacks
ldr r1, =CONFIG_ISR_STACK_SIZE + MPU_GUARD_ALIGN_AND_SIZE
adds r0, r0, r1
msr PSP, r0
mrs r0, CONTROL
movs r1, #2
orrs r0, r1 /* CONTROL_SPSEL_Msk */
msr CONTROL, r0

isb

Prior to calling C runtime code, ARM Cortex-M reset handler also takes care of setting and switching to PSP (Process Stack Pointer). In the instructions above, this is done by setting bit 1 of CONTROL register, telling the processor to use PSP instead of MSP.

The last instruction (isb: Instruction Synchronization Barrier) is a context synchronization event that ensures all previous instructions are completed before executing any further ones, and is typically used when changing CONTROL register bits.

After this sequence:

  • Thread mode uses the PSP pointing to the top of the interrupt stack.
  • MSP is still set to z_main_stack + CONFIG_MAIN_STACK_SIZE, and is reserved for interrupts and faults.

Handing off to C

bl z_prep_c

The final instruction in reset handler code is a plain jump to z_prep_c, which we’ll cover in the next article.

It’s worth noting that, interestingly, bl (branch with link) is chosen instead of b – which makes more sense since we don’t expect to return after the call to z_prep_c – because it has a larger jump range than b that can be limited on some smaller instruction sets.