Pixhawk 4 external SPI: DMA sometimes not working


I am trying to get large amounts of data into the Pixhawk 4 via the external SPI bus. As a test I am calling spi::transfer() in a 500Hz work queue to exchange 256B of data, which is working well.
Since this uses about 14% CPU time (proportional to the amount of data, will go up for the final application) and the transfers are not continuous (as expected from a multi tasking OS) I decided to enable the DMA for SPI5. I got it running but the behaviour is kinda weird.
Sometimes it works beautifully, with the transfer being continuous and only about 1.5% CPU usage (independent of the amount of data). But other times it behaves the same way as it did with DMA disabled (non continuous transfer, high CPU usage).

Does anyone have an idea why it sometimes does not work as intended?
Thanks in advance for any help!

To clarify: the behaviour does not change between transfers while the module is running but only when it is restarted.
I think I narrowed the cause down to this check: https://github.com/PX4/NuttX/blob/e3665c1fb4486d641d366ea8c07fdc47f8bbf7d7/arch/arm/src/stm32f7/stm32_spi.c#L1681 (I am not using the newest version of PX4 and NuttX but the SPI DMA implementation did not change much) and I suspect that one of the stm32_dmacapable() calls is at fault.
I would like to use the dmainfo() calls in there to debug this. What do I need to do to get access to the output of those other than enabling CONFIG_DEBUG_DMA_INFO and its dependencies?

Well I figured it out eventually: the data arrays to be sent/written to need to be 32 byte aligned. If that is the case, the SPI DMA works as intended.