BeagleBone Black GPIO Performance: PRU -> 2 memory mapped different GPIOs

In this test we change two different GPIO-banks (0 and 2) from the PRU in order to estimate the maximum performance and check the coherence of the signals.
Some interesting details:

clock cycles per operation
Most operations, such as ADD,SUB,QBxx,MOV,JMP etc.: 1 cycle

LBBO 1,2,4 Bytes from PRU DRAM: 3 cycles
LBBO 8 Bytes from PRU DRAM: 4 cycles
LBBO 12 Bytes from PRU DRAM: 5 cycles
LBBO 16 Bytes from PRU DRAM: 6 cycles

LBCO 4 Bytes from DDR: 43 cycles
LBCO 8 Bytes from DDR: 44 cycles
LBCO 12 Bytes from DDR: 45 cycles
LBCO 16 Bytes from DDR: 46 cycles

GPIO performance
// PRU GPIO Write Timing Details
// The actual write instruction to a GPIO pin using SBBO takes two
// PRU cycles (10 nS). However, the GPIO logic can only update every
// 40 nS (8 PRU cycles). This meas back-to-back writes to GPIO pins
// will eventually stall the PRU, or you can execute 6 PRU instructions
// for ‘free’ when burst writing to the GPIO.

Source Code

.origin 0  
.entrypoint TOP  
 
 #define GPIO0 0x44e07000
 #define GPIO1 0x481ac000
 #define GPIO2 0x481ac000
 #define GPIO3 0x481ae000
 #define GPIO_SETDATAOUT 0x194
 #define GPIO_CLEARDATAOUT 0x190
 
 #define DELAY 25 //number regarding to the titles
 #define GPIO73 1<<9  //0x200
 #define GPIO26 1<<26
   
   
 TOP:  
 
  LBCO r0, c4, 4, 4 //load SYSCFG register to r0 (use c4 const addr)
  CLR r0, r0, 4 //clear bit 4 (standby init)
  SBCO r0, c4, 4, 4 //store the modified r0 back at the load address
  
  //memory assignments
  mov r1, GPIO0 | GPIO_SETDATAOUT //load addr for gpio, set data r1
  mov r2, GPIO0 | GPIO_CLEARDATAOUT //load addr for gpio to clear data
  mov r3, GPIO26 //write 1, 9th bit GPIO73
  
  mov r4, GPIO2 | GPIO_SETDATAOUT 
  mov r5, GPIO2 | GPIO_CLEARDATAOUT
  mov r6, GPIO73
  
  
  
LEDON:  
  
  sbbo r3, r1, 0, 4
  sbbo r6, r4, 0, 4
  
  mov r0, DELAY //store the length of the delay in REG0
  
DELAYON:
    sub r0, r0, 1 //Decerement REG0 by 1
    qbne DELAYON, r0, 0 //loop to delay DELAYON, unless REG0=0

LEDOFF:
  
    sbbo r3, r2, 0, 4
    sbbo r6, r5, 0, 4
    mov r0, DELAY
DELAYOFF:
    sub r0, r0, 1 //decrement REG0 by 1
    qbne DELAYOFF, r0, 0 //loop to delayoff unless reg0=0
    
    jmp LEDON


GPIO frequency xx, delay in program 1000, 0,05 µs TIME/DIV

GPIO frequency xx, delay in program 100, 0,1µs and 0,5 µs TIME/DIV

GPIO frequency xx, delay in program 50, 0,1µs and 0,5µs TIME/DIV

GPIO frequency xx, delay in program 30, 0,1µs TIME/DIV

GPIO frequency xx, delay in program 25, 0,1µs TIME/DIV

GPIO frequency xx, delay in program 20, 0,1µs TIME/DIV

GPIO frequency xx, delay in program 15, 0,1µs TIME/DIV

GPIO frequency xx, delay in program 10, 0,1µs TIME/DIV