[C/C++ Library] Profiling Nuttx uClibc++ module

Background:
In the recent community meeting, as Daniel elaborated, when building for NuttX OS, the NuttX build flags include the ‘-nostdinc++’ option, which is used to prevent the utilization of the toolchain’s standard C++ library. Consequently, all C++ code is constructed using the NuttX add-on uClibc++ module.

Upon conducting a performance analysis for the library and comparing it with the standard library and the G++ compiler, I observed a significant performance disparity.
I might be overlooking certain aspects in my approach.

The following are the findings for your consideration:

1. A simple code for handling a large dataset of integers.

#include <iostream>
#include <vector>

int main() {

    // Test using uClibc++ and stdinc++
    
    // In this section, we are conducting a performance test using two separate
    // instances of std::vector to assess the behavior of the uClibc++ library
    // and the standard C++ library (stdinc++) when handling a large dataset of
    // integers. Each vector is filled with one million integers in a loop.

    {
        std::vector<int> vec;

        for (int i = 0; i < 1000000; ++i) {
            vec.push_back(i);
        }
    }

    {
        std::vector<int> vec;

        for (int i = 0; i < 1000000; ++i) {
            vec.push_back(i);
        }
    }

    return 0;
}

g++ with stdinc++

*1. calling g++ with stdinc++
g++ ../perfomance.cpp -o performance_gcc

2. profiling the binary
perf stat ./performance_gcc

 Performance counter stats for './performance_gcc':

             24,24 msec task-clock                       #    0,947 CPUs utilized             
                10      context-switches                 #  412,507 /sec                      
                 0      cpu-migrations                   #    0,000 /sec                      
             4.101      page-faults                      #  169,169 K/sec                     
   <not supported>      cycles                                                                
   <not supported>      instructions                                                          
   <not supported>      branches                                                              
   <not supported>      branch-misses                                                         

       0,025588812 seconds time elapsed

       0,020693000 seconds user

g++ with uClibc++ library

*1. calling g++ with uClibc+±0.2.5
g++ ../perfomance.cpp -o performance -I ../usr/include/ -L ../usr/lib/ -luClibc++ -nostdinc++

 Performance counter stats for './performance':

         24.717,36 msec task-clock                       #    0,998 CPUs utilized             
               665      context-switches                 #   26,904 /sec                      
                 1      cpu-migrations                   #    0,040 /sec                      
        10.340.689      page-faults                      #  418,357 K/sec                     
   <not supported>      cycles                                                                
   <not supported>      instructions                                                          
   <not supported>      branches                                                              
   <not supported>      branch-misses                                                         

      24,778615819 seconds time elapsed

      15,193766000 seconds user
       9,519179000 seconds sys

A more detailed analysis could be provided, but the noticeable difference in performance, as indicated by the context switches for the same code using the same compiler but different libraries, is a strong indicator of the performance gap.

Interesting, what’s your goal with this?

Hello Julian,

I’m initiating this discussion in preparation for the upcoming community Q&A call tomorrow, as a follow-up to our last meeting.
I’m relatively new to the PX4 community and I’m enthusiastic about diving deeper into the PX4 ecosystem, particularly PX4 development. During the previous community Q&A call, I brought to light some observations within the PX4 codebase. Specifically, I noted the absence of C++ algorithms and the application of well-established C++ idioms such as RAII. My intention with this analysis is to address the question: What are the drawbacks of not utilizing the standard library, and does it make sense to transition the PX4 codebase to C++20, taking advantage of the new features like concepts for fine-tuning containers and std::span for efficient non-owning access to objects

We have some of that but it’s usually bespoke, custom made, e.g. PX4-Autopilot/src/include/containers/LockGuard.hpp at 7ac50a20b0b3c4ad539afd5f2c539233b7383240 · PX4/PX4-Autopilot · GitHub

That would certainly be nice, but will require compiler support. The latest arm-none-eabi toolchain is currently at 10.3 from what I see in Downloads | GNU Arm Embedded Toolchain Downloads – Arm Developer.

According to C++ compiler support - cppreference.com, some features might indeed be available there, e.g. span.