Background:
In the recent community meeting, as Daniel elaborated, when building for NuttX OS, the NuttX build flags include the ‘-nostdinc++’ option, which is used to prevent the utilization of the toolchain’s standard C++ library. Consequently, all C++ code is constructed using the NuttX add-on uClibc++ module.
Upon conducting a performance analysis for the library and comparing it with the standard library and the G++ compiler, I observed a significant performance disparity.
I might be overlooking certain aspects in my approach.
The following are the findings for your consideration:
1. A simple code for handling a large dataset of integers.
#include <iostream>
#include <vector>
int main() {
// Test using uClibc++ and stdinc++
// In this section, we are conducting a performance test using two separate
// instances of std::vector to assess the behavior of the uClibc++ library
// and the standard C++ library (stdinc++) when handling a large dataset of
// integers. Each vector is filled with one million integers in a loop.
{
std::vector<int> vec;
for (int i = 0; i < 1000000; ++i) {
vec.push_back(i);
}
}
{
std::vector<int> vec;
for (int i = 0; i < 1000000; ++i) {
vec.push_back(i);
}
}
return 0;
}
g++ with stdinc++
*1. calling g++ with stdinc++
g++ ../perfomance.cpp -o performance_gcc
2. profiling the binary
perf stat ./performance_gcc
Performance counter stats for './performance_gcc':
24,24 msec task-clock # 0,947 CPUs utilized
10 context-switches # 412,507 /sec
0 cpu-migrations # 0,000 /sec
4.101 page-faults # 169,169 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0,025588812 seconds time elapsed
0,020693000 seconds user
g++ with uClibc++ library
*1. calling g++ with uClibc+±0.2.5
g++ ../perfomance.cpp -o performance -I ../usr/include/ -L ../usr/lib/ -luClibc++ -nostdinc++
Performance counter stats for './performance':
24.717,36 msec task-clock # 0,998 CPUs utilized
665 context-switches # 26,904 /sec
1 cpu-migrations # 0,040 /sec
10.340.689 page-faults # 418,357 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
24,778615819 seconds time elapsed
15,193766000 seconds user
9,519179000 seconds sys
A more detailed analysis could be provided, but the noticeable difference in performance, as indicated by the context switches for the same code using the same compiler but different libraries, is a strong indicator of the performance gap.