The GOOD Outcome
From the last post we were able to integrate some C++ modules into our Python code. Now, we are interesting in sending some data from Python over to the C++ object to be processed.
I’ve set up some C++ to accept a Python List, and have it represented in the form of an STL Vector. This is shown in the file: vector_func.cpp below.
//filename: vector_func.cpp
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/stl_bind.h>
namespace py = pybind11;
PYBIND11_MAKE_OPAQUE(std::vector<long long>);
std::vector<long long> apply_calc(std::vector<long long> &pylist) {
for(unsigned long long i=0; i<pylist.size(); i++) {
pylist[i] = (pylist[i]*pylist[i])-3;
}
return pylist;
}
PYBIND11_MODULE(vector_func, m) {
m.doc() = "x**2 -3 to each element in list"; // optional module docstring
py::bind_vector< std::vector<long long> >(m, "CVector");
m.def("apply_calc", &apply_calc, "applies x**2-3 on each element");
}
We’ve created a C++ function named apply_calc that takes in the address of a Vector and returns a Vector. We make an Opaque Type for the vector so that in our Python code, we can cast the list to a Vector Object and then pass the Vector into the apply calc function.
In this function we apply a simple transformation on the integer/number input: x = x²-3
We build the above C++ file:
:~$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` vector_func.cpp -o vector_func`python3-config --extension-suffix`
Now, in the same directory (as the .so file produced by compilation above), we’ll create a Python file to use the C++ module created. We can call this test.py:
from vector_func import CVector, apply_calc
import functools
import time
from random import shuffle
def time_this(func):
@functools.wraps(func) # preserves metadata of original func
def wrapper_timer(*args, **kwargs):
start_time = time.perf_counter()
value = func(*args, **kwargs)
end_time = time.perf_counter()
func_run_time = end_time - start_time
print(f"Finished {func.__name__!r} in {func_run_time:.5f} secs")
return (value, func_run_time)
return wrapper_timer
@time_this
def vector_init(p_list):
return CVector(p_list)
@time_this
def py_listcomp_init(p_list):
return p_list
@time_this
def calc_vector(c_obj):
return apply_calc(c_obj)
@time_this
def calc_listcomp(p_list):
return [(x**2)-3 for x in p_list]
sizes = [1_000, 10_000, 100_000, 1_000_000, 10_000_000]
for test_case in sizes:
print("TEST CASE: ", test_case)
int_list = [x for x in range(test_case)]
shuffle(int_list)
CVec = CVector()
print("===========INIT============")
cpp_vector, v_init_time = vector_init(int_list)
listcomp, l_init_time = py_listcomp_init(int_list)
print("===========================")
print("===========CALC============")
cpp_vector, v_calc_time = calc_vector(cpp_vector)
listcomp, l_calc_time = calc_listcomp(int_list)
print("CPP METHOD: ", sum(cpp_vector))
print("LIST COMP : ", sum(listcomp))
print("===========================")
print("===========TIME============")
v_total = v_init_time+v_calc_time
l_total = l_init_time+l_calc_time
print("CPP METHOD: ", v_total)
print("LIST COMP : ", l_total)
print("CPP FASTER BY: ", l_total/v_total)
print("===========================\n\n\n")
Results
List initialization step is common to both vector initialization and list initialization, therefore list initialization is not taken. Running the above code and tabulating the output gives us:
| 1,000 | 10,000 | 100,000 | 1,000,000 | 10,000,000 | |
|---|---|---|---|---|---|
| Vector Initialization | 0.00005 | 0.00038 | 0.00696 | 0.12626 | 1.51522 |
| Vector Calculation | 0.00001 | 0.00008 | 0.00065 | 0.00648 | 0.06822 |
| List Initialization | - | - | - | - | - |
| List Comp Calculation | 0.00053 | 0.00385 | 0.04416 | 0.51254 | 5.64929 |
From the table above we can condense the information to make a more informative observation:
| 1,000 | 10,000 | 100,000 | 1,000,000 | 10,000,000 | |
|---|---|---|---|---|---|
| C++ Vector Time | 0.00006 | 0.00046 | 0.00761 | 0.13274 | 1.58344 |
| List Comprehension Time | 0.00053 | 0.00385 | 0.04416 | 0.51254 | 5.64929 |
| C++ faster than List by: | 8.83x | 8.34x | 5.80x | 3.86x | 3.57 |
Conclusion of GOOD Outcome:
By using C++ to process our list and applying x = x²-3 on each integer in the list, we can see performance increases from 3.5x to 8.8x faster than Python! So far this data is collected for list sizes up to 10,000,000 elements.
The BAD Outcome
Maybe you start thinking… If C++ STL containers offer such a performance boost (as seen above), maybe we can just bring across those containers in Python and use them instead of Lists and then we’ll achieve some level of performance increase as we’re directly leveraging C++. I certainly thought that maybe using C++ containers would offer some sort of improvement compared to the super-general (all-purposed) List Data-Structure in Python. And well… I was wrong.
As Far as Vectors are Concerned
The C++ module would simply be an opaque pass through to access the vector class:
//filename: vector_opaque_list.cpp
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/stl_bind.h>
namespace py = pybind11;
PYBIND11_MAKE_OPAQUE(std::vector<long long>);
PYBIND11_MODULE(vector_opaque_list, m) {
py::bind_vector< std::vector<long long> >(m, "VectorLL");
}
Compilation:
:~$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` vector_opaque_list.cpp -o vector_opaque_list`python3-config --extension-suffix`
Python Script:
from vector_opaque_list import VectorLL
import functools
import time
from random import shuffle
def time_this(func):
@functools.wraps(func) # preserves metadata of original func
def wrapper_timer(*args, **kwargs):
start_time = time.perf_counter()
value = func(*args, **kwargs)
end_time = time.perf_counter()
func_run_time = end_time - start_time
print(f"Finished {func.__name__!r} in {func_run_time:.5f} secs")
return value
return wrapper_timer
@time_this
def vector_init(size):
vll = VectorLL()
for _ in range(size): vll.append(0)
return vll
@time_this
def py_listcomp_init(size):
l = [0 for _ in range(size)]
return l
@time_this
def py_list_init(size):
l = [0]*size
return l
@time_this
def mod_vector(int_list, cpp_vector):
for i,val in enumerate(int_list): cpp_vector[i]=val
@time_this
def mod_listcomp(int_list, listcomp):
for i,val in enumerate(int_list): listcomp[i]=val
@time_this
def mod_listmath(int_list, listmath):
for i,val in enumerate(int_list): listmath[i]=val
@time_this
def calc_vector(cpp_vector):
i=0
while i < len(cpp_vector):
cpp_vector[i]=cpp_vector[i]*cpp_vector[i]-3
i+=1
@time_this
def calc_listcomp(listcomp):
return [(x**2)-3 for x in listcomp]
sizes = [1_000, 10_000, 100_000, 1_000_000, 10_000_000]
for test_case in sizes:
int_list = [x for x in range(test_case)]
shuffle(int_list)
print(test_case)
print("===========INIT=============")
cpp_vector = vector_init(test_case)
listcomp = py_listcomp_init(test_case)
listmath = py_list_init(test_case)
print("============================")
print("============MOD=============")
mod_vector(int_list, cpp_vector)
mod_listcomp(int_list, listcomp)
mod_listmath(int_list, listmath)
print("===========================")
print("===========CALC============")
x = calc_vector(cpp_vector)
listcomp = calc_listcomp(listcomp)
print(sum(cpp_vector))
print(sum(listcomp))
print("===========================\n\n")
With this we can see based on the calculation, that the function: x = x²-3 was performed in Python on the VectorLL Object (derived from C++’s STL Vector).
The Results
| 1,000 | 10,000 | 100,000 | 1,000,000 | 10,000,00 | |
|---|---|---|---|---|---|
| Vector Initialization | 0.00040 | 0.00431 | 0.04268 | 0.42754 | 4.19480 |
| Vector Modification | 0.00051 | 0.00598 | 0.07510 | 0.83904 | 7.90630 |
| Vector Calculation | 0.00250 | 0.02805 | 0.30033 | 2.94035 | 28.16864 |
| List Initialization | 0.00001 | 0.00006 | 0.00010 | 0.00564 | 0.05296 |
| List Modification | 0.00010 | 0.00105 | 0.01651 | 0.27302 | 2.47224 |
| List Comp Calculation | 0.00039 | 0.00445 | 0.05608 | 0.59009 | 5.83378 |
Observations
Building a list like:
example_list = [0]*1000
vs
example_list = [0 for _ in range(1000)]
The first example using the multiplication is consistently around 7x to 10x faster than using range to build a list.
Conclusion for BAD Outcome:
You will not get any performance benefit by just leveraging a C++ class in Python. Using C++ classes in Python seem to come with significant losses: Lists are approximately 6x faster than just leveraging the C++ STL Vector class in Python. Lists are 40x faster when it comes to initialization when compared to what it takes to initialize a Vector through Python.
Overall Conclusion
Using C++ in Python can offer significant performance improvements (up to 8x faster in our test cases) when done intelligently, otherwise it may come at a grave cost. Python performs handling data structures well enough. What C++ brings to the table is compute performance (and not its STL/Classes). When it comes to transforming or manipulating data, this may be much quicker performed in C++. You would need to take into account the amount of time it takes to initialize a C++ object with a Python object as this is a bottleneck source.
If there are a lot of operations to perform on a dataset (for example image processing/manipulation), then it’s clear that leveraging C++ directly via pybind11 (or indirectly like using numpy) is the clear victor as far as these benchmarks prove.