The GOOD Outcome

From the last post we were able to integrate some C++ modules into our Python code. Now, we are interesting in sending some data from Python over to the C++ object to be processed.

I’ve set up some C++ to accept a Python List, and have it represented in the form of an STL Vector. This is shown in the file: vector_func.cpp below.

//filename: vector_func.cpp
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/stl_bind.h>

namespace py = pybind11;

PYBIND11_MAKE_OPAQUE(std::vector<long long>);

std::vector<long long> apply_calc(std::vector<long long> &pylist) {
    for(unsigned long long i=0; i<pylist.size(); i++) {
        pylist[i] = (pylist[i]*pylist[i])-3;
    }
    return pylist;
}

PYBIND11_MODULE(vector_func, m) {
    m.doc() = "x**2 -3 to each element in list"; // optional module docstring
    py::bind_vector< std::vector<long long> >(m, "CVector");
    m.def("apply_calc", &apply_calc, "applies x**2-3 on each element");
}

We’ve created a C++ function named apply_calc that takes in the address of a Vector and returns a Vector. We make an Opaque Type for the vector so that in our Python code, we can cast the list to a Vector Object and then pass the Vector into the apply calc function.

In this function we apply a simple transformation on the integer/number input: x = x²-3

We build the above C++ file:

:~$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` vector_func.cpp -o vector_func`python3-config --extension-suffix`

Now, in the same directory (as the .so file produced by compilation above), we’ll create a Python file to use the C++ module created. We can call this test.py:

from vector_func import CVector, apply_calc
import functools
import time
from random import shuffle

def time_this(func):
    @functools.wraps(func)  # preserves metadata of original func
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        func_run_time = end_time - start_time
        print(f"Finished {func.__name__!r} in {func_run_time:.5f} secs")
        return (value, func_run_time)
    return wrapper_timer

@time_this
def vector_init(p_list):
    return CVector(p_list)

@time_this
def py_listcomp_init(p_list):
    return p_list

@time_this
def calc_vector(c_obj):
    return apply_calc(c_obj)

@time_this
def calc_listcomp(p_list):
    return [(x**2)-3 for x in p_list]


sizes = [1_000, 10_000, 100_000, 1_000_000, 10_000_000]
for test_case in sizes:
    print("TEST CASE: ", test_case)
    int_list = [x for x in range(test_case)]
    shuffle(int_list)

    CVec = CVector()

    print("===========INIT============")
    cpp_vector, v_init_time = vector_init(int_list)
    listcomp, l_init_time = py_listcomp_init(int_list)
    print("===========================")
    print("===========CALC============")
    cpp_vector, v_calc_time = calc_vector(cpp_vector)
    listcomp, l_calc_time = calc_listcomp(int_list)
    print("CPP METHOD: ", sum(cpp_vector))
    print("LIST COMP : ", sum(listcomp))
    print("===========================")
    print("===========TIME============")
    v_total = v_init_time+v_calc_time
    l_total = l_init_time+l_calc_time
    print("CPP METHOD: ", v_total)
    print("LIST COMP : ", l_total)
    print("CPP FASTER BY: ", l_total/v_total)
    print("===========================\n\n\n")

Results

List initialization step is common to both vector initialization and list initialization, therefore list initialization is not taken. Running the above code and tabulating the output gives us:

1,000 10,000 100,000 1,000,000 10,000,000
Vector Initialization 0.00005 0.00038 0.00696 0.12626 1.51522
Vector Calculation 0.00001 0.00008 0.00065 0.00648 0.06822
List Initialization - - - - -
List Comp Calculation 0.00053 0.00385 0.04416 0.51254 5.64929

From the table above we can condense the information to make a more informative observation:

1,000 10,000 100,000 1,000,000 10,000,000
C++ Vector Time 0.00006 0.00046 0.00761 0.13274 1.58344
List Comprehension Time 0.00053 0.00385 0.04416 0.51254 5.64929
C++ faster than List by: 8.83x 8.34x 5.80x 3.86x 3.57

Conclusion of GOOD Outcome:

By using C++ to process our list and applying x = x²-3 on each integer in the list, we can see performance increases from 3.5x to 8.8x faster than Python! So far this data is collected for list sizes up to 10,000,000 elements.


The BAD Outcome

Maybe you start thinking… If C++ STL containers offer such a performance boost (as seen above), maybe we can just bring across those containers in Python and use them instead of Lists and then we’ll achieve some level of performance increase as we’re directly leveraging C++. I certainly thought that maybe using C++ containers would offer some sort of improvement compared to the super-general (all-purposed) List Data-Structure in Python. And well… I was wrong.

As Far as Vectors are Concerned

The C++ module would simply be an opaque pass through to access the vector class:

//filename: vector_opaque_list.cpp
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/stl_bind.h>

namespace py = pybind11;

PYBIND11_MAKE_OPAQUE(std::vector<long long>);

PYBIND11_MODULE(vector_opaque_list, m) {
    py::bind_vector< std::vector<long long> >(m, "VectorLL");
}

Compilation:

:~$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` vector_opaque_list.cpp -o vector_opaque_list`python3-config --extension-suffix`

Python Script:

from vector_opaque_list import VectorLL
import functools
import time
from random import shuffle


def time_this(func):
    @functools.wraps(func)  # preserves metadata of original func
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()
        value = func(*args, **kwargs)
        end_time = time.perf_counter()
        func_run_time = end_time - start_time
        print(f"Finished {func.__name__!r} in {func_run_time:.5f} secs")
        return value
    return wrapper_timer


@time_this
def vector_init(size):
    vll = VectorLL()
    for _ in range(size): vll.append(0)
    return vll

@time_this
def py_listcomp_init(size):
    l = [0 for _ in range(size)]
    return l

@time_this
def py_list_init(size):
    l = [0]*size
    return l

@time_this
def mod_vector(int_list, cpp_vector):
    for i,val in enumerate(int_list): cpp_vector[i]=val


@time_this
def mod_listcomp(int_list, listcomp):
    for i,val in enumerate(int_list): listcomp[i]=val

@time_this
def mod_listmath(int_list, listmath):
    for i,val in enumerate(int_list): listmath[i]=val


@time_this
def calc_vector(cpp_vector):
    i=0
    while i < len(cpp_vector):
        cpp_vector[i]=cpp_vector[i]*cpp_vector[i]-3
        i+=1

@time_this
def calc_listcomp(listcomp):
    return [(x**2)-3 for x in listcomp]


sizes = [1_000, 10_000, 100_000, 1_000_000, 10_000_000]
for test_case in sizes:
    int_list = [x for x in range(test_case)]
    shuffle(int_list)
    print(test_case)
    print("===========INIT=============")
    cpp_vector = vector_init(test_case)
    listcomp = py_listcomp_init(test_case)
    listmath = py_list_init(test_case)
    print("============================")
    print("============MOD=============")
    mod_vector(int_list, cpp_vector)
    mod_listcomp(int_list, listcomp)
    mod_listmath(int_list, listmath)
    print("===========================")
    print("===========CALC============")
    x = calc_vector(cpp_vector)
    listcomp = calc_listcomp(listcomp)
    print(sum(cpp_vector))
    print(sum(listcomp))
    print("===========================\n\n")

With this we can see based on the calculation, that the function: x = x²-3 was performed in Python on the VectorLL Object (derived from C++’s STL Vector).

The Results

1,000 10,000 100,000 1,000,000 10,000,00
Vector Initialization 0.00040 0.00431 0.04268 0.42754 4.19480
Vector Modification 0.00051 0.00598 0.07510 0.83904 7.90630
Vector Calculation 0.00250 0.02805 0.30033 2.94035 28.16864
List Initialization 0.00001 0.00006 0.00010 0.00564 0.05296
List Modification 0.00010 0.00105 0.01651 0.27302 2.47224
List Comp Calculation 0.00039 0.00445 0.05608 0.59009 5.83378

Observations

Building a list like:

example_list = [0]*1000

vs

example_list = [0 for _ in range(1000)]

The first example using the multiplication is consistently around 7x to 10x faster than using range to build a list.

Conclusion for BAD Outcome:

You will not get any performance benefit by just leveraging a C++ class in Python. Using C++ classes in Python seem to come with significant losses: Lists are approximately 6x faster than just leveraging the C++ STL Vector class in Python. Lists are 40x faster when it comes to initialization when compared to what it takes to initialize a Vector through Python.

Overall Conclusion

Using C++ in Python can offer significant performance improvements (up to 8x faster in our test cases) when done intelligently, otherwise it may come at a grave cost. Python performs handling data structures well enough. What C++ brings to the table is compute performance (and not its STL/Classes). When it comes to transforming or manipulating data, this may be much quicker performed in C++. You would need to take into account the amount of time it takes to initialize a C++ object with a Python object as this is a bottleneck source.

If there are a lot of operations to perform on a dataset (for example image processing/manipulation), then it’s clear that leveraging C++ directly via pybind11 (or indirectly like using numpy) is the clear victor as far as these benchmarks prove.