Parallelization

There are two methods of parallelization implemented:

  1. shared memory (threading)
  2. distributed memory (multiprocessing)

Either of these methods can be used to split up each frame across the specified number of workers. To specify the number of workers, use the -w N flag where N is an integer greater than 0. By default, the application runs multi-thread. To use multi-processing, pass the -mp flag. As an example, to process a *.trak file using 4 workers and multi-processing use:

tracker -w 4 -mp path/to/video1.trak

Warning

Multiprocessing on Windows does not work due to issues between python’s multiprocessing library and OpenCV.

The basic parrallelization technique is:

  1. The frame splitting routine tries to split the frame into equal “tiles” as close to squares as possible.
  2. Create the workers (either threads or processes) for each tile range.
  3. In the main thread, read the next frame and create a shared memory object.
  4. Tell each worker to process their tile.
  5. While waiting for the workers, in the main thread, read the next frame and create a shared memory object.
  6. Wait for all workers to finish.
  7. If running with the GUI, collect data from workers and display. This step is is expensive (moving lots of data). To avoid, run without the GUI. See Without the Graphical User Interface
  8. Go back to step 4 until all frames have been processed.
  9. Collect the tracks from all workers

Note

Tracks are not exchanged between workers because this data transfer is expensive and drastically slows down the processing with very little gain in track counts.

Testing the increase in performance with the number of workers on a 4 core machine results in greater than perfect scaling for multiprocessing until the number of workers is greater than the available cores (5 > 4). However, for multithreading, the performance gain is poor because of Python’s Global Interperator Lock (GIL) only allowing one thread to execute at a time.

_images/scaling.png

Note

For maximum performance, use the same number of workers as cores, use multiprocessing (-mp flag) and without the GUI (-ng flag).