Overview
Example that uses parallel_do to do parallel preorder traversal of a sparse graph.
Each vertex in the graph is called a "cell".
Each cell has a value.
The value is a matrix.
Some of the cells have operators
that compute the cell's value, using other cell's values as input.
A cell that uses the value of cell x is called a successor of x.
The algorithm works as follows.
- Compute the set of cells that have no inputs. This set is called root_set.
- Each cell has an associated field ref_count that is an atomic integer.
Initialize ref_count to the number of inputs for the Cell.
- Update each cell in root_set, by applying a parallel_do to a root_set
- After updating a cell, for each of its successors
- Atomically decrement the successor's ref_count
- If the count became zero, add the cell to the set of cells to be updated,
by calling parallel_do_feeder_impl::add.
The times printed are for the traversal and update,
and do not include time for computing the root_set.
The example is using custom synchronization via ref_count atomic variable.
Correctness checking tools might not take this into account, and report data races
between different tasks that are actually synchronized.
NOTE: It is important to understand that this example is unlikely to show speedup
if the cell values are changed to type "float". The reason is twofold.
- The smaller value type causes each Cell to be significantly smaller than a cache line,
which leads to false sharing conflicts.
- The time to update the cells becomes very small, and consequently the overhead of
parallel_do swamps the useful work.
Files
- main.cpp
- Main program which parses command line options and runs the algorithm with different numbers of threads.
- parallel_preorder.cpp
- Implementation of the parallel preorder traversal algorithm.
- Graph.h
- Interfaces of the Graph and Cell classes.
- Graph.cpp
- Implementations of the Graph and Cell classes.
- Matrix.h
- The Matrix class definition.
- Makefile
- Makefile for building example.
Directories
- msvs
- Contains Microsoft* Visual Studio* 2005 workspace for building and running the example (Windows* systems only).
- xcode
- Contains Xcode* IDE workspace for building and running the example (OS X* systems only).
To Build
General build directions can be found here.
Usage
- parallel_preorder -h
- Prints the help for command line options
- parallel_preorder [n-of-threads=value] [n-of-nodes=value] [n-of-traversals=value] [silent]
- parallel_preorder [n-of-threads [n-of-nodes [n-of-traversals]]] [silent]
- n-of-threads is the number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the TBB default.
n-of-nodes is a number of nodes in the graph. Default value is 1000.
n-of-traversals is the number of times to evaluate the graph. Default value is 500.
silent - no output except elapsed time.
- To run a short version of this example, e.g., for use with Intel® Parallel Inspector:
- Build a debug version of the example
(see the build directions).
Run it with the desired number of threads and smaller number of traversals, e.g., parallel_preorder 4 1000 5.
Up to parent directory
Copyright © 2005-2014 Intel Corporation. All Rights Reserved.
Intel is a registered trademark or trademark of Intel Corporation
or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.