Overview

This directory contains a simple example that sums values in a tree. The example exhibits some speedup, but not a lot, because it quickly saturates the system bus on a multiprocessor. For good speedup, there needs to be more computation cycles per memory reference. The point of the example is to teach how to use the raw task interface, so the computation is deliberately trivial.

The performance of this example is better when objects are allocated by the scalable_allocator instead of the default "operator new". The reason is that the scalable_allocator typically packs small objects more tightly than the default "operator new", resulting in a smaller memory footprint, and thus more efficient use of cache and virtual memory. In addition, the scalable_allocator performs better for multi-threaded allocations.

Files

SerialSumTree.cpp
Sums sequentially.
SimpleParallelSumTree.cpp
Sums in parallel without any fancy tricks.
OptimizedParallelSumTree.cpp
Sums in parallel, using "recycling" and "continuation-passing" tricks. In this case, it is only slightly faster than the simple version.
common.h
Shared declarations.
main.cpp
Main program which parses command line options and runs the algorithm.
Makefile
Makefile for building example.

Directories

msvs
Contains Microsoft* Visual Studio* 2005 workspace for building and running the example (Windows* systems only).
xcode
Contains Xcode* IDE workspace for building and running the example (OS X* systems only).

To Build

General build directions can be found here.

Usage

tree_sum -h
Prints the help for command line options
tree_sum [n-of-threads=value] [number-of-nodes=value] [silent] [stdmalloc]
tree_sum [n-of-threads [number-of-nodes]] [silent] [stdmalloc]
n-of-threads is the number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the default.
number-of-nodes is the number of nodes in the tree.
silent - no output except elapsed time.
stdmalloc - causes the default "operator new" to be used for memory allocations instead of the scalable_allocator.
To run a short version of this example, e.g., for use with Intel® Parallel Inspector:
Build a debug version of the example (see the build directions).
Run it with a small problem size and the desired number of threads, e.g., tree_sum 4 100000.

Up to parent directory

Copyright © 2005-2014 Intel Corporation. All Rights Reserved.

Intel is a registered trademark or trademark of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands may be claimed as the property of others.