std::experimental::parallel::reduce
From cppreference.com
< cpp  experimental
Defined in header <experimental/numeric>


template<class InputIt> typename std::iterator_traits<InputIt>::value_type reduce( 
(1)  (parallelism TS) 
template<class ExecutionPolicy, class InputIterator> typename std::iterator_traits<InputIt>::value_type reduce( 
(2)  (parallelism TS) 
template<class InputIt, class T> T reduce(InputIt first, InputIt last, T init); 
(3)  (parallelism TS) 
template<class ExecutionPolicy, class InputIt, class T> T reduce(ExecutionPolicy&& policy, InputIt first, InputIt last, T init); 
(4)  (parallelism TS) 
template<class InputIt, class T, class BinaryOp> T reduce(InputIt first, InputIt last, T init, BinaryOp binary_op); 
(5)  (parallelism TS) 
template<class ExecutionPolicy, class InputIt, class T, class BinaryOp> T reduce(ExecutionPolicy&& policy, 
(6)  (parallelism TS) 
1) same as reduce(first, last, typename std::iterator_traits<InputIt>::value_type{})
3) same as reduce(first, last, init, std::plus<>())
5) Reduces the range [first; last), possibly permuted and aggregated in unspecified manner, along with the initial value
init
over binary_op
.2,4,6) Same as (1,3,5), but executed according to
policy
The behavior is nondeterministic if binary_op
is not associative or not commutative.
The behavior is undefined if binary_op
modifies any element or invalidates any iterator in [first; last).
Contents 
[edit] Parameters
first, last    the range of elements to apply the algorithm to 
init    the initial value of the generalized sum 
policy    the execution policy 
binary_op    binary FunctionObject that will be applied in unspecified order to the result of dereferencing the input iterators, the results of other binary_op and init .

Type requirements  
InputIt must meet the requirements of LegacyInputIterator.

[edit] Return value
Generalized sum of init
and *first
, *(first+1)
, ... *(last1)
over binary_op
,
where generalized sum GSUM(op, a
1, ..., a
N) is defined as follows:
 if N=1, a
1  if N > 1, op(GSUM(op, b
1, ..., b
K), GSUM(op, b
M, ..., b
N)) where
 b
1, ..., b
N may be any permutation of a1, ..., aN and  1 < K+1 = M ≤ N
 b
in other words, the elements of the range may be grouped and rearranged in arbitrary order
[edit] Complexity
O(last  first) applications of binary_op
.
[edit] Exceptions
 If execution of a function invoked as part of the algorithm throws an exception,
 if
policy
isparallel_vector_execution_policy
, std::terminate is called  if
policy
issequential_execution_policy
orparallel_execution_policy
, the algorithm exits with an exception_list containing all uncaught exceptions. If there was only one uncaught exception, the algorithm may rethrow it without wrapping inexception_list
. It is unspecified how much work the algorithm will perform before returning after the first exception was encountered.  if
policy
is some other type, the behavior is implementationdefined
 if
 If the algorithm fails to allocate memory (either for itself or to construct an
exception_list
when handling a user exception), std::bad_alloc is thrown.
[edit] Notes
If the range is empty, init
is returned, unmodified
 If
policy
is an instance ofsequential_execution_policy
, all operations are performed in the calling thread.  If
policy
is an instance ofparallel_execution_policy
, operations may be performed in unspecified number of threads, indeterminately sequenced with each other  If
policy
is an instance ofparallel_vector_execution_policy
, execution may be both parallelized and vectorized: function body boundaries are not respected and user code may be overlapped and combined in arbitrary manner (in particular, this implies that a userprovided Callable must not acquire a mutex to access a shared resource)
[edit] Example
reduce is the outoforder version of std::accumulate:
Run this code
#include <iostream> #include <chrono> #include <vector> #include <numeric> #include <experimental/execution_policy> #include <experimental/numeric> int main() { std::vector<double> v(10'000'007, 0.5); { auto t1 = std::chrono::high_resolution_clock::now(); double result = std::accumulate(v.begin(), v.end(), 0.0); auto t2 = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> ms = t2  t1; std::cout << std::fixed << "std::accumulate result " << result << " took " << ms.count() << " ms\n"; } { auto t1 = std::chrono::high_resolution_clock::now(); double result = std::experimental::parallel::reduce( std::experimental::parallel::par, v.begin(), v.end()); auto t2 = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> ms = t2  t1; std::cout << "parallel::reduce result " << result << " took " << ms.count() << " ms\n"; } }
Possible output:
std::accumulate result 5000003.50000 took 12.7365 ms parallel::reduce result 5000003.50000 took 5.06423 ms
[edit] See also
sums up a range of elements (function template)  
applies a function to a range of elements (function template)  
(parallelism TS) 
applies a functor, then reduces out of order (function template) 