Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/algorithm.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ Convert a sequence of hexadecimal characters into a sequence of integers or char
Convert a sequence of integral types into a lower case hexadecimal sequence of characters
[endsect:hex_lower]

[include indirect_sort.qbk]

[include is_palindrome.qbk]

[include is_partitioned_until.qbk]
Expand Down
111 changes: 111 additions & 0 deletions doc/indirect_sort.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
[/ File indirect_sort.qbk]

[section:indirect_sort indirect_sort ]

[/license
Copyright (c) 2023 Marshall Clow

Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]

There are times that you want a sorted version of a sequence, but for some reason you don't want to modify it. Maybe the elements in the sequence can't be moved/copied, e.g. the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.

That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns a "permutation" of the elements that, when applied, will put the elements in the sequence in a sorted order.

Assume have a sequence `[first, last)` of 1000 items that are expensive to swap:
```
std::sort(first, last); // ['O(N ln N)] comparisons and ['O(N ln N)] swaps (of the element type).
```

On the other hand, using indirect sorting:
```
auto perm = indirect_sort(first, last); // ['O(N lg N)] comparisons and ['O(N lg N)] swaps (of size_t).
apply_permutation(first, last, perm.begin(), perm.end()); // ['O(N)] swaps (of the element type)
```

If the element type is sufficiently expensive to swap, then 10,000 swaps of size_t + 1000 swaps of the element_type could be cheaper than 10,000 swaps of the element_type.

Or maybe you don't need the elements to actually be sorted - you just want to traverse them in a sorted order:
```
auto permutation = indirect_sort(first, last);
for (size_t idx: permutation)
std::cout << first[idx] << std::endl;
```


Assume that instead of an "array of structures", you have a "struct of arrays".
```
struct AType {
Type0 key;
Type1 value1;
Type1 value2;
};

std::array<AType, 1000> arrayOfStruct;
```

versus:

```
template <size_t N>
struct AType {
std::array<Type0, N> key;
std::array<Type1, N> value1;
std::array<Type2, N> value2;
};

AType<1000> structOfArrays;
```

Sorting the first one is easy, because each set of fields (`key`, `value1`, `value2`) are part of the same struct. But with indirect sorting, the second one is easy to sort as well - just sort the keys, then apply the permutation to the keys and the values:
```
auto perm = indirect_sort(std::begin(structOfArrays.key), std::end(structOfArrays.key));
apply_permutation(structOfArrays.key.begin(), structOfArrays.key.end(), perm.begin(), perm.end());
apply_permutation(structOfArrays.value1.begin(), structOfArrays.value1.end(), perm.begin(), perm.end());
apply_permutation(structOfArrays.value2.begin(), structOfArrays.value2.end(), perm.begin(), perm.end());
```

[heading interface]

The function `indirect_sort` returns a `vector<size_t>` containing the permutation necessary to put the input sequence into a sorted order. One version uses `std::less` to do the comparisons; the other lets the caller pass predicate to do the comparisons.

There is also a variant called `indirect_stable_sort`; it bears the same relation to `indirect_sort` that `std::stable_sort` does to `std::sort`.

```
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, BinaryPredicate pred);

template <typename RAIterator>
std::vector<size_t> indirect_stable_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_stable_sort (RAIterator first, RAIterator last, BinaryPredicate pred);
```

[heading Examples]

[heading Iterator Requirements]

`indirect_sort` requires random-access iterators.

[heading Complexity]

Both of the variants of `indirect_sort` run in ['O(N lg N)] time; they are not more (or less) efficient than `std::sort`. There is an extra layer of indirection on each comparison, but all of the swaps are done on values of type `size_t`

[heading Exception Safety]

[heading Notes]

In numpy, this algorithm is known as `argsort`.

[endsect]

[/ File indirect_sort.qbk
Copyright 2023 Marshall Clow
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).
]
192 changes: 192 additions & 0 deletions include/boost/algorithm/indirect_sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
/*
Copyright (c) Marshall Clow 2023.

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

*/

/// \file indirect_sort.hpp
/// \brief indirect sorting algorithms
/// \author Marshall Clow
///

#ifndef BOOST_ALGORITHM_INDIRECT_SORT
#define BOOST_ALGORITHM_INDIRECT_SORT

#include <algorithm> // for std::sort (and others)
#include <functional> // for std::less
#include <vector> // for std::vector

#include <boost/algorithm/cxx11/iota.hpp>

namespace boost { namespace algorithm {

namespace detail {

template <class Predicate, class Iter>
struct indirect_predicate {
indirect_predicate (Predicate pred, Iter iter)
: pred_(pred), iter_(iter) {}

bool operator ()(size_t a, size_t b) const {
return pred_(iter_[a], iter_[b]);
}

Predicate pred_;
Iter iter_;
};

}

typedef std::vector<size_t> Permutation;

// ===== sort =====

/// \fn indirect_sort (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_sort (RAIterator first, RAIterator last, Pred pred) {
Permutation ret(std::distance(first, last));
boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_sort (RAIterator first, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted in non-descending order.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_sort (RAIterator first, RAIterator last) {
return indirect_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== stable_sort =====

/// \fn indirect_stable_sort (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_stable_sort (RAIterator first, RAIterator last, Pred pred) {
Permutation ret(std::distance(first, last));
boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::stable_sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_stable_sort (RAIterator first, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted in non-descending order.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_stable_sort (RAIterator first, RAIterator last) {
return indirect_stable_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== partial_sort =====

/// \fn indirect_partial_sort (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the resulting range [first, middle) is sorted and the range [middle,last)
/// consists of elements that are "larger" than then ones in [first, middle),
/// according to the predicate pred.
///
/// \param first The start of the input sequence
/// \param middle The end of the range to be sorted
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_partial_sort (RAIterator first, RAIterator middle,
RAIterator last, Pred pred) {
Permutation ret(std::distance(first, last));

boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::partial_sort(ret.begin(), ret.begin() + std::distance(first, middle), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_partial_sort (RAIterator first, RAIterator last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C&P mistake in the signature inside this (and below) docstrings

/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the resulting range [first, middle) is sorted in non-descending order,
/// and the range [middle,last) consists of elements that are larger than
/// then ones in [first, middle).
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_partial_sort (RAIterator first, RAIterator middle, RAIterator last) {
return indirect_partial_sort(first, middle, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== nth_element =====

/// \fn indirect_nth_element (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar C&P mistake in signature and description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch - thanks!

///
/// \param first The start of the input sequence
/// \param nth The sort partition point in the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
Permutation indirect_nth_element (RAIterator first, RAIterator nth,
RAIterator last, Pred pred) {
Permutation ret(std::distance(first, last));
boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::nth_element(ret.begin(), ret.begin() + std::distance(first, nth), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_nth_element (RAIterator first, RAIterator last)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted in non-descending order.
///
/// \param first The start of the input sequence
/// \param nth The sort partition point in the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
Permutation indirect_nth_element (RAIterator first, RAIterator nth, RAIterator last) {
return indirect_nth_element(first, nth, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

}}

#endif // BOOST_ALGORITHM_INDIRECT_SORT
4 changes: 4 additions & 0 deletions test/Jamfile.v2
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ alias unit_test_framework

# Apply_permutation tests
[ run apply_permutation_test.cpp unit_test_framework : : : : apply_permutation_test ]

# Indirect_sort tests
[ run indirect_sort_test.cpp unit_test_framework : : : : indirect_sort_test ]

# Find tests
[ run find_not_test.cpp unit_test_framework : : : : find_not_test ]
[ run find_backward_test.cpp unit_test_framework : : : : find_backward_test ]
Expand Down
Loading