Data Structure - BuckSort with Parallel Computing

Introduction

This article introduces the basic bucket sort algorithm and the parallel version of bucket sort

Simple Bucket Sort

Step of Bucket Sort

  1. Create n buckets and each bucket has a range, such as [0,4)
  2. Assign every element of unsorted array to the corresponding bucket based on the range of bucket. Ex: element 3 should be assigned to the bucket with range [0,4)
  3. Sort every bucket using insertion sort or other sorting method
  4. Merge all buckets together based on range to get the overall sorted array

    Example

    Using two buckets with range: [0,5), [5, 10]
    1
    2
    3
    4
    5
    6
    7
    8
    unsorted array:                     [10, 2 , 5, 9, 4, 6, 1]
    / \
    assign element to buckets: [2,4,1] [10,5,9,6]
    | |
    Sort every bucket [1,2,4] [5,6,9,10]
    with insertion sort

    Merge buckets: [1,2,4,5,6,9,10]

Analysis

  1. Time complexity: O(n) for assigning element to buckets. O(n^2) for insertion sort in insertion. When merging buckets to a new buckets: O(n). Depending on the insertion operation, the time complexity can be different.

Average case: O(n) if we think insertion time is O(1), else O(n)+O(n^2) = O(n^2) when using insertion sort
if using quicksort, mergesort, it becomes O(nlog(m))+O(n) = O(nlogm)

  1. Memory complexity: O(n), since we use buckets to store element, where n = number of bucket * size of bucket

Bucket Sort with Parallel Computing method

Step of Bucket Sort of parallel version

  1. There are p computing nodes/processors
  2. For every computing node, it has 1 large bucket and m small bucket and p=m
  3. In processor 0, it divides unsorted array into pieces evenly to every processors
  4. Every small bucket / large bucket has its range
  5. Every processor breaks its own piece of array into its small buckets based on the range of small bucket
  6. Sort every small bucket using some sorting (quicksort, insertion sort, etc)
  7. Gather all small buckets that have the same range into the large bucket that have that range
  8. Sort every large bucket
  9. Merge all large buckets into a sorted array

Example

Using two small buckets: b1, b2 and two large buckets: lb1, lb2, with range: [0,5), [5, 10].
There are two computing nodes/processor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
unsorted array:                     [10, 2 , 5, 9, 4, 6, 1, 7]
/ \
2 nodes: node1: [10,2,5,9] node2: [4,6,1,7]
| |
Divide them into b1: [2] b2:[10,5,9] b1:[4,1] b2:[6,7]
Small buckets
| |
Sort: b1:[2], b2:[5,9,10] b1:[1,4] b2:[6,7]
| |
Send to large buckets: node1: [2,1,4] node2: [5,9,10,6,7]
| |
Sort: node1: [1,2,4] node2: [5,6,7,9,10]

Merge buckets: [1,2,4,5,6,7, 9,10]

Analysis

  1. Time complexity: Average Case O(n) if we consider sorting time as O(1). Or O(nlogm) if we use quicksort/mergesort
  2. Memory Complexity: O(n) since we use p computing nodes , large buckets and m smaller buckets to store data

Reference

[1] https://media.geeksforgeeks.org/wp-content/uploads/BucketSort.png

Comments