Skip to content
Aaron Lichtman edited this page Apr 2, 2018 · 6 revisions

Binary Trees

  • Acyclic and connected.
  • Rooted, directed and ordered.

Binary Tree T defined as : T = $\emptyset$ or T = {root, $T_{L}$, $T_{R}$}

Tree Height

Empty tree has height of -1, so height = max( height($T_{L}$), height($T_{R}$) ) + 1

Tree Properties

  1. Full : All nodes have either 0 or m children.

  2. Perfect : Every leaf node is at the same level.

  3. Complete : Perfect tree for every level except the leaves, where all leaves are pushed leftward.

Not every full tree is complete. Not every complete tree is full.

Binary Search Trees (BSTs)

We use BSTs because can't run binary search on Linked List

  • Height bound by n such that \(lg(n) \leq h \leq n\)
  • Full, Complete and/or perfect
  • n nodes in a Binary Tree means $n+1$ NULL pointers
  • BST has avg height of $O(lg(n))$

Tree Analysis

  • Best Case BST: Complete Tree
  • Worst Case BST: Linked List Tree
    • If inserting sorted list of elements, start at middle to avoid this outcome.
  • All height-based BST operations have lower bound of $\Omega$(log(n)) time, and upper bound of $O(n)$ time
  • Min number of nodes in tree of height $h$: $log(n)$
  • Max number of nodes in tree of height $h$: $n$

Traversals

  1. Pre/Post/In Order Traversals

    Always Left then Right. Only change where current node goes.

    • Pre-Order: Self, L, R -- Stack (DFS)
    • In-Order: L, Self, R
      • Ordered list
    • Post-Order: L, R, Self
  2. Level-Order Traversal -- Queue (BFS)

    • Enqueue root
    • While queue isn't empty
      • Dequeue and if element isn't NULL, print it and enqueue its children.

Traversals vs. Searches

A traversal visits each node exactly once. A search visits each node until find node being searched for.

An invalid in-order traversal is one that doesn't strictly ascend.

BFS : Visits all nodes on same level before going deeper.

DFS : Visits leaf nodes as quickly as possible.

Two Child Removal

  1. Find In Order Predecessor (rightmost element of left subtree)
  2. Swap IOP with node
  3. Delete removal node

Run Time Analysis

Operation BST Avg. Case BST Worst Case Sorted Array Sorted List
find() O(log(n)) O(n) O(log(n)) *Binary Search O(n)
insert() O(log(n)) O(n) O(n) Given: O(1), Random: O(n)
delete() O(log(n)) O(n) O(n) Given: O(1), Random: O(n)
traverse() O(n) O(n) O(n) O(n)

AVL Trees -- Self Balancing BST

AVL Tree considered height balanced if the absolute value of the balance factor for each node is $\leq$ 1

BalanceFactor = heightOfRightSubtree - heightOfLeftSubtree

Insertion

  1. If subNode is NULL, return
  2. If key is greater than the curr key, go right child.
  3. If key is less than curr key, go left child.
  4. Rebalance and update height (can be done inside rebalance)

Removal

  1. find(SubNodeKey) 2. If find does not return a valid node, end the remove method.
  2. Deal with children 3. If No Children, delete the node 4. Else If Two Children : 5. Find the In Order Predecessor (IOP), which is the rightmost value of the left subtree. 6. swap(IOP, root) 7. remove(root->left) 5. Else One Child, just overwrite removal node with its child

Rotations

All rotations run in O(1) time, and are done in rebalance(). No rotations possible in find() call.

void rotateRight(AVL*& root) {
	AVL* origRoot = root;
	root = root->left_;
	origRoot->left_ = root->right_;
	root->right_ = origRoot;

	updateHeight(root->right_);
	updateHeight(root);
}

void rotateLeft(AVL*& root) {
	AVL* origRoot = root;
	root = root->right_;
	origRoot->right_ = root->left_;
	root->left_ = origRoot;

	updateHeight(root->left_);
	updateHeight(root);
}

void rotateRightLeft(AVL*& root) {
	rotateRight(root->right_);
	rotateLeft(root);
}

void rotateLeftRight(AVL*& root) {
	rotateLeft(root->left_);
	rotateRight(root);
}

void rebalance(Node *&subtree) {
  int balance = getBalanceFactor(subtree);
  int leftBalance = getBalanceFactor(subtree->left);
  int rightBalance = getBalanceFactor(subtree->right);

  //If balance is negative, left heavy
  if (balance == - 2) {
    if (leftBalance == - 1) {
      rotateRight(subtree);
    } else rotateLeftRight(subtree);
  }
  //If balance is positive, right heavy
  if (balance == 2) {
    if (rightBalance) == 1) {
      rotateLeft(subtree);
    } else rotateRightLeft(subtree);
  }

  if (subtree != NULL) {
  	updateHeight(subtree);
  }
}

Run Time Analysis

AVL Operation Run Time
find() O(h) ** h = O(lg(n))
insert() O(h) ** h = O(lg(n))
remove() O(h) ** h = O(lg(n))
traverse() O(n)

Summary of Advantages/Disadvantages

  1. Advantages 2. Run time of O(log(n)) is improvement over arrays and lists 3. Great for specific implementations where exact key is unknown (Nearest neighbor, Interval search, etc.)
  2. Disadvantages 3. Not the fastest find() run time. Outperformed by hash tables. 4. AVL tree does not work well with disk seeks. Must be stored in memory, so not good for large data sets.

Red-Black Trees

When you see "Red-Black Tree", you should think of an AVL Tree. They're both balanced BSTs.

B-Trees

  • Use for large data sets. Minimizes number of disk seeks. Chunks data for better caching.
  • B-trees can be expensive if frequently modified
  • Self-balancing and sorted

B-Tree of order m

  • All internal nodes have one more child than key
  • All leaves are on the same level, and hold no more than $m-1$ keys
  • When a node has m key, you split it into two nodes, and push the median value to the node above.
  • Binary Search to Insert
  • All non root internal nodes have [ceil(m/2), m] children.
  • Root can be a leaf or have between 2 and m children.
  • All keys are sorted within a node

B-Tree Upper and Lower Bounds

  1. Max Num Keys in Tree : $m^{h+1} - 1$
  2. Min Num Keys in Tree : 2 ceil[m/2]^h - 1 -> $2^{kh + 1} - 1$

Time Complexity

B-Tree Operation/Property Run Time
Search Through Single Node $O(log (m))$
height $O(log_m(n))$
find() height * Search
insert() Same runtime as find()

KD Trees

  • CAN NOT USE FOR DICTIONARY because can't insert or remove elements.
  • "Think about how a K-D tree is created. We have to know information about all of the nodes that will eventually be in our tree so we can find the median and medians of the subarrays. Now that we have created our tree, we try to add in a new node. This could potentially change which node is our median, meaning we'd have to reconstruct the entire tree which is not efficient. Same with removals"

Constructor

  • Quickselect allowed us to find the k-th smallest element
  • Partition actually does the swapping

Run Time

  • Worst Case Find Nearest Neighbor : $O(n)$
  • Average Case Find Nearest Neighbor : $O(log n)$

Clone this wiki locally