Trees

Binary Trees

Acyclic and connected.
Rooted, directed and ordered.

Binary Tree T defined as : T = $\emptyset$ or T = {root, $T_{L}$, $T_{R}$}

Tree Height

Empty tree has height of -1, so height = max( height($T_{L}$), height($T_{R}$) ) + 1

Tree Properties

Full : All nodes have either 0 or m children.
Perfect : Every leaf node is at the same level.
Complete : Perfect tree for every level except the leaves, where all leaves are pushed leftward.

Not every full tree is complete. Not every complete tree is full.

Binary Search Trees (BSTs)

We use BSTs because can't run binary search on Linked List

Height bound by n such that $lg(n) \leq h \leq n$
Full, Complete and/or perfect
n nodes in a Binary Tree means $n+1$ NULL pointers
BST has avg height of $O(lg(n))$

Tree Analysis

Best Case BST: Complete Tree
Worst Case BST: Linked List Tree
- If inserting sorted list of elements, start at middle to avoid this outcome.
All height-based BST operations have lower bound of $\Omega$(log(n)) time, and upper bound of $O(n)$ time
Min number of nodes in tree of height $h$: $log(n)$
Max number of nodes in tree of height $h$: $n$

Traversals

Pre/Post/In Order Traversals

Always Left then Right. Only change where current node goes.
- Pre-Order: Self, L, R -- Stack (DFS)
- In-Order: L, Self, R
  - Ordered list
- Post-Order: L, R, Self
Level-Order Traversal -- Queue (BFS)
- Enqueue root
- While queue isn't empty
  - Dequeue and if element isn't NULL, print it and enqueue its children.

Traversals vs. Searches

A traversal visits each node exactly once. A search visits each node until find node being searched for.

An invalid in-order traversal is one that doesn't strictly ascend.

BFS : Visits all nodes on same level before going deeper.

DFS : Visits leaf nodes as quickly as possible.

Two Child Removal

Find In Order Predecessor (rightmost element of left subtree)
Swap IOP with node
Delete removal node

Run Time Analysis

Operation	BST Avg. Case	BST Worst Case	Sorted Array	Sorted List
find()	O(log(n))	O(n)	O(log(n)) *Binary Search	O(n)
insert()	O(log(n))	O(n)	O(n)	Given: O(1), Random: O(n)
delete()	O(log(n))	O(n)	O(n)	Given: O(1), Random: O(n)
traverse()	O(n)	O(n)	O(n)	O(n)

AVL Trees -- Self Balancing BST

AVL Tree considered height balanced if the absolute value of the balance factor for each node is $\leq$ 1

BalanceFactor = heightOfRightSubtree - heightOfLeftSubtree

Insertion

If subNode is NULL, return
If key is greater than the curr key, go right child.
If key is less than curr key, go left child.
Rebalance and update height (can be done inside rebalance)

Removal

find(SubNodeKey) 2. If find does not return a valid node, end the remove method.
Deal with children 3. If No Children, delete the node 4. Else If Two Children : 5. Find the In Order Predecessor (IOP), which is the rightmost value of the left subtree. 6. swap(IOP, root) 7. remove(root->left) 5. Else One Child, just overwrite removal node with its child

Rotations

All rotations run in O(1) time, and are done in rebalance(). No rotations possible in find() call.

void rotateRight(AVL*& root) {
	AVL* origRoot = root;
	root = root->left_;
	origRoot->left_ = root->right_;
	root->right_ = origRoot;

	updateHeight(root->right_);
	updateHeight(root);
}

void rotateLeft(AVL*& root) {
	AVL* origRoot = root;
	root = root->right_;
	origRoot->right_ = root->left_;
	root->left_ = origRoot;

	updateHeight(root->left_);
	updateHeight(root);
}

void rotateRightLeft(AVL*& root) {
	rotateRight(root->right_);
	rotateLeft(root);
}

void rotateLeftRight(AVL*& root) {
	rotateLeft(root->left_);
	rotateRight(root);
}

void rebalance(Node *&subtree) {
  int balance = getBalanceFactor(subtree);
  int leftBalance = getBalanceFactor(subtree->left);
  int rightBalance = getBalanceFactor(subtree->right);

  //If balance is negative, left heavy
  if (balance == - 2) {
    if (leftBalance == - 1) {
      rotateRight(subtree);
    } else rotateLeftRight(subtree);
  }
  //If balance is positive, right heavy
  if (balance == 2) {
    if (rightBalance) == 1) {
      rotateLeft(subtree);
    } else rotateRightLeft(subtree);
  }

  if (subtree != NULL) {
  	updateHeight(subtree);
  }
}

Run Time Analysis

AVL Operation	Run Time
find()	O(h) ** h = O(lg(n))
insert()	O(h) ** h = O(lg(n))
remove()	O(h) ** h = O(lg(n))
traverse()	O(n)

Summary of Advantages/Disadvantages

Advantages 2. Run time of O(log(n)) is improvement over arrays and lists 3. Great for specific implementations where exact key is unknown (Nearest neighbor, Interval search, etc.)
Disadvantages 3. Not the fastest find() run time. Outperformed by hash tables. 4. AVL tree does not work well with disk seeks. Must be stored in memory, so not good for large data sets.

Red-Black Trees

When you see "Red-Black Tree", you should think of an AVL Tree. They're both balanced BSTs.

B-Trees

Use for large data sets. Minimizes number of disk seeks. Chunks data for better caching.
B-trees can be expensive if frequently modified
Self-balancing and sorted

B-Tree of order m

All internal nodes have one more child than key
All leaves are on the same level, and hold no more than $m-1$ keys
When a node has m key, you split it into two nodes, and push the median value to the node above.
Binary Search to Insert
All non root internal nodes have [ceil(m/2), m] children.
Root can be a leaf or have between 2 and m children.
All keys are sorted within a node

B-Tree Upper and Lower Bounds

Max Num Keys in Tree : $m^{h+1} - 1$
Min Num Keys in Tree : 2 ceil[m/2]^h - 1 -> $2^{kh + 1} - 1$

Time Complexity

B-Tree Operation/Property	Run Time
Search Through Single Node	$O(log (m))$
height	$O(log_m(n))$
find()	`height * Search`
insert()	Same runtime as `find()`

KD Trees

CAN NOT USE FOR DICTIONARY because can't insert or remove elements.
"Think about how a K-D tree is created. We have to know information about all of the nodes that will eventually be in our tree so we can find the median and medians of the subarrays. Now that we have created our tree, we try to add in a new node. This could potentially change which node is our median, meaning we'd have to reconstruct the entire tree which is not efficient. Same with removals"

Constructor

Quickselect allowed us to find the k-th smallest element
Partition actually does the swapping

Run Time

Worst Case Find Nearest Neighbor : $O(n)$
Average Case Find Nearest Neighbor : $O(log n)$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trees

Binary Trees

Binary Search Trees (BSTs)

AVL Trees -- Self Balancing BST

Red-Black Trees

B-Trees

KD Trees

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Data Structures

Clone this wiki locally