The Tree Abstract Data Type (ADT)
(Heavily from https://www.geeksforgeeks.org/dsa/introduction-to-tree-data-structure/)
A tree is a hierarchical data structure that consists of nodes connected by edges. It is used to represent relationships between elements, where each node may hold data. Trees as a data structure organize and represent data in a parent–child relationship.
A tree consists of nodes such that there is a single topmost node called the root, and every other node can have one or more child nodes.
Each node may point to none or some other nodes. Unlike linked lists, where each node points to the next one and possibly to the previous one too (except for its two end nodes), nodes in a tree data structure store references to multiple child nodes, that is zero to any number.
Data in a tree is not stored sequentially (i.e., not in a linear order). Instead, it is organized across multiple levels, forming a hierarchical structure. Because of this arrangement, a tree is classified as a non-linear data structure.
Trees are useful for storing data that naturally forms a hierarchy. They help in efficient data organization and retrieval for hierarchical relationships.
Some examples:
- File systems on computers are structured as trees, with folders containing subfolders and files.
- The DOM (Document Object Model) of an HTML page is a tree:The <html> tag is the root.<head> and <body> are its children.These tags can have their own child nodes, forming a hierarchical structure.
- Hierarchical organizations, where there is one chief or boss and a single chain of command to each employee.
- A book made up of chapters (and indeces etc.), each chapter comprising sections and so on.
- A patriarchal or matriarchal lineage, which traces descent through male or female ancestors. (A family tree is not a tree structure because each child is descended from two parents, not one.)
Tree Terminology
Basic terms in the tree data structure:
- Parent Node: A node that is an immediate predecessor of another node
- Child Node: A node that is an immediate successor of another node.
- Root Node: The topmost node in a tree, which does not have a parent.
- Leaf Node (External Node): Nodes that do not have any children.
- Ancestor: Any node on the path from the root to a given node (excluding the node itself).
- Descendant: A node x is a descendant of another node y if y is an ancestor of x.
- Sibling: Nodes that share the same parent
- Level of a Node: The number of edges in the path from the root to that node.The root node is at level 0.
- Internal Node: A node with at least one child.
- Neighbor of a Node: The parent or children of a node.
- Subtree: A node and all its descendants form a subtree.
Types of Tree Data Structures
Tree data structures can be classified based on the number of children each node can have.
Binary Trees
Each node can have a maximum of two children.
Some subtypes of binary tree:
- Binary Search Tree (BST) and its Variations: each node has at most two children, and for each node, the left child<s value is smaller than the node<s value, and the right child<s value is greater.
- Full Binary Tree: every node has either 0 or 2 children.
- Complete Binary Tree: all levels are fully filled except possibly the last, which is filled from left to right.
- Balanced Binary Tree: height difference between left and right subtrees of every node is minimal.
Some examples of Balanced Binary Tree are AVL Tree, Red Black Tree and Splay Tree.
Ternary Trees
Each node can have at most three children, often labeled as left, middle, and right.
N-ary Tree (or Generic Tree)
- Each node can have any number of children.
- Each node typically contains some data and a list of references to its children (duplicates are not allowed)
Some common subtypes of n-ary trees are:
- B-tree: A self-balancing search tree where nodes can have multiple children, usually used for indexing large databases.
- B+ Tree: a variation of the B-tree that only stores data in the leaf nodes, making range queries more efficient.
- Trie (or Prefix Tree): A tree where each node represents a character, and paths from the root to leaves represent strings. It can have a variable number of children for each node, making it an N-ary tree.
Properties of Tree Data Structure
- Number of edges: An edge is the connection between two nodes. A tree with N nodes will always have N - 1 edges. There is exactly one path from any node to any other node in the tree.
- Depth of a node: The depth of a node is the length of the path from the root to that node. Each edge in the path adds 1 unit to the length. Equivalently, it is the number of edges from the root to the node.
- Height of the tree: The height of the tree is the length of the longest path from the root to any leaf node.
- Degree of a node: The degree of a node is the number of subtrees attached to it (i.e., the number of children it has). A leaf node has a degree of 0. Then, the degree of the tree is defined as the maximum degree among all nodes in the tree.