Associative Containers

An associative container is a variable-sized container that supports efficient retrieval of elements (values) based on keys. It supports insertion and removal of elements, but differs from a sequence container in that it does not provide a mechanism for inserting an element at a specific position but the container itself keeps its elements in order internally in a suitable data structure.

Immutability of Associative Containers

Since the elements of an associative container are stored according to their keys (in a binary tree, hash table...), it is essential that the key associated with each element is immutable. In simple associative containers this means that the elements themselves are immutable, while in other types of associative containers, such as pair associative containers (that is, maps), the elements themselves are mutable but the part of an element that is its key cannot be modified or assigned a value to. This has an important consequence: associative containers cannot have mutable iterators in the sense that if it is an iterator for an associative container and t is an object of it's value type, then *it = t is not a valid expression.

Consequently, every element of a simple associative container is immutable: objects may be inserted and erased, but not modified.

Types of Associative Containers

Associative containers are either maps or sets. A map associates or maps a key to a value, whereas in a set (set and multiset) their elements are their own keys.

A map may be either a stl::map or a stl::multimap, and a set may be either a stl::set or a stl::multiset. Here the prefix multi means that elements with the same key are allowed. Furthermore, sameness is to be understood not literally but according to a function or a function object that takes to objects and returns a boolean value.

According to their implementation and performance, associative containers are further divided into:

  1. ordered containers: and are std::set, std::multiset, std::map, and std::multimap. They are implemented as binary trees, often red-black trees. They use a comparison function or a comparison function object to compare keys.
  2. unordered containers: and are std::unordered_set, std::unordered_multiset, std::unordered_map, and std::unordered_multimap. According to Bjarne Stroutstrup, they are implemented as hash tables with linked overflow.

Comparison Function Objects for STL's Associative Containers

Introduction

To use the Standard Template Library (STL) associative containers, the type of keys stored must be comparable.

For sorted containers (e.g std::set), the comparison is by equivalence, which roughly means overloading the operator<, and for unsorted containers (e.g std::unordered_set) the comparison is by equality, which in turn (again, roughly) means overloading the operator==.

Custom Comparison for STL's Associative Containers

For some instances of a container with a predefined comparison function for the stored type, you may want to supply an alternative comparison function that makes more sense for the container instance's purpose.

For example, given a type Person that are normally sorted by Person::id, for some std::set<Person> called names you may want to compare the elements by Person::name instead of Person::id. How do you override the default comparison function for Person to achieve the desired behavior for this particular instance of std::set<Person>?

The Complete Declaration for Associative Containers

To exemplify the concepts, I'll consider std::set as a representative of the STL's associative containers, the differences between it and the others aren't significant for our purpose: how to create an associative container with a different comparison function from the one defined by the type of element held inside the container?

The complete declaration for std::set is:

template
<
class T,
class Compare = less<T>,
class Alloc = allocator<T>
>
class set;

Where:

  1. T is the type of elements that will be held
  2. Alloc is the memory allocator, with the default being allocator<T>

And the most important aspect for this discussion:

Compare is the type of the comparison used, with the default being std::less<T<

But, what's std::less<T<?

std::less<T< is a function object that expresses the less-than semantic for objects of type T. Its behavior is to call the operator<() overloaded by the type T to check the relative order of two objects A and B of type T.

Given the declaration for std::set, we can see that the way for providing our desired behavior is to supply a different type of comparison function object for the template parameter: Compare. Alright? Let's do it then!

Our type Person looks like this:

#include <algorithm>
#include <functional>
#include <iostream>
#include <set>
#include <string>

class Person {
public:
	explicit Person(int id, const std::string &name) : m_id{id}, m_name{name} {}
	const int id() const {return m_id;}
	const std::string& name() const {return m_name;}
private:
	const int m_id;
	const std::string m_name;
};

bool operator<(const Person &lhs, const Person &rhs) { // compare by id
	return lhs.id() < rhs.id();
}

std::ostream& operator<<(std::ostream &os, const Person &rhs) { // for formatted output
	os << "Id: " << rhs.id() << ", Name: " << rhs.name();
	return os;
}

And the use of Person inside std::set and comparing by Person::name instead of Person::id can be achieved by:

struct PersonCompareByName {
	bool operator()(const Person &lhs, const Person &rhs) const {
		return lhs.name() < rhs.name();
	}
};

int main() {
	auto people = {Person{2, "C"}, Person{3, "A"}, Person{1, "B"}};

	// By Id
	std::set<Person> peopleById = people;
	std::cout << "By Id: " << "\n\n";
	std::copy(cbegin(peopleById), cend(peopleById), std::ostream_iterator<Person>{std::cout, "\n"});

	std::cout << '\n';

	// By Name (1)
	std::set<Person, PersonCompareByName> peopleByName = people;
	std::cout << "By Name (1): " << "\n\n";
	std::copy(cbegin(peopleByName), cend(peopleByName), std::ostream_iterator<Person>{std::cout, "\n"});

	std::cout << '\n';

	// By Name (2)
	auto personComparator = [](const auto &lhs, const auto &rhs) {return lhs.name() < rhs.name();};
	std::set<Person, decltype(personComparator)> peopleByName2{people, personComparator};
	std::cout << "By Name (2): " << "\n\n";
	std::copy(cbegin(peopleByName2), cend(peopleByName2), std::ostream_iterator<Person>{std::cout, "\n"});

	return 0;
}

First, we'd created the peopleById that uses the internal (default) comparison function for the type Person: By Person::id(). Thus, every element inside the container will be sorted based on its id.

Then, at (1) we created peopleName that relies on the function object PersonCompareByName as the argument for the template parameter Compare, establishing the comparison by Person::name().

At (2) we achieved the same behavior of (1) but employing an alternative syntax that uses the lambda personComparator and supplied the template parameter with decltype.

Function Objects for Ordered Associative Containers

Ordered associative containers use compare function objects to keep themselves sorted. The default upon construction is less<Key>, which is defined in functional. You may define and pass in your own. It should overload bool operator()(const Key& l, const Key& r) and meet the following three requirements for a strict weak ordering:

It has to be antisymmetric
That is, if less<Key>(val1,val2) is true, then less<Key>(val2,val1) must be false.
It has to be transitive
It has to be irreflexive
That is, a key is never less than itself, or more formally less<Key>(val,val) is always false.

On this definition we define an equality test given two elements: they are equal if neither is less than the other.

set and multiset

Set and multiset containers sort their elements automatically according to a certain sorting criterion. The difference between the two is that a multiset allows duplicates, whereas a set does not.

They are good for finding out if an element is in a set, or for retrieving an element that is equal to another, which actually is useful if function object less<T> is specialized to suit our goals.

Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.

Using a std::set for Recording Visited Items

We will be using the following:

  1. insertions, and possibly deletions
  2. const_iterator std::set::find( const Key& key ) const
  3. const_iterator std::set::end() const for comparing the returned value

This would enable us to write a bool has(s, k) const, and a similar bool has_or_add(s, k) function.

In many situations the key type would be just a pointer...

std::map

Maps are associative containers that store elements formed by a combination of a key value (key_type) and a mapped value (mapped_type), following a specific order, unlike .

In a map, the key values are used to sort and uniquely identify the elements, while the mapped values store the content associated to this key. The types of key and mapped value may differ, and are grouped together in member type value_type, which is a std::pair type combining both:

typedef pair<const Key, Mapped> value_type;

Internally, the elements in a map are always sorted by its key following a specific strict weak ordering criterion indicated by its internal comparison object (of type Compare).

std::map containers are generally slower than std::unordered_map containers to access individual elements by their key, but they allow the direct iteration on subsets based on their order.


The mapped values in a map can be accessed directly by their corresponding key using the bracket operator (std::map::operator[]).

std::map::insert(ARGS)

Prototypes:

pair<iterator,bool> insert (const value_type& val);
template <class P>
pair<iterator,bool> insert (P&& val);

// with hint:
iterator insert(const_iterator position, const value_type& val);
template <class P>
iterator insert (const_iterator position, P&& val);

// range:
template <class InputIterator>
void insert (InputIterator first, InputIterator last);

// initializer list:
void insert (initializer_list<value_type> il);

Because element keys in a map are unique, the insertion operation checks whether each inserted element has a key equivalent to the one of an element already in the container, and if so, the element is not inserted, returning an iterator to this existing element (if the function returns a value).

For a similar container allowing for duplicate elements, see std::multimap.

An alternative way to insert elements in a map is by using member function map::operator[]


Member types iterator and const_iterator are defined in map as bidirectional iterator types that point to elements.

Postion Hints

As for hinting for the position where the element can be inserted, the function std::map::insert() optimizes its insertion time if position points to the element that will follow the inserted element (or to the end, if it would be the last).

Notice that this is just a hint and does not force the new element to be inserted at that position within the map container (the elements in a map always follow a specific order depending on their key).

Return value

The single element versions return a pair, with its member pair::first set to an iterator pointing to either the newly inserted element or to the element with an equivalent key in the map. The pair::second element in the pair is set to true if a new element was inserted or false if an equivalent key already existed.

The versions with a hint return an iterator pointing to either the newly inserted element or to the element that already had an equivalent key in the map.

An Example using std::map::insert()

// map::insert (C++98)
#include <iostream>
#include <map>

int main ()
{
  std::map<char,int> mymap;

  // first insert function version (single parameter):
  mymap.insert ( std::pair<char,int>('a',100) );
  mymap.insert ( std::pair<char,int>('z',200) );

  std::pair<std::map<char,int>::iterator,bool> ret;
  ret = mymap.insert ( std::pair<char,int>('z',500) );
  if (ret.second==false) {
    std::cout << "element 'z' already existed";
    std::cout << " with a value of " << ret.first->second << '\n';
  }

  // second insert function version (with hint position):
  std::map<char,int>::iterator it = mymap.begin();
  mymap.insert (it, std::pair<char,int>('b',300));  // max efficiency inserting
  mymap.insert (it, std::pair<char,int>('c',400));  // no max efficiency inserting

  // third insert function version (range insertion):
  std::map<char,int> anothermap;
  anothermap.insert(mymap.begin(),mymap.find('c'));

  // showing contents:
  std::cout << "mymap contains:\n";
  for (it=mymap.begin(); it!=mymap.end(); ++it)
    std::cout << it->first << " => " << it->second << '\n';

  std::cout << "anothermap contains:\n";
  for (it=anothermap.begin(); it!=anothermap.end(); ++it)
    std::cout << it->first << " => " << it->second << '\n';

  return 0;
}

Output:


          

std::map::find()

Searches the container for an element with a key equivalent to k and returns an iterator to it if found, otherwise it returns an iterator to map::end().

Prototypes

iterator find (const key_type& k);
const_iterator find (const key_type& k) const;

Two keys are considered equivalent if the container's comparison object returns false reflexively (i.e., no matter the order in which the elements are passed as arguments).


Another member function, map::count, can be used to just check whether a particular key exists. In later versions you may also use std::map::contains(K).

Here is an example using map::find:

// map::find
#include <iostream>
#include <map>

int main ()
{
  std::map<char,int> mymap;
  std::map<char,int>::iterator it;

  mymap['a']=50;
  mymap['b']=100;
  mymap['c']=150;
  mymap['d']=200;

  it = mymap.find('b');
  if (it != mymap.end())
    mymap.erase (it);

  // print content:
  std::cout << "elements in mymap:" << '\n';
  std::cout << "a => " << mymap.find('a')->second << '\n';
  std::cout << "c => " << mymap.find('c')->second << '\n';
  std::cout << "d => " << mymap.find('d')->second << '\n';

  return 0;
}

std::map::lower_bound and std::map::upper_bound

std::map::lower_bound returns an iterator pointing to the first element in the container whose key is not considered to go before k (i.e., either it is equivalent or goes after), whereas std::map::upper_bound returns a key pointing to the next element after k, or just std::map::end().

Prototypes:

iterator lower_bound (const key_type& k);
const_iterator lower_bound (const key_type& k) const;

and

iterator upper_bound (const key_type& k);
const_iterator upper_bound (const key_type& k) const;

std::map::equal_range

... is very much anologous to std::map::lower_bound/std::map::upper_bound in that it returns a std::pair of iterators, which represent the bounds of a range that includes all the elements in the container which have a key equivalent to k and, because the elements in a map container have unique keys, the range returned will contain a single element at most. Not so with same-named function std::multimap::equal_range.


          

[...]

std::map::erase()

Prototypes:

iterator  erase (const_iterator position);
size_type erase (const key_type& k);
iterator  erase (const_iterator first, const_iterator last);

Removes from the map container either a single element or a range of elements ([first,last)).

Difference Between std::map::operator[](KEY) and std::map::at(KEY)

If you access a key using the indexing operator [] that is not currently a part of a map, then it automatically adds a key for you. This is a huge caveat. For this reason, I prefer using the indexing operator [] for setting, and members find() / at() for lookup.

Another advantage of using at() over [] is the fact that it can operate on a const std::map, whereas [] won't.

Importantly, at(key) throws an exception if the key doesn't exist.

std::multimap

Just like std::map', multimaps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order.

Unlike std::map, multiple elements are allowed to have equivalent keys.

Member functions std::map and std::multimap are the same save for the ramifications from the fact that equivalent keys are allowed.

std::unordered_set and std::unordered_multiset

(Heavily from https://cplusplus.com/reference/unordered_set/)

Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value.

Internally, the elements are organized into buckets depending on their hash values to allow for fast access to individual elements directly by their values (with a constant average time complexity on average).

std::unordered_set containers are faster than std::set containers to access individual elements by their key, although they are generally less efficient for range iteration through a subset of their elements.

Buckets in std::unordered_set's

A bucket is a slot in the container's internal hash table to which elements are assigned based on their hash value.

Buckets are numbered from 0 to (bucket_count() - 1).

bucket_count()

Return number of buckets (public member function)

max_bucket_count()

Return maximum number of buckets (public member function)

bucket_size()

bucket_size() returns the bucket size

size_type bucket_size ( size_type idx) const;

where idx is the bucket number, which should be lower than bucket_count().

bucket()
size_type bucket ( const key_type& k ) const;

Returns the bucket number (or index) where the element with value k is located.

Hash policy

load_factor()

load_factor() returns the load factor (public member function)

float load_factor() const noexcept;

The load factor is the ratio between the number of elements in the container (its size) and the number of buckets (bucket_count()):

load_factor = size / bucket_count

The load factor influences the probability of collision in the hash table (i.e., the probability of two elements being located in the same bucket). The container automatically increases the number of buckets to keep the load factor below a specific threshold (its max_load_factor), causing a rehash each time an expansion is needed.

To retrieve or change this threshold, use member function max_load_factor().

max_load_factor()

max_load_factor() gets or sets the maximum load factor (public member function)

float max_load_factor() const noexcept;
void max_load_factor( float mlf);

By default, unordered_set containers have a max_load_factor of 1.0.

rehash()
void rehash ( size_type n );

Sets the number of buckets in the container to n or more.

If n is greater than the current number of buckets in the container (bucket_count()), a rehash is forced. The new bucket count can either be equal or greater than n.

If n is lower than the current number of buckets in the container (bucket_count()), the function may have no effect on the bucket count and may not force a rehash.

A rehash is the reconstruction of the hash table: All the elements in the container are rearranged according to their hash value into the new set of buckets. This may alter the order of iteration of elements within the container.

Rehashes are automatically performed by the container whenever its load factor is going to surpass its max_load_factor in an operation.

[...]

reserve()

Request a capacity change

void reserve( size_type n);

Sets the number of buckets in the container (bucket_count()) to the most appropriate to contain at least n elements. If n is greater than the current bucket_count multiplied by the max_load_factor, the container's bucket_count is increased and a rehash is forced. If n is lower than that, the function may have no effect.

Observers

hash_function()

Get hash function

hasher hash_function() const;
key_eq()

Get key equivalence predicate

key_equal key_eq() const;

The key equivalence comparison is a predicate that takes the value of two elements as arguments and returns a bool value indicating whether they are to be considered equivalent. It is adopted by the container on construction. By default, it is equal_to<key_type>, which returns the same as applying the equal-to operator (==) to the arguments.

get_allocator()

Get allocator (public member function)

std::unordered_map and std::unordered_multimap

(Heavily from https://cplusplus.com/reference/unordered_map/)

[...]