This article is intended to cover some basic data structures: arrays, linked lists, stacks and queues; their applications and implementations in Java. Source code of implementations used in this article is available on Github.
Abstract and concrete data structures
Abstract data structure is a mathematical model, an interface declaring desired behaviour. Concrete data structure is an implementation of such interface. For example, collection or container is an abstract data structure. It says only that some items of particular type are stored inside it and declares methods for accessing and updating them. Array and list, on the other side, are concrete data structures. They describe the way items are stored (in continuous memory chunk versus as a chain of pointers) and implement methods declared in collection’s interface.
Collection (Container) is an abstract data structure. It
holds several items of some specified type
T and offers methods for accessing
and updating them. The following interface will be used throughout this article
to describe containers:
Array is a container that holds a number of items in contiguous area of memory. In its simplest form, static array, the capacity of a container is set up at the moment of its creation and may not be changed. Dynamic arrays, however, can grow automatically when current capacity is exceeded.
Arrays are used both on its own and as an underlying data structure for more sophisticated structures, e.g. heaps, associative arrays and adjacency matrices.
Arrays are supported out of the box in overwhelming majority of modern programming languages. Usually static arrays present as a syntactic feature. Here is an example of creating static array of 10 ints in Java:
Dynamic arrays often shipped as utility classes of standard library
std::vector in C++ and
ArrayList in Java).
Dynamic array implementation
Idea: let’s start with a small static array. If there is no free space left
to insert new item, then allocate new static array with extended capacity and copy
all items from the old array into the new one. It is proven that allocating new
array twice as large as the old one gives you
O(1) amortized complexity of
inserting new item at the end of an array.
Here is a simple implementation of dynamic array in Java:
Advanced implementation can shrink array when the number of items drops below
some point, say, half of capacity. However, it should be implemented with
cautious, because too eager behaviour can cause drastic change of complexity.
Imagine an array which shrinks itself in half as soon as the number of items
drops below half of capacity; when there are
n/2 items and you remove one,
then insert one and repeat these actions
m times the overall complexity will
O(mn). It’s better to shrink array by lesser amount, a third or a fourth of
capacity for example.
Linked list is also a container. It stores its items in nodes. Each node consists of at least two fields: a value to store and a pointer to another node. Nodes are chained one to another and, thus, can be located in arbitrary places of memory (contrary to arrays). Linked lists can be divided into several groups according to the number of pointers in its nodes: singly-linked, doubly-linked and multiply-linked.
Linked lists can be used either on its own or as an underlying data structure for such ADTs as adjacency lists, stacks, queues and for collision resolution in hash tables.
Linked lists are present in many programming languages, for example,
in C++ and
LinkedList in Java. Usually they are implemented as doubly-linked
Doubly-linked list implementation
Idea: key idea of this implementation is to use head and tail sentinel
nodes. They are constant for particular linked list object; they point to each
other when the list is empty and to the first/last node otherwise. Head
prev is always
null, just as tail sentinel’s
purpose of such sentinel nodes is to save us time and energy playing around
null checks: this way we can be sure that list always has head and tail nodes,
though they’re merely decorative.
Here is a simple implementation of doubly-linked list in Java:
Here is complexity comparision of dynamic array and doubly linked list for all
Collection interface. As you see, linked list wins on
prepending, while dynamic array wins on random access.
|Additional space used||O(n)||O(n)|
Stacks and queues
Stacks and queues are abstract data structures that support limited access to their elements: they allow only appending and retrieval. They are distinguished by the particular retrieval order they support: stack is LIFO container (“last-in-first-out”) and queue is FIFO container (“first-in-first-out”). Interfaces of stack and queue are presented below:
Both stacks and queues are widely applicable in modern software engineering. Examples of stack usage are:
- Function calls are stored in stack while program is executed
- Many parsing and expression evaluation algorithms use stacks to hold values or currently processed lexemes
- JVM is a stack-based virtual machine. It does not use registers; instead it stores all variables on stack
Examples of queue usage are:
- Concurrent applications; for example, concurrent access to some resource means that there is a queue of requests and they are processed in order
- Buffering and data transfer
Usually stacks and queues are implemented as adapters to another containers,
for example, linked lists or dynamic arrays. In Java
Queue is an interface
implemented by classes
LinkedList and others.
a concrete class, but it is obsolete. It is recommended to use interface
Deque instead and its implementations as it has methods of both stacks and
Explanations skipped as the implementation is quite straightforward.
Implementation of stack using doubly-linked list will be exactly the same.
Complexity of operations is
O(1) amortized for
O(1) for list.
Queue implementation using dynamic array
Idea: this is standard two-pointer implementation of a queue.
Because in queue we need to retrieve element from the beginning of a queue, it
is inefficient to shift the whole array back each time
dequeue is called.
Instead lets create a pointer to location where the real beginning of a queue is
and shift it forward on
enqueue, however, lets insert new item
not just at the end of an array, but, if possible, in the free place between
array’s beginning and queue’s beginning. In order to achieve this lets create
another pointer to store location of the real end of a queue. When this two
pointers meet lets relocate the whole array so that queue’s beginning and end
will be the same as underlying array’s are.
Complexity of this solution is
O(1) amortized for both operations.
Queue implementation using doubly-linked list
This implementation is much easier, because in doubly-linked list we can
remove item from both beginning and end in
Complexity of this solution is
O(1) for both operations.
Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms
Skiena. The Algorithm Design Manual