Arrays Made of Sequences in Python

This note is based on "Fluent Python". Most of it consists of excerpts from the book, with a small part being my own understanding. The notes were converted from Jupyter to markdown. The original Jupyter notebook is in this repository

Python inherited from ABC the trait of handling sequence data uniformly. No matter what data structure you're working with-strings, lists, byte sequences, arrays, XML elements, or database query results-they all share a rich set of operations: iteration, slicing, sorting, and concatenation.

Overview of Built-in Sequences

Classification of Sequences

By container type:

Container sequences: list, tuple, collections.deque - can hold items of different types
Flat sequences: str, bytes, bytearray, memoryview, array.array - can only hold one type

Important distinction: Container sequences store references to the objects they contain, while flat sequences store the value directly, not references.

By mutability:

Mutable sequences: list, bytearray, array.array, collections.deque, memoryview
Immutable sequences: tuple, str, bytes

List Comprehensions and Generator Expressions

List comprehensions are a powerful way to build lists. However, due to somewhat cryptic syntax, people often shy away from using them. Mastering list comprehensions also opens the door to generator expressions, which can generate elements of various types and fill sequences with them.

List comprehensions are shortcuts for building lists, while generator expressions can be used to create any other type of sequence. If you don't use them regularly in your code, you're likely missing opportunities to write more readable and efficient code.

python

symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes
# Output: [36, 162, 163, 165, 8364, 164]

Python ignores line breaks inside [], {}, and (), so if you have multiline lists, list comprehensions, generator expressions, or dictionaries, you can omit the unsightly line continuation character \.

List Comprehensions vs filter and map

What filter and map combined can do, list comprehensions can also do, without relying on hard-to-understand and read lambda expressions:

python

symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii
# Output: [162, 163, 165, 8364, 164]

# Equivalent using filter and map
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii
# Output: [162, 163, 165, 8364, 164]

Lambda Expressions

In Python, lambda expressions create anonymous functions-functions without a name. The syntax is:

python

lambda arguments: expression

Example:

python

add = lambda x, y: x + y
print(add(5, 3))  # Output: 8

Cartesian Products with List Comprehensions

python

# Computing Cartesian products using list comprehensions
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts
# Output: [('black', 'S'), ('black', 'M'), ('black', 'L'),
#          ('white', 'S'), ('white', 'M'), ('white', 'L')]

# To sort by size first, then color, just adjust the for clause order
tshirts = [(color, size) for size in sizes for color in colors]
tshirts
# Output: [('black', 'S'), ('white', 'S'), ('black', 'M'),
#          ('white', 'M'), ('black', 'L'), ('white', 'L')]

Generator Expressions

The sole purpose of list comprehensions is to generate lists. To generate other types of sequences, generator expressions are the way to go.

While you can use list comprehensions to initialize tuples, arrays, or other sequence types, generator expressions are the better choice. This is because generator expressions follow the iterator protocol and can produce elements one by one, rather than building a complete list first and then passing it to a constructor. The former approach obviously saves memory.

The syntax for generator expressions is similar to list comprehensions, just replace square brackets with parentheses:

python

symbols = '$¢£¥€¤'
# If the generator expression is the only argument, no extra parentheses needed
tuple(ord(symbol) for symbol in symbols)
# Output: (36, 162, 163, 165, 8364, 164)

import array
# array.array constructor needs two arguments, so parentheses are required
array.array('I', (ord(symbol) for symbol in symbols))
# Output: array('I', [36, 162, 163, 165, 8364, 164])

Example: Cartesian product with generator expression

This example uses a generator expression to compute a Cartesian product for printing all combinations of 2 colors and 3 sizes of T-shirts. Unlike earlier examples, using a generator expression means no list of 6 combinations stays in memory, because the generator expression produces one combination each time the for loop runs.

Generator expressions produce elements one at a time, never producing a list with all 6 T-shirt styles at once.

python

colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirts in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirts)
# Output:
# black S
# black M
# black L
# white S
# white M
# white L

Tuples Are Not Just Immutable Lists

Characteristics of tuples:

Elements in tuples cannot be modified
Can be used as records without field names

Tuples are actually records: each element in a tuple holds data for one field of a record, plus the position of that field. It's this position information that gives the data meaning.

If you only think of tuples as immutable lists, the other information-the total number of elements it contains and their positions-seems dispensable. But if you treat tuples as collections of fields, then quantity and position information become very important.

python

lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)  # Unpacking
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'),  
    ('ESP', 'XDA205856')]
for passport in sorted(traveler_ids):  
    print('%s/%s' % passport)
# Output:
# BRA/CE342567
# ESP/XDA205856
# USA/31195855

Tuple Unpacking

Tuple unpacking works with any iterable object. The only hard requirement is that the iterable produces exactly as many items as there are receiving slots in the tuple, unless we use * to capture excess items.

python

# Using * operator to unpack an iterable as function arguments
divmod(20, 8)
# Output: (2, 4)

t = (20, 8)
divmod(*t)
# Output: (2, 4)

quotient, remainder = divmod(*t)
quotient, remainder
# Output: (2, 4)

# Allowing a function to return multiple values in tuple form
import os
_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')
filename
# Output: 'idrsa.pub'

If you're writing internationalized software, _ may not be an ideal placeholder because it's also the conventional alias for the gettext.gettext function, as mentioned in the gettext module documentation. In other cases, _ makes a good placeholder.

Using * in Tuple Unpacking

python

# Using * to handle excess items
a, b, *rest = range(5)
a, b, rest
# Output: (0, 1, [2, 3, 4])

Nested Tuple Unpacking

python

metro_areas = [
    ('Tokyo','JP',36.933,(35.689722,139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333,-99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611,-74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778,-46.635833)),
]
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))
# Output:
#                 |   lat.    |   long.  
# Mexico City     |   19.4333 |  -99.1333
# New York-Newark |   40.8086 |  -74.0204
# Sao Paulo       |  -23.5478 |  -46.6358

Named Tuples

While tuples are very useful, as records they lack one feature: we often need to name the fields. The namedtuple function solves this problem.

python

from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35, 139))
tokyo
# Output: City(name='Tokyo', country='JP', population=36.933, coordinates=(35, 139))

tokyo.population  # Access field by name or position
# Output: 36.933

tokyo.coordinates
# Output: (35, 139)

tokyo[1]
# Output: 'JP'

Named Tuple Attributes and Methods

python

# _fields attribute is a tuple containing all field names
City._fields
# Output: ('name', 'country', 'population', 'coordinates')

# _make() creates an instance from an iterable
City._make(('Tokyo', 'JP', 36.933, (35, 139)))
# Output: City(name='Tokyo', country='JP', population=36.933, coordinates=(35, 139))

# _asdict() returns an OrderedDict
City._asdict(tokyo)
# Output: {'name': 'Tokyo', 'country': 'JP', 'population': 36.933, 'coordinates': (35, 139)}

Aside from methods related to adding or removing elements, tuples support all other list methods, and tuples don't have a __reversed__ method.

Slicing

Why Slices and Ranges Exclude the Last Item

When only the stop position is given, we can quickly see how many elements are in the slice or range: range(3) and my_list[:3] both return 3 elements.
When both start and stop positions are visible, we can quickly calculate the slice or range length by subtracting: stop - start.
This also lets us split a sequence into non-overlapping parts at any index by writing my_list[:x] and my_list[x:].

python

# We can also use s[a:b:c] to take values from s between a and b with interval c
# c can also be negative, meaning reverse order
s = 'bicycle'
s[::3]
# Output: 'bye'

s[::-1]
# Output: 'elcycib'

s[::-2]
# Output: 'eccb'

Using Slice Objects - Naming Slices

python

invoice = """
0.....6................................40........52...55........
1909  Pimoroni PiBrella                    $17.50    3    $52.50
1489  6mm Tactile Switch x20                $4.95    2     $9.90
1510  Panavise Jr.-PV-201                  $28.00    1    $28.00
1601  PiTFT Mini Kit 320x240               $34.95    1    $34.95
"""
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

Ellipsis in Multidimensional Slicing

If you have a multidimensional array, you can use ... to represent omitted dimensions:

python

import numpy as np

# Create a 3D array
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

# Use ... to access all elements in first dimension
print(arr[..., 0])
# Output: [[ 1  4]
#          [ 7 10]]

Assigning to Slices

If we put a slice on the left side of an assignment statement, or make it the target of a del operation, we can splice, remove, or modify sequences in place:

python

l = list(range(10))
l
# Output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

l[2:5] = [20, 30]
l
# Output: [0, 1, 20, 30, 6, 7, 8, 9]

del l[5:7]
l
# Output: [0, 1, 20, 30, 6, 9]

l[3::2] = [11, 22]
l
# Output: [0, 1, 20, 11, 6, 22]

Using + and * with Sequences

python

l = [1, 2, 3]
l * 5
# Output: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

5 * 'abcd'
# Output: 'abcdabcdabcdabcdabcd'

Note: Both + and * follow this rule: they don't modify the original operands but build entirely new sequences.

Building Lists of Lists

python

# Correct way to build a list of lists
board = [['_'] * 3 for i in range(3)]
board
# Output: [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

board[1][2] = 'X'
board
# Output: [['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]

# WRONG way - creates three references to the same list
weird_board = [['_'] * 3] * 3
weird_board
# Output: [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

weird_board[1][2] = 'O'
weird_board
# Output: [['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]

Augmented Assignment with Sequences

The behavior of augmented assignment operators += and *= depends on their first operand
+= uses the special method __iadd__
If a class doesn't implement this method, Python falls back to __add__
The same concepts apply to *=, which corresponds to __imul__

Example showing how *= behaves on mutable vs immutable sequences:

python

# Mutable sequence
l = [1, 2, 3]
id(l)
# Output: (some memory address)

l *= 2
l
# Output: [1, 2, 3, 1, 2, 3]

id(l)
# Output: (same memory address)

# Immutable sequence
t = (1, 2, 3)
id(t)
# Output: (some memory address)

t *= 2
id(t)
# Output: (different memory address)

For mutable sequences: the list ID doesn't change after augmented multiplication; new elements are appended.
For immutable sequences: the original tuple is destroyed, a new tuple is created, and the reference is reassigned.

Therefore, repeated concatenation on immutable sequences is inefficient because each time creates a new object, and the interpreter must copy elements from the old object to the new one before appending new elements.

list.sort Method and sorted Built-in Function

list.sort sorts the list in place without copying it. This is why this method returns None.
In contrast, the built-in sorted function creates a new list as its return value. This method accepts any iterable object as an argument, including immutable sequences or generators.
Both list.sort and sorted have two optional keyword arguments: reverse and key
key is a one-parameter function that will be applied to each element in the sequence, and the resulting values will be the comparison keys for the sorting algorithm.

The convention of returning None to indicate in-place changes has a drawback: callers can't chain these calls together. Methods that return new objects (like all str methods) work the opposite way and can be chained, forming a fluent interface.

python

fruits = ['grape', 'raspberry', 'apple', 'banana']
sorted(fruits)
# Output: ['apple', 'banana', 'grape', 'raspberry']

fruits  # Original unchanged
# Output: ['grape', 'raspberry', 'apple', 'banana']

sorted(fruits, key=len)
# Output: ['grape', 'apple', 'banana', 'raspberry']

fruits.sort()  # Sorts in place
fruits
# Output: ['apple', 'banana', 'grape', 'raspberry']

Managing Sorted Sequences with bisect

The bisect module contains two main functions, bisect and insort, both using binary search algorithms to find or insert elements in sorted sequences.

python

import bisect
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

# bisect finds the insertion position
for needle in NEEDLES:
    position = bisect.bisect(HAYSTACK, needle)
    print(f'{needle} @ position {position}')

bisect is actually an alias for bisect_right. Its sibling function is bisect_left. The difference: bisect_left returns a position before equal elements, while bisect_right returns a position after them.

Inserting with bisect.insort

Sorting is expensive, so after obtaining a sorted sequence, it's best to keep it sorted. bisect.insort exists for this purpose.

python

import bisect
import random
SIZE = 7
random.seed(1729)
my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d->' % new_item, my_list)

When a List Is Not the Answer

While lists are flexible and simple, we may have better options for various needs.

To store 10 million floating-point numbers, arrays are much more efficient because arrays store numeric machine representations (bytes), not float objects.
For frequent first-in-first-out operations, deque (double-ended queue) will be faster.

Arrays

If we need a list containing only numbers, array.array is more efficient than list. Arrays support all mutable sequence operations including .pop, .insert, and .extend. Additionally, arrays provide faster file reading and writing methods like .frombytes and .tofile.

python

# Creating, saving, and loading a float array
from array import array
from random import random
floats = array('d', (random() for i in range(10**7)))
floats[-1]

fp = open('floats.bin', 'wb')
floats.tofile(fp)  # Save array to binary file
fp.close()

floats2 = array('d')
fp = open('floats.bin', 'rb') 
floats2.fromfile(fp, 10**7)  # Read 10 million floats from binary file
fp.close()
floats2[-1]

floats2 == floats  # Arrays are equal
# Output: True

Memory Views

memoryview is a built-in class that lets you manipulate different slices of the same array without copying content.

python

# Changing an array element by modifying one byte
numbers = array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)
len(memv)
# Output: 5

memv_oct = memv.cast('B')  # Cast to unsigned char
memv_oct.tolist()
# Output: [254, 255, 255, 255, 0, 0, 1, 0, 2, 0]

memv_oct[5] = 4  # Assign byte at position 5
numbers
# Output: array('h', [-2, -1, 1024, 1, 2])

NumPy and SciPy

NumPy implements multidimensional homogeneous arrays and matrices that can handle not just numbers but also user-defined records.
SciPy is another library based on NumPy, providing many algorithms related to scientific computing, designed for linear algebra, numerical integration, and statistics.

Deques and Other Queues

Using .append and .pop, we can use lists as stacks or queues. But deleting the first element (or adding before the first element) is expensive because it involves moving all elements.

collections.deque (double-ended queue) is a thread-safe data type that can quickly add or remove elements from both ends.

python

from collections import deque
dq = deque(range(10), maxlen=10)
dq
# Output: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

dq.rotate(3)  # Rotate right 3 positions
dq
# Output: deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)

dq.appendleft(-1)  # Add to left
dq
# Output: deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

dq.extend([11, 22, 33])  # Add to right
dq
# Output: deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)

Conclusion

Python's sequence types provide a rich, consistent interface for working with ordered data. By understanding the differences between container and flat sequences, mutable and immutable sequences, and when to use specialized types like arrays, deques, or NumPy arrays, you can write more efficient and Pythonic code.

Key takeaways:

Use list comprehensions and generator expressions for concise, readable sequence creation
Understand tuple unpacking and named tuples for better data handling
Master slicing operations for efficient sequence manipulation
Choose the right sequence type for your use case (list, tuple, array, deque, etc.)
Leverage built-in tools like bisect for sorted sequences

Arrays Made of Sequences in Python ​

Overview of Built-in Sequences ​

Classification of Sequences ​

List Comprehensions and Generator Expressions ​

List Comprehensions vs filter and map ​

Lambda Expressions ​

Cartesian Products with List Comprehensions ​

Generator Expressions ​

Tuples Are Not Just Immutable Lists ​

Characteristics of tuples: ​

Tuple Unpacking ​

Using * in Tuple Unpacking ​

Nested Tuple Unpacking ​

Named Tuples ​

Named Tuple Attributes and Methods ​

Slicing ​

Why Slices and Ranges Exclude the Last Item ​

Using Slice Objects - Naming Slices ​

Ellipsis in Multidimensional Slicing ​

Assigning to Slices ​

Using + and * with Sequences ​

Building Lists of Lists ​

Augmented Assignment with Sequences ​

list.sort Method and sorted Built-in Function ​

Managing Sorted Sequences with bisect ​

Inserting with bisect.insort ​

When a List Is Not the Answer ​

Arrays ​

Memory Views ​

NumPy and SciPy ​

Deques and Other Queues ​

Conclusion ​

Further Reading ​

Arrays Made of Sequences in Python

Overview of Built-in Sequences

Classification of Sequences

List Comprehensions and Generator Expressions

List Comprehensions vs filter and map

Lambda Expressions

Cartesian Products with List Comprehensions

Generator Expressions

Tuples Are Not Just Immutable Lists

Characteristics of tuples:

Tuple Unpacking

Using * in Tuple Unpacking

Nested Tuple Unpacking

Named Tuples

Named Tuple Attributes and Methods

Slicing

Why Slices and Ranges Exclude the Last Item

Using Slice Objects - Naming Slices

Ellipsis in Multidimensional Slicing

Assigning to Slices

Using + and * with Sequences

Building Lists of Lists

Augmented Assignment with Sequences

list.sort Method and sorted Built-in Function

Managing Sorted Sequences with bisect

Inserting with bisect.insort

When a List Is Not the Answer

Arrays

Memory Views

NumPy and SciPy

Deques and Other Queues

Conclusion

Further Reading