Arrays Made of Sequences in Python
This note is based on "Fluent Python". Most of it consists of excerpts from the book, with a small part being my own understanding. The notes were converted from Jupyter to markdown. The original Jupyter notebook is in this repository
Python inherited from ABC the trait of handling sequence data uniformly. No matter what data structure you're working with—strings, lists, byte sequences, arrays, XML elements, or database query results—they all share a rich set of operations: iteration, slicing, sorting, and concatenation.
Overview of Built-in Sequences
Classification of Sequences
By container type:
- Container sequences:
list,tuple,collections.deque- can hold items of different types - Flat sequences:
str,bytes,bytearray,memoryview,array.array- can only hold one type
Important distinction: Container sequences store references to the objects they contain, while flat sequences store the value directly, not references.
By mutability:
- Mutable sequences:
list,bytearray,array.array,collections.deque,memoryview - Immutable sequences:
tuple,str,bytes
List Comprehensions and Generator Expressions
List comprehensions are a powerful way to build lists. However, due to somewhat cryptic syntax, people often shy away from using them. Mastering list comprehensions also opens the door to generator expressions, which can generate elements of various types and fill sequences with them.
List comprehensions are shortcuts for building lists, while generator expressions can be used to create any other type of sequence. If you don't use them regularly in your code, you're likely missing opportunities to write more readable and efficient code.
symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes
# Output: [36, 162, 163, 165, 8364, 164]Python ignores line breaks inside
[],{}, and(), so if you have multiline lists, list comprehensions, generator expressions, or dictionaries, you can omit the unsightly line continuation character\.
List Comprehensions vs filter and map
What filter and map combined can do, list comprehensions can also do, without relying on hard-to-understand and read lambda expressions:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii
# Output: [162, 163, 165, 8364, 164]
# Equivalent using filter and map
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii
# Output: [162, 163, 165, 8364, 164]Lambda Expressions
In Python, lambda expressions create anonymous functions—functions without a name. The syntax is:
lambda arguments: expressionExample:
add = lambda x, y: x + y
print(add(5, 3)) # Output: 8Cartesian Products with List Comprehensions
# Computing Cartesian products using list comprehensions
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts
# Output: [('black', 'S'), ('black', 'M'), ('black', 'L'),
# ('white', 'S'), ('white', 'M'), ('white', 'L')]
# To sort by size first, then color, just adjust the for clause order
tshirts = [(color, size) for size in sizes for color in colors]
tshirts
# Output: [('black', 'S'), ('white', 'S'), ('black', 'M'),
# ('white', 'M'), ('black', 'L'), ('white', 'L')]Generator Expressions
The sole purpose of list comprehensions is to generate lists. To generate other types of sequences, generator expressions are the way to go.
While you can use list comprehensions to initialize tuples, arrays, or other sequence types, generator expressions are the better choice. This is because generator expressions follow the iterator protocol and can produce elements one by one, rather than building a complete list first and then passing it to a constructor. The former approach obviously saves memory.
The syntax for generator expressions is similar to list comprehensions, just replace square brackets with parentheses:
symbols = '$¢£¥€¤'
# If the generator expression is the only argument, no extra parentheses needed
tuple(ord(symbol) for symbol in symbols)
# Output: (36, 162, 163, 165, 8364, 164)
import array
# array.array constructor needs two arguments, so parentheses are required
array.array('I', (ord(symbol) for symbol in symbols))
# Output: array('I', [36, 162, 163, 165, 8364, 164])Example: Cartesian product with generator expression
This example uses a generator expression to compute a Cartesian product for printing all combinations of 2 colors and 3 sizes of T-shirts. Unlike earlier examples, using a generator expression means no list of 6 combinations stays in memory, because the generator expression produces one combination each time the for loop runs.
Generator expressions produce elements one at a time, never producing a list with all 6 T-shirt styles at once.
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirts in ('%s %s' % (c, s) for c in colors for s in sizes):
print(tshirts)
# Output:
# black S
# black M
# black L
# white S
# white M
# white LTuples Are Not Just Immutable Lists
Characteristics of tuples:
- Elements in tuples cannot be modified
- Can be used as records without field names
Tuples are actually records: each element in a tuple holds data for one field of a record, plus the position of that field. It's this position information that gives the data meaning.
If you only think of tuples as immutable lists, the other information—the total number of elements it contains and their positions—seems dispensable. But if you treat tuples as collections of fields, then quantity and position information become very important.
lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014) # Unpacking
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'),
('ESP', 'XDA205856')]
for passport in sorted(traveler_ids):
print('%s/%s' % passport)
# Output:
# BRA/CE342567
# ESP/XDA205856
# USA/31195855Tuple Unpacking
Tuple unpacking works with any iterable object. The only hard requirement is that the iterable produces exactly as many items as there are receiving slots in the tuple, unless we use * to capture excess items.
# Using * operator to unpack an iterable as function arguments
divmod(20, 8)
# Output: (2, 4)
t = (20, 8)
divmod(*t)
# Output: (2, 4)
quotient, remainder = divmod(*t)
quotient, remainder
# Output: (2, 4)
# Allowing a function to return multiple values in tuple form
import os
_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')
filename
# Output: 'idrsa.pub'If you're writing internationalized software,
_may not be an ideal placeholder because it's also the conventional alias for thegettext.gettextfunction, as mentioned in the gettext module documentation. In other cases,_makes a good placeholder.
Using * in Tuple Unpacking
# Using * to handle excess items
a, b, *rest = range(5)
a, b, rest
# Output: (0, 1, [2, 3, 4])Nested Tuple Unpacking
metro_areas = [
('Tokyo','JP',36.933,(35.689722,139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
('Mexico City', 'MX', 20.142, (19.433333,-99.133333)),
('New York-Newark', 'US', 20.104, (40.808611,-74.020386)),
('Sao Paulo', 'BR', 19.649, (-23.547778,-46.635833)),
]
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:
if longitude <= 0:
print(fmt.format(name, latitude, longitude))
# Output:
# | lat. | long.
# Mexico City | 19.4333 | -99.1333
# New York-Newark | 40.8086 | -74.0204
# Sao Paulo | -23.5478 | -46.6358Named Tuples
While tuples are very useful, as records they lack one feature: we often need to name the fields. The namedtuple function solves this problem.
from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35, 139))
tokyo
# Output: City(name='Tokyo', country='JP', population=36.933, coordinates=(35, 139))
tokyo.population # Access field by name or position
# Output: 36.933
tokyo.coordinates
# Output: (35, 139)
tokyo[1]
# Output: 'JP'Named Tuple Attributes and Methods
# _fields attribute is a tuple containing all field names
City._fields
# Output: ('name', 'country', 'population', 'coordinates')
# _make() creates an instance from an iterable
City._make(('Tokyo', 'JP', 36.933, (35, 139)))
# Output: City(name='Tokyo', country='JP', population=36.933, coordinates=(35, 139))
# _asdict() returns an OrderedDict
City._asdict(tokyo)
# Output: {'name': 'Tokyo', 'country': 'JP', 'population': 36.933, 'coordinates': (35, 139)}Aside from methods related to adding or removing elements, tuples support all other list methods, and tuples don't have a __reversed__ method.
Slicing
Why Slices and Ranges Exclude the Last Item
- When only the stop position is given, we can quickly see how many elements are in the slice or range:
range(3)andmy_list[:3]both return 3 elements. - When both start and stop positions are visible, we can quickly calculate the slice or range length by subtracting:
stop - start. - This also lets us split a sequence into non-overlapping parts at any index by writing
my_list[:x]andmy_list[x:].
# We can also use s[a:b:c] to take values from s between a and b with interval c
# c can also be negative, meaning reverse order
s = 'bicycle'
s[::3]
# Output: 'bye'
s[::-1]
# Output: 'elcycib'
s[::-2]
# Output: 'eccb'Using Slice Objects - Naming Slices
invoice = """
0.....6................................40........52...55........
1909 Pimoroni PiBrella $17.50 3 $52.50
1489 6mm Tactile Switch x20 $4.95 2 $9.90
1510 Panavise Jr.-PV-201 $28.00 1 $28.00
1601 PiTFT Mini Kit 320x240 $34.95 1 $34.95
"""
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
print(item[UNIT_PRICE], item[DESCRIPTION])Ellipsis in Multidimensional Slicing
If you have a multidimensional array, you can use ... to represent omitted dimensions:
import numpy as np
# Create a 3D array
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
# Use ... to access all elements in first dimension
print(arr[..., 0])
# Output: [[ 1 4]
# [ 7 10]]Assigning to Slices
If we put a slice on the left side of an assignment statement, or make it the target of a del operation, we can splice, remove, or modify sequences in place:
l = list(range(10))
l
# Output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[2:5] = [20, 30]
l
# Output: [0, 1, 20, 30, 6, 7, 8, 9]
del l[5:7]
l
# Output: [0, 1, 20, 30, 6, 9]
l[3::2] = [11, 22]
l
# Output: [0, 1, 20, 11, 6, 22]Using + and * with Sequences
l = [1, 2, 3]
l * 5
# Output: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
5 * 'abcd'
# Output: 'abcdabcdabcdabcdabcd'Note: Both + and * follow this rule: they don't modify the original operands but build entirely new sequences.
Building Lists of Lists
# Correct way to build a list of lists
board = [['_'] * 3 for i in range(3)]
board
# Output: [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
board[1][2] = 'X'
board
# Output: [['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]
# WRONG way - creates three references to the same list
weird_board = [['_'] * 3] * 3
weird_board
# Output: [['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
weird_board[1][2] = 'O'
weird_board
# Output: [['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]Augmented Assignment with Sequences
- The behavior of augmented assignment operators
+=and*=depends on their first operand +=uses the special method__iadd__- If a class doesn't implement this method, Python falls back to
__add__ - The same concepts apply to
*=, which corresponds to__imul__
Example showing how *= behaves on mutable vs immutable sequences:
# Mutable sequence
l = [1, 2, 3]
id(l)
# Output: (some memory address)
l *= 2
l
# Output: [1, 2, 3, 1, 2, 3]
id(l)
# Output: (same memory address)
# Immutable sequence
t = (1, 2, 3)
id(t)
# Output: (some memory address)
t *= 2
id(t)
# Output: (different memory address)- For mutable sequences: the list ID doesn't change after augmented multiplication; new elements are appended.
- For immutable sequences: the original tuple is destroyed, a new tuple is created, and the reference is reassigned.
Therefore, repeated concatenation on immutable sequences is inefficient because each time creates a new object, and the interpreter must copy elements from the old object to the new one before appending new elements.
list.sort Method and sorted Built-in Function
list.sortsorts the list in place without copying it. This is why this method returnsNone.- In contrast, the built-in
sortedfunction creates a new list as its return value. This method accepts any iterable object as an argument, including immutable sequences or generators. - Both
list.sortandsortedhave two optional keyword arguments:reverseandkey keyis a one-parameter function that will be applied to each element in the sequence, and the resulting values will be the comparison keys for the sorting algorithm.
The convention of returning
Noneto indicate in-place changes has a drawback: callers can't chain these calls together. Methods that return new objects (like allstrmethods) work the opposite way and can be chained, forming a fluent interface.
fruits = ['grape', 'raspberry', 'apple', 'banana']
sorted(fruits)
# Output: ['apple', 'banana', 'grape', 'raspberry']
fruits # Original unchanged
# Output: ['grape', 'raspberry', 'apple', 'banana']
sorted(fruits, key=len)
# Output: ['grape', 'apple', 'banana', 'raspberry']
fruits.sort() # Sorts in place
fruits
# Output: ['apple', 'banana', 'grape', 'raspberry']Managing Sorted Sequences with bisect
The bisect module contains two main functions, bisect and insort, both using binary search algorithms to find or insert elements in sorted sequences.
import bisect
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]
# bisect finds the insertion position
for needle in NEEDLES:
position = bisect.bisect(HAYSTACK, needle)
print(f'{needle} @ position {position}')bisect is actually an alias for bisect_right. Its sibling function is bisect_left. The difference: bisect_left returns a position before equal elements, while bisect_right returns a position after them.
Inserting with bisect.insort
Sorting is expensive, so after obtaining a sorted sequence, it's best to keep it sorted. bisect.insort exists for this purpose.
import bisect
import random
SIZE = 7
random.seed(1729)
my_list = []
for i in range(SIZE):
new_item = random.randrange(SIZE*2)
bisect.insort(my_list, new_item)
print('%2d->' % new_item, my_list)When a List Is Not the Answer
While lists are flexible and simple, we may have better options for various needs.
- To store 10 million floating-point numbers, arrays are much more efficient because arrays store numeric machine representations (bytes), not
floatobjects. - For frequent first-in-first-out operations,
deque(double-ended queue) will be faster.
Arrays
If we need a list containing only numbers, array.array is more efficient than list. Arrays support all mutable sequence operations including .pop, .insert, and .extend. Additionally, arrays provide faster file reading and writing methods like .frombytes and .tofile.
# Creating, saving, and loading a float array
from array import array
from random import random
floats = array('d', (random() for i in range(10**7)))
floats[-1]
fp = open('floats.bin', 'wb')
floats.tofile(fp) # Save array to binary file
fp.close()
floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10**7) # Read 10 million floats from binary file
fp.close()
floats2[-1]
floats2 == floats # Arrays are equal
# Output: TrueMemory Views
memoryview is a built-in class that lets you manipulate different slices of the same array without copying content.
# Changing an array element by modifying one byte
numbers = array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)
len(memv)
# Output: 5
memv_oct = memv.cast('B') # Cast to unsigned char
memv_oct.tolist()
# Output: [254, 255, 255, 255, 0, 0, 1, 0, 2, 0]
memv_oct[5] = 4 # Assign byte at position 5
numbers
# Output: array('h', [-2, -1, 1024, 1, 2])NumPy and SciPy
- NumPy implements multidimensional homogeneous arrays and matrices that can handle not just numbers but also user-defined records.
- SciPy is another library based on NumPy, providing many algorithms related to scientific computing, designed for linear algebra, numerical integration, and statistics.
Deques and Other Queues
Using .append and .pop, we can use lists as stacks or queues. But deleting the first element (or adding before the first element) is expensive because it involves moving all elements.
collections.deque (double-ended queue) is a thread-safe data type that can quickly add or remove elements from both ends.
from collections import deque
dq = deque(range(10), maxlen=10)
dq
# Output: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
dq.rotate(3) # Rotate right 3 positions
dq
# Output: deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
dq.appendleft(-1) # Add to left
dq
# Output: deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
dq.extend([11, 22, 33]) # Add to right
dq
# Output: deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)Conclusion
Python's sequence types provide a rich, consistent interface for working with ordered data. By understanding the differences between container and flat sequences, mutable and immutable sequences, and when to use specialized types like arrays, deques, or NumPy arrays, you can write more efficient and Pythonic code.
Key takeaways:
- Use list comprehensions and generator expressions for concise, readable sequence creation
- Understand tuple unpacking and named tuples for better data handling
- Master slicing operations for efficient sequence manipulation
- Choose the right sequence type for your use case (list, tuple, array, deque, etc.)
- Leverage built-in tools like
bisectfor sorted sequences
Further Reading
- Python Data Structures Documentation
- "Fluent Python" by Luciano Ramalho
- collections module documentation
- array module documentation