Python Data Model
This note is based on "Fluent Python". Most of it consists of excerpts from the book, with a small part being my own understanding. The notes were converted from Jupyter to markdown. The original Jupyter notebook is in this repository
Getting Started
Python's Design Philosophy
Python has some remarkable design philosophies:
- Guido van Rossum knows how to make theoretical compromises and design a language that makes users feel refreshed, which is truly rare.
- One of Python's best qualities is consistency. After working with Python for a while, you start to understand the Python language and can correctly guess language features that are completely new to you.
Magic Methods in Python
Calling process: When the Python interpreter encounters special syntax, it uses special methods to activate some basic object operations. These special methods start and end with two underscores (e.g., __getitem__).
For example, obj[key] uses the __getitem__ method behind the scenes. To get the value of my_collection[key], the interpreter actually calls my_collection.__getitem__(key).
Magic methods include the following categories:
- Iteration
- Collection classes
- Attribute access
- Operator overloading
- Function and method calls
- Object creation and destruction
- String representation and formatting
- Context management (i.e., with blocks)
Card Deck Class Example
Let's build a simple card deck class to demonstrate Python's data model:
import collections
Card = collections.namedtuple('Card', ['rank', 'suit'])
class FrenchDeck:
ranks = [str(n) for n in range(2, 11)] + list('JQKA')
suits = 'spades diamonds clubs hearts'.split()
def __init__(self):
self._cards = [Card(rank, suit) for suit in self.suits
for rank in self.ranks]
def __len__(self):
return len(self._cards)
def __getitem__(self, position):
return self._cards[position]Key Points About This Code
Using namedtuple (used to construct objects with few attributes but no methods), we can easily create a card object. Here it represents a card with two attributes: rank and suit.
# List concatenation example
[1, 2, 3] + [2, 3, 4]
# Output: [1, 2, 3, 2, 3, 4]Using the FrenchDeck Class
deck = FrenchDeck()
len(deck)
# Output: 52Like any standard Python collection type, you can use the len() function to see how many cards are in a deck, because it implements the __len__ method internally.
'spades diamonds clubs hearts'.split()
# Output: ['spades', 'diamonds', 'clubs', 'hearts']Random Selection
Python has a built-in function random.choice to randomly select an element from a sequence:
from random import choice
deck = FrenchDeck()
choice(deck)
# Output: Card(rank='10', suit='diamonds')Automatic Slicing Support
Because the __getitem__ method delegates the [] operation to the self._cards list, our deck class automatically supports slicing operations:
# First three cards
deck[:3]
# Output: [Card(rank='2', suit='spades'),
# Card(rank='3', suit='spades'),
# Card(rank='4', suit='spades')]
# All aces
deck[12::13]
# Output: [Card(rank='A', suit='spades'),
# Card(rank='A', suit='diamonds'),
# Card(rank='A', suit='clubs'),
# Card(rank='A', suit='hearts')]Iteration and Membership Testing
# Reverse iteration
for card in reversed(deck):
print(card)Iteration is usually implicit. For example, if a collection type doesn't implement the __contains__ method, the in operator will do an iterative search in order. Thus, the in operator works on our FrenchDeck class because it is iterable:
Card('Q', 'hearts') in deck
# Output: TrueCustom Sorting
# Custom sorting function
suit_values = dict(spades=3, hearts=2, diamonds=1, clubs=0)
def spades_high(card):
rank_value = FrenchDeck.ranks.index(card.rank)
return rank_value * len(suit_values) + suit_values[card.suit]
for card in sorted(deck, key=spades_high):
print(card)
# Outputs cards sorted by rank, then by suitHow to Use Special Methods
Best Practices
Don't write
my_object.__len__()— instead uselen(my_object), and Python will automatically call the__len__method you implemented.Built-in types are optimized: If it's a Python built-in type like
list,str, orbytearray, CPython takes a shortcut.__len__actually directly returns theob_sizeattribute from PyVarObject (a C structure representing variable-length built-in objects in memory). Reading this value directly is much faster than calling a method.Indirect usage is preferred: Usually your code doesn't need to directly call special methods. Unless there's a lot of metaprogramming, calling special methods directly should be much less frequent than implementing them. The only exception might be the
__init__method.Use built-in functions: Using built-in functions (like
len,iter,str, etc.) to invoke special methods is the best choice. These built-in functions not only call special methods but usually provide additional benefits, and for built-in classes, they're faster.Don't invent special methods: Don't arbitrarily add special methods like
__foo__on your own, because although this name isn't currently used internally by Python, it might be in the future.
Simulating a Vector Class
Let's create a more complex example with a Vector class:
from math import hypot
class Vector:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
def __repr__(self):
return 'Vector(%r, %r)' % (self.x, self.y)
def __abs__(self):
return hypot(self.x, self.y)
def __bool__(self):
return bool(abs(self))
def __add__(self, other):
x = self.x + other.x
y = self.y + other.y
return Vector(x, y)
def __mul__(self, scalar):
return Vector(self.x * scalar, self.y * scalar)Important Notes About Vector Class
String Representation:
- The interactive console and debugger use the
repr()function to get string representations - The difference between
__repr__and__str__: the latter is called when thestr()function is used or when printing an object withprint(), and it returns a more user-friendly string. - If you only want to implement one of these two special methods,
__repr__is the better choice, because if an object doesn't have a__str__function and Python needs to call it, the interpreter will use__repr__as a substitute.
Arithmetic Operations:
Through __add__ and __mul__, we've added the + and * arithmetic operators to the vector class.
Important principle: The return values of both methods are newly created vector objects. The two operated vectors (self or other) remain unchanged; the code only reads their values. The basic principle of infix operators is not to change the operands but to produce a new value.
Boolean Conversion:
By default, instances of classes we define ourselves are always considered true, unless the class has its own implementation of __bool__ or __len__ functions.
- Behind
bool(x)is callingx.__bool__() - If the
__bool__method doesn't exist,bool(x)will try to callx.__len__() - If it returns 0, bool will return False; otherwise, it returns True
Python's Magic Methods Reference
Python's built-in magic methods can be categorized by functionality:
Basic Methods
__new__, __init__, __del__, __repr__, __str__
Arithmetic Operators
__add__, __sub__, __mul__, __truediv__, __floordiv__, __mod__, __divmod__, __pow__, __lshift__, __rshift__, __and__, __xor__, __or__
Reverse Arithmetic Operators
__radd__, __rsub__, __rmul__, __rtruediv__, __rfloordiv__, __rmod__, __rdivmod__, __rpow__, __rlshift__, __rrshift__, __rand__, __rxor__, __ror__
Augmented Assignment Operators
__iadd__, __isub__, __imul__, __itruediv__, __ifloordiv__, __imod__, __ipow__, __ilshift__, __irshift__, __iand__, __ixor__, __ior__
Unary Operators
__neg__, __pos__, __abs__, __invert__
Attribute Access
__getattr__, __getattribute__, __setattr__, __delattr__, __dir__
Descriptors
__get__, __set__, __delete__
Container Types
__len__, __getitem__, __setitem__, __delitem__, __iter__, __reversed__, __contains__
Context Management
__enter__, __exit__
Object Comparison
__eq__, __ne__, __lt__, __le__, __gt__, __ge__
Type Conversion
__int__, __float__, __bool__, __complex__, __bytes__, __str__
Other Methods
__call__, __hash__, __format__, __sizeof__
Note on Reverse and Augmented Operators
Reverse operators are called when you swap the positions of two operands (
b * ainstead ofa * b).Augmented assignment operators are shortcuts for turning infix operators into assignments (
a = a * bbecomesa *= b).
Why Len Is Not a Regular Method
If x is an instance of a built-in type, len(x) will be very fast. The reason behind this is that CPython reads the object's length directly from a C structure without calling any methods.
Getting the number of elements in a collection is a very common operation, and for types like str, list, and memoryview, this operation must be efficient.
In other words, len() is not a regular method to allow Python's built-in data structures to take a shortcut. The same applies to abs().
This design confirms another saying from the "Zen of Python":
"Special cases aren't special enough to break the rules."
Conclusion
Python's data model provides a consistent and powerful way to make your custom objects behave like built-in types. By implementing special methods (magic methods), you can:
- Make your objects work with built-in functions like
len(),abs(),str() - Support standard operators like
+,*,[],in - Enable iteration, context management, and more
The key is understanding when and how to implement these special methods to create intuitive, Pythonic APIs for your classes.
Further Reading
- Python Data Model Documentation
- "Fluent Python" by Luciano Ramalho
- Python's Magic Methods Guide