Data Classes in Python 3.7
Data Classes in Python 3.7
7
Brian Stempin | Yiu Ming Huynh
Goals
@dataclass
class MyExampleClass(object):
x: int
y: int = 20
Dataclass Features
@dataclass
class CartesianPoints:
x: float
y: float
@dataclass
class PolarPoints:
r: float
theta: float
c = CartesianPoints(1, 2)
p = PolarPoints(1, 2)
>>> print(c == p)
False
Dataclasses as a class have their own names,
whereas tuples are always tuples
c = (1, 2)
p = (1, 2)
>>> print(c == p)
True
Namedtuples kinda solve the problem,
but then you run into this:
>>> s = (1, 2, 3)
>>> s[0] = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
... but dataclasses have options
@dataclass
class MutatingMing:
super_powers: List[str]
@dataclass(frozen=True)
class ForeverMing:
super_powers: List[str]
m1 = MutatingMing(super_powers=["shapeshifting master"])
m1.super_powers = ["levitation"]
m2 = ForeverMing(super_powers=["stops time"])
m2.super_powers = ["super human strength"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'super_powers'
Dataclasses can inherit from other classes...
@dataclass
class Product:
name: str
@dataclass
class DigitalProduct(Product):
download_link: URL
@dataclass
class Ebook(DigitalProduct):
isbn: str
But try doing that with a tuple
@dataclass
class CartesianPoint:
x: float
y: float
c1 = Tuple(1,2)
Vs
@dataclass
class ARandomBundleOfAttributes:
opener: str
random_number: int
random_bool: bool
closing_statement: str
● [spoiler text] (not really) Tuples have better performance... Coming up soon
● Tuples are naturally immutable, so they make a good data structure for
multithreading
Pros of Dataclasses vs Dict
Dataclasses have well structured, specified attributes
@dataclass
class TemperaturePoint:
x: float
y: float
temperature: float
temperature_points = [
{"x": 1.2, "y": 4.5, "temperature": 20.0},
{"x": 5.4, "temperature": 24.0}]
species = {
"name": "mountain toucan"
}
pet = {
"species": species,
"name": "billy"
}
phones_to_addresses = {
"+13125004000": {"name": "Billy the Toucan"},
"+13125004001": {"name": "Polly the Parrot"},
...
}
Try doing this with a dataclass
@dataclass
class PhoneNumberToAddress:
# you can't even have a string that starts with a symbol
or
# number as an attribute
pass
@dataclass
class PhoneEntry:
number: str
business_name: str
Dataclasses come with the standard library; you have to install attrs as a library.
# requirements.txt
attrs==17.10.0
Cons of Dataclasses vs attrs
Cons of Dataclasses vs attrs
@attr.s(slots=True)
class YellowPageEntry:
phone_number: PhoneNumber =
attr.ib(convert=phonenumbers.parse)
business_name: str = attr.ib(validator=instance_of(str))
● How much of the dataclasses/attrs slow down has to do with the type
checking and validation?
● How much of the dataclasses/attrs slow down has to do with how the data is
being stored?
Benchmarking Process
● ASV (Airspeed Velocity) was a life saver and was used to measure CPU time
and memory usage
● Every benchmark starts with an attribute count ("ac" for the rest of this
presentation)
● A list of N random names, types, and values to fit those types are generated
and stored. E.g.: `[['a', 'b', 'c'], [int, str, int], [4, '3vdna9s', 9482]]`
● We test creation time by instantiation the data container under test 10,000
with the previously mentioned randoom data
● ASV does this several times to generate averages
● For places where applicable, we test how long an instantition plus mutation
costs
Benchmarking Process
● We test creation time by instantiation the data container under test 10,000
times with the previously mentioned random data
● ASV does this several times to generate averages
● Where applicable, we test instantiation plus mutation costs
Performance Tidbits: dataclasses