Serdar Yegulalp
Senior Writer

How to use Python dataclasses

Python dataclasses can make your Python classes less verbose and more powerful at the same time. Here's an introduction to using dataclasses in your Python programs.

hands hold a string of lightbulbs hands at sunset / ideas / brainstorming / invention / innovation

Everything in Python is an object, or so the saying goes. If you want to create your own custom objects, with their own properties and methods, you use Pythonโ€™s class object to make that happen. But creating classes in Python sometimes means writing loads of repetitive, boilerplate code to set up the class instance from the parameters passed to it or to create common functions like comparison operators.

Dataclasses, introduced in Python 3.7 (and backported to Python 3.6), provideย a handy, less verbose way to create classes. Many of the common things you do in a class, like instantiating properties from the arguments passed to the class, can be reduced to a few basic instructions.

Python dataclass example

Here is a simple example of a conventional class in Python:


class Book:
    '''Object for tracking physical books in a collection.'''
    def __init__(self, name: str, weight: float, shelf_id:int = 0):
        self.name = name
        self.weight = weight # in grams, for calculating shipping
        self.shelf_id = shelf_id
    def __repr__(self):
        return(f"Book(name={self.name!r},
            weight={self.weight!r}, shelf_id={self.shelf_id!r})")

The biggest headache here is the way each of the arguments passed toย __init__ย has to be copied to the objectโ€™s properties. This isnโ€™t so bad if youโ€™re only dealing withย Book, but what if you have to deal withย  Bookshelf,ย Library,ย Warehouse, and so on? Plus, the more code you have to type by hand, the greater the chances youโ€™ll make a mistake.

Here is the same Python class, implemented as a Python dataclass:


from dataclasses import dataclass

@dataclass
class Book:
    '''Object for tracking physical books in a collection.'''
    name: str
    weight: float 
    shelf_id: int = 0

When you specify properties, calledย fields,ย in a dataclass,ย the @dataclass decoratorย automatically generates all of the code needed to initialize them. It also preserves the type information for each property, so if you use a code linter likeย mypy, it will ensure that youโ€™re supplying the right kinds of variables to the class constructor.

Another thingย @dataclassย does behind the scenes is automatically create code for a number of common dunder methods in the class. In the conventional class above, we had to create our ownย __repr__. In the dataclass, the @dataclass decoratorย generates theย __repr__ย for you.

Once a dataclass is created it is functionally identical to a regular class. There is no performance penalty for using a dataclass. Thereโ€™s only a small performance penalty for declaring the class as a dataclass, and thatโ€™s a one-time cost when the dataclass object is created.

Advanced Python dataclass initialization

The dataclass decorator can take initialization options of its own. Most of the time you wonโ€™t need to supply them, but they can come in handy for certain edge cases. Here are some of the most useful ones (theyโ€™re all True/False):

  • frozen: Generates class instances that are read-only. Once data has been assigned, it canโ€™t be modified.
  • slots: Allows instances of dataclasses to use less memory by only allowing fields explicitly defined in the class.
  • kw_only: When set, all fields for the class are keyword-only.

Customize Python dataclass fields with theย fieldย function

The default way dataclasses work should be okay for the majority of use cases. Sometimes, though, you need to fine-tune how the fields in your dataclass are initialized. As shown below, you can use theย fieldย function for fine-tuning:


from dataclasses import dataclass, field
from typing import List

@dataclass
class Book:
    '''Object for tracking physical books in a collection.'''
    name: str     
    condition: str = field(compare=False)    
    weight: float = field(default=0.0, repr=False)
    shelf_id: int = 0
    chapters: List[str] = field(default_factory=list)

When you set a default value to an instance ofย field, it changes how the field is set up depending on what parameters you giveย field. These are the most commonly used options for fieldย (there are others):

  • default: Sets the default value for the field. You need to use default if you a) useย fieldย to change any other parameters for the field, and b) want to set a default value on the field on top of that. In this case, we useย defaultย to setย weightย toย 0.0.
  • default_factory: Provides the name of a function, which takes no parameters, that returns some object to serve as the default value for the field. In this case, we wantย chaptersย to be an empty list.
  • repr: By default (True), controls if the field in question shows up in the automatically generatedย __repr__ย for the dataclass. In this case we donโ€™t want the bookโ€™s weight shown in theย __repr__, so we useย repr=Falseย to omit it.
  • compare: By default (True), includes the field in the comparison methods automatically generated for the dataclass. Here, we donโ€™t wantย conditionย to be used as part of the comparison for two books, so we setย compare=False.

Note that we have had to adjust the order of the fields so that the non-default fields come first.

Controlling Python dataclass initialization

At this point youโ€™re probably wondering: If theย __init__ย method of a dataclass is generated automatically, how do I get control over the init process to make more fine-grained changes?

__post_init__

Enter theย __post_init__ย method. If you include theย __post_init__ method in your dataclass definition, you can provide instructions for modifying fields or other instance data:


from dataclasses import dataclass, field
from typing import List

@dataclass
class Book:
    '''Object for tracking physical books in a collection.'''
    name: str    
    weight: float = field(default=0.0, repr=False)
    shelf_id: Optional[int] = field(init=False)
    chapters: List[str] = field(default_factory=list)
    condition: str = field(default="Good", compare=False)

    def __post_init__(self):
        if self.condition == "Discarded":
            self.shelf_id = None
        else:
            self.shelf_id = 0

In this example, we have created aย __post_init__ย method to set shelf_idย toย Noneย if the bookโ€™s condition is initialized asย "Discarded". Note how we useย fieldย to initializeย shelf_id, and passย initย asย Falseย toย field. This meansย shelf_idย wonโ€™t be initialized inย __init__.

InitVar

Another way to customize Python dataclass setup is to use theย InitVarย type. This lets you specify a field that will be passed toย __init__ย and then toย __post_init__, but wonโ€™t be stored in the class instance.

By using InitVar, you can take in parameters when setting up the dataclass that are only used during initialization.ย Hereโ€™s an example:


from dataclasses import dataclass, field, InitVar
from typing import List

@dataclass
class Book:
    '''Object for tracking physical books in a collection.'''
    name: str     
    condition: InitVar[str] = "Good"
    weight: float = field(default=0.0, repr=False)
    shelf_id: int = field(init=False)
    chapters: List[str] = field(default_factory=list)

    def __post_init__(self, condition):
        if condition == "Unacceptable":
            self.shelf_id = None
        else:
            self.shelf_id = 0

Setting a fieldโ€™s type toย InitVarย (with its subtype being the actual field type) signals toย @dataclassย to not make that field into a dataclass field, but to pass the data along toย __post_init__ย as an argument.

In this version of ourย Bookย class, weโ€™re not storingย conditionย as a field in the class instance. Weโ€™re only using condition during the initialization phase. If we find thatย conditionย was set toย "Unacceptable", we setย shelf_idย toย Noneย โ€” but we donโ€™t storeย conditionย itself in the class instance.

When to use Python dataclassesโ€”and when not to use them

One common scenario for using dataclasses is as a replacement for theย namedtuple. Dataclasses offer the same behaviors and more, and they can be made immutable (as namedtuples are) by simply usingย @dataclass(frozen=True)ย as the decorator.

Another possible use case is replacing nested dictionaries, which can be clumsy to work with, with nested instances of dataclasses. If you have a dataclassย Library, with a list propertyย of shelves, you could use a dataclassย ReadingRoomย to populate that list, then add methods to make it easy to access nested items (e.g., a book on a shelf in a particular room).

But not every Python class needs to be a dataclass. If youโ€™re creating a class mainly as a way to group together a bunch ofย static methods, rather than as a container for data, you donโ€™t need to make it a dataclass. For instance, a common pattern with parsers is to have a class that takes in an abstract syntax tree, walks the tree, and dispatches calls to different methods in the class based on the node type. Because the parser class has very little data of its own, a dataclass isnโ€™t useful here.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author