Serdar Yegulalp
Senior Writer

How to unleash the power of Python sets

feature
Jan 20, 20217 mins

Sets in Python organize collections of unique objects. Learn how to take advantage of this powerful feature in your own code.

two dogs in file cabinet crm erp comparison similar pairs
Credit: Thinkstock

Of the major data types built into Python, the set is one of the least discussed, but also one of the most powerful. A Python set lets you create collections of objects where each object is unique to the collection, and it works with the speed and efficiency of Pythonโ€™s dictionaries.

However, because Pythonโ€™s sets are not as widely discussed as its lists or dictionaries,ย itโ€™s easy to miss out on how sets can make your Python apps smarter and more elegant. Letโ€™s fix that!

Python set basics

Sets are defined with a syntax that is reminiscent of Pythonโ€™s dictionary type:

my_set = {1,2,3,4}

The fact that this looks a little like a dictionary is no accident. You can think of a set as a dictionary that stores only keys, no values. In fact, many of the mechanisms under Pythonโ€™s hood for sets are built with the same code as for dictionaries.

You can also create a set with the set() built-in, which takes any iterable:

my_set = set([1,2,3,4])

Set members can contain any hashable type โ€” basically, any object in Python that can be guaranteed not to change over its lifetime. Numbers and strings are all OK, as are instances of user-defined classes. (Even if their properties change over time, the instances themselves donโ€™t change.) Again, this is exactly the same as how the keys work in Pythonโ€™s dictionaries.

If you try to define a set with redundant members, the redundancies will be removed automatically, with previously defined members taking priority. For instance, if we defined my_set as {1,2,3,2,4,5}, the result would be {1,2,3,4,5}.

Uses for Python sets

One powerful and common use for sets is deduplicating the members of a collection or the output generated by an iterable. For instance, if you have a list, you can quickly deduplicate the list by making a set from its contents:

list_1 = [1,2,3,4,3,4,2,4,5,3]
set_1 = set(list_1)
# yields {1,2,3,4,5}

(Note that the original list is preserved.)

This is far faster than iterating through the list and testing for duplicates manually. You can also do this for any iterable, not just a list, although lists are a common source. If you do this with a string, for instance, youโ€™ll get a set that contains all the unique characters in the string:

s1="Hello there"
set(s1)
# yieldsย {' ', 'r', 'l', 't', 'e', 'h', 'o', 'H'}

Note that this technique will work only if the objects in the list are all hashable. Youโ€™ll get a TypeError if you try to add an unhashable object. Also, there is no parameter you can pass that will ignore unhashable objects, so if youโ€™re in doubt about whatโ€™s hashable or not, youโ€™ll have to iterate through the collection and .add() each element manually, testing as you go.

Another common use for sets is to quickly test for the presence of a small collection of objects within a larger collection, or vice versa, by way of the superset/subset methods described below. Note that this works best when the larger of the two collections is something you can convert to a set once and then test against many times, because the overhead of converting a list to a set (especially a long list) might outstrip the performance gains from using sets in the first place. But on the whole, set membership testing is generally faster than iterating through objects and testing membership manually.

Adding and removing members of Python sets

If you want to add and remove members from sets, use the .add() and .remove() methods. For example, my_set.add(5) would update my_set to include 5, and my_set.remove(5) would remove 5 if it were present.

If you try to .remove() something from a set that isnโ€™t there, youโ€™ll get a KeyError โ€” same as if you try to reference a key in a dictionary that doesnโ€™t exist. To remove something without the risk of raising an error if it isnโ€™t there, use .discard() instead of remove().

To drop all elements from a set, you can use .clear(), or reassign the variable to an empty set:

my_set = set()

Unions and intersections with Python sets

Sets support a number of operations where you take two or more sets and generate new ones from them. A union of two sets combines the two into a single set, removing any duplicates:

set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.union(set_2)
# yields {1,2,3,4,5,6}

You can also use the pipe operator to perform a union:

set_3 = set_1 | set_2

Again, this is a handy way to perform deduplication across multiple collections of items.

An intersection generates a new set from only the elements common to multiple sets:

set_1 = {1,2,3}
set_2 = {2,3,4}
set_3 = set_1.intersection(set_2)
# yields {2,3}

The & operator can also be used to combine two sets (union):

set_3 = set_1 & set_2

Many set operations can be expressed with operators, which weโ€™ll illustrate below.

<h2 class=โ€tocโ€>Differences with Python sets</h2>

if you want to find out which members two sets donโ€™t have in common, you can use the <code>difference() method:

set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.difference(set_2)
# yields {1,2,3}
set_3 = set_1 - set_2
# different way to express same operation

One way to express this in English might be, โ€œCreate a new set that has everything in set 1 that isnโ€™t in set 2.โ€

By contrast, if we used set_3 = set_2.difference(set_1), the results would be {4,5,6}.

Python sets also support symmetric difference operations. The symmetric difference returns elements that are in one set or the other, but not both.

set_1 = {1,2,3,4}
set_2 = {4,5,6,7}
set_3 = set_1.symmetric_difference(set_2)
# yields {1, 2, 3, 5, 6, 7}
set_3 = set_1 ^ set_2
# operator version

Supersets and subsets in Python

Youโ€™re probably familiar by now with Pythonโ€™s in operator, which you can use to search for the presence of a character in a string or an object in a list. Sets support in as well:

set_1 = {1,2,3,4}
1 in set_1 # this is True
5 in set_1 # this is False

What if you wanted to test for the presence of all the elements of one set inside another set? You canโ€™t use in for that โ€” Python will think youโ€™re testing for the presence of the entire set object, not its individual elements. Fortunately, Python does provide ways to check such things with other set methods:

set_1 = {1,2,3,4}
set_2 = {1,2}

# Tests if members of set_2 are in set_1:
set_2.issubset(set_1)
# Operator version:
set_2 <= set_1

# Tests if set_1 contains all members of set_2:
set_1.issuperset(set_2)
# Operator version:
set_1 >= set_2

Set updates in Python

Up until now weโ€™ve only explored how to generate new sets from intersections or differences of existing sets. Python also lets you update a set in-place with intersections or differences:

# In-place update of set_1 with set_2:
set_1 |= set_2

# In-place intersection of set_1 with set_2;
set_1 &= set_2

# In-place difference of set_1 with set_2:
set_1 -= set_2

# In-place symmetric difference of set_1 with set_2:
set_1 ^= set_2

In-place updates are handy when youโ€™re dealing with a very large set, and you donโ€™t want to create an entirely new instance of the set (with all the overhead that goes with such an operation). Instead, you can make the changes directly to the existing set, which is more efficient.

Frozen sets in Python

I mentioned before how sets can only be made of things that are hashable. Since sets are mutable, they canโ€™t themselves be used as set elements or dictionary keys. But there is a variety of set called the frozen set that isnโ€™t mutable, and so can be used as a set element, as a dictionary key, or in any other context where you need a hashable type.

To create a frozen set, just use frozenset() to generate one from an existing set or iterable:

set_1 = {1,2,3,4}
f_set = frozenset(set_1)
set_2 = {f_set,2,3,4}

Note that once you create a frozen set, it canโ€™t be altered. The .add() and .remove() methods wonโ€™t work on a frozen set. You can use a frozen set to generate set intersections or differences, as long as you donโ€™t try to store the results of such operations in-place.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author