Sets in Python organize collections of unique objects. Learn how to take advantage of this powerful feature in your own code.
Of the major data types built into Python, the set is one of the least discussed, but also one of the most powerful. A Python set lets you create collections of objects where each object is unique to the collection, and it works with the speed and efficiency of Pythonโs dictionaries.
However, because Pythonโs sets are not as widely discussed as its lists or dictionaries,ย itโs easy to miss out on how sets can make your Python apps smarter and more elegant. Letโs fix that!
Python set basics
Sets are defined with a syntax that is reminiscent of Pythonโs dictionary type:
my_set = {1,2,3,4}
The fact that this looks a little like a dictionary is no accident. You can think of a set as a dictionary that stores only keys, no values. In fact, many of the mechanisms under Pythonโs hood for sets are built with the same code as for dictionaries.
You can also create a set with the set() built-in, which takes any iterable:
my_set = set([1,2,3,4])
Set members can contain any hashable type โ basically, any object in Python that can be guaranteed not to change over its lifetime. Numbers and strings are all OK, as are instances of user-defined classes. (Even if their properties change over time, the instances themselves donโt change.) Again, this is exactly the same as how the keys work in Pythonโs dictionaries.
If you try to define a set with redundant members, the redundancies will be removed automatically, with previously defined members taking priority. For instance, if we defined my_set as {1,2,3,2,4,5}, the result would be {1,2,3,4,5}.
Uses for Python sets
One powerful and common use for sets is deduplicating the members of a collection or the output generated by an iterable. For instance, if you have a list, you can quickly deduplicate the list by making a set from its contents:
list_1 = [1,2,3,4,3,4,2,4,5,3]
set_1 = set(list_1)
# yields {1,2,3,4,5}
(Note that the original list is preserved.)
This is far faster than iterating through the list and testing for duplicates manually. You can also do this for any iterable, not just a list, although lists are a common source. If you do this with a string, for instance, youโll get a set that contains all the unique characters in the string:
s1="Hello there"
set(s1)
# yieldsย {' ', 'r', 'l', 't', 'e', 'h', 'o', 'H'}
Note that this technique will work only if the objects in the list are all hashable. Youโll get a TypeError if you try to add an unhashable object. Also, there is no parameter you can pass that will ignore unhashable objects, so if youโre in doubt about whatโs hashable or not, youโll have to iterate through the collection and .add() each element manually, testing as you go.
Another common use for sets is to quickly test for the presence of a small collection of objects within a larger collection, or vice versa, by way of the superset/subset methods described below. Note that this works best when the larger of the two collections is something you can convert to a set once and then test against many times, because the overhead of converting a list to a set (especially a long list) might outstrip the performance gains from using sets in the first place. But on the whole, set membership testing is generally faster than iterating through objects and testing membership manually.
Adding and removing members of Python sets
If you want to add and remove members from sets, use the .add() and .remove() methods. For example, my_set.add(5) would update my_set to include 5, and my_set.remove(5) would remove 5 if it were present.
If you try to .remove() something from a set that isnโt there, youโll get a KeyError โ same as if you try to reference a key in a dictionary that doesnโt exist. To remove something without the risk of raising an error if it isnโt there, use .discard() instead of remove().
To drop all elements from a set, you can use .clear(), or reassign the variable to an empty set:
my_set = set()
Unions and intersections with Python sets
Sets support a number of operations where you take two or more sets and generate new ones from them. A union of two sets combines the two into a single set, removing any duplicates:
set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.union(set_2)
# yields {1,2,3,4,5,6}
You can also use the pipe operator to perform a union:
set_3 = set_1 | set_2
Again, this is a handy way to perform deduplication across multiple collections of items.
An intersection generates a new set from only the elements common to multiple sets:
set_1 = {1,2,3}
set_2 = {2,3,4}
set_3 = set_1.intersection(set_2)
# yields {2,3}
The & operator can also be used to combine two sets (union):
set_3 = set_1 & set_2
Many set operations can be expressed with operators, which weโll illustrate below.
if you want to find out which members two sets donโt have in common, you can use the <code>difference() method:
set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.difference(set_2)
# yields {1,2,3}
set_3 = set_1 - set_2
# different way to express same operation
One way to express this in English might be, โCreate a new set that has everything in set 1 that isnโt in set 2.โ
By contrast, if we used set_3 = set_2.difference(set_1), the results would be {4,5,6}.
Python sets also support symmetric difference operations. The symmetric difference returns elements that are in one set or the other, but not both.
set_1 = {1,2,3,4}
set_2 = {4,5,6,7}
set_3 = set_1.symmetric_difference(set_2)
# yields {1, 2, 3, 5, 6, 7}
set_3 = set_1 ^ set_2
# operator version
Supersets and subsets in Python
Youโre probably familiar by now with Pythonโs in operator, which you can use to search for the presence of a character in a string or an object in a list. Sets support in as well:
set_1 = {1,2,3,4}
1 in set_1 # this is True
5 in set_1 # this is False
What if you wanted to test for the presence of all the elements of one set inside another set? You canโt use in for that โ Python will think youโre testing for the presence of the entire set object, not its individual elements. Fortunately, Python does provide ways to check such things with other set methods:
set_1 = {1,2,3,4}
set_2 = {1,2}
# Tests if members of set_2 are in set_1:
set_2.issubset(set_1)
# Operator version:
set_2 <= set_1
# Tests if set_1 contains all members of set_2:
set_1.issuperset(set_2)
# Operator version:
set_1 >= set_2
Set updates in Python
Up until now weโve only explored how to generate new sets from intersections or differences of existing sets. Python also lets you update a set in-place with intersections or differences:
# In-place update of set_1 with set_2:
set_1 |= set_2
# In-place intersection of set_1 with set_2;
set_1 &= set_2
# In-place difference of set_1 with set_2:
set_1 -= set_2
# In-place symmetric difference of set_1 with set_2:
set_1 ^= set_2
In-place updates are handy when youโre dealing with a very large set, and you donโt want to create an entirely new instance of the set (with all the overhead that goes with such an operation). Instead, you can make the changes directly to the existing set, which is more efficient.
Frozen sets in Python
I mentioned before how sets can only be made of things that are hashable. Since sets are mutable, they canโt themselves be used as set elements or dictionary keys. But there is a variety of set called the frozen set that isnโt mutable, and so can be used as a set element, as a dictionary key, or in any other context where you need a hashable type.
To create a frozen set, just use frozenset() to generate one from an existing set or iterable:
set_1 = {1,2,3,4}
f_set = frozenset(set_1)
set_2 = {f_set,2,3,4}
Note that once you create a frozen set, it canโt be altered. The .add() and .remove() methods wonโt work on a frozen set. You can use a frozen set to generate set intersections or differences, as long as you donโt try to store the results of such operations in-place.


