Thinking Categorically
An introductory foray into category theory
This is adapted from an old Quora post, but I have tweaked and updated it.
A little while back, I wrote an introduction to set theory, with some comments about what it is good for and what the drawbacks of using set theory to construct the basic objects in mathematics are.
Specifically, I pointed out that oftentimes we really want to think about such objects in terms of their relationships with each other, and set theory is a poor medium for that. Right now, we are going to revisit this problem, and I shall introduce a solution: category theory. We’ll see that category theory is a very powerful tool; at the end, we’ll also talk about its shortcomings.
The basic idea of category theory is that we split up mathematical objects into “categories”, which we consider separately. So, as an example, I might have a category of sets, or a category of vector spaces, or a category of groups, or a category of topological spaces. Whatever category we happen to be working in, we only allow maps between them that behave nicely with respect to the objects in the category. So, while all functions are admissible between sets, only linear transformations are admissible between vector spaces, only group homomorphisms are admissible between groups, and only continuous functions are admissible between topological spaces. (If these examples don’t mean anything to you, don’t worry—I’ll give a worked example later.)
In any case, let’s give a (mostly) formal definition.
Definition: A category is a collection of objects, together with a collection of morphisms (also called maps or arrows) between them. Each morphism has a source object and a target object. If the morphism is f, with source object A and target object B, we will write f: A→B or alternatively
We allow ourselves to compose morphisms: if I have a morphism f: A→B and a morphism g: B→C, then there is a composition morphism g∘f: A→C.
Composition should obey the following properties:
It is associative: that is, if I have morphisms f: A→B, g: B→C, h: C→D, then h∘(g∘f)=(h∘g)∘f. (Speaking intuitively, if we just follow one path along a diagram of arrows, that defines a unique morphism between the object that we started on and whatever object we ended on.)
For every object, there is a corresponding identity morphism: that is, for every object A, there exists a morphism idA: A→A with the property that for every morphism f: A→B and g: C→A, it is true that f∘idA=f and idA∘g=g.
The only reason why I say this definition is not entirely formal is that I didn’t define what a “collection” is. Based on our previous post, it is very tempting to suggest that it should be a set. Unfortunately, this leads to some foundational problems: one would like to define the category of sets, but that would mean that we would need to have a set of all sets. Such a construction is impossible—I’ll write about why and how one can try to resolve it in a different post. For now, we are going to ignore this issue entirely.
In any case, I imagine that some examples of categories would be welcome at this juncture.
We’ll start with the category of sets—here, the objects are sets, and the morphisms are functions between them. It is easy to check that this satisfies the definition of a category, since:
Taking composition of morphism here to mean the usual composition of sets, it is clear that if I have functions f: A→B, g: B→C, then there is a function g∘f: A→C.
Composition of functions is associative.
For every set A, there is a corresponding identity function idA characterized by the property that idA(a)=a for all elements a in A. It is easy to check that composing any function with idA does not change the function.
This is an enormously important example, as many categories can be thought of as sets with some extra properties.
Fix a particular field—if you are unfamiliar with the term, it’s okay to just use ℝ, the real numbers. We can then consider vector spaces over this field. If you are unfamiliar with the general theory of vector spaces, just replace it with ℝn, n-dimensional Euclidean space. (A vector in ℝn is just an n-tuple of real numbers: (x1, x2,… xn).)
The category of vector spaces over this field has vector spaces over this field as objects, and the morphisms are linear transformations. Once more, if you are unfamiliar with the general theory of vector spaces, you may replace this with matrices. So, a morphism between ℝm and ℝn will be an m×n matrix. Composition of morphisms amounts to composition of linear transformations—or, equivalently, multiplication of matrices.

Is this a category? Well, composition associative? Yes, because the composition of linear transformations is just function composition. (Or, equivalently, matrix multiplication is associative.) Is there an identity morphism for every object? Yes, the identity map. (Equivalently, the identity matrix.) So, yes, it is.
The category of groups is similar: the objects are groups, and the morphisms are group homomorphisms. Since we have covered this at length in the following two posts, I am going to skip going through it again here.
One can also think of any particular group as a category itself. For example, here is a representation of the group ({1},×).
There is only one object in this category—the group itself. The morphisms are the group elements—in this case, there is just one. Here is the slightly more complicated group ({1,−1,i,−i},×).
Again, the idea is that there is one object in the category, and the morphisms are the group elements. The composition of morphisms corresponds exactly to multiplying the group elements together. That this satisfies the conditions of a category follows from the group axioms.
This last one is the group of integers—as there are infinitely many elements in this group, I cannot possibly draw all of the morphisms, but it should be clear how you would keep adding on more arrows.
We can also just create categories out of graphs, without any prior mathematical object in mind. Here is an example.
I just need to tell you the rules for composing morphisms in this category. I don’t need to worry about the identity morphisms, since we know how to compose those. This leaves the following four rules:
That’s it. You can check if you like that this will satisfy the conditions of a category.
It’s worth noting that the above diagram has the nice property that if you follow any two sequences of arrows starting and ending in the same places, then the composition of those arrows will be equal—e.g. from A we can get to C by g or by h∘f, and these are in fact the same morphism. We call any diagram that satisfies this property commutative. Category theory is suffused with commutative diagrams, as they often give a compact way of encoding a large amount of information.
Now that we have a sizable bank of examples to draw from, let’s discuss what sort of things this machinery allows us to do. Why should we be interested in studying categories?
The most basic appeal of category theory is that all manner of properties that we are used to discussing about groups, sets, topological spaces, and many, many other things can be phrased entirely in terms of relations between arrows. What this means is that we can give one definition entirely in category theoretical terms, which thereby unifies many, many different definitions for all the various categories that it could apply to. Let’s see some examples.
Definition: An isomorphism between two objects A and B is a morphism f: A→B such that there is another morphism g: B→A such that f∘g=idB and g∘f=idA.
Simple, yeah? But what notion is an isomorphism capturing?
The idea is that if you have an isomorphism between two objects in a category, then you can essentially swap those two objects around freely. Why is this? Suppose that I have morphisms ψ: C→A, ϕ: A→D, φ: A→A. Then I have corresponding morphisms
so I can translate any morphisms on one object onto the other object (and vice versa). Furthermore, this transference preserves any relations between these morphisms—so, for example, if I had three morphisms ψ, ϕ, φ: A→A such that ψ∘ϕ=φ, then this relation is preserved, since
In any case, this is the general theory. Let’s see how this plays out in particular categories. Let’s start with the following category.
In this category, h: B→C is an isomorphism, since it has an inverse h−1. Indeed, we see that if we switch B and C around, it doesn’t change the structure of the category at all—these two objects are perfectly symmetric. In contrast, f: A→B is not an isomorphism, as it does not have a corresponding inverse. And, we see that if we were to switch A with B or C, our category would look rather different. We conclude that while B and C are isomorphic, they are not isomorphic with A.
Next, let’s think about the category of sets. Remember that in the category of sets, morphisms are just functions, and so an isomorphism is any function f that has an inverse function f−1. In other words, an isomorphism in this category is nothing more and nothing less than a bijection. Saying that two sets are related by a bijection is the same as saying that they are equinumerous.
In the category of vector spaces over a particular field, what is an isomorphism? It is a linear transformation that possesses an inverse, or, equivalently, it is an invertible matrix. Two vector spaces are related by an invertible linear transformation if and only if they have the same dimension.
The situation is very similar in the category of groups—an isomorphism here is a group homomorphism that has an inverse group homomorphism. One can check that this happens precisely when said group homomorphism is also a bijection.
Finally, what is an isomorphism in a category representing a group? Well, remember that in such a category, the morphisms correspond to the group elements, and so an isomorphism is any group element that has a corresponding group element that is its inverse. But, of course, every group element has an inverse, by definition of a group. In fact, we see that we can now define groups in a funny way in terms of categories: a group is a category that has a single object and such that every morphism is an isomorphism. (Technically speaking, I should also ask that the collection of morphisms be a set, but this is getting back to the foundational problems that we promised to ignore.)
Let me give another example of a category-theoretical definition.
Definition: We define an initial object to be an object I in our category such that for every object A, there exists a unique morphism I→A. Similarly, a terminal object T is one such that for every object A, there exists a unique morphism A→T.
Initial and terminal objects are always unique up to isomorphism—that is to say, if there exist two initial (resp. terminal) objects, then there exists an isomorphism between them. We can prove this directly.
Theorem: Let A, B be initial (resp. terminal) objects. There exists an isomorphism between them.
Proof:
By definition, there exist unique morphisms ϕ: A→B and ψ: B→A. Consider their compositions ϕ∘ψ: B→B and ψ∘ϕ: A→A. Because A, B are initial (resp. terminal) objects, they can only have one morphism A→A and B→B—this morphism must be idA, idB. We conclude that ϕ∘ψ=idB, ψ∘ϕ=idA, and therefore ϕ is the desired isomorphism between A and B. □
This is an example of a universal property—some way of characterizing some structure in the category purely in terms of morphisms, and which thereby uniquely characterizes this structure up to the correct notion of isomorphism in this case. In fact, this is the ur-example of a universal property, in that the general formal definition of a universal property is precisely in terms of initial and terminal objects. This is a discussion for another time, however. In the meantime, I want to give some concrete examples of initial and terminal objects in various categories.
In the category
one checks by looking that there is precisely one initial object (namely, A), and both B and C are terminal objects (of course, since they are isomorphic, one of them is a terminal object if and only if the other one is—on the other hand, as we know that A and B are not isomorphic, we know that if one of them is an initial or terminal object, the other one isn’t).
In the category of groups, the group ({1},×) is a terminal object—for any group, there is always the homomorphism sending every element to 1, but on the other hand, there are no other available homomorphisms we could write down. In fact, ({1},×) is also an initial object. To see this, note that any group homomorphism from ({1},×) is completely characterized by where it sends 1. However, since 1 is the identity, it has to be sent to the identity in the target group, so there is no choice about how this is done.
In the category of vector spaces, the vector space {0} is also both an initial and a terminal object, for the same reason.
In the category of sets, the set {x} containing a single element is a terminal object—for any set, the only way that we can build a function to this set is to map every element to x. However, {x} is not an initial object, since we can choose to map x to any element of the target set, and in general, there may be many options. Instead, the empty set {} is the initial object.
In a category which is a group, generally speaking, there are no initial or terminal objects. Indeed, there is only one object, and the only way that it can be initial or terminal is if there is only one morphism—this happens for precisely one group, namely the trivial one.
Next, I want to give an example of how something we might have defined set-theoretically gets defined in category theory. In my preceding article, I showed how to set-theoretically define an ordered pair (x, y). Given two sets X, Y, one denotes the set of all ordered pairs (x, y) with x∈X, y∈Y by X×Y. This is known as the Cartesian product of X and Y. We shall now define the product of two objects in a general category.
Definition: Let A, B be objects in a category. A product of A, B is an object C (which we shall typically denote by A×B) together with morphisms πA: A×B→A, πB: A×B→B such that for any other object D with morphisms fA: D→A and fB: D→B, there exists a unique morphism f: D→A×B making the following a commutative diagram.
I claim that in the category of sets, this crazy definition precisely reduces to the Cartesian product we are already familiar with. To see this, let A, B be sets and form their Cartesian product A×B. There are projection maps
and it isn’t so hard to see that if we have functions fA: D→A and fB: D→B, then if we define
then the diagram
commutes. Some further thought shows that this is the only choice of f that is possible. We conclude that the Cartesian product of sets (together with the natural projection maps) satisfies the universal property defining the product of objects. However, because the product of objects is defined by a universal property, it is uniquely defined up to isomorphism! So, we conclude that, indeed, the Cartesian product is (up to isomorphism) the same as a product of objects in the category of sets.
In the category of vector spaces, the same sort of reasoning works: if we have vector spaces V, W, we can form a product V×W—the underlying set is just the Cartesian product, and sums and scalar products are done coordinate-wise. More simply, given V=ℝm, W=ℝn, V×W=ℝm+n. The maps πA, πB are the same as before—importantly, they are now linear transformations.
The category of groups is also essentially the same, but since I haven’t gotten to product groups in my notes, I’ll skip going through the details.
In a category that is a group, products generally do not exist—in fact, they only exist for the trivial group, in which case we get a slightly silly diagram (note that all of the objects in this diagram are in fact the same object).
Of course, category theory is not just good for giving general definitions that apply in an extremely wide range of contexts. Category theory is also excellent in its ability to take proofs created in one setting and lift them to apply in completely different settings. How does this work? The essential observation is that if we have two categories that might be described differently but have the same structure of morphisms, then any proofs regarding category-theoretical constructions in one category will be just as valid in the other category. And, as I have hinted, a lot of things can be phrased as category-theoretical constructions, so this actually encompasses quite a bit of mathematics. In effect, what we want to come up with now is some manner of mapping one category onto another category. (Indeed, if you define in some way a category of categories, then what we shall do now is nothing more than to come up with a notion of morphism and isomorphism in that category.)
Definition: Given two categories C and D, a functor F is a mapping that associates to each object A in C an object F(A) in D, and to each morphism f:A→B in C a morphism F(f): F(A)→F(B) in D such that
F(idA)=idF(A) for all objects A in C, and
F(f∘g)=F(f)∘F(g) for all morphisms f, g in C.
Much like a group homomorphism preserves group multiplication identities, a functor preserves the identities between morphisms. In fact, this is our first example of a functor: if we think of groups as categories, then a group homomorphism is exactly a functor between two categories!
We can give other examples as well. For instance, there is a functor between the category of vector spaces and the category of sets known as the forgetful functor—what it does is it takes vector spaces V and sends them to the underlying sets V. Linear transformations are functions, so we just interpret them as such. (For matrices, we consider the function that multiplies vectors by that matrix.) Using the forgetful functor, we can give another derivation of the product of objects in the category of vector spaces—recalling that in the category of sets the product is just the Cartesian product, we see that if we have a product in the category of vector spaces, the forgetful functor must take it to the Cartesian product. At that point, the problem just reduces to checking whether the construction in the category of sets is still a product in the category of vector spaces. What is more, this same procedure must work in any category where the objects and morphisms are sets and functions with some extra imposed structure, as we will again have a forgetful functor mapping that category to the category of sets.
Sometimes, we instead need a slightly different kind of functor: a contravariant functor.
Definition: Given two categories C and D, a contravariant functor F is a mapping that associates to each object A in C an object F(A) in D, and to each morphism f:A→B in C a morphism F(f): F(B)→F(A) in D such that
F(idA)=idF(A) for all objects A in C, and
F(f∘g)=F(g)∘F(f) for all morphisms f, g in C.
Contravariant functors are just like ordinary functors (which are frequently called covariant functors to differentiate them), except that they switch the direction of all of the arrows. Here’s the most common example: the opposite functor. This maps from a given category C to the opposite category (also known as the dual category) Cop. The category Cop has all the same objects as C, but all of the arrows are switched in direction. The functor is sort of the obvious one: it keeps all objects the same but reverses the directions of all of the arrows. For example, here is the opposite category of the category we have seen again and again.
Recall that in the original category, A was an initial object and B, C were terminal objects. In the opposite category, A is a terminal object and B, C are initial objects. This is always true: the opposite functor always takes terminal objects to initial objects and initial objects to terminal objects. Indeed, whenever you have a structure defined by a universal property, there is a corresponding dual structure which is whatever the image is under the opposite functor. For example, the dual of the product of objects is the coproduct of objects—in the category of sets, this is the disjoint union, and in the category of groups, this is the free product of groups. Note that for any theorem proved about a particular structure, there is a corresponding dual theorem about the dual structure.
There are far fewer obvious functors that are also enormously useful. In a sense, much of algebraic topology comes down to finding nice functors from the category of topological spaces (where the morphisms are usually homotopies) to either the category of groups or the category of chain complexes. This is helpful as figuring out whether two topological spaces are secretly the same topological space (after some deformations) is quite difficult, but telling whether two groups or two chain complexes are different is often very easy. I plan to write about the basics of this at some point, but it probably won’t be for a while.
So, let’s recap and discuss the strengths and weaknesses of category theory. Hopefully, I have convinced you that category theory has an incredible capacity for unifying seemingly disparate ideas and giving proofs that apply in astonishing generality. Carrying over from our discussion in the previous post on set theory, hopefully, I have also convinced you that category theory is a better model of the computer science notion of inheritance—unlike in set theory, where we are preoccupied with what a set is and how it is constructed, in category theory, we only care about how a given object interacts with other objects in its category.
However, while this is the greatest strength of category theory, it is also its greatest weakness. Proofs and constructions in category theory tend to be so abstract that it can be difficult to get a good sense of what they mean, which is why it is usually a good idea to give plenty of concrete examples of the general definition. This tendency of category theory to produce proofs that are undoubtedly true, but don’t a priori feel like they are about anything, is so well-known that proofs of this type have earned a special name—they are known as abstract nonsense.
Category theory has another weakness that we have seen in our discussions above. While category theory is undoubtedly very good at coming up with constructions that are unique up to isomorphism, it generally doesn’t have any means of showing that there actually exists even one object with the desired property. For instance, we saw that initial objects, terminal objects, and products are all unique up to isomorphism if they exist, but we saw plenty of examples of categories that didn’t contain any such objects.
What does this mean practically? It means that in most of our writing and proofs, we need to appeal to both set theory and category theory. Set theory is excellent for being able to construct explicit examples (good for our own ability to gain intuition, and important for proofs to be able to show that there do exist examples), whereas category theory is exceptional for generalizing, unifying, and giving natural definitions. The synthesis of these two different points of view gives an enormously useful approach to mathematics that neither of them in isolation can deliver.













I’m confused how we are using the forgetful functor to derive the product in the category of vector spaces. How do we know if P is a product of V and W in the category of vector spaces that F(P) is a Cartesian product of V and W in the category of sets where F is the forgetful functor?
This seems to be what the argument relied on, and furthermore we know that products are not unique (may be unique up to isomorphism but they are not literally unique) so if P is a product of vector spaces that is not literally a set theoretic Cartesian product of V and W then when we apply the forgetful functor F we won’t get a Cartesian product and I’m not sure how the argument then works.
Loved the post, just something I was confused by.
Small correction: you wrote ϕ∘f^-1: B—>D but I believe you meant Φ∘f^-1: B—>D instead.
ϕ∘f^-1 is actually a morphism from B to A.