We start with a matrix of *dxn* and end up with a vector *u* that is of dimension *dx1*, which means that it is in the dimension of the features.

What I don't understand is, what are we trying to do? compress the data or reduce the features?

Because if we are trying to reduce the features, then we ended up with a vector(principal component) that is at the same dimension as the features and it doesn't help us.

The PC u is indeed in the dimension of the features. For dimensionality reduction, we project each data point on the PC u (or several PCs), to get a number (or numbers) which are the compressed data.

So, what we did there is to compress the data and not reduce the number of features. To reduce the number of features I just have to transpose the data matrix and get u that is in the dimension of the number of samples -> therefore, I can project a sample into a lower dimension using the PC - u and thereby reduce the number of features. Right?

I don't understand what you mean by "reducing the number of features". At no point we select a subset of the features.

You "reduce the number of features" in the sense that you project them onto the first 2 PCs - these projections can be considered "new features".