Eigenvalues with pcalc

4/27/2023

Let’s consider a scenario where we have only two features, x and y. To understand the PCA more deeply, we need to introduce some further concepts. Hence, the first principal component accounts for the largest possible variance the second component will, intuitively, account for the second largest variance (under one condition: it has to be uncorrelated with the first principal component) and so forth. In this analysis, what measures the amount of information is variance, and principal components can be geometrically seen as the directions of high-dimensional data which capture the maximum amount of variance and project it onto a smaller dimensional subspace while keeping most of the information.

So, the idea is that k-dimensional data give you k principal components, but PCA tries to put maximum possible information in the first ones, so that, if you want to reduce your dataset’s dimensionality, you can focus your analysis on the first few components without suffering a great penalty in terms of information loss. These combinations are done in such a way that these new variables are uncorrelated and most of the information within the initial variables is stored into the first components. Principal components are new variables that are constructed as linear combinations of the initial variables. How can we handle this trade-off between simplicity and amount of information? The answer to this question is the result of the Principal Components Analysis (PCA). On the other hand, we do not want to lose important information while getting rid of some features. However, it often happens that your data are presented to you provided with many features, sometimes hundreds of them….but do you need all of them? Well, keeping in mind the law of parsimony, we’d rather handle a dataset with few features: it will be far easier and faster to train. Namely, if you are collecting some data about houses in Milan, typical features might be position, dimension, floor and so on.

Those latter are the variables we take into account to describe our data. Whenever you are handling data, you will always face relative features.

0 Comments

Eigenvalues with pcalc

Leave a Reply.

Author

Archives

Categories