Journal Entry

Today I Learned: Why PageRank Works (and it’s just linear algebra)

Google’s original ranking algorithm is basically an eigenvector problem.

We model the web as a graph:

graph LR
A[Page A] --> B[Page B]
A --> C[Page C]
B --> C
C --> A
D[Page D] --> C

Each page distributes its “importance” evenly to the pages it links to.

This becomes a system:

  • Let $ x_i $ = importance of page $ i $
  • Then each page satisfies:
\[x_i = \sum_{j \to i} \frac{x_j}{\text{outdegree}(j)}\]

In matrix form:

\[M \mathbf{x} = \mathbf{x}\]

So we’re looking for a vector $ \mathbf{x} $ that doesn’t change when multiplied by $ M $.

That’s literally the definition of an eigenvector with eigenvalue 1.


The twist (random surfer model)

To avoid dead ends and loops, we modify it:

\[\mathbf{x} = \alpha M \mathbf{x} + (1 - \alpha)\mathbf{v}\]
  • $ \alpha \approx 0.85 $
  • $ \mathbf{v} $ = random jump (teleportation)

This guarantees convergence.


Why it’s cool

  • Turns the entire web into a linear algebra problem
  • Uses probability + graph theory + eigenvectors
  • Solved with simple power iteration

Tiny pseudo-iteration

flowchart TD
Start --> Init[Initialize x randomly]
Init --> Multiply[x = Mx]
Multiply --> Normalize
Normalize --> Check{Converged?}
Check -- No --> Multiply
Check -- Yes --> Done

takeaway

Ranking the internet is just finding a stable vector in a matrix transformation.

Previous Worry is a waste of energy