given the facebook app on my blackberry and feeds on digsby, i've become somewhat of a facebook whore (as you've probably noticed). i've managed to stay off twitter so far (microblogging for someone that blogs once every few months?), but that doesn't stop me from using it conceptually for my own perverse enjoyment.
i stumbled across this, which reports that twitter now has an API which can be used to compute importance. given the daily increase in the rate at which crap is produced on the internet (i'm not excusing myself...), "importance" in searches becomes more and more ... well important.
>6 months ago, a CS student had asked me for project ideas for his scientific computing class. turns out one of my ideas seems like its about to pan out...
from noeldate Fri, Apr 24, 2009 at 10:24 AMsubject Re: Hate matlab!!mailed-by gmail.comoh, well here's something that may be relevant to you: who's the most important person in twitter. we can consider who follows who as a directed graph. we know from CS that we can model a graph as an adjacency matrix, where each column (each column is a person) has a 1.0 in the row if the if the person follows the person in the corresponding row. in this, one way to compute "importance" would be to sum of the rows. if your row has the highest number, you're the most important person, because you're followed by the most people. however that can be a flawed model, because your grandmother is following all 87 of your cousins, so she may be unfairly contributing to your family's importance scores. we want everyone to have a total of one "vote".so in a more refined model, we can normalize each column to sum to 1.0 (just divide by the sum of the column), so everyone has one "vote". this means that the more people you follow, the less valuable your followings are to others' importance.the best thing we can do is model our importance as a function of the other people's importance who follows us. so this means that bill gates following you is more important than your grandmother following you. so if you imagine one row of the resulting adjacency matrix, the linear system would look something like (x_i is importance of person i, a_i is the coefficient for matrix A, which you create from the above two paragraphs, lets look at the third row of a 3 person twitter universe, and you can't follow yourself):a_1 * x_1 + a_2*x_2 + 0*x_3 + ... = x_3if you write that out for every other column, you see a pattern, which can be summarized in the matrix equation Ax = x. this is identical to Ax = lambda*x for lambda = 1, the canonical eigenvalue equation. so basically, the importance score would be the eigenvector for the normalized directed adjacency matrix of twitter. the highest values in the resulting eigenvector x would be the most "important people". so someone who is followed by lots of important people would be the most important person.the practical difficulties in this is that the size of the matrix increases in N^2. and for twitter, there are tons of users, probably in N ~= 1e6 or 1e7. the dense matrix can't be represented in memory, but with some tweaks, you probably could represent (some?) of it as a file on disk. so given that you can't just pass in this monster matrix into matlab's svd(A, 'econ'), how can we compute this? if you haven't done it already, you can use a simple power iteration scheme for eigenvalue problems. basically, come up with a random vector and just keep multiplying the matrix by it (then normalize). so then all you need to do is write a matrix-vector multiplication for however you represent your matrices.note that the above is completely inspired by google's page rank algorithm, which effectively is the world's largest eigenvector computation.



0 comments:
Post a Comment