Thursday, December 10, 2009

Calculating people similarity - Cosine approach and using tags

This week has been more of fixing up things rather than coding anything exciting.. Today is the last day of the week and it makes all the more "responsible" to complete my scrum tasks for the week.. So here I am trying to code for the simialrtiy between two people in an Enterprise ... My manager agreed that we do this by using just one set of data .. Tagging data was an obvious choice ... The first approach being considered this week is the cosine approach. I was thinking of the folllowing approach. To find similarity between person A and person B, I first look at all the tags by these two people. Let them be denoted by a = { a1,a2,a3,... } and b= { b1,b2,...}. Then as per the cosine rule, the similarity between these two people is = a . b / |a| * |b| . The angle of overlap is what is confusing me a bit.  How to go about calculating the "theta" ? Should it be as follows ... Let the total number of tgs by both A and B be denoted by T. Let the number of common tags be C. Then the theta needs to be ( 90 - ( C/T * 90 ) . The dot product would then be reduced to |a||b| cos ( 90- ((C * 90 ) / T )) 


Tags: , ,


Calculating people similarity - Cosine approach and using tags

This week has been more of fixing up things rather than coding anything exciting.. Today is the last day of the week and it makes all the more "responsible" to complete my scrum tasks for the week.. So here I am trying to code for the simialrtiy between two people in an Enterprise ... My manager agreed that we do this by using just one set of data .. Tagging data was an obvious choice ... The first approach being considered this week is the cosine approach. I was thinking of the folllowing approach. To find similarity between person A and person B, I first look at all the tags by these two people. Let them be denoted by a = { a1,a2,a3,... } and b= { b1,b2,...}. Then as per the cosine rule, the similarity between these two people is = a . b / |a| * |b| . The angle of overlap is what is confusing me a bit.  How to go about calculating the "theta" ? Should it be as follows ... Let the total number of tgs by both A and B be denoted by T. Let the number of common tags be C. Then the theta needs to be ( 90 - ( C/T * 90 ) . The dot product would then be reduced to |a||b| cos ( 90- ((C * 90 ) / T )) 




Tags: , ,