The first thing I think of when I hear recommendation system, is a music recommendation system. So I tried to make one using Instagram data, which is probably the wrong form of social media / tag, but I was curious to see what might happen.
I set up a remote server and ran this script as a cronjob every 15 minutes for a few days, gathering a dataset of 61974 Instagram media items that contained the hashtag “music.”
Then I loaded all of the 300+ pickle files into an ipython notebook. I made a hashtable of users to tags, where each tag had a number that represents how often that user used that tag in connection with music.
I found that only 165 users produced all 61974 instagram posts. Obviously I screwed up the way I calculated the max_tag_id. As it turns out, I only got 198 unique media items. I’ll look into this later. In the meantime, I’ll use the Pearson Correlation Coefficient to smooth out the weight of each user’s tag.
I decided to run this again 200 times on a more specific tag where I might get multiple instagram posts from the same user. So I used the Morrissey with a dataset of 200 items. This time I got 4097 unique users and 8741 on 6633 media items.
I used like count to weight each tag, with each instance of the tag adding 1 and each tag adding 0.25.
The most popular tags:
‘morrissey’, 6623
‘thesmiths’, 2435
‘moz’, 1635
‘music’, 500
‘mozarmy’, 493
‘concert’, 385
‘live’, 351
‘love’, 298
‘smiths’, 267
‘truetoyou’, 233
User-Based Collaborative Filtering / Cosine Similarity & K-nearest neighbor
For a random user, olivia.lord, here are 10 nearest neighbors:
I also tried recommending tags to users with Manhattan. That gave far fewer results. For example, it did not return any results for olivia.lord. melchiano, it gave two results: (‘meatismurder’, 4), (‘pescara’, 4).
Item/Tag-Based Collaborative Filtering
Here are the 20 top tags for ‘Morrissey’:
for ‘music’:
for ‘mozfest’:
for ‘vegan’:
for ‘meat’: