Works in Progress

The first thing I think of when I hear recommendation system, is a music recommendation system. So I tried to make one using Instagram data, which is probably the wrong form of social media / tag, but I was curious to see what might happen.

I set up a remote server and ran this script as a cronjob every 15 minutes for a few days, gathering a dataset of 61974 Instagram media items that contained the hashtag “music.”

Then I loaded all of the 300+ pickle files into an ipython notebook. I made a hashtable of users to tags, where each tag had a number that represents how often that user used that tag in connection with music.

I found that only 165 users produced all 61974 instagram posts. Obviously I screwed up the way I calculated the max_tag_id. As it turns out, I only got 198 unique media items. I’ll look into this later. In the meantime, I’ll use the Pearson Correlation Coefficient to smooth out the weight of each user’s tag.

I decided to run this again 200 times on a more specific tag where I might get multiple instagram posts from the same user. So I used the Morrissey with a dataset of 200 items. This time I got 4097 unique users and 8741 on 6633 media items.

I used like count to weight each tag, with each instance of the tag adding 1 and each tag adding 0.25.

The most popular tags:

‘morrissey’, 6623
‘thesmiths’, 2435
‘moz’, 1635
‘music’, 500
‘mozarmy’, 493
‘concert’, 385
‘live’, 351
‘love’, 298
‘smiths’, 267
‘truetoyou’, 233

User-Based Collaborative Filtering / Cosine Similarity & K-nearest neighbor

For a random user, olivia.lord, here are 10 nearest neighbors:

[(0.655, 'luke_ellis92'), (0.655, 'lau211'), (0.617, 'pixie_xtears'), (0.567, 'ramon_maspons'), (0.567, 'alanakillsit'), (0.535, 'willpagemlir'), (0.535, 'whorissey'), (0.535, 'v17tty'), (0.535, 'trimmtrabb_'), (0.535, 'thejacobgann')]

I also tried recommending tags to users with Manhattan. That gave far fewer results. For example, it did not return any results for olivia.lord. melchiano, it gave two results: (‘meatismurder’, 4), (‘pescara’, 4).
Item/Tag-Based Collaborative Filtering

Here are the 20 top tags for ‘Morrissey’:

[(0.795, 'thesmiths'), (0.772, 'moz'), (0.643, 'londonisdead'), (0.613, 'losangeles'), (0.612, 'london'), (0.611, 'mozsquad'), (0.61, 'superestrella'), (0.61, 'rockentuidioma'), (0.61, u'ma\u0144a'), (0.61, 'madentertainment'), (0.61, 'kroq'), (0.61, 'elmovimientodelrock'), (0.609, 'tributoacaifanes'), (0.609, 'pergamo'), (0.609, 'mana'), (0.609, 'flashback'), (0.609, 'eastlosangeles'), (0.609, 'citiesnightlife'), (0.608, 'jaguares'), (0.608, 'citiesrestaurant')]

for ‘music’:

[(0.536, 'instamusic'), (0.487, 'rock'), (0.455, 'follow4follow'), (0.44, 'instarock'), (0.44, 'igersmilano'), (0.435, 'musicarock'), (0.435, 'enjoy'), (0.434, 'igersitaly'), (0.431, 'instagold'), (0.43, 'britrock'), (0.419, 'gig'), (0.408, 'gigs'), (0.403, 'uk'), (0.395, 'tagsforlikes'), (0.394, 'teatrolinear4ciak'), (0.394, 'nightclub'), (0.393, 'nightlife'), (0.372, 'night'), (0.363, 'likeforlike'), (0.36, 'milan')]

for ‘mozfest’:

[(1.0, 'twisterella'), (1.0, 'piccadilly'), (1.0, 'ijo'), (1.0, 'hijau'), (1.0, 'bandung'), (0.577, 'underground'), (0.5, 'indonesia'), (0.2, 'pop'), (0.005, 'morrissey'),

for ‘vegan’:

[(0.715, 'savethemall'), (0.701, 'savetheanimals'), (0.701, '17ottobre2014'), (0.695, 'nikon'), (0.642, 'govegan'), (0.377, 'paladozza'), (0.311, 'worldofmortissey'), (0.311, 'thesmuths'), (0.311, 'postconcert'), (0.311, 'parnaso'), (0.311, 'igersrome'), (0.311, 'ig_rome'), (0.311, 'ig_italia'), (0.311, 'animalliberationfrobt'), (0.294, 'worldofmorrissey'), (0.294, 'liveinrome'), (0.278, 'peta'), (0.275, 'charme'), (0.257, 'concert'), (0.234, 'traplord')]

for ‘meat’:

[(0.612, 'murder'), (0.426, 'animals'), (0.408, 'free'), (0.354, 'witness'), (0.354, 'will'), (0.354, 'whoputthe'), (0.354, 'wearing'), (0.354, 'up'), (0.354, 'unhappy'), (0.354, 'typical'), (0.354, 'twitter'), (0.354, 'town'), (0.354, 'tormentors'), (0.354, 'toosoon'), (0.354, 'there'), (0.354, 'thequeenisdeath'), (0.354, 'theonlyonearoundherewhoisme'), (0.354, 'stop'), (0.354, 'speechless'), (0.354, 'smithsarmy')]

From ITP @ NYU

Meat is Murder tag recommender system for spambots

Leave a Reply Cancel reply