TAGS :Viewed: 4 - Published at: a few seconds ago

[ Keeping track of users and photos they've viewed ]

I have a very high traffic site with lots of photos on it and am trying to track which photo each user has viewed.

My first instinct would be a SQL table with two columns: user_id & photo_id. But, that wouldn't scale to my traffic level, and the table would get unmanageable very quickly.
Any recommendations for anoher solution, either SQL or NoSQL (mongodb,couch,redis,...)

My code is mostly PHP if that matters.


Edit There are 10s of millions of views day.

Edit I don't need to know the total times a user viewed a particular photo, just whether its been viewed at all by that user

Answer 1

Your best bet is to create a collection with { _id:Generated automagically, pictureID, viewerID }

with find( pictureID, viewerID ).limit(1) and an index on pictureID AND viewerID will make checking super ultra fast level 99. very important to set the index. I use find().limit(1) because its faster than findOne, at least from the current benchmarks.

why not have one entry per user with an array of viewed images? because searching through the array is slower than searching for a whole document in a collection. 10 million images? no problem. this is where mongodb shines. its designed to scale for big-ass databases like yours. As long as your documents are less than 16 mb, and with 3 properties, it is :) you have little to worry about.

When you delete an image, just db.viewed.remove( { pictureID : pictureID } ) and it will remove all relating the image.

db.viewed.remove( { viewerID : viewerID } ) for when you remove a user! Don't do this when the user deletes the image or the account. Do this at maintenance time or say, once an hour. Create a collection with pendingRemovingImages and pendingRemovingUsers where you store the things you want removed. check $in to perform bulk removal by images and/or by users.

I find your question to be most exciting and I feel strongly that you should go in my direction.

Answer 2

You can give Redis a try. Redis supports PHP very well, with Redis, you can store the viewing history of a particular photo in a hashmap.

$map = 'views|' . $photo_id;
// this line is called whenever a user view a photo
$redis->hset($map, $uid, time());
// this line is called to test whether a user viewed a photo
$redis->hget($map, $uid);

Redis is fast enough. But one thing you should know about Redis is that it stores all data in memory, so if the data eventually exceed the physical memory, you have to shard the data by your own.

Also you can try SSDB(https://github.com/ideawu/ssdb), which has similar APIs to Redis, also supports PHP well(http://www.ideawu.com/ssdb/docs/php/), but store most data in disk, memory is only used for caching. That means SSDB's capacity is 100 times of Redis' - up to TBs.