Sequenz No.3 Rex tremendae & Sequenz No.5 Confutatis
posted by max on November 18, 2007 at 08:40:39 PM
Good evening pals,
It is with much pride I present to you the new YTMND starbar tonight. Not only does it provide some new functionality, but it also represents a great technical achievement.
The new starbar allows multiple score views, voting feedback, site-wide favorite recognition, single click adding/removing of favorites as well as a spectacular feat; enabling collaborative filtering vote prediction. Read more for a long winded description and explanation.
The original "vote bar" was one of the oldest pieces of YTMND. It was created from stolen netflix code for a project that pre-dates YTMND by almost a year and was basically an archaic piece of junk. I had created a program to generate colored star bars and the two fit well, so it became an integral part of YTMND.
It had some major technical issues attached to it though. One of which was the fact that a user's score had to be passed from the server at the time of the page loading, and since votebars show up randomly across pages, it ended up requiring an extra database query for each bar. This has changed completely so that votes load after the page loads changing up to hundreds of queries into one.
Another major issue is that it provided no feedback if there was a problem. From the user perspective, you voted and if something goes wrong on the back end, you had no idea. This has changed as well.
So the next couple weeks will be a "forced beta" to see how well everything holds up. Instead of a giant wall of text, I'll try to give an overview of what the new starbar is capable of.
It is with much pride I present to you the new YTMND starbar tonight. Not only does it provide some new functionality, but it also represents a great technical achievement.
The new starbar allows multiple score views, voting feedback, site-wide favorite recognition, single click adding/removing of favorites as well as a spectacular feat; enabling collaborative filtering vote prediction. Read more for a long winded description and explanation.
The original "vote bar" was one of the oldest pieces of YTMND. It was created from stolen netflix code for a project that pre-dates YTMND by almost a year and was basically an archaic piece of junk. I had created a program to generate colored star bars and the two fit well, so it became an integral part of YTMND.
It had some major technical issues attached to it though. One of which was the fact that a user's score had to be passed from the server at the time of the page loading, and since votebars show up randomly across pages, it ended up requiring an extra database query for each bar. This has changed completely so that votes load after the page loads changing up to hundreds of queries into one.
Another major issue is that it provided no feedback if there was a problem. From the user perspective, you voted and if something goes wrong on the back end, you had no idea. This has changed as well.
So the next couple weeks will be a "forced beta" to see how well everything holds up. Instead of a giant wall of text, I'll try to give an overview of what the new starbar is capable of.
New Features
- You are trying to vote on or favorite a site you have not seen.
- You are trying to vote on or favorite a site you own.
- Your authentication has expired.
- An internal error (which should almost never happen).
Multiple score views
-
This nifty feature allows us to combine two starbars into one. The prime example of this is viewing a starbar for a site you've already voted on. In the old setup, you could only view your score and had to visit the site's profile page to see what the overall score of the site was. Now when you vote on a site; the votebar figures it out and displays both the site's score as well as your rating. We mix the color of the two bar types where they overlap. Your vote is blue, site score is red, so where they overlap the color of the bar will be purple.
Examples:
You vote 3 on a site with a score of 4.5 | |
You vote 5 on a site with a score of 1.0 | |
You vote 2 on a site with a score of 2.5 |
I haven't finalized the exact colors yet, but this should give you an idea of how this feature functions. I think at first it may be hard to absorb but over time it will become an integral piece of YTMND.
Extended Security
-
Previously when a user would vote, it would hit a REST interface that at first did nothing but check if the user was logged in. This meant people could link to the interface in an iframe on their websites and cause people to unknowingly vote. After people began exploiting this, the user id number was required for the vote to register. Once this was enabled people began writing scripts to automatically vote and even with a user_id it was possible to make targeted users unknowingly vote on sites. The new starbar works on the same principals as the comment voting interface. We now generate a cipher specific to each user using a rolling salt that changes every few minutes. This enables us to ensure (for the most part) that users will not unknowingly vote on sites as well as making vote scripts and bots more difficult. One of the unfavorable effects of this new system is that after around 20 minutes, pages expire and you will have to refresh them in order to vote. The starbar will be notify you if this happens.
Voting feedback.
-
With the old votebar, voting was sort of "click-and-pray" in that you had no idea if your vote was registering or not. Due to the massive amount of vote lookups, user votes were loaded from a slave database, and if database replication failed, it would look as if your votes weren't registering even if they had. While we are still going to use a slave database for vote lookups, if a vote fails, you will get a message as to why. Some instances of this are:
Site-wide favorite recognition and vote loading
-
The starbar now checks if you've got each YTMND on your favorites list across the entire site. Once it is out of beta, any remaining areas where your votes aren't shown or voting is not allowed (such as in search results) will be updated to show your vote/allow you to vote.
Quick favorite and un-favorite
-
A new starbar addon which is appended to the end of the starbar in some places (currently only on the site itself and the site profile) allows you to add a favorite to your list with a single click. Sites that are currently on your favorite list will allow you to "unfavorite" them when you hover over their starbar.
Additionally, adding favorites has been changed so that when you add a site to your favorites list, it will automatically vote five on that site. When you "unfavorite" a site, the vote will remain. Once the starbar is out of beta, all previous favorites will be updated to five-star votes (if they were fav'd without voting, a new five-star vote will be added).
Holy shit: Collaborative filtering.
-
This is a feature that has been something I've devoted some free time to for over a year. For those of you that don't know what collaborative filtering is, it's when you take a massive amount of data on who likes what and use it to figure out what each user might like. Simply put; collaborative filtering allows us to predict what you will rate a site you haven't even seen based on how you've voted in the past.
Sadly, the majority of you have no idea what an amazingly complicated feat this is to accomplish. The small amount of you that have dealt with this type of system in college or business dealings will understand how awesome it is that we are launching this on such a limited hardware platform.
<technical jargon>
-
I'll try to give an idea of how I accomplished this for the two or three of you who are interested in the technical side of this. First we gather the over twenty million YTMND votes into the memory of a C++ program I wrote which uses Simon Funk's SVD algorithm to calculate "feature" scores for each user and site. This consists of hundreds of billions of calculations and with the full YTMND data set, it takes roughly 70 minutes using 100% of a 2.4ghz AMD Opteron and around 500mb of memory. At this point we export the feature data to a SQL VIEW which allows us to calculate the prediction by performing (site features * corresponding user features) for any given user+site combination. We do this instead of storing each prediction because it would require (users*sites) rows (currently around 180 billion) in a database, most of which will never be accessed.
Due to the nature of the algorithm, a single vote change or addition/removal recurses the entire tree. For instance, if a user votes on a site, that user's features have to be recalculated based on the new information and that site's features have to be recalculated as well; any site that user has voted on has to have its features updated based on the new user features scores as well as anyone who has voted on the site and any sites they've voted on etc. This means that the features can not be updated incrementally and we have to recalculate feature data in full every time. This can be done every few days or so and then imported into SQL.
Since we have to calculate predictions on the fly, doing straight top-N recommendations isn't very easy as it requires a massive amount of calculations (sites*features (currently around 20 million)) and then a bubble sort on over 500,000 numbers for a single user. While it's doable, it doesn't really have any place in a production environment at the moment.
We can create an accuracy score for each user by getting a list of their entire vote history and then comparing each vote to it's corresponding prediction. So if I vote 5 on a site and the prediction was 4.5, it was off by 0.5. When we average the remainder on all votes, we can create a score from zero to five based on how far off the predictions are on average. So to summarize it's one gigantic hack.
</technical jargon>
How this will affect you
-
The system is very specific to your voting history, this means while some users will get really good predictions, others will get awful predictions. There are two major factors on how accurate your predictions will be: how many votes you've made in the past, how many votes have been made on each site you get a prediction for.
Simply put, the closer your votes are to how you actually feel a site should be rated, the better your predictions will be. If you make a lot of five-star votes in the hopes that other people will return the favor, your predictions will be bad. If you down-vote out of spite, your predictions will be bad. If you have made very few votes or the site you are getting a prediction for has very few votes, it's likely the prediction will be off. So if you haven't been voting honestly, this system will be almost totally useless.
Depending on how busy the system is, it may take a while to calculate predictions for you, this means that your previous votes and your predictions may take a while to show up on votebars. Depending on how badly YTMND shits the bed over the next few days, I may make predictions on option you can turn off if you feel it isn't useful for you. I will also add something for you to see how accurate your predictions will be on average once we are out of beta.
How this will be used
-
Since the system is so heavily based on vote history, and predictions are only updated at 24 hour intervals at most, this is really not very useful on the front page since the majority of sites there are relatively new and predictions would frequently be inaccurate. The main use of this feature is for browsing large lists of sites like on user profile pages or search results. Predictions will allow you to quickly figure out what sites you may enjoy more than others. The new color for predictions will be gold, which will mix with the current color of scores which is red so the resulting overlap color will be orange. You will not get predictions for sites that you've voted on/favorited already.
Examples:
A site with a score of 4.5 with a custom prediction of 3.0 | |
A site with a score of 1.0 with a custom prediction of 5.0 | |
A site with a score of 2.5 with a custom prediction of 2.0 |
As you can see, the prediction mix color is subtle, this was done on purpose, as we don't want to influence peoples decisions so much as help them sift through a lot of garbage.
If you encounter any problems or bugs, post a comment here and over the next day I'll try to hammer out any problems. Moving on...
Much as I expected, the limited number of you technically capable of understanding the API had little to no interest in using it. There have been three entries to the API Contest so at this point in time, everyone who entered will get a "prize". You still have a couple weeks to enter, so come up with an idea and enter the contest so I don't feel like writing the API was a total waste.
I know YTMND is suffering from "broken-window" syndrome, and I've been working on it a lot, I've been making a lot of small improvements both on the back and front end of the site. Obviously the main issue is now moderation, which is going to be the primary focus as I am now actively going over the technical design and coming up with a system that will be far more self-sufficient than the current "moderators clean up after users" setup.
Hurray for gigantic news posts.