Surfacing Interesting Content

Heyzap is The Largest Social Netowrk for Mobile Gamers™, and as such, we get lots of user generated content. This is a collection algorithms we’ve been using to find the most interesting content. A lot of this is likely to be obvious, but I spent a bunch of time on the visualizations, so read it anyways.

Do you need to know which songs your users are listening to? Which tags are trending on twitter? No need to break out a cron job, this algorithm will keep you up to date real-time.


Half-life:
Vote Rate:
Vote Distribution:

What it Does

At Heyzap, we use this algorithm to display popular games. Each time a user plays a game, we cast a “vote” for that game. Each vote has a “score”, which decays with age. For our popular games, votes decay at a rate of 50% per week. To display the most popular games, simply add up all the scores for all the votes. Using this algorithm, a game played 20 times this week will be ranked “more popular” than a game played 30 times last week, and less popular than a game played 50 times last week.

To track trending hashtags, just replace “games” with “hashtags”, and cast a vote each time a tag appears. You could also use this algorithm to track word frequencies in newspapers, or which countries are visiting your site.

In the visualization above, votes are cast randomly at a set of items. The orange bars indicate the current “popularity score” of each item, and the red bars indicate the probabilistic rate at which each item should accrue new votes.

The longer the half-life, the slower the algorithm will respond to new votes. At the extreme ends, a half-life of zero would answer “Which post was most recently voted on?”, while a half-life of infinity would answer “Which post has the most votes?”.

How it works

A straightforward implementation might be:

You would probably actually want to multiply popularity scores by 0.97 each hour to minimize weird transients and make the decay more continuous.

However, I loathe cron jobs, and have an easier method.

As we only care about the rank of the popular items, the only difference between the outputs of the two implementations is that this one is perfectly continous, as opposed to the stuttering decay of the cron variation.

One drawback to the continious implementation is float overflow. With a carefully chosen epoch, we can make use of a double-precision float’s 9 exponent bits and one sign bit to allow the algorithm to run for 2048 half-lives. If your half life is one day, you can run the algorithm for five years before needing to migrate the epoch.

In all my examples, I’m using Redis as an external index. You could add a column and an index to your posts table, but it’s probably huge, which means that will be a pain. Additionally, since we only care about the most popular items, we can save memory by only indexing the top few thousand items.

If you’re not familiar with Redis, I’m using ZSETs. ZSETs are sorted sets. Half-array, half-dictionary. The value in the dictionary corresponds to the key’s relative “index” in the array. They have O(Log(N)) inserts, O(Log(N)) slices, and are indexed by double-preciesion foats, which make them perfect for this implementation.

Implementation

class PopularStream
  STREAM_KEY = "popular_stream"
  HALF_LIFE = 1.day.to_i

  # 2.5 \* half_life (in days) years from now
  EPOCH = Date.new(2015, 10, 1).to_i
 
  def self.onVote(post)
    # dict[post.id] += value
    REDIS.zincrby(STREAM_KEY, post.id, 2 ** ((Time.now.to_i - EPOCH) / HALF_LIFE))
    trim(STREAM_KEY, 10000)
  end

  def self.get(limit=20)
    # arr.sort.reverse[0, limit]
    REDIS.zrevrange(STREAM_KEY, 0, limit).map(&:to_i)
  end

  def self.trim(key, n)
    # arr = arr[-n, n]
    REDIS.zremrangebyrank(key, 0, -n) if rand < (2.to_f/n)
  end

  # run this in five years
  # you could make EPOCH and STREAM_KEY dynamic
  # to make this process easier. Otherwise migrate and deploy the new values
  def self.migrate(new_key, new_epoch)
    REDIS.zunionstore(new_key, [STREAM_KEY], :weights => [2 ** ((new_epoch - EPOCH) / half_life)])
  end
end

Hot Stream

If the age of the post is more relevant than the age of the votes, we can simplify things considerably by treating all votes as though they were cast at the time the post was created. This is the algorithm used by Reddit’s front page.

What it does

If we start the decay for all votes on a post at the same time, we can simplify the formula for a posts score to:

post_creation_time / half_life + log2(votes + 1)

In the visualization below, votes are cast randomly on a series of posts. Each column represents the “hot” score of one post. The tallest column would be the #1 post on the “hot” page, the second tallest #2, and so on.


Half Life:
Vote Rate:
Vote Distribution:

How it works

As I’ve tried to show in the picture above, adding a constant to log(votes) is the same as multiplying votes by a constant. log(c) + log(n) = log(n*c). So, each half-life we add to log(votes) doubles those votes power, giving us the same decay we had in the previous algorithm.

This means we don’t have to worry about overflows anymore!

Implementation

class HotStream
  STREAM_KEY = "hot_stream"

  # How long until a post with 100 votes is less interesting than one with 10 votes?
  # Reddit uses 12 hours
  TENTH_LIFE = 12.hours.to_f
  
  # just to make it clear it's still the same algorithm
  HALF_LIFE = TENTH_LIFE * Math.log(10) / Math.log(2) 

  def self.onVote(post)
    # dict[post.id] = value
    REDIS.zadd(STREAM_KEY, post.id, post.created_at.to_i / TENTH_LIFE + Math.log10(post.votes + 1))
    trim(STREAM_KEY, 10000)
  end
 
  def self.get(limit = 20)
    # arr.sort.reverse[0, limit]
    REDIS.zrevrange(STREAM_KEY, 0, limit)
  end

end

Drip Stream

This algorithm uses the same decay used in the hot steam, plus a threshold to create a digg-like, rate-limited, append-only stream.

What it does

Whenever a new post crosses the threshold, the threshold is incremented by the “drip period”, and the post is added to the drip stream. Since we’re constantly increasing the base score of each new post, a new post should be added to the stream once per drip period.

In the visualization below, votes are cast randomly on a series of posts. Each column represents the “hot” score of one post. The threshold is marked with a horizontal red line. As posts cross the threshold and are added to the drip stream, they are marked red.


Half Life:
Drip Rate:
Vote Rate:
Vote Distribution:

Implementation

class DripStream
  STREAM_KEY = "drip_stream"
  THRESHOLD_KEY = "drip_stream_threshold"

  # How long until a post with 100 votes is less interesting than one with 10 votes?
  # Reddit uses 12 hours
  TENTH_LIFE = 12.hours.to_f

  # How often should a new story be pushed to the stream?
  DRIP_PERIOD = 1.hour.to_f

  def self.newVote(post)
    return if REDIS.zscore(STREAM_KEY, post.id)

    score = post.created_at.to_i / TENTH_LIFE + Math.log10(points)) 
    threshold = (REDIS.get(THRESHOLD_KEY)||score).to_f + DRIP_PERIOD.to_f / TENTH_LIFE

    if score > threshold
      REDIS.set(THRESHOLD_KEY, threshold + DRIP_PERIOD.to_i / TENTH_LIFE)

      # dict[post.id] = value
      REDIS.zadd(STREAM_KEY, post.id, Time.now.to_i)

      trim(STREAM_KEY, 10000)
    end
  end
 
  def self.get(limit = 20)
    # arr.sort.reverse[0, limit]
    REDIS.zrevrange(STREAM_KEY, 0, limit).map(&:to_i)
  end
end

Friends Stream

This creates a twitter-like stream of people/places/things you are following.

Isn’t that trivial?

Sure, usually. That’s why it’s at the end.

SELECT * FROM posts WHERE user_id IN (7,23,42,...) ORDER BY created_at LIMIT 20

Unfortunately, as you scale, IN queries get slow. Mongo pulls down 20 posts from each user, sorts them all by hand, then crops. When users follow thousands of other users, that gets slow. The SQL databases I tried at the time didn’t cut it either.

I no longer have the benchmarks. Don’t take my word for it. Just remember this is here if you start seeing thousand-entry IN queries in your slowlog.

How it works

The active ingredient is a ZSET of all users and their most recent post. That ZSET can be quickly intersected with the set of followed users, then sliced to create a list of recently active people you follow.

In this implementation, I’m using the actives list to union ZSETs containing each user’s stream. You could just as easily use the list to pair down the arguments to your IN query.

Implementation

class FriendsStream
  USER_STREAM_KEY = lambda{|user_id| "user_stream_#{user_id}"}
  USER_FRIENDS_KEY = lambda{|user_id| "user_friends_#{user_id}"}
  USER_ACTIVE_FRIENDS_KEY = lambda{|user_id| "user_active_friends_#{user_id}"}
  FRIENDS_STREAM_KEY = lambda{|user_id| "friend_stream_#{user_id}"}
  ACTIVE_USERS_KEY = "active_users"

  def self.follow(user, to_follow)
    REDIS.sadd(USER_FRIENDS_KEY[user.id], to_follow.id)
  end

  def self.push(post)
    REDIS.zadd(USER_STREAM_KEY[post.user_id], post.id, post.created_at.to_i)
    trim(USER_STREAM_KEY[post.user_id], 40)
    REDIS.zadd(ACTIVE_USERS_KEY, post.user_id, post.created_at.to_i)
    trim(ACTIVE_USERS_KEY, 10000)
  end

  def self.get(user, limit=20)
    REDIS.zinterstore(USER_ACTIVE_FRIENDS_KEY[user.id], [ACTIVE_USERS_KEY, USER_FRIENDS_KEY[user.id]])
    active_friends = REDIS.zrevrangebyscore(USER_ACTIVE_FRIENDS_KEY[user.id], 0, limit)
    REDIS.zunionstore(FRIENDS_STREAM_KEY[user.id], active_friends.map(&USER_STREAM_KEY))
    REDIS.zrevrange(FRIENDS_STREAM_KEY[user.id], 0, limit).map(&:to_i)
  end
end

Hiring Plug

Like everyone else, Heyzap is always hiring great engineers. If you found this interesting, or better yet obvious, drop us an email. Make sure to mention you read this article. I think I get a bonus.

Email: jobs@heyzap.com

About Us : heyzap.com/about

How to Ace a Startup Engineering Interview Part 1

In the last 5 years I have interviewed hundreds of engineering candidates and I thought it would be valuable to put my thoughts on how people could interview better. This will:

  1. Help people be prepared for their next interview

  2. Help connect engineers to the correct jobs.

  3. Hopefully lead to better engineers in the world :)

There are two main categories that we judge engineers on:

In Part 1, I will focus on technical skills.

Technical Skills for an Engineer at a Startup

I think about practical skills across 3 main principles:

3 Main Principles of Technical Skills

These are all important:

Preparing for a job interview is the perfect opportunity to take yourself to the next level as a programmer and improve all these abilities.

Improve your Theoretical knowledge

Learn C

C is a fundamental building block of programming. As such, knowing C gives you a strong base of understanding the higher level concepts you will likely need to know as a programmer in a startup.

Having a basic working knowledge of C does not require too much work and can be picked up in a couple of weeks. C does not have much abstraction above assembly, this means by definition it is relatively simple. Like many others I learned most of C by reading the C Programming Language book.

Once you get your head around pointers and memory management, C is a fun language as it takes you close to the metal of what a processor does and has principles like pointer management that do not exist in most other higher level languages. Understanding these concepts will also help you understand how higher level languages work.

Learn about Data Structures and Algorithms

Although it is relatively rare in web/app development to code up complex algorithms and data structures, data structures and algorithms has had more effect on my thought processes around building complex systems than anything else. Even if you are not making your own data structures, you will be making choices about how to use data structures every day of your programming career, so understanding the basics is crucial.

I recommend the book, Introduction to Algorithms. Since the book is quite long, if you don’t have time to read it all, the basics of sorting, hash tables, binary trees, and string matching sections are highly recommended.

Once you have a good grasp of data structures and algorithms, you’ll find these concepts put just about everything you do as an engineer into context.

Comparative Programming Languages

In a startup, you will often be touching many languages across the technology stack. You may also be required to learn new languages and concepts quickly. To speed up learning, it helps to have a good understanding of various language concepts so you can quickly see their similarities. The best approach is to learn one language in every major style:

If you haven’t touched a language in one of these categories, it can feel as if you are learning to program all over again when you try it for the first time, which can be a fun experience.

Have an Expert Level Knowledge in at least one Language and Framework

Expert knowledge in a specific language or framework demonstrates:

  1. Your ability to become an expert and there is no reason why you would be unable to become an expert in other areas as well.
  2. You understand some of the nuances involved with languages/frameworks and can make decisions on pros and cons of different tech.
  3. You have the passion to go deep on subjects and get to the heart of a language.

It takes time and energy to get to an expert level. Here are some tips to help

Improve your Practical Coding Ability

There are three ways we look at an engineer’s practical coding ability:

  1. Code examples from their contributions to open source and other projects (for example, on Github)
  2. On the spot coding questions on a whiteboard or remotely over an Etherpad clone
  3. A 2 hour long coding challenge

The quality and speed of an engineer’s coding ability is important and can only come with practice.

It is also helpful to use the appropriate language or framework for the job. Scripting languages like Python and Ruby can be quicker to program in than static languages like Java/C++ for many situations. Since we understand that an interviewee might not have had much experience in a dynamic language, we try to factor out its importance, but it still has influence.

Doing coding challenges can also help hone your skills. You can find a number of resources online and can often get a benchmark on how fast you are so that you can further optimize your speed. Google Code Jam is a good source

Improve your Relevant Tech Experience

Having relevant experience is important in multiple ways:

You can get this experience from your previous employer or from side projects.

Side projects are really good signals for us. They show us:

  1. You have a passion for technology and for building things
  2. You can generally explore more modern technologies and have a more well-rounded perspective
  3. You practice building things fast in a new space, which is exactly what you will be doing at a startup.

These don’t have to be complex, massive projects to make an impact. Some types of side projects that are quick and show off your experience:

It is fun to do a side project that explores some new technology/frameworks, like node.js, backbone, bootstrap etc. If a candidate comes in with a good repertoire of side projects and can talk intelligently about what they learnt while doing them , it is a great positive signal.

Conclusions

I am looking forward to talking to candidates who can ace all of these areas. Of course the technical side is just one half of the coin. I will explore the non-technical side in part 2.

Heyzap is hiring great engineers. If you have some Android, iOS or Rails skills, even better. Get in touch with us at jobs@heyzap.com and check us out at heyzap.com/about.

Continuous Cache Warming for Rails

Warning: what you are about to see will offend your sensibilities as an engineer. It should never be used in production or for user-facing or critical client purposes. It is terrible, no-good very-bad hackiness, and also kinda works. I just hope there’s not some simple way to solve this that I totally missed. Now that you’ve been warned, read on!

There’s a problem with Rails page caching – the page has to fully load once in somebody’s browser before the cache kicks in. That means somebody’s request is really slow, or worse, times out. Not so good. Wouldn’t it be nice if something in the background could do all of a page’s data operations, render the view, and then update the cache for that page, without users ever noticing? Well, check this out.

You can render a page in irb via this method:

  app.get 'https://www.mywebsite.com/slow/endpoint'; true
  response = app.response; true

I use ; true to prevent irb from dumping huge amounts of request / response data onto the console.

Unfortunately, I couldn’t find any way of doing this outside irb that wasn’t very complicated. Our slow endpoint was on a back-end administrative page only; faking the session data in curl would have been annoying. Also, it was exceeding the timeout limits of our production server. So I had to add an ugly line in: Admin::SlowController.set_timeout 60000, which has to be run in the rails environment. So, irb it is! Or script/console in this case.

Here’s the rundown. I set up a shell script cache_generator.sh to run irb with a ruby file, like so:

    #!/bin/bash
    script/console production < worker/cache_page.rb

Then set up a ruby script cache_page.rb to do some crazy duck-typing to prepare our environment, render the page, and stuff it in Redis (our cache of choice):

    Admin::SlowController.skip_before_filter :admin_required, :only => [:long]
    Admin::SlowController.set_timeout 60000
    app.get 'https://www.heyzap.com/admin/slow/long?ignore_cache=true'; true
    response = app.response; true
    REDIS.hset('page_caches', 'long', response.body)

This ignores the required filters, runs as long as it needs to, renders the page and stores it. Now all that’s left is to display it. In our controller method, I just added the following at the top:

    def long
      # don't use cache
      if page = REDIS.hget('installs_cache', 'main') & !params[:ignore_cache]
        render :inline => page and return
      end
      # ...
      render :action => :long
    end

That render :action => :long at the end is important - that’s necessary for the script/console code to work correctly. You might also want to skip the cache if there’s a fancy query string or something - maybe if params.count > 2, so if there are any crazy options the option-less cache won’t be used.

Last step, set up a cron job to run your script every now and then. I set ours for every five minutes, so we won’t be too far off real data.

    */5 * * * * cd /home/deploy/myapp/current && worker/cache_generator.sh

And boom! Your page now re-generates regularly, and is cached whenever users actually hit it. Wacky, hacky, and fun!

How to Write a Self-Documenting API

Since the launch of Heyzap, we have been collecting a ton of interesting social data around mobile games on Android and iPhone. We think there is a lot of potential in using this data to discover interesting things around games and we wanted to expose that to the world through our API. Even though our mobile apps connect to our internal API, making a new API was not as easy as exposing the internal mobile API, because:

So, when our internal hack day was approaching, I thought it was a perfect idea to try to launch an API and invest some time in learning how best to build it.

You can check out the first version API documentation and go to the root of the API to start exploring.

Here is a cool usage of the API that Chris did for our hack day: Live Check-in map

The Inspiration

Heyzap has previously had several internally and externally facing APIs so I was aware of the mistakes I had made. Here are some that were foremost in my mind:

Before starting the Heyzap API, I spent several hours researching blog posts and look at how other APIs were written. Here are the resources that I drew most inspiration from

Considerations for Self-Documenting APIs

There are several aspects to making an API self-documenting. Here are things we considered

These are discussed more in depth below.

URLs

A lot of thought was put in to the URL structure, and invested a reasonable amount of time in to the routes file before writing a line of code. Here is what we ended up with:

  map.namespace :api do |api|
    api.namespace "v1" do |version|
      # root
      version.connect "", :controller => "base", :action => :index

      version.resources :games, :only => [:index, :show], :member => {:players => :get}, :controller => "games" do |games|
        games.resources :ios, :only => [:index, :show], :collection => {:search => :get, :trending => :get, :popular => :get}
        games.resources :android, :only => [:index, :show], :collection => {:search => :get, :trending => :get, :popular => :get}

        games.resources :activity, :only => [:index], :collection => {:checkins => :get, :questions => :get, :tips => :get}, :controller => "games/activity"
      end

      version.connect "users/:id", :controller => "users", :action => :show
      version.resources :users, :only => [:index, :show], :collection => {"search" => :get},  :member => {:badges => :get, :followers => :get, :following => :get,
                                                           :games => :get, :boss_of => :get}, :controller => "users" do |users|
        users.resources :activity, :only => [:index], :collection => {:checkins => :get, :questions => :get, :tips => :get}, :controller => "users/activity"
      end
      version.resources :activity, :only => [:index, :show], :member => {:checkins => :get, :questions => :get, :tips => :get, :badges => :get, :bossings => :get}, :controller => "activity"
    end
  end

At least 80% of this was written before a line of code was written.

The Resource Roots and Cross Linking

Normally, when I’m designing an API, I don’t put anything in the root URL or in places where I don’t expect to give data. But to make it sufficiently self-document, this API returns URL to relevant API end points wherever possible. This is the root response:

{
    activity_api_url: "http://www.heyzap.com/api/v1/activity",
    games_api_url: "http://www.heyzap.com/api/v1/games",
    users_api_url: "http://www.heyzap.com/api/v1/users"
}

And if you happen to click on the games_api_url you get:

{
    platforms_allowed: {
        android_url: "http://www.heyzap.com/api/v1/games/android",
        ios_url: "http://www.heyzap.com/api/v1/games/ios"
    }
}

In this way, without reading any documentation, you can explore most of the API. You of course need a JSON viewer plugin to do this.

Additionally, whenever a different object is references in the API there is a API URL to it so you can explore in that way, too.

The Short and Long Form Objects

To keep the API readable, all end points that return an array of objects always returns the objects in a short form. It includes a URL always labeled “url” to the full object resource.

This is a practice that Github follows and I really liked how it gave the information necessary at the array level and gave URLs to the full information. Twitter tended to just give an array of raw object IDs which was less self-documenting.

Use of Request and Response Headers

There are pros and cons to the use of request/response headers to convey a lot of information. Strictly speaking one could do all of the following with appropriate headers

Although REST gives this ability, I think at times it can be not very readable and self-documenting. The two that I did do through headers were using status codes and rate limiting, but in both cases the response body also repeated the information.

Some Compromises

To get this done in one day while still being scalable, I only exposed data that has been hit at scale on our mobile apps. This meant that, for example, Activity streams come in iOS or Android but you can’t get a combined activity feed currently. Obviously the alternatives could have been done but it wasn’t within the scope to make and test new ways of accessing the data.

I am not fully happy with the way we handle pagination. Our mobile app uses infinite scrolling everywhere so it only really needs next_page_url. I would have preferred to encourage all people to skip through real pages with readable page numbers.

Initially the API is read only. Setting up write capabilities end-to-end was again not within the scope of a day.

Conclusion

The API is just launching today so I am sure there will be things that we learn in real world usage that we didn’t anticipate. We will probably move over our mobile apps to use this API or something very similar as soon as we add write capabilities.

To start using the API read the API documentation and go to the root of the API to start exploring.

Sunspot-Resque Session Proxy

We use Sunspot and Solr for search and indexing. Solr is an Apache Foundation project that makes an easy standalone interface to the Lucene search engine. Sunspot and its partner gem sunspot_rails provide an easy interface to the Solr server in Ruby, and hooks into ActiveRecord and the rails request lifecycle to update the search index in Solr automatically. For most setups, following the add sunspot to your app in 5 minutes tutorial should work just fine.

But when you’re dealing with millions of records over many different types of data, and rather specific searching needs, the 5-minute setup doesn’t quite cut it. At our rate of throughput, Sunspot was generating way too many commits to the search index, making it impossible to search. Also, if there was any kind of error in Sunspot, it would raise an exception that halted the current request. These two problems compounded each other and search basically unusable.

To deal with the errors halting the request, we decided to delay tasks like indexing and committing until after the request. That way the commit errors would disappear, and we’d only see web request errors on things that actually interfere with the web request - actual searches. After all, if you change your username, it isn’t critical that it be searchable in real-time; waiting a few seconds or minutes in that case is fine. While the 3rd-party sunspot_index_queue library does this, it didn’t integrate with our existing worker processes, which all run through Resque. So we had to write an integration ourselves.

sunspot_rails uses a “session” object to talk to Solr. The standard built-in session object makes an HTTP request to Solr for each request that it receives. We wanted a session object that only made a request for data needed right now (i.e. actual searches) and shipped everything else to a Resque worker class. sunspot_rails provides an example session object in stub_session_proxy, so we started from there and wired up the methods to do what we wanted. Here’s how it looks right now:

require 'hoptoad_notifier'
require "#{RAILS_ROOT}/solr/lib/sunspot_worker.rb"

module Sunspot
  module SessionProxy
    class ResqueSessionProxy < AbstractSessionProxy
      attr_reader :search_session

      delegate :new_search, :search, :config,
                :new_more_like_this, :more_like_this,
                :delete_dirty, :delete_dirty?, :dirty?,
                :to => :search_session

      def initialize(search_session = Sunspot.session)
        @search_session = search_session
      end

      def rescued_exception(method, exception)
        HoptoadNotifier.notify(exception)
        $stderr.puts("Exception in SunspotSessionProxy\##{method}: #{exception.message}")
      end

      [:index!, :index, :remove!, :remove].each do |method|
        module_eval(<<-RUBY)
          def #{method}(*objects)
            missed_objects = []
            objects.each do |object|
              if(object.is_a? ActiveRecord::Base)
                Resque.enqueue SunspotWorker, :#{method}, {:class => object.class.name, :id => object.id }
              else
                missed_objects << object
              end
            end
            begin
              @search_session.#{method}(missed_objects) unless missed_objects.empty?
            rescue => e
              self.rescued_exception(:#{method}, e)
            end
          end
        RUBY
      end

      [:remove_by_id, :remove_by_id!].each do |method|
        module_eval(<<-RUBY)
          def #{method}(clazz, id)
            Resque.enqueue SunspotWorker, :remove, {:class => clazz, :id => id}
          end
        RUBY
      end

      def remove_all(clazz = nil)
        Resque.enqueue SunspotWorker, :remove_all, clazz.to_s
      end

      def remove_all!(clazz = nil)
        Resque.enqueue SunspotWorker, :remove_all, clazz.to_s
      end

      [:commit_if_dirty, :commit_if_delete_dirty, :commit].each do |method|
        module_eval(<<-RUBY)
          def #{method}
            Resque.enqueue(SunspotWorker, :commit) unless ::Rails.env == 'production'
          end
        RUBY
      end
    end
  end
end

We ship every recognized action to the Resque worker, and only send http requests for unrecognized actions. You’ll notice we also catch exceptions in a begin / rescue block and send them to our exception catching service, which has since been renamed to Airbrake. Now all we needed was a Resque worker that would handle the input and do the actual processing. Here’s what that looks like:

require 'resque-retry'

class SunspotWorker
  extend Resque::Plugins::ExponentialBackoff
  @queue = :solr_index

  def self.perform(sunspot_method, object = nil)
    sunspot_method = sunspot_method.to_sym
    object = object.with_indifferent_access if object.is_a? Hash

    session = Sunspot.session
    Sunspot.session = Sunspot::Rails.build_session
    case sunspot_method
    when :index
      self.index( object[:class].constantize.find(object[:id]) )
    when :remove
      self.remove_by_id(object[:class], object[:id])
    when :remove_all
      self.remove_all(object)
    when :commit
      self.commit
    else
      raise "Error: undefined protocol for SunspotWorker: #{sunspot_method} (#{objects})"
    end
    Sunspot.session = session
  end

  def self.index(object)
    Sunspot.index(object)
  end

  def self.remove_by_id(klass, id)
    Sunspot.remove_by_id(klass, id)
  end

  def self.remove_all(klass = nil)
    klass = klass.constantize unless klass.nil?
    Sunspot.remove_all(klass)
  end

  def self.commit
    # on production, use autocommit in solrconfig.xml 
    # or commitWithin whenever sunspot supports it
    Sunspot.commit unless Rails.env == 'production'
  end
end

Since our default Rails environment sets the sunspot session to our proxy-queuing object, we have to manually change the connection to a standard session object that actually talks to Solr. We do that using the sunspot_rails build_session method, and then put it back when we’re done.

You’ll also notice we don’t allow commits, either in the proxy or in the worker in our production environment. This handily takes care of the “too many commits” issue, which usually reports itself as “MaxWarmingSearchers Exceeded”. We use Solr’s autocommit, so our app never has to worry about committing in general. While it would be nicer to use Solr’s new and shiny commitWithin, Sunspot doesn’t support it, and autocommit was sufficient for our purposes.

That’s our code for the day - we’re debating whether to package this up as a Sunspot plugin,