Efficient algorithm for weighted random sampling with replacement

admin

Administrator
Staff member
I have a constant amount of samples, with each sample having a probability. Now I want to resample from this data source to get the <strong>same amount</strong> of new samples, each having the same probability.

For example:

Code:
                                          random | 0.03 | 0.78 | 0.45 | 0.70
                                          -------+------+------+------+------
sample | 0000 | 0001 | 0002 | 0003   RNG  sample | 0000 | 0003 | 0002 | 0003
-------+------+------+------+------ ====&gt; -------+------+------+------+------
 prob. | 0.10 | 0.20 | 0.30 | 0.40         prob. | 0.25 | 0.25 | 0.25 | 0.25

In my case, the probabilities wouldn't be given directly but as weights. However, the probabilities can be directly derived from the weights as the sum of all weights is known (but not constant).

In a MATLAB implementation, I used the <a href="http://www.mathworks.ch/help/toolbox/stats/randsample.html" rel="nofollow">randsample</a> function of the Statistics Toolbox to achieve this resampling process:

<blockquote>
Code:
y = randsample(n,k,true,w)
or
Code:
y = randsample(population,k,true,w)
returns a weighted sample taken with replacement, using a vector of positive weights
Code:
w
, whose length is
Code:
n
. The probability that the integer
Code:
i
is selected for an entry of
Code:
y
is
Code:
w(i)/sum(w)
. Usually,
Code:
w
is a vector of probabilities.
Code:
randsample
does not support weighted sampling without replacement.
</blockquote>

Code:
function [samples probabilities] = resample(samples, probabilities)
    sampleCount = size(samples, 1);
    indices = randsample(1 : samplecount, samplecount, 
                         true, probabilities);
    samples = samples(indices, :);
    probabilities = repmat(1 / sample count, samplecount, 1);
end

I now want to port this part of the algorithm to an iPad 2 where it is used to update real-time (~25fps) data where <strong>512 samples</strong> are resampled. Therefore, time-efficiency is crucial, as also other calculations will be performed. Memory does not have to be minimized.

I've looked into <a href="http://prxq.wordpress.com/2006/04/17/the-alias-method/" rel="nofollow">the Alias method</a>, however it seems that the structure building process is quite tedious and maybe not the most efficient solution.

Are there any other efficient methods available which would satisfy the realtime requirement or is the Alias method the way to go?