Monday, October 1, 2012

Splitting Trade Units

Dividing a single executed trade proportionally for several customers is an interesting problem.  Naive decisions can lead to unpleasant and surprising results. In this post, I will describe the problem of fairly partitioning a trade with an integer number of units amongst a group of participants who are participating according to some numerical proportions (think pie chart).  The proportions will be specified as double precision floating point numbers, and they will be normalized relative to their overall sum.



Requirement: Make it Whole


Given a new trade for U units which establishes a new position, and N participants where participant i has P[i] >= 0 proportion.  If the total proportion (left-associated sum) is P>0, then the problem becomes finding values U[i] whose sum is U, and which are proportional to the P[i] values.

Using (P[i]/P)*U and rounding, seems like a good idea, but it is wrong.  Consider what happens with N = 3, P[0] = P[1] = P[2] = 10.0 and U = 20 : each participant is given 6.6666666666 units (but rounded  to the nearest integer using an unspecified rounding technique).  A deterministic round to 6 units each or 7 units each hands out the wrong number of total units (18 versus 21).  So this won't work.

Instead of trying to establish unit counts right away per client, I attempted to assign unit counts to two groups of participants (essentially abstracting the problem to two participants).  I wrote the following code:


 void inline position_split(
   int & units1
 , int & units2
 , const double & prop1
 , const double & prop2
 , int total_position)
{
 double total_proportion = prop1 + prop2;
 assert(total_proportion > 0 || total_position == 0);
 if(total_position) {
  units1 = ROUND_UNITS(total_position * prop1 / total_proportion);
  units2 = total_position - units1;
 } else {
  units1 = 0;
  units2 = 0;
 }
}

Going back to my N = 3 example, this is better since I can use it to determine (U[0], U[1]+U[2]) using (P[0],P[1]+P[2]) and then invoke it again in a similar way to find out what (U[0]+U[1],U[2]) is.  Applying the code above I obtain the following unit values: U[0]=7, U[1] = 6, U[2]=7.  Great, now the values add up to the expected total U.


Selling Out


Now that you are in a trade for U units and each participant has his piece the most interesting part happens.  How do you attribute profit as each unit is closed (in the worst case each one is done separately).  In essence, you have to have an ordering of the existing units of the position and an idea of whose unit will go next.  If the number of participants is very high, it would also be nice to not have to store the number of units left to each of them at all times.  This is key, since persisting that information can be a painful limit on performance as each update will cause a side-effect to an underlying database.

My initial approach was to leverage position_split to figure out what the position distribution would be if for (U-1) total units and then just compare that with the result for U to find out which participant lost a unit.  This does not work.  Back to our example plugging in 19 for position size we get U[0] = 6, U[1] = 7, and U[2] = 6.  This is a problem since the middle participant (i=1) actually gained a unit when we reduced the position.

Since position_split is not enough and we still want to avoid storing detailed per-participant data on the remaining position as it gets reduced, we return to considering how to order the units of the beginning position in a way that fairly strips them from the participants.  How can the ordering of the units sold off be deterministic and therefore, require no storage?

Partial Binary Trees 


A binary tree is a tree with two paths emenating from each non-leaf node in the tree.  The leaves can be addressed using the binary coded path from the root to the leaf.  A partial binary tree is a binary tree which is missing leaves beyond a particular coded address.  It should be obvious that a unique partial binary tree exists with U leaves for any positive integer U.  Consider the following tree for U = 5 (as an example):


I have written the coded binary path (0s and 1s for lefts and rights) on the leaves backwards.  Those backwards path numbers can be arranged as a sequence of unsigned binary numbers S = ( 0, 4, 2, 1, 3 ).  In this case, the sequence has no gaps (missing numbers), but that is not the case in general.  Using OS = sort(S),  to denote the ascending totally order sequence for S, we can say that after reducing the position by r units the next (ordered) unit to be picked is OS-1[S[OS-1[r]]].  For the above tree, OS = (0,1,2,3,4), so unit 0 goes first, unit 4 goes second, etc.  When there are gaps in OS, the point of using its inverse is clearer.

Essentially, for each trade which is originally U units in size, we can keep track of the number of units sold so far (r) and figure out which participants lose the next t units by evaluating OS-1•S•OS-1 at r,r+1,...r+t-1.  The position_split function indicates which participant originally had each of the units (e.g. participant 0 gets units 0 to U[0]-1, assuming U[0]>0).  I have empirically observed that the ordering of sold units using this technique is reasonably fair -- at each point the relative amount of units each participant has is roughly in line with their initial proportion amounts P[i].


Summary

Splitting the initial trade properly, requires a technique which is sensitive to rounding and floating point arithmetic. It distributes the units of a trade to several participants using their proportion amounts P[i] and it ensures that the right total number of units are distributed.

The reversed binary codes for a partial binary tree provide a deterministic permutation which can be used to fairly sell out a position so that at any point the participants still hold their proportion of what is left in aggregate.  By only knowing how many units were already sold r and the additional number to be sold t, an algorithm can tell you which participants are losing how many units.  The code to efficiently compute the permutation is a bit involved, so I have not posted it here.  The basics needed to reconstruct the approach are here, however. 



No comments:

Post a Comment


Follow Mark on GitHub