Saturday, November 26, 2011

Eduasync part 16: Example of composition: majority voting

Note: For the rest of this series, I'll be veering away from the original purpose of the project (investigating what the compiler is up to) in favour of discussing the feature itself. As such, I've added a requirement for AsyncCtpLib.dll - but due to potential distribution restrictions, I've felt it safest not to include that in the source repository. If you're running this code yourself, you'll need to copy the DLL from your installation location into the Eduasync\lib directory before it will build - or change each reference to it.

One of the things I love about async is the compositional aspect. This is partly due to the way that the Task Parallel Library encourages composition to start with, but async/await makes it even easier by building the tasks for you. In the next few posts I'll talk about a few examples of interesting building blocks. I wouldn't be surprised to see an open source library with a proper implementation of some of these ideas (Eduasync is not designed for production usage) whether from Microsoft or a third party.

In project 26 of Eduasync, I've implemented "majority voting" via composition. The basic idea is simple, and the motivation should be reasonably obvious in this day and age of redundant services. You have (say) five different tasks which are meant to be computing the same thing. As soon as you have a single answer which the majority of the tasks agree on, the code which needs the result can continue. If the tasks disagree, or fail (or a combination leading to no single successful majority result), the overall result is failure too.

My personal experience with services requiring a majority of operations to return is with Megastore, a storage system we use at Google. I'm not going to pretend to understand half of the details of how Megastore works, and I'm certainly not about to reveal any confidential information about its internals or indeed how we use it, but basically when discussing it with colleagues at around the time that async was announced, I contemplated what a handy feature async would be when implementing a Megastore client. It could also be used in systems where each calculation is performed in triplicate to guard against rogue errors - although I suspect the chances of those systems being implemented in C# are pretty small.

It's worth mentioning that the implementation here wouldn't be appropriate for something like a stock price service, where the result can change rapidly and you may be happy to tolerate a small discrepancy, within some bounds.


Here's the signatures of the methods we'll implement:

public static Task<T> WhenMajority<T>(params Task<T>[] tasks)

public static Task<T> WhenMajority<T>(IEnumerable<Task<T>> tasks)

Obviously the first just delegates to the second, but it's helpful to have both forms, so that we can pass in a few tasks in an ad hoc manner with the first overload, or a LINQ-generated sequence of tasks with the second.

The name is a little odd - it's meant to match WhenAll and WhenAny, but I'm sure there are better options. I'm not terribly interested in that at the moment.

It's easy to use within an async method:

Task<int> firstTask = firstServer.ComputeSomethingAsync(input);
Task<int> secondTask = selectServer.ComputeSomethingAsync(input);
Task<int> thirdTask = thirdServer.ComputeSomethingAsync(input);

int result = await MoreTaskEx.WhenMajority(firstTask, secondTask, thirdTask);

Or using the LINQ-oriented overload:

var tasks = servers.Select(server => server.ComputeSomethingAsync(input));
int result = await MoreTaskEx.WhenMajority(tasks);

Of course we could add an extension method (dropping the When prefix as it doesn't make as much sense there, IMO):

int result = await servers.Select(server => server.ComputeSomethingAsync(input))

The fact that we've stayed within the Task<T> model is what makes it all work so smoothly. We couldn't easily express the same API for other awaitable types in general although we could do it for any other specific awaitable type of course. It's possible that it would work using dynamic, but I'd rather avoid that :) Let's implement it now.


There are two parts to the implementation, in the same way that we implemented LINQ operators in Edulinq - and for the same reason. We want to go bang immediately if there are any clear input violations - such as the sequence of tasks being null or empty. This is in line with the Task-based Asynchronous Pattern white paper:

An asynchronous method should only directly raise an exception to be thrown out of the MethodNameAsync call in response to a usage error*. For all other errors, exceptions occurring during the execution of an asynchronous method should be assigned to the returned Task.

Now it occurs to me that we don't really need to do this in two separate methods (one for precondition checking, one for real work). We could create an async lambda expression of type Func<Task<T>>, and make the method just return the result of invoking it - but I don't think that would be great in terms of readability.

So, the first part of the implementation performing validation is really simple:

public static Task<T> WhenMajority<T>(params Task<T>[] tasks)
    return WhenMajority((IEnumerable<Task<T>>) tasks);

public static Task<T> WhenMajority<T>(IEnumerable<Task<T>> tasks)
    if (tasks == null)
        throw new ArgumentNullException("tasks");
    List<Task<T>> taskList = new List<Task<T>>(tasks);
    if (taskList.Count == 0)
        throw new ArgumentException("Empty sequence of tasks");
    foreach (var task in taskList)
        if (task == null)
            throw new ArgumentException("Null task in sequence");
    return WhenMajorityImpl(taskList);

The interesting part is obviously in WhenMajorityImpl. It's mildly interesting to note that I create a copy of the sequence passed in to start with - I know I'll need it in a fairly concrete form, so it's appropriate to remove any laziness at this point.

So, here's WhenMajorityImpl, which I'll then explain:

private static async Task<T> WhenMajorityImpl<T>(List<Task<T>> tasks)
    // Need a real majority - so for 4 or 5 tasks, must have 3 equal results.
    int majority = (tasks.Count / 2) + 1;
    int failures = 0;
    int bestCount = 0;
    Dictionary<T, int> results = new Dictionary<T, int>();
    List<Exception> exceptions = new List<Exception>();
    while (true)
        await TaskEx.WhenAny(tasks);
        var newTasks = new List<Task<T>>();
        foreach (var task in tasks)
            switch (task.Status)
                case TaskStatus.Canceled:
                case TaskStatus.Faulted:
                case TaskStatus.RanToCompletion:
                    int count;
                    // Doesn't matter whether it was there before or not - we want 0 if not anyway
                    results.TryGetValue(task.Result, out count);
                    if (count > bestCount)
                        bestCount = count;
                        if (count >= majority)
                            return task.Result;
                    results[task.Result] = count;
                    // Keep going next time. may not be appropriate for Created
        // The new list of tasks to wait for
        tasks = newTasks;

        // If we can't possibly work, bail out.
        if (tasks.Count + bestCount < majority)
            throw new AggregateException("No majority result possible", exceptions);

I should warn you that this isn't a particularly efficient implementation - it was just one I wrote until it worked. The basic steps are:

  • Work out how many results make a majority, so we know when to stop
  • Keep track of how many "votes" our most commonly-returned result has, along with the counts of all the votes
  • Repeatedly:
    • Wait (asynchronously) for at least of the remaining tasks to finish (many may finish "at the same time")
    • Start a new list of "tasks we're going to wait for next time"
    • Process each task in the current list, taking an action on each state:
      • If it's been cancelled, we'll treat that as a failure (we could potentially treat "the majority have been cancelled" as a cancellation, but for the moment a failure is good enough)
      • If it's faulted, we'll add the exception to the list of exceptions, so that if the overall result ends up as failure, we can throw an AggregateException with all of the individual exceptions
      • If it's finished successfully, we'll check the result:
        • Add 1 to the count for that result (the dictionary will use the default comparer for the result type, which we assume is good enough)
        • If this is greater than the previous "winner" (which could be for the same result), check for it being actually an overall majority, and return if so.
      • If it's still running (or starting), add it to the new task list
    • Check whether enough tasks have failed - or given different results - so ensure that a majority is now impossible. If so, throw an AggregateException to say so. This may have some exceptions, but it may not (if there are three tasks which gave different results, none of them actually failed)

Each iteration of the "repeatedly" will have a smaller list to check than before, so we'll definitely terminate at some point.

I mentioned that it's inefficient. In particular, we're ignoring the fact that WhenAny returns a Task<Task<T>>, so awaiting that will actually tell us a task which has finished. We don't need to loop over the whole collection at that point - we could just remove that single task from the collection. We could do that efficiently if we kept a Dictionary<Task<T>, LinkedListNode<Task<T>> and a LinkedList<Task<T>> - we'd just look up the task which had completed in the dictionary, remove its node from the list, and remove the entry from the dictionary. We wouldn't need to create a new collection each time, or iterate through all of the old one. However, that's a job for another day... as is allowing a cancellation token to be passed in, and a custom equality comparer.


So we can make this implementation smarter and more flexible, certainly - but it's not insanely tricky to write. I'm reasonably confident that it works, too - as I have unit tests for it. They'll come in the next part. The important point  from this post is that by sticking within the Task<T> world, we can reasonably easily create building blocks to allow for composition of asynchronous operations. While it would be nice to have someone more competent than myself write a bullet-proof, efficient implementation of this operation, I wouldn't feel too unhappy using a homegrown one in production. The same could not have been said pre-async/await. I just wouldn't have had a chance of getting it right.

Next up - the unit tests for this code, in which I introduce the TimeMachine class.



No comments:

Post a Comment