Skip to content

Arrangement batch formation costs in proportion to outstanding updates #460

@frankmcsherry

Description

@frankmcsherry

When an arrangement builder is presented with (far) future updates, they linger in the holding pen and are reconsidered for each batch that is formed. This increases the cost of "extracting" a batch from "proportional to the batch size" to "proportional to all outstanding updates". This is especially noticeable in Materialize with it's mz_now() temporal filters, which introduce (far) future retractions.

To see this in the existing codebase, consider the following modification to examples/hello.rs:

--- a/examples/hello.rs
+++ b/examples/hello.rs
@@ -51,7 +51,7 @@ fn main() {
 
         // Load up graph data. Round-robin among workers.
         for _ in 0 .. (edges / peers) + if index < (edges % peers) { 1 } else { 0 } {
-            input.update((rng1.gen_range(0, nodes), rng1.gen_range(0, nodes)), 1)
+            input.update_at((rng1.gen_range(0, nodes), rng1.gen_range(0, nodes)), 1_000_000, 1)
         }
 
         input.advance_to(1);

The does the bulk loading of data at the start using a future time (1_000_000) and which lingers indefinitely, for our purposes.

Unmodified,

cargo run --example hello -- 1000000 10000000 1 foo

produces output that looks like

...
round 5664 finished after 215.5µs
round 5665 finished after 245.666µs
round 5666 finished after 221.375µs
round 5667 finished after 213.542µs
round 5668 finished after 243.666µs
round 5669 finished after 208.583µs
round 5670 finished after 189.417µs
round 5671 finished after 216.375µs
round 5672 finished after 219.292µs
round 5673 finished after 231.959µs
round 5674 finished after 233.958µs
^C

whereas modified

...
round 18 finished after 85.603625ms
round 19 finished after 98.8575ms
round 20 finished after 96.948833ms
round 21 finished after 100.601833ms
round 22 finished after 74.549875ms
round 23 finished after 94.3985ms
round 24 finished after 75.045083ms
^C

So, clearly there is some orders of magnitude difference between the two.

The best remedy at the moment seems to be to do the batch organization "by time" rather than "by data". This is too bad because batches will want the updates organized by data, but the frontier-based extraction would prefer to have them organized by time. However, due to the partial order on times, there's no guarantee that we can do anything quickly; binary search does not apply in the same way it does for totally ordered times.

Nonetheless, seems like a good idea to build the partially robust operator, which orders by times and if we want to support partially ordered times probably maintains runs of less_equal times which do admit binary search (but all of our updates may be an arbitrarily large number of runs).

Future work might look in to Sorting and Selection in Posets, which aims for bounds related to the "width" of the poset: the largest antichain. There will be no such bound with the above technique.

cc: @antiguru, @petrosagg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions