-
Notifications
You must be signed in to change notification settings - Fork 71
Add (*Set).FromJSON benchmarks and switch to post-accumulation sort. #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add (*Set).FromJSON benchmarks and switch to post-accumulation sort. #310
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: benluddy The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| if appendOK { | ||
| *m = append(*m, pe) | ||
| } else { | ||
| children.Members.Insert(pe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does append + sort do the same thing as Insert (here) and Descend (below)?
it looks like Insert can be a no-op if the item already exists in the set:
structured-merge-diff/fieldpath/element.go
Lines 265 to 267 in 3392408
| if s.members[loc].Equals(pe) { | |
| return | |
| } |
and Descend can return the existing set if the path already exists
structured-merge-diff/fieldpath/set.go
Lines 496 to 498 in 3392408
| if s.members[loc].pathElement.Equals(pe) { | |
| return s.members[loc].set | |
| } |
It looks like switching these to unconditional appends can produce duplicates where that was impossible before?
If so, can you add tests that ensure this isn't changing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I added tests for duplicate members and child sets (the members of the last duplicate child set is kept) and fixed it. This made the bulk grandchildren sort a little more expensive, because it had to preserve the accumulated order. It still pays off in the worst case as inputs get larger than about a couple hundred kB (well within the default apiserver request size limit).
6c869cc to
367149b
Compare
Set deserialization with children in reverse order performs an insertion sort. Sets serialized by
structured-merge-diff are sorted, but in practice the serialization of sets is controlled by clients
and they can (intentionally or not) exercise this worst case. Switching to a bulk insert followed by
a sort appears to perform no worse in the best case (sorted) and is significantly better in the
worst case:
│ master │ bulk │
│ sec/op │ sec/op vs base │
SetFromJSON/children_in_ascending_order 1.286m ± 3% 1.341m ± 1% +4.27% (p=0.000 n=10)
SetFromJSON/children_in_descending_order 20.189m ± 0% 1.385m ± 1% -93.14% (p=0.000 n=10)
SetFromJSON/grandchildren_in_ascending_order 7.821m ± 1% 8.211m ± 1% +4.98% (p=0.000 n=10)
SetFromJSON/grandchildren_in_descending_order 75.16m ± 4% 13.31m ± 13% -82.28% (p=0.000 n=10)
geomean 11.11m 3.774m -66.04%
│ master │ bulk │
│ B/op │ B/op vs base │
SetFromJSON/children_in_ascending_order 1.320Mi ± 0% 1.320Mi ± 0% ~ (p=1.000 n=10)
SetFromJSON/children_in_descending_order 1.320Mi ± 0% 1.320Mi ± 0% +0.00% (p=0.000 n=10)
SetFromJSON/grandchildren_in_ascending_order 5.395Mi ± 0% 5.395Mi ± 0% +0.00% (p=0.000 n=10)
SetFromJSON/grandchildren_in_descending_order 6.187Mi ± 0% 5.395Mi ± 0% -12.80% (p=0.000 n=10)
geomean 2.761Mi 2.668Mi -3.37%
│ master │ bulk │
│ allocs/op │ allocs/op vs base │
SetFromJSON/children_in_ascending_order 25.99k ± 0% 25.99k ± 0% ~ (p=1.000 n=10) ¹
SetFromJSON/children_in_descending_order 25.99k ± 0% 25.99k ± 0% ~ (p=1.000 n=10) ¹
SetFromJSON/grandchildren_in_ascending_order 138.4k ± 0% 138.4k ± 0% ~ (p=1.000 n=10) ¹
SetFromJSON/grandchildren_in_descending_order 155.7k ± 0% 138.4k ± 0% -11.11% (p=0.000 n=10)
geomean 61.77k 59.98k -2.90%
¹ all samples are equal
367149b to
26509bc
Compare
Set deserialization with children in reverse order performs an insertion sort. Sets serialized by structured-merge-diff are sorted, but in practice the serialization of sets is controlled by clients and they can (intentionally or not) exercise this worst case. Switching to a bulk insert followed by a sort appears to perform no worse in the best case (sorted) and is significantly better in the worst case:
This was originally proposed in #292. All I've done here is added benchmarks to demonstrate the worst case performance and separated it from the json-iterator migration.