-
Notifications
You must be signed in to change notification settings - Fork 19
1410 Accelerate add_edge for large graphs #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1411 +/- ##
==========================================
- Coverage 97.29% 97.28% -0.01%
==========================================
Files 180 180
Lines 15646 15663 +17
==========================================
+ Hits 15223 15238 +15
- Misses 423 425 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
HenrZu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Im still not entirely sure about the overall usefulness of the feature.
Do you have some information about the overhead caused by the sorting and checking for dublicates?
if we want to add this feature, we should also extend the documentation of the current add edge function starting in line 169.
| * @return Edge<EdgePropertyT>& End of edge vector | ||
| */ | ||
| template <class... Args> | ||
| Edge<EdgePropertyT>& lazy_add_edge(size_t start_node_idx, size_t end_node_idx, Args&&... args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should catch the case, when m_edges is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Doesn't emplace_back also work on an empty vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He potentially means uninitialized, not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
| * | ||
| * @return Edge<EdgePropertyT>& End of edge vector | ||
| */ | ||
| Edge<EdgePropertyT>& sort_edges() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why return the reference and not use void?
| /** | ||
| * @brief Make the edges of a graph unique. | ||
| * | ||
| * Copies all the unique edges to a new vector and replaces the edge vector of the graph with it. Unique means that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also mention that properties might get lost.
Additionally, if i add two edges with the same start/end combination. Which edges gets deleted? the one that was added latest?
| * the start and end node indices are unique. Other edge properties are not checked. | ||
| * @return Edge<EdgePropertyT>& End of edge vector | ||
| */ | ||
| Edge<EdgePropertyT>& make_edges_unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be clearer, if we use the term "remove dublicates" instead of unique edges.
Also renaming the function to remove_duplicate_edges would add clarity here imo.
| * @return Edge<EdgePropertyT>& End of edge vector | ||
| */ | ||
| Edge<EdgePropertyT>& make_edges_unique() | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also an option to merge the properties would be nice. However, this would make some restrictions to the property.
| { | ||
| std::vector<Edge<EdgePropertyT>> unique_edges; | ||
| unique_edges.reserve(m_edges.size()); | ||
| std::ranges::unique_copy(m_edges, std::back_inserter(unique_edges), [](auto&& e1, auto&& e2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do a copy here. Is this really neccessary?
| * @param start_node_idx Id of start node | ||
| * @param end_node_idx Id of end node | ||
| * @param args Additional arguments for edge construction | ||
| * @return Edge<EdgePropertyT>& End of edge vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe the return more in detail. This is to generic.
Currently, its a reference to the latest added edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also doxy-comment @tparam ?
| * @return Edge<EdgePropertyT>& End of edge vector | ||
| */ | ||
| template <class... Args> | ||
| Edge<EdgePropertyT>& lazy_add_edge(size_t start_node_idx, size_t end_node_idx, Args&&... args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a motivation, why we even need this function? Whats the use case? why not add a bool flag to the existing add_edge function to allow addition without checking or sorting?
also, i would prefer the name add_edge_naive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe directly unchecked?
| EXPECT_THAT(g.out_edges(1), testing::ElementsAreArray(v1)); | ||
| } | ||
|
|
||
| TEST(TestGraph, compare_add_edge_functions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases are great so far. However, could you add a few more tests for edge cases (empty graph / calling make_unique... before sorting / testing if the correct properties remain and the expected ones get deleted)
| .. dropdown:: :fa:`gears` Working with large graphs | ||
|
|
||
| When working with very large graphs, i.e. starting from a few thousand edges, it will be faster to not use the standard ``add_edge`` function. This function always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now i see the motivation. Some questions.
Does that really make a noticeable difference? Have you ever quantified it?
Especially in simulations with many edges, we expect a relatively large runtime. Is the overhead caused by checking so much higher that we accept new sources of error (such as forgetting to call the sort function, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does make a difference of around four orders of magnitude in example simulations. For example, with 1.6 million edges I measure 4.5e+03 seconds with the standard function and 9.6e-01 seconds with the new functions.
Changes and Information
Please briefly list the changes (main added features, changed items, or corrected bugs) made:
lazy_add_edgefunction that does neither check for uniqueness nor sort the added edges.sort_edgesand amake_edges_uniquefunctionreserve_edgesfunction to reserve space for edgesIf need be, add additional information and what the reviewer should look out for in particular:
Merge Request - Guideline Checklist
Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.
Checks by code author
Checks by code reviewer(s)
Closes #1410