Skip to content

Commit ed540e3

Browse files
committed
Update docstring for RWR
- Content copied as much as possible from GDS Manual - Add default values
1 parent b381416 commit ed540e3

File tree

1 file changed

+29
-26
lines changed

1 file changed

+29
-26
lines changed

graphdatascience/procedure_surface/api/graph_sampling_endpoints.py

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
class GraphSamplingEndpoints(ABC):
1212
"""
13-
Abstract base class defining the API for graph sampling algorithms algorithm.
13+
Abstract base class defining the API for graph sampling operations.
1414
"""
1515

1616
@abstractmethod
@@ -32,55 +32,58 @@ def rwr(
3232
job_id: Optional[str] = None,
3333
) -> GraphWithSamplingResult:
3434
"""
35-
Computes a set of Random Walks with Restart (RWR) for the given graph and stores the result as a new graph in the catalog.
35+
Random walk with restarts (RWR) samples the graph by taking random walks from a set of start nodes.
3636
37-
This method performs a random walk, beginning from a set of nodes (if provided),
38-
where at each step there is a probability to restart back at the original nodes.
39-
The result is turned into a new graph induced by the random walks and stored in the catalog.
37+
On each step of a random walk, there is a probability that the walk stops, and a new walk from one of the start
38+
nodes starts instead (i.e. the walk restarts). Each node visited on these walks will be part of the sampled
39+
subgraph. The resulting subgraph is stored as a new graph in the Graph Catalog.
4040
4141
Parameters
4242
----------
4343
G : GraphV2
4444
The input graph on which the Random Walk with Restart (RWR) will be
4545
performed.
4646
graph_name : str
47-
The name of the new graph in the catalog.
47+
The name of the new graph that is stored in the graph catalog.
4848
start_nodes : list of int, optional
49-
A list of node IDs to start the random walk from. If not provided, all
50-
nodes are used as potential starting points.
49+
IDs of the initial set of nodes in the original graph from which the sampling random walks will start.
50+
By default, a single node is chosen uniformly at random.
5151
restart_probability : float, optional
52-
The probability of restarting back to the original node at each step.
53-
Should be a value between 0 and 1. If not specified, a default value is used.
52+
The probability that a sampling random walk restarts from one of the start nodes.
53+
Default is 0.1.
5454
sampling_ratio : float, optional
55-
The ratio of nodes to sample during the computation. This value should
56-
be between 0 and 1. If not specified, no sampling is performed.
55+
The fraction of nodes in the original graph to be sampled.
56+
Default is 0.15.
5757
node_label_stratification : bool, optional
58-
If True, the algorithm tries to preserve the label distribution of the original graph in the sampled graph.
58+
If true, preserves the node label distribution of the original graph.
59+
Default is False.
5960
relationship_weight_property : str, optional
60-
The name of the property on relationships to use as weights during
61-
the random walk. If not specified, the relationships are treated as
62-
unweighted.
61+
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.
6362
relationship_types : list of str, optional
64-
The relationship types used to select relationships for this algorithm run.
63+
Filter the named graph using the given relationship types. Relationships with any of the given types will be
64+
included.
6565
node_labels : list of str, optional
66-
The node labels used to select nodes for this algorithm run.
66+
Filter the named graph using the given node labels. Nodes with any of the given labels will be included.
6767
sudo : bool, optional
68-
Override memory estimation limits. Use with caution as this can lead to
69-
memory issues if the estimation is significantly wrong.
68+
Bypass heap control. Use with caution.
69+
Default is False.
7070
log_progress : bool, optional
71-
If True, logs the progress of the computation.
71+
Turn `on/off` percentage logging while running procedure.
72+
Default is True.
7273
username : str, optional
73-
The username to attribute the procedure run to
74+
Use Administrator access to run an algorithm on a graph owned by another user.
75+
Default is None.
7476
concurrency : int, optional
75-
The number of concurrent threads used for the algorithm execution.
77+
The number of concurrent threads used for running the algorithm.
78+
Default is 4.
7679
job_id : str, optional
77-
An identifier for the job that can be used for monitoring and cancellation
80+
An ID that can be provided to more easily track the algorithm’s progress.
81+
By default, a random job id is generated.
7882
7983
Returns
8084
-------
8185
GraphWithSamplingResult
82-
Tuple of the graph object and the result of the Random Walk with Restart (RWR), including the sampled
83-
nodes and their scores.
86+
Tuple of the graph object and the result of the Random Walk with Restart (RWR), including the dimensions of the sampled graph.
8487
"""
8588
pass
8689

0 commit comments

Comments
 (0)