|
2 | 2 |
|
3 | 3 | **Authors: GEMS Lab Team @ University of Michigan** |
4 | 4 |
|
5 | | -This SEMB library allows fast onboarding to explore structural embedding of graph data using hetereogenous methods, with a unified API interface and a modular codebase enabling easy intergration of 3rd party methods and datasets. |
| 5 | +This SEMB library allows fast onboarding to get and evaluate structural node embeddings. With the unified API interface and the modular codebase, SEMB library enables easy intergration of 3rd-party methods and datasets. |
6 | 6 |
|
7 | | -The library itself has already included a set of popular methods and datasets ready for use immediately. |
| 7 | +The library itself has already included a set of popular methods and datasets ready for immediate use. |
| 8 | + |
| 9 | +- Built-in methods: [node2vec](https://github.com/aditya-grover/node2vec), [struc2vec](https://github.com/leoribeiro/struc2vec), [GraphWave](https://github.com/snap-stanford/graphwave), [xNetMF](https://github.com/GemsLab/REGAL), [role2vec](https://github.com/benedekrozemberczki/role2vec), [DRNE](https://github.com/tadpole/DRNE), [MultiLENS](https://github.com/GemsLab/MultiLENS), [RiWalk](github.com/maxuewei2/RiWalk), [SEGK](https://github.com/giannisnik/segk) |
| 10 | + |
| 11 | +- Built-in datasets: |
| 12 | + |
| 13 | + | Dataset | # Nodes | # Edges | |
| 14 | + | ------------------------------------------------------------ | ------- | ------- | |
| 15 | + | [BlogCatalog](http://snap.stanford.edu/node2vec/) | 10,312 | 333,983 | |
| 16 | + | [Facebook](http://snap.stanford.edu/data/egonets-Facebook.html) | 4,039 | 88,234 | |
| 17 | + | [ICEWS](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QI2T9A) | 1,255 | 1,414 | |
| 18 | + | [PPI](snap.stanford.edu/graphsage/) | 56,944 | 818,786 | |
| 19 | + | [BR air-traffic](https://github.com/leoribeiro/struc2vec/tree/master/graph) | 131 | 1,038 | |
| 20 | + | [EU air-traffic](https://github.com/leoribeiro/struc2vec/tree/master/graph) | 399 | 5,995 | |
| 21 | + | [US air-traffic](https://github.com/leoribeiro/struc2vec/tree/master/graph) | 1,190 | 13,599 | |
| 22 | + | [DD6](https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets) | 4,152 | 20,640 | |
8 | 23 |
|
9 | 24 | The library requires *Python 3.7+*. |
10 | 25 |
|
11 | | -## Getting started |
| 26 | +## Installation and Usage |
12 | 27 |
|
13 | 28 | Make sure you are using *Python 3.7+* for all below! |
14 | 29 |
|
15 | | -### Installation |
16 | | -`python setup.py install` (TODO: Pip support will be added soon) |
17 | | - |
18 | | -### Import and load a dataset |
19 | | -```py |
20 | | -from semb.datasets import load, get_dataset_ids |
21 | | -# explore all datasets (both built in and extended by 3rd party) |
22 | | -ids = get_dataset_ids() |
23 | | -# load a dataset |
24 | | -graph = load(ids[0]) |
25 | | -``` |
26 | | - |
27 | | -### Import and load a method |
28 | | -```py |
29 | | -from semb.methods import load, get_method_ids |
30 | | -# explore all methods (both built in and extended by 3rd party) |
31 | | -ids = get_method_ids() |
32 | | -# load a method, returns a constructor for a method's base class |
33 | | -Method = load(ids[0]) |
34 | | -# create and run a method. |
35 | | -# NOTE: except for the first "graph" arg, everything other argument MUST be in keyword form! |
36 | | -method = Method(graph, a=1, b=2, c=3, ...) |
37 | | -method.train() |
38 | | -embeddings = method.get_embeddings() |
39 | | -``` |
| 30 | +`python setup.py install` |
| 31 | + |
| 32 | +After installation, we highly recommend you go through our [Tutorial](https://github.com/GemsLab/StrucEmbeddingLibrary/blob/master/Tutorial.ipynb) to see how SEMB library works. |
| 33 | + |
| 34 | + |
40 | 35 |
|
41 | 36 | ## Extending SEMB |
42 | 37 |
|
43 | 38 | First make sure the `semb` library is installed. |
44 | 39 |
|
45 | 40 | ### Developing 3rd party Dataset extension |
46 | 41 |
|
47 | | -- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form of `semb-dataset[$YOUR_CHOSEN_DATASET_ID]` |
| 42 | +Currently, SEMB only supports embedding and evaluation on *undirected* and *unweighted* graphs. |
| 43 | + |
| 44 | +- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form at `semb/datasets/[$YOUR_CHOSEN_DATASET_ID]` |
48 | 45 | - Within the package root directory, make sure `__init__.py` is present |
49 | 46 | - Create a `dataset.py` and make a `Method` class that inherits from `from semb.datasets import BaseDataset` and implement the required methods. See `semb/datasets/airports/dataset.py` for more details. |
| 47 | + - To use the built-in `load_dataset()`method, we accept the graph edgelist with the following format |
| 48 | + - `<Node1_id (int)> <Blank> <Node2_id (int)> <\n>` |
| 49 | + - Otherwise, you can overload and implement your own `load_dataset()` function. Please make sure that the returned graph is of `networkx.classes.graph.Graph` datatype. |
| 50 | + - If the dataset is accompanied by the label file, to use the built-in `load_label()` function, we accept the label file with the following format |
| 51 | + - `<Node_id (int)> <delimeter> <Node_label (int)>` |
| 52 | + - Otherwise, you can overload and implement your own `load_label()` function. Please make sure that the returned type is python built-in `dict()` with the key as `<Node_id (int)>` and value as `<Node_label (int)>` |
50 | 53 | - Install the package via `setup.py` or pip. |
51 | 54 | - Now the dataset is loadable by the main client program that uses `semb`! |
52 | 55 |
|
53 | 56 | ### Developing 3rd party Method extension |
54 | 57 |
|
55 | | -- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form of `semb-method[$YOUR_CHOSEN_METHOD_ID]` |
| 58 | +- Create a Python 3.7+ [package](https://packaging.python.org/tutorials/packaging-projects/) with a name in form of `semb/methods/[$YOUR_CHOSEN_METHOD_ID]` |
56 | 59 | - Within the package root directory, make sure `__init__.py` is present |
57 | | -- Create a `dataset.py` and make a `Dataset` class that inherits from `from semb.methods import BaseMethod` and implement the required methods. See `semb/methods/node2vec/method.py` for more details. |
| 60 | +- Create a ` method.py` and make a `Method` class that inherits from `from semb.methods import BaseMethod` and implement the required methods. See `semb/methods/node2vec/method.py` for more details. |
| 61 | + - Please make sure that your implemented method accepts `networkx.classes.graph.Graph` as input. |
| 62 | + - Please make sure that after `train()` is called, the `self.embeddings` should be a Python built-in `dict()` with key as `<Node_id (int)>` and value(embedding) as `<List(float)>`. |
58 | 63 | - Install the package via `setup.py` or pip. |
59 | 64 | - Now the method is load-able by the main client program that uses `semb`! |
60 | 65 |
|
61 | 66 | ### Note |
62 | 67 | For both `dataset` and `method` extensions, make sure the `get_id()` to be overridden and returns the same id as your chosen id in your package name. |
| 68 | + |
| 69 | + |
| 70 | + |
0 commit comments