Skip to content

Commit 0a4f8de

Browse files
lyuxiaosuxiaosuGW
andauthored
Sledgescale code (#387)
* replace http by eRPC * make it work: Sledge receives a rpc request and parse the header and payload successfully * first working version to integrate eRPC, but some part are hardcode * forgot to submit erpc_handler.h and debug.sh * add sending response packets to the client for any errors * add SIGINT handler to print out throughput latency data when press CTRL + C * enable macro SANDBOX_STATE_TOTALS * remove log * upload test scripts * upload id_rsa * make the first worker core id be configurable * create multiple listener threads with eRPC * upload parse_single.py * remove unreachable code * make listener threads put requests to worker queue direclty with RR * update start.sh * make runtime log write to memory and dump to a file when receive a SIGINT signal * change memory buffer size for logging * comment sandbox_state_totals_decrement(last_state) in sandbox_set_as_initialized.h to show the correct number of allocated sandbox * change fib.json to support multiple functions * uncomment SLEDGE_SANDBOX_PERF_LOG in start.sh * add LOG_RUNTIME_MEM_LOG definition in runtime/Makefile * 1. solve incomplete log printing when receive SIGINT 2. print out total local requests number of each worker thread * gracefully stop sledge when receive SIGINT * increase init length to 2560000 for local runqueue. * upload meet_deadline_percentage.py to tests * update meet_deadline_percentage.py * update meet_deadline_percentage.py * 1. add more fine-grained time cost for a request, such as clean up cost of the previous request2. preallocate two linear and stack memorys for each module in each worker thread * 1. move logging from sandbox_free to scheduler_cooperative_sched. 2. check sandbox state to make sure it is in complete or error state before logging to memory * fix bug: not longging in sandbox_perf_log.h * wronly comment unmask alarm signal in previous, just uncomment * remove one whitespace in printing log to memory * replace local rr_index with a global automic request_index to distribute requests to workers with RR * update meet_deadline_percentage.py and tests/start.sh * 1.Remove perf_window lock by let each worker thread maintain its own perf_windows. 2. Distribute requests based on the amount of work of each worker thread * implement new scheduling policy with binary search tree, but it has some bugs now * fix previous submit bug: calculating the total waiting serving time is wrong * remove debug log * fix bug: worker thread cannot receive sigalarm signal * implemented the main framework for DARC scheduling * update runtime/tests/start.sh * implemented DARC algo but have a bug that the finished work doesn't correctly notify the listener it is idle * fix the previous bug for DARC algo * update fib.json * add comment * 1. rename variables to make it more easy to understand. 2. add commment * shinjuku implementation with bugs * fix bug: worker and listener thread both call push_to_rqueue() to enqueue a request to request typed queue that has a race condition cause the bug. Another bug is the spot that a popped sandbox from in the request typed queue was not reset to NULL, causing crash bug * 1. forgot committing request_typed_queue.c 2. update scripts * add shinjuku_dispatch_different_core() but doesn't work due to different page mapping * fix bug: load imbalancing among worker threads * fix bug: check each worker's current task to see if it runs longer than 10us even total free workers is 0 * 1. remove some http setting code, would cause crash. 2. Send failed response when sandbox running failed. 3. Update debug.sh to disable timer preemption. 4. Update meet_deadline_percentage.py to calcuate total interrupts * update debug.sh * commit test_dequeue.cpp to tests folder * 1. fix bug: sandbox_set_as_running_sys() failed with assert sandbox state not in RUNNING_USER because sandbox state is set to PREEMPTED, but it hasn't been context swithed out. 2. When preempt a sandbox, worker enqueue the preempted sandbox to its local preempted fifo queue, dispatcher thread will move all sandbox from worker's preempted queue to its typed queue, thus reducing lock for the typed queue * fix bug: for shinjuku algo, when context switch to another sandbox, the previous one was deleted from the local runqueue, but still will re-exeucted by the current context switch after the signal handler returned * fix bug: for shinjuku algo, when it receives a sigalarm signal, the signal handler will do context switch to another sandbox. It first delete the sandbox from the local runqueue and get the next sandbox, howerver, sometimes, there is no more sandbox in the local runqueue, so the code just return and cause the bug that after sigal handler returns, the removed sandbox will be resumed immedaitely, when it finished, it try to remove itself from the local runqueue and couldn't find it. * 1. recover wrong change for assert failure at sandbox->state == SANDBOX_RUNNING_USER in sandbox_set_as_running_sys(). 2. update meet_deadline_percentage.py * recover global typed queue to one dimensional array because now only the dispatcher thread will put requests to it, workers won't do it * replace global typed queue with global typed deque which allows to insert delete in both side * fix bug: set sandbox->start_ts_running_user after sandbox_return which sets sandbox to running_user state and be preempted imediately because sandbox->start_ts_running_user is 0 * replace binary-search-tree local runqueue with circular-queue for shinjuku, no lock * replace binary-search-tree local runqueue with circular-queue for DARC, no lock * optimize edf-selected-interrupt algo: Before context switch to other sandbox, check if its remaining slack is equal or less than 0, if it is, do not interrupt it * forgot submit local_runqueue_circular_queue.h and local_runqueue_circular_queue.c * update meet_deadline_percentage.py * format code * solved variant TSC issue across CPU cores for shinjuku * update and upload scripts * add sledge extend abi for webassembly to get cpu cycles * Add more debug log, but comment them for later use * add test_tsc.c * precalculate the interrupt interval to cycles for shinjuku * add simulated exponential service time on sledge: when sledge get a parameter from client, it can know the execution time and deadline. Currently, the deadline is 10 times of the execution time * 1. change type 2's execution time and deadline for fib.json. 2. Modify meet_deadline_percentage.py to get deadline from the server log instead of hardcode. 3. Export SLEDGE_DISABLE_EXPONENTIAL_SERVICE_TIME_SIMULATION in start.sh and start_test.sh * 1. Add code to print out the maximum local queue length of each worker when exit. 2. Increase local queue maximum length to 4096 and global queue maximum length to 65535 for shinjuku. 3. For simulated exponential service time distribution, if the passed argument value is 1, let the execution time to 10, otherwise, multiply the loss rate 4. For simulated exponential distribution service time, send each request's pure cpu time as the response content to the client side * fix bug for shinjuku: The first worker of each listener thread has a longer queue then other workers, because each time, the iteration is from the first worker queue. Fixing this by using round robin to choose a different queue to iterate with * 1. choose the worker with the minimum total amount of work if more than one can be interrupted. 2. Fix bug for simulating expontential service time distribution: use the sandbox->estimated_cost instead of the perf_window when calculate the total amount of work * modify debug.sh * implement partial code for autoscaling: each worker will keep idle if queue is empty. Listener thread will wake up workers when adding new requests to its queue. This was implemented by condition variable. 2. Set main thread cpu affinity to core #1(DPDK control threads also pin to core #1). Listener threads from core #2 * add local_runqueue_circular_queue_is_empty_index for DARC and Shinjuku to support checking empty of a local queue with queue index * add SLEDGE_DISABLE_AUTOSCALING to enable or disable autoscaling * 1. fix bug: condition variable lost signal 2. Add semaphore to wake up worker * forgot submit runtime/include/runtime.h * add scaling up implementation and cpu monitoring script * add start_single_request.sh empty.json and parse_power_consumption.py * comment autoscaling logic code in listener_thread.c due to no benefit * update Makefile to let it checkout specified awsm code * update start.sh * update sledge main.c * upload ori_sledge_tests * update measure_old_sledge_cost.sh * update curl.sh * remove meet_deadline_percentage.py parse_single.py from ori_sledge_tests * rename parase_cost.py to parse_cost.py * update curl.sh and parse_cost.py in ori_sledge_tests * update start.sh: * 1. use request type id to locate object route and module. 2. Increase MODULE_DATABASE_CAPACITY from 128 to 1024. 3. Change log of sandbox_perf_log.h 4. Commend code 'explicit_bzero(wasm_stack->low, wasm_stack->capacity)' when reclaim sandbox stack memory because it will hurt performance too much as it access every bit of the memory * set sandbox stack pointer to sandbox->wasm_stack to let it only free the correct size of stack * upload increase_req_type.patch * update increase_req_type.patch * upload delete_patch.sh and apply_patch.sh * upload http_router.c,copy_func_so.sh,and generate_json.py * upload start_func_density_test.sh * update start_test.sh * update Makefile * 1.commend explicit_bzero stack memory when sandbox exit 2. update start_test.sh * fix DARC bug: the long requests and short requests share cpu cores and not seperate * update fib.json * upload dummy_func_DARC.json dummy_func_EDF_SHINJUKU.json * upload tests/config.json * fix implementation bug: not use binary search tree property to search the node * rename dummy_func_DARC.json to dummy_tpcc_DARC.json, dummy_func_EDF_SHINJUKU.json to dummy_tpcc_EDF_SHINJUKU.json * add some debug code * update debug.sh * update increase_req_type.patch * replace the memory pool in each module with a global shared reused memory pool, so different typs of requests can reuse the claimed sandboxs' memory * update tests/start_func_density_test.sh * update start_func_density_test.sh and upload generate_json_with_replica_field.py * update increase_req_type.patch * upload binary_search_tree.h_redblacktree local_runqueue_binary_tree.c_redblacktree * for binary search tree to get_next: when tree length is 1, then return root without locking directly * update sandbox_perf_log.h * update ori_sledge_tests/start.sh * update ori_sledge_tests/start.sh * update ori_sledge_tests/start.sh, upload measure_throughput.sh * update ori_sledge_tests/start.sh * upload sed_json.sh * update hash.json * upload tests/hash_high_bimodal.json and update binary_search_tree.h_redblacktree, local_runqueue_binary_tree.c_redblacktree * use remaining execution time instead of estimated execution time to calculate the total waiting time for a new request inserted to a local queueu, this only works for EDF_INTERRUPT scheduling algorithm * update runtime/src/software_interrupt.c to wakeup_worker and sem_post only when runtime_worker_busy_loop_enabled is false * update curl.sh and measure_old_sledge_cost.sh of folder ori_sledge_tests * update curl.sh measure_old_sledge_cost.sh and parse_cost.py * upload high_bimodal_realapps.json * update tests/high_bimodal_realapps.json * update scripts * upload monitor_mem.sh kill_perf.sh run_perf.sh * upload parse_batch_profile.py * upload vision_apps.json * change DARC core reservation way by group id not request id * 1. check valiadation of group-id and n-resas if using DARC. 2. add group-id to all .json files * update increase_req_type.patch * update vision_apps.json * update vision_apps.json * update dummy_tpcc.json * update log info for each sandbox * update meet_deadline_percentage.py and parse_batch.py * fix bug: forget to assign admission_info.uid, assign it with request type id * update increase_req_type.patch * add comment code for shinjuku, won't changge performance * add total received requests count log * update parse script, upload scp.sh * upload vision_apps_same_6apps.json and apps_Makefile * upload change_vision_apps_json.sh, update parse_batch.py * 1. fix bug: replace calloc with aligned_alloc to allocate struct memory_pool to avoid crash. 2. fix bug: deque access will exceed its bounds, fixed it. 3. add feature: support multiple dispatchers with one global queue(FIFO). All dispatchers put requests to the global queue and each worker get requests from it * update start_test.sh and debug.sh * update debug.sh, run_perf.sh, start_func_density_test.sh and Makefile * update increase_req_type.patch * update run_perf.sh * add thread name for dispatcher and worker * support round robin distributing and EDF scheduling * add JSQ and LLD distribution with EDF scheduling * implement the second version of JSQ, RR, LLD: Dispatcher assigns requests to workers with JSQ, LLD, or RR + interruption and each worker schedule its local queue with EDF, no self interruption * update parse_batch.py * upload compare_dispatchers.sh * upload vision_apps_dispatcher.json * add new support for LLD + FIFO: Dispatcher assign requests to workers with LLD and without preempt, worker schedule its local queue task with Round Robin and timer interrupt * 1. replace Semaphore with condition variable to avoid signal missing and decrase performance issue. 2. Add checking runtime configuration variable validation * update start_test.sh * Recover to use Semaphore since it has a better performance than condition variable based on the test * upload get_ctx.sh * update start_func_density_test.sh and compare_dispatchers.sh * update increase_req_type.patch * make getting batch size from FIFO queue as a configurable variable * update test scripts and vision_apps_dispatcher.json * update compare_dispatchers.sh * upload applications_Makefile_patch * update debug.sh * change dispatcher - RR, JSQ, and LLD to use original algorithms working with each worker with fixed interval interrupt + EDF * update applications_Makefile_patch * add max local queue count for minheap runqueue * upload parse_avg_throughput.sh and measure_throughput.sh * 1. add eRPC as a sub-module in .gitmodules. 2. update Makefile * update .gitmodules * Add new submodule eRPC * update Makefile * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: xiaosuGW <xiaosuGW@localhost>
1 parent 5e25580 commit 0a4f8de

File tree

156 files changed

+11363
-461
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

156 files changed

+11363
-461
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,6 @@
1212
[submodule "jsmn"]
1313
path = runtime/thirdparty/jsmn
1414
url = https://github.com/gwsystems/jsmn.git
15+
[submodule "eRPC"]
16+
path = eRPC
17+
url = https://github.com/lyuxiaosu/eRPC

Makefile

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
SHELL:=/bin/bash
22

33
.PHONY: all
4-
all: awsm libsledge runtime applications
4+
all: awsm erpc libsledge runtime applications
55

66
.PHONY: clean
7-
clean: awsm.clean libsledge.clean runtime.clean applications.clean
7+
clean: awsm.clean erpc.clean libsledge.clean runtime.clean applications.clean
88

99
.PHONY: submodules
1010
submodules:
@@ -16,7 +16,7 @@ install: submodules wasm_apps all
1616
# aWsm: the WebAssembly to LLVM bitcode compiler
1717
.PHONY: awsm
1818
awsm:
19-
cd awsm && cargo build --release
19+
cd awsm && git checkout f0b35e756395f79b06be8dd2660eecac94506e94 && cargo build --release
2020

2121
.PHONY: awsm.clean
2222
awsm.clean:
@@ -31,6 +31,16 @@ libsledge:
3131
libsledge.clean:
3232
make -C libsledge clean
3333

34+
.PHONY: erpc
35+
erpc:
36+
@echo "Building eRPC interface..."
37+
cd eRPC/c_interface && ./build.sh
38+
@echo "eRPC build complete."
39+
40+
.PHONY: erpc.clean
41+
erpc.clean:
42+
cd eRPC/c_interface && make clean
43+
3444
# sledgert: the runtime that executes *.so modules
3545
.PHONY: runtime
3646
runtime:
@@ -52,7 +62,7 @@ applications.clean:
5262

5363
# Instead of having two copies of wasm_apps, just link to the awsm repo's copy
5464
wasm_apps:
55-
ln -sr awsm/applications/wasm_apps/ applications/
65+
cd awsm/applications/wasm_apps && git checkout master && cd ../../../ && ln -sr awsm/applications/wasm_apps/ applications/
5666

5767
# Tests
5868
.PHONY: test

README.md

Lines changed: 178 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -1,160 +1,220 @@
1-
# SLEdge
2-
3-
**SLEdge** is a lightweight serverless solution suitable for edge computing. It builds on WebAssembly sandboxing provided by the [aWsm compiler](https://github.com/gwsystems/aWsm).
4-
5-
## Setting up a development environment
6-
7-
### Native on Debian Host
8-
9-
```sh
10-
git clone https://github.com/gwsystems/sledge-serverless-framework.git
11-
cd sledge-serverless-framework
12-
./install_deb.sh
13-
source ~/.bashrc
14-
make install
15-
make test
16-
```
17-
18-
### Docker
19-
20-
**Note: These steps require Docker. Make sure you've got it installed!**
21-
22-
[Docker Installation Instructions](https://docs.docker.com/install/)
23-
24-
We provide a Docker build environment configured with the dependencies and toolchain needed to build the SLEdge runtime and serverless functions.
25-
26-
To setup this environment, run:
27-
28-
```bash
29-
./devenv.sh setup
30-
```
31-
32-
### Using the Docker container to compile your serverless functions
33-
34-
To enter the docker environment, run:
35-
36-
```bash
37-
./devenv.sh run
38-
```
39-
40-
The first time you enter this environment, run the following to copy the sledgert binary to /sledge/runtime/bin.
41-
42-
```bash
43-
cd /sledge/runtime
44-
make clean all
45-
```
46-
1+
# SLEdgeScale
2+
3+
**SLEdgeScale** is an ultra-low-latency, high-density, and task-deadline-aware serverless computing solution suitable for edge environments, extending **SLEdge**. It leverages WebAssembly sandboxing provided by the [aWsm compiler](https://github.com/gwsystems/aWsm) and kernel-bypass RPC offered by [eRPC](https://github.com/erpc-io/eRPC).
4+
5+
## Setting up a development environment (Native on Debian Host)
6+
SLEdgeScale was developed and tested on [Cloudlab](https://www.cloudlab.us) nodes (d6515) equipped with Mellanox NICs. A public [profile](https://www.cloudlab.us/p/GWCloudLab/sledge-rpc2) is available on CloudLab for easily creating a development environment for eRPC with node d6515. If you plan to set up the environment on machines with Intel NICs or other machines, please refer to the [eRPC](https://github.com/erpc-io/eRPC) repository for details about environment configuration, including driver and DPDK installation. For Mellanox NICs, please follow [this](https://docs.nvidia.com/networking/display/mlnxofedv590560125/installing+mlnx_ofed) guide to install *MLNX_OFED*.
7+
8+
For using CloudLab profile to create the development environment:
9+
Choose the [profile](https://www.cloudlab.us/p/GWCloudLab/sledge-rpc2) and use the following configuration:
10+
Number of Nodes: 2
11+
Select OS image: SLEDGE
12+
Optional physical node type : d6515
13+
14+
Now the environment is prepared for eRPC. The following steps are to build and install SLEdgeScale:
15+
1. Git clone this repo and checkout branch *compare_dispatchers*
16+
2. Extend the root filesystem:
17+
```sh
18+
cd sledge-serverless-framework/runtime/tests
19+
./add_partition.sh
20+
```
21+
4. Move `sledge-serverless-framework` to `/my_mount/`
22+
5. Disable multiple threads:
23+
```sh
24+
cd sledge-serverless-framework/runtime/tests
25+
sudo ./no_hyperthreads.sh
26+
```
27+
7. Build:
28+
```sh
29+
cd sledge-serverless-framework
30+
./install_deb.sh
31+
source $HOME/.cargo/env
32+
source ~/.bashrc
33+
make install
34+
```
4735
There are a set of benchmarking applications in the `/sledge/applications` directory. Run the following to compile all benchmarks runtime tests using the aWsm compiler and then copy all resulting `<application>.wasm.so` files to /sledge/runtime/bin.
4836

4937
```bash
5038
cd /sledge/applications/
5139
make clean all
5240
```
5341

54-
You now have everything that you need to execute your first serverless function on SLEdge
42+
All binary files are generated in `sledge-serverless-framework/runtime/bin`. You now have everything that you need to execute your first serverless function on SLEdgeScale
5543

56-
To exit the container:
44+
## Running your first serverless function
5745

58-
```bash
59-
exit
60-
```
46+
An SLEdgeScale serverless function consists of a shared library (\*.so) and a JSON configuration file that determines how the runtime should execute the serverless function. We first need to prepare this configuration file. As an example, here is the configuration file for our sample fibonacci function:
6147

62-
To stop the Docker container:
48+
```json
49+
[
50+
{
51+
"name": "gwu",
52+
"port": 31850,
53+
"replenishment-period-us": 0,
54+
"max-budget-us": 0,
55+
"routes": [
56+
{
57+
"route": "/fib",
58+
"request-type": 1,
59+
"n-resas": 1,
60+
"group-id": 1,
61+
"path": "fibonacci.wasm.so",
62+
"admissions-percentile": 70,
63+
"expected-execution-us": 5,
64+
"relative-deadline-us": 50,
65+
"http-resp-content-type": "text/plain"
66+
}
67+
]
68+
69+
}
6370

64-
```bash
65-
./devenv.sh stop
71+
]
6672
```
6773

68-
### Deleting Docker Build Containers
69-
70-
If you are finished working with the SLEdge runtime and wish to remove it, run the following command to delete our Docker build and runtime images.
74+
`port`:Refers to the UDP port.
75+
`request-type` and `path`: Used to determine which serverless function will be served; `request-type` must be unique per function.
76+
`route`: An inherited field from SLEdge. It is not used currently but is kept to avoid parse errors.
77+
`n-resas`: Specifies the number of CPU cores reserved for this serverless function. It is used by the DARC algorithm.
78+
`group-id`: Specifies the group identifier used in the DARC algorithm.
79+
`expected-execution-us`: Currently not used. SLEdgeScale will estimate execution time online.
80+
`relative-deadline-us`: Specifies the request deadline in microseconds.
81+
`http-resp-content-type`: Not used currently but is kept to avoid parse errors.
7182

72-
```bash
73-
./devenv.sh rma
74-
```
83+
### Start the SLEdgeScale Server
84+
First, set the public IPs and ports for eRPC. Open `sledge-serverless-framework/eRPC/scripts/autorun_process_file` — the first line specifies the server IP and port, and the second line specifies the client IP and port. Make sure to apply the same change on the client machine as well.
7585

76-
And then simply delete this repository.
86+
Then we need to export some environment variables before start the server. The commonly used environment variables are:
7787

78-
## Running your first serverless function
88+
`SLEDGE_DISABLE_PREEMPTION`: Disables the timer that sends a SIGALRM signal every 5 ms for preemption. Must disable in SLEdgeScale.
7989

80-
An SLEdge serverless function consists of a shared library (\*.so) and a JSON configuration file that determines how the runtime should execute the serverless function. As an example, here is the configuration file for our sample fibonacci function:
90+
`SLEDGE_DISPATCHER`: Specifies the dispatcher policy. There are seven types of dispatchers:
91+
- SHINJUKU: Requests are enqueued to each dispatcher's typed queue.
92+
- EDF_INTERRUPT: The dispatcher policy used by SLEdgeScale.
93+
- DARC: Requests are enqueued to each dispatcher's typed queue.
94+
- LLD: The dispatcher selects the worker with the least loaded queue to enqueue a request..
95+
- TO_GLOBAL_QUEUE: The dispatcher policy used by SLEdge. All dispatchers enqueue requests to a global queue.
96+
- RR: The dispatcher selects a worker in a round-robin fashion.
97+
- JSQ: The dispatcher selects the worker with the shortest queue to enqueue a request.
98+
99+
`SLEDGE_DISABLE_GET_REQUESTS_FROM_GQ`: Disable workers fetching requests from the global queue. Must be disabled if the dispatcher policy is not set to TO_GLOBAL_QUEUE.
81100

82-
```json
83-
[
84-
{
85-
"name": "GWU",
86-
"port": 10010,
87-
"routes": [
88-
{
89-
"route": "/fib",
90-
"path": "fibonacci.wasm.so",
91-
"expected-execution-us": 6000,
92-
"relative-deadline-us": 20000,
93-
"http-resp-content-type": "text/plain"
94-
}
95-
]
96-
}
97-
]
101+
`SLEDGE_SCHEDULER`: Specifies the scheduler policy. There are two types of schedulers:
102+
- FIFO: First-In-First-Out. Must use the TO_GLOBAL_QUEUE dispatch policy when using FIFO.
103+
- EDF: Earliest-deadline-first.
104+
105+
`SLEDGE_FIFO_QUEUE_BATCH_SIZE`: When using the FIFO scheduler, specifies how many requests are fetched from the global queue to the local queue each time the local queue becomes empty.
98106

99-
```
107+
`SLEDGE_DISABLE_BUSY_LOOP`: Disables the worker’s busy loop for fetching requests from the local or global queue. The busy loop must be enabled if the dispatcher policy is set to `TO_GLOBAL_QUEUE`.
100108

101-
The `port` and `route` fields are used to determine the path where our serverless function will be served served.
109+
`SLEDGE_DISABLE_AUTOSCALING`: Currently not used;always set to `true`.
102110

103-
In our case, we are running the SLEdge runtime on localhost, so our function is available at `localhost:10010/fib`.
111+
`SLEDGE_DISABLE_EXPONENTIAL_SERVICE_TIME_SIMULATION`: For the `hash` function, enabling this option allows SLEdgeScale to estimate the function’s execution time based on the input number. For other types of functions, this should be disabled.
104112

105-
Our fibonacci function will parse a single argument from the HTTP POST body that we send. The expected Content-Type is "text/plain".
113+
`SLEDGE_FIRST_WORKER_COREID`: Specifies the ID of the first core for the worker thread. Cores 0–2 are reserved, so numbering should start from 3.
106114

107-
Now that we understand roughly how the SLEdge runtime interacts with serverless function, let's run Fibonacci!
115+
`SLEDGE_NWORKERS`: The total number of workers in the system.
108116

109-
The fastest way to check it out is just to click on the following URL on your Web browser: [http://localhost:10010/fib?10](http://localhost:10010/fib?10)
117+
`SLEDGE_NLISTENERS`: The total number of dispachers in the system.
110118

111-
From the root project directory of the host environment (not the Docker container!), navigate to the binary directory
119+
`SLEDGE_WORKER_GROUP_SIZE`: The number of workers in each worker group. Its value is equal to SLEDGE_NWORKERS / SLEDGE_NLISTENERS
112120

113-
```bash
114-
cd runtime/bin/
115-
```
121+
`SLEDGE_SANDBOX_PERF_LOG`: Server log file path
116122

117-
Now run the sledgert binary, passing the JSON file of the serverless function we want to serve. Because serverless functions are loaded by SLEdge as shared libraries, we want to add the `applications/` directory to LD_LIBRARY_PATH.
123+
Now run the sledgert binary with the following script using sudo, passing the JSON file (e.g., the above Fibonacci function configuration) of the serverless function we want to serve. Because serverless functions are loaded by SLEdgeScale as shared libraries, we want to add the `applications/` directory to LD_LIBRARY_PATH:
118124

119-
```bash
120-
LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH" ./sledgert ../../tests/fibonacci/bimodal/spec.json
125+
```sh
126+
#!/bin/bash
127+
128+
declare project_path="$(
129+
cd "$(dirname "$0")/../.."
130+
pwd
131+
)"
132+
echo $project_path
133+
path=`pwd`
134+
export SLEDGE_DISABLE_PREEMPTION=true
135+
export SLEDGE_DISABLE_GET_REQUESTS_FROM_GQ=true
136+
export SLEDGE_FIFO_QUEUE_BATCH_SIZE=5
137+
export SLEDGE_DISABLE_BUSY_LOOP=true
138+
export SLEDGE_DISABLE_AUTOSCALING=true
139+
export SLEDGE_DISABLE_EXPONENTIAL_SERVICE_TIME_SIMULATION=true
140+
export SLEDGE_FIRST_WORKER_COREID=3
141+
export SLEDGE_NWORKERS=1
142+
export SLEDGE_NLISTENERS=1
143+
export SLEDGE_WORKER_GROUP_SIZE=1
144+
export SLEDGE_SCHEDULER=EDF
145+
export SLEDGE_DISPATCHER=EDF_INTERRUPT
146+
export SLEDGE_SANDBOX_PERF_LOG=$path/server.log
147+
148+
cd $project_path/runtime/bin
149+
LD_LIBRARY_PATH="$(pwd):$LD_LIBRARY_PATH" ./sledgert ../tests/fib.json
121150
```
151+
### Start the client to send requests
152+
First git clone the client code:
153+
```sh
154+
git clone https://github.com/lyuxiaosu/eRPC.git
155+
```
156+
There are several client implementations under eRPC/apps, and you can also create your own customized client. In our setup, we use `openloop_client`, which is an open-loop client that sends requests following a Poisson distribution.
157+
Edit `autorun_app_file`, `autorun_process_file`, and build:
158+
```sh
159+
cd eRPC
160+
echo "openloop_client" > ./scripts/autorun_app_file
161+
./build.sh
162+
```
163+
Our fibonacci function will parse a single argument from the rpc request that we send. Create the configuration file `eRPC/apps/openloop_client/conf` for `openloop_client`:
164+
```
165+
--test_ms 10000
166+
--sm_verbose 0
167+
--num_server_threads 1
168+
--window_size 10
169+
--req_size 5
170+
--resp_size 32
171+
--num_processes 2
172+
--numa_0_ports 0
173+
--numa_1_ports 1,3
174+
--req_type 1
175+
--rps 1000
176+
--req_parameter 20
177+
--warmup_rps 200
178+
```
179+
`test_ms`: Define the test duration time in milliseconds.
122180

123-
While you don't see any output to the console, the runtime is running in the foreground.
124-
125-
Let's now invoke our serverless function to compute the 10th fibonacci number. We'll use `cURL` and [HTTPie](https://httpie.org/) to send a HTTP GET and POST requests with the parameter we want to pass to my serverless function. Feel free to use whatever other network client you prefer!
126-
127-
Open a **new** terminal session and execute the following
181+
`num_server_threads`: Specifies how many dispatcher threads to run on the server.
128182

129-
```bash
130-
# HTTP GET method:
131-
http localhost:10010/fib?10
132-
curl localhost:10010/fib?10
183+
`req_size`: The size of the request package in bytes
133184

134-
# HTTP POST method:
135-
echo "10" | http POST localhost:10010/fib
136-
curl -i -d 10 localhost:10010/fib
137-
```
185+
`resp_size`: The size of the response package in bytes
138186

139-
You should receive the following in response. The serverless function says that the 10th fibonacci number is 55, which seems to be correct!
187+
`req_type`: The request type
140188

141-
```bash
142-
HTTP/1.1 200 OK
143-
Server: SLEdge
144-
Connection: close
145-
Content-Type: text/plain
146-
Content-Length: 3
189+
`req_parameter`: The parameter carried by the request. Here is the fibonacci number.
147190

148-
55
149-
```
150191

151-
When done, terminal the SLEdge runtime with `Ctrl+c`
192+
Now we have everything, let's run Fibonacci!
152193

153-
## Running Test Workloads
194+
```sh
195+
cd eRPC
196+
./scripts/do.sh 1 0
154197

155-
Various synthetic and real-world tests can be found in `runtime/tests`. Generally, each experiment can be run by Make rules in the top level `test.mk`.
198+
```
199+
The results is saved at `client.log`:
200+
```
201+
thread id, type id, latency, cpu time
202+
0 1 64.492000 23
203+
0 1 46.649000 23
204+
0 1 45.806000 23
205+
0 1 45.877000 22
206+
0 1 45.416000 22
207+
...
208+
```
209+
The first column is the thread ID, the second column is the request type, the third column is the end-to-end latency in microseconds, and the fourth column is the execution time in microseconds.
156210

157-
`make -f test.mk all`
211+
### High Density Test
212+
Since the High Density experiment involves a large number of RPC types, we need to modify the maximum number of RPC types supported by eRPC, as well as some parts of the SLEdgeScale code. These changes are temporary and not part of the permanent code base.
213+
Please run:
214+
```
215+
./apply_patch.sh
216+
```
217+
in the `eRPC` directory (on both the client and server sides) and in the `runtime` directory, and then recompile `eRPC` and the `runtime`.
158218

159219
## Problems or Feedback?
160220

0 commit comments

Comments
 (0)