Commit 0a4f8de
Sledgescale code (#387)
* replace http by eRPC
* make it work: Sledge receives a rpc request and parse the header and payload successfully
* first working version to integrate eRPC, but some part are hardcode
* forgot to submit erpc_handler.h and debug.sh
* add sending response packets to the client for any errors
* add SIGINT handler to print out throughput latency data when press CTRL + C
* enable macro SANDBOX_STATE_TOTALS
* remove log
* upload test scripts
* upload id_rsa
* make the first worker core id be configurable
* create multiple listener threads with eRPC
* upload parse_single.py
* remove unreachable code
* make listener threads put requests to worker queue direclty with RR
* update start.sh
* make runtime log write to memory and dump to a file when receive a SIGINT signal
* change memory buffer size for logging
* comment sandbox_state_totals_decrement(last_state) in sandbox_set_as_initialized.h to show the correct number of allocated sandbox
* change fib.json to support multiple functions
* uncomment SLEDGE_SANDBOX_PERF_LOG in start.sh
* add LOG_RUNTIME_MEM_LOG definition in runtime/Makefile
* 1. solve incomplete log printing when receive SIGINT 2. print out total local requests number of each worker thread
* gracefully stop sledge when receive SIGINT
* increase init length to 2560000 for local runqueue.
* upload meet_deadline_percentage.py to tests
* update meet_deadline_percentage.py
* update meet_deadline_percentage.py
* 1. add more fine-grained time cost for a request, such as clean up cost of the previous request2. preallocate two linear and stack memorys for each module in each worker thread
* 1. move logging from sandbox_free to scheduler_cooperative_sched. 2. check sandbox state to make sure it is in complete or error state before logging to memory
* fix bug: not longging in sandbox_perf_log.h
* wronly comment unmask alarm signal in previous, just uncomment
* remove one whitespace in printing log to memory
* replace local rr_index with a global automic request_index to distribute requests to workers with RR
* update meet_deadline_percentage.py and tests/start.sh
* 1.Remove perf_window lock by let each worker thread maintain its own perf_windows. 2. Distribute requests based on the amount of work of each worker thread
* implement new scheduling policy with binary search tree, but it has some bugs now
* fix previous submit bug: calculating the total waiting serving time is wrong
* remove debug log
* fix bug: worker thread cannot receive sigalarm signal
* implemented the main framework for DARC scheduling
* update runtime/tests/start.sh
* implemented DARC algo but have a bug that the finished work doesn't correctly notify the listener it is idle
* fix the previous bug for DARC algo
* update fib.json
* add comment
* 1. rename variables to make it more easy to understand.
2. add commment
* shinjuku implementation with bugs
* fix bug: worker and listener thread both call push_to_rqueue() to enqueue a request to request typed queue that has a race condition cause the bug. Another bug is the spot that a popped sandbox from in the request typed queue was not reset to NULL, causing crash bug
* 1. forgot committing request_typed_queue.c 2. update scripts
* add shinjuku_dispatch_different_core() but doesn't work due to different page mapping
* fix bug: load imbalancing among worker threads
* fix bug: check each worker's current task to see if it runs longer than 10us
even total free workers is 0
* 1. remove some http setting code, would cause crash.
2. Send failed response when sandbox running failed.
3. Update debug.sh to disable timer preemption.
4. Update meet_deadline_percentage.py to calcuate total interrupts
* update debug.sh
* commit test_dequeue.cpp to tests folder
* 1. fix bug: sandbox_set_as_running_sys() failed with assert sandbox state not in RUNNING_USER
because sandbox state is set to PREEMPTED, but it hasn't been context swithed out.
2. When preempt a sandbox, worker enqueue the preempted sandbox to its local preempted fifo queue,
dispatcher thread will move all sandbox from worker's preempted queue to its typed queue, thus
reducing lock for the typed queue
* fix bug: for shinjuku algo, when context switch to another sandbox, the previous one was
deleted from the local runqueue, but still will re-exeucted by the current context switch after the signal handler returned
* fix bug: for shinjuku algo, when it receives a sigalarm signal, the signal handler will do context switch to another
sandbox. It first delete the sandbox from the local runqueue and get the next sandbox, howerver, sometimes, there is no
more sandbox in the local runqueue, so the code just return and cause the bug that after sigal handler returns, the
removed sandbox will be resumed immedaitely, when it finished, it try to remove itself from the local runqueue and
couldn't find it.
* 1. recover wrong change for assert failure at sandbox->state == SANDBOX_RUNNING_USER
in sandbox_set_as_running_sys().
2. update meet_deadline_percentage.py
* recover global typed queue to one dimensional array because now only the dispatcher thread will
put requests to it, workers won't do it
* replace global typed queue with global typed deque which allows to insert delete in both side
* fix bug: set sandbox->start_ts_running_user after sandbox_return which sets sandbox
to running_user state and be preempted imediately because sandbox->start_ts_running_user is 0
* replace binary-search-tree local runqueue with circular-queue for shinjuku, no lock
* replace binary-search-tree local runqueue with circular-queue for DARC, no lock
* optimize edf-selected-interrupt algo: Before context switch to other sandbox, check if
its remaining slack is equal or less than 0, if it is, do not interrupt it
* forgot submit local_runqueue_circular_queue.h and local_runqueue_circular_queue.c
* update meet_deadline_percentage.py
* format code
* solved variant TSC issue across CPU cores for shinjuku
* update and upload scripts
* add sledge extend abi for webassembly to get cpu cycles
* Add more debug log, but comment them for later use
* add test_tsc.c
* precalculate the interrupt interval to cycles for shinjuku
* add simulated exponential service time on sledge: when sledge get a parameter from client,
it can know the execution time and deadline. Currently, the deadline is 10 times of the execution time
* 1. change type 2's execution time and deadline for fib.json.
2. Modify meet_deadline_percentage.py to get deadline from the server log instead of hardcode.
3. Export SLEDGE_DISABLE_EXPONENTIAL_SERVICE_TIME_SIMULATION in start.sh and start_test.sh
* 1. Add code to print out the maximum local queue length of each worker when exit.
2. Increase local queue maximum length to 4096 and global queue maximum length to 65535 for shinjuku.
3. For simulated exponential service time distribution, if the passed argument value is 1, let the execution time to 10, otherwise, multiply the loss rate
4. For simulated exponential distribution service time, send each request's pure cpu time as the response content to the client side
* fix bug for shinjuku: The first worker of each listener thread has a longer queue then other workers, because each time, the iteration is from the first worker queue. Fixing this by using round robin to choose a different queue to iterate with
* 1. choose the worker with the minimum total amount of work if more than one can be interrupted.
2. Fix bug for simulating expontential service time distribution: use the sandbox->estimated_cost instead of the perf_window when calculate the total amount of work
* modify debug.sh
* implement partial code for autoscaling: each worker will keep idle if queue is empty.
Listener thread will wake up workers when adding new requests to its queue. This was implemented by condition variable.
2. Set main thread cpu affinity to core #1(DPDK control threads also pin to core #1). Listener threads from core #2
* add local_runqueue_circular_queue_is_empty_index for DARC and Shinjuku to support checking empty of a local queue with queue index
* add SLEDGE_DISABLE_AUTOSCALING to enable or disable autoscaling
* 1. fix bug: condition variable lost signal
2. Add semaphore to wake up worker
* forgot submit runtime/include/runtime.h
* add scaling up implementation and cpu monitoring script
* add start_single_request.sh empty.json and parse_power_consumption.py
* comment autoscaling logic code in listener_thread.c due to no benefit
* update Makefile to let it checkout specified awsm code
* update start.sh
* update sledge main.c
* upload ori_sledge_tests
* update measure_old_sledge_cost.sh
* update curl.sh
* remove meet_deadline_percentage.py parse_single.py from ori_sledge_tests
* rename parase_cost.py to parse_cost.py
* update curl.sh and parse_cost.py in ori_sledge_tests
* update start.sh:
* 1. use request type id to locate object route and module.
2. Increase MODULE_DATABASE_CAPACITY from 128 to 1024.
3. Change log of sandbox_perf_log.h
4. Commend code 'explicit_bzero(wasm_stack->low, wasm_stack->capacity)' when reclaim sandbox stack memory
because it will hurt performance too much as it access every bit of the memory
* set sandbox stack pointer to sandbox->wasm_stack to let it only free the correct size of stack
* upload increase_req_type.patch
* update increase_req_type.patch
* upload delete_patch.sh and apply_patch.sh
* upload http_router.c,copy_func_so.sh,and generate_json.py
* upload start_func_density_test.sh
* update start_test.sh
* update Makefile
* 1.commend explicit_bzero stack memory when sandbox exit
2. update start_test.sh
* fix DARC bug: the long requests and short requests share cpu cores and not seperate
* update fib.json
* upload dummy_func_DARC.json dummy_func_EDF_SHINJUKU.json
* upload tests/config.json
* fix implementation bug: not use binary search tree property to search the node
* rename dummy_func_DARC.json to dummy_tpcc_DARC.json, dummy_func_EDF_SHINJUKU.json to dummy_tpcc_EDF_SHINJUKU.json
* add some debug code
* update debug.sh
* update increase_req_type.patch
* replace the memory pool in each module with a global shared reused memory pool, so different typs of requests can reuse the claimed sandboxs' memory
* update tests/start_func_density_test.sh
* update start_func_density_test.sh and upload generate_json_with_replica_field.py
* update increase_req_type.patch
* upload binary_search_tree.h_redblacktree local_runqueue_binary_tree.c_redblacktree
* for binary search tree to get_next: when tree length is 1, then return root without locking directly
* update sandbox_perf_log.h
* update ori_sledge_tests/start.sh
* update ori_sledge_tests/start.sh
* update ori_sledge_tests/start.sh, upload measure_throughput.sh
* update ori_sledge_tests/start.sh
* upload sed_json.sh
* update hash.json
* upload tests/hash_high_bimodal.json and update binary_search_tree.h_redblacktree, local_runqueue_binary_tree.c_redblacktree
* use remaining execution time instead of estimated execution time to calculate the total waiting time for a new request inserted to a local queueu, this only works for EDF_INTERRUPT scheduling algorithm
* update runtime/src/software_interrupt.c to wakeup_worker and sem_post only when runtime_worker_busy_loop_enabled is false
* update curl.sh and measure_old_sledge_cost.sh of folder ori_sledge_tests
* update curl.sh measure_old_sledge_cost.sh and parse_cost.py
* upload high_bimodal_realapps.json
* update tests/high_bimodal_realapps.json
* update scripts
* upload monitor_mem.sh kill_perf.sh run_perf.sh
* upload parse_batch_profile.py
* upload vision_apps.json
* change DARC core reservation way by group id not request id
* 1. check valiadation of group-id and n-resas if using DARC.
2. add group-id to all .json files
* update increase_req_type.patch
* update vision_apps.json
* update vision_apps.json
* update dummy_tpcc.json
* update log info for each sandbox
* update meet_deadline_percentage.py and parse_batch.py
* fix bug: forget to assign admission_info.uid, assign it with request type id
* update increase_req_type.patch
* add comment code for shinjuku, won't changge performance
* add total received requests count log
* update parse script, upload scp.sh
* upload vision_apps_same_6apps.json and apps_Makefile
* upload change_vision_apps_json.sh, update parse_batch.py
* 1. fix bug: replace calloc with aligned_alloc to allocate struct memory_pool to avoid crash.
2. fix bug: deque access will exceed its bounds, fixed it.
3. add feature: support multiple dispatchers with one global queue(FIFO). All dispatchers put requests to the global queue and each worker get requests from it
* update start_test.sh and debug.sh
* update debug.sh, run_perf.sh, start_func_density_test.sh and Makefile
* update increase_req_type.patch
* update run_perf.sh
* add thread name for dispatcher and worker
* support round robin distributing and EDF scheduling
* add JSQ and LLD distribution with EDF scheduling
* implement the second version of JSQ, RR, LLD: Dispatcher assigns requests to workers with JSQ, LLD, or RR + interruption and each worker schedule its local queue with EDF, no self interruption
* update parse_batch.py
* upload compare_dispatchers.sh
* upload vision_apps_dispatcher.json
* add new support for LLD + FIFO: Dispatcher assign requests to workers with LLD and without preempt, worker schedule its local queue task with Round Robin and timer interrupt
* 1. replace Semaphore with condition variable to avoid signal missing and decrase performance issue.
2. Add checking runtime configuration variable validation
* update start_test.sh
* Recover to use Semaphore since it has a better performance than condition variable based on the test
* upload get_ctx.sh
* update start_func_density_test.sh and compare_dispatchers.sh
* update increase_req_type.patch
* make getting batch size from FIFO queue as a configurable variable
* update test scripts and vision_apps_dispatcher.json
* update compare_dispatchers.sh
* upload applications_Makefile_patch
* update debug.sh
* change dispatcher - RR, JSQ, and LLD to use original algorithms working with each worker with fixed interval interrupt + EDF
* update applications_Makefile_patch
* add max local queue count for minheap runqueue
* upload parse_avg_throughput.sh and measure_throughput.sh
* 1. add eRPC as a sub-module in .gitmodules.
2. update Makefile
* update .gitmodules
* Add new submodule eRPC
* update Makefile
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
---------
Co-authored-by: xiaosuGW <xiaosuGW@localhost>1 parent 5e25580 commit 0a4f8de
File tree
156 files changed
+11363
-461
lines changed- applications
- libsledge
- include
- src
- runtime
- include
- src
- libc
- tests
- ori_sledge_tests
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
156 files changed
+11363
-461
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
34 | 44 | | |
35 | 45 | | |
36 | 46 | | |
| |||
52 | 62 | | |
53 | 63 | | |
54 | 64 | | |
55 | | - | |
| 65 | + | |
56 | 66 | | |
57 | 67 | | |
58 | 68 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
47 | 35 | | |
48 | 36 | | |
49 | 37 | | |
50 | 38 | | |
51 | 39 | | |
52 | 40 | | |
53 | 41 | | |
54 | | - | |
| 42 | + | |
55 | 43 | | |
56 | | - | |
| 44 | + | |
57 | 45 | | |
58 | | - | |
59 | | - | |
60 | | - | |
| 46 | + | |
61 | 47 | | |
62 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
63 | 70 | | |
64 | | - | |
65 | | - | |
| 71 | + | |
66 | 72 | | |
67 | 73 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
71 | 82 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 83 | + | |
| 84 | + | |
75 | 85 | | |
76 | | - | |
| 86 | + | |
77 | 87 | | |
78 | | - | |
| 88 | + | |
79 | 89 | | |
80 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
81 | 100 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
98 | 106 | | |
99 | | - | |
| 107 | + | |
100 | 108 | | |
101 | | - | |
| 109 | + | |
102 | 110 | | |
103 | | - | |
| 111 | + | |
104 | 112 | | |
105 | | - | |
| 113 | + | |
106 | 114 | | |
107 | | - | |
| 115 | + | |
108 | 116 | | |
109 | | - | |
| 117 | + | |
110 | 118 | | |
111 | | - | |
| 119 | + | |
112 | 120 | | |
113 | | - | |
114 | | - | |
115 | | - | |
| 121 | + | |
116 | 122 | | |
117 | | - | |
| 123 | + | |
118 | 124 | | |
119 | | - | |
120 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
121 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
122 | 180 | | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
| 181 | + | |
128 | 182 | | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
| 183 | + | |
133 | 184 | | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
| 185 | + | |
138 | 186 | | |
139 | | - | |
| 187 | + | |
140 | 188 | | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
| 189 | + | |
147 | 190 | | |
148 | | - | |
149 | | - | |
150 | 191 | | |
151 | | - | |
| 192 | + | |
152 | 193 | | |
153 | | - | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
154 | 197 | | |
155 | | - | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
156 | 210 | | |
157 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
158 | 218 | | |
159 | 219 | | |
160 | 220 | | |
| |||
0 commit comments