You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- poh: Generates block "ticks" and manages leader status.
610
+
- sign: Generates signatures for various tiles which require them (e.g.
611
+
repair, gossip).
612
+
- rpc: Supports a subset of the Solana RPC API.
613
+
- gui: Serves the GUI, which includes the WebSocket API described in
614
+
this document.
615
+
616
+
##### short-lived
617
+
618
+
- snapct: Manages the snapshot loading state machine.
619
+
- snapld: Loads snapshots from the network or from the file system.
620
+
- snapdc: Decompresses snapshot data.
621
+
- snapin: Inserts snapshot data into the accounts database.
622
+
- genesi: Handles cluster bootstrapping if validator is booting a new
623
+
cluster. If booting into an existing cluster, fetches cluster info (e.g.
624
+
genesis hash).
625
+
578
626
**`Tile`**
579
-
| Field | Type | Description
580
-
|---------|---------|------------
581
-
| kind | `string` | What kind of tile it is. One of `net`, `sock`, `quic`, `verify`, `dedup`, `pack`, `bank`, `poh`, `shred`, `store`, `sign`, `plugin`, or `http`.
582
-
| kind_id | `number` | The index of the tile in its kind. For example, if there are four `verify` tiles they have `kind_id` values of 0, 1, 2, and 3 respectively.
627
+
| Field | Type | Description |
628
+
|---------|---------|-------------|
629
+
| kind |`string`| What kind of tile it is. In Firedancer, one of the above tiles. In Frankendancer, might be one of `net`, `sock`, `quic`, `verify`, `dedup`, `pack`, `bank`, `poh`, `shred`, `store`, `sign`, `plugin`, or `http`|
630
+
| kind_id |`number`| The index of the tile in its kind. For example, if there are four `verify` tiles they have `kind_id` values of 0, 1, 2, and 3 respectively |
631
+
| pid |`number`| The process id of the tile |
583
632
584
633
::: details Example
585
634
@@ -984,6 +1033,114 @@ first connect by the `summary.tiles` message.
984
1033
985
1034
:::
986
1035
1036
+
#### `summary.live_tile_metrics`
1037
+
| frequency | type | example |
1038
+
|------------------|---------------|---------|
1039
+
|*Once* + *10ms*|`TileMetrics`| below |
1040
+
1041
+
Live tile metrics is a live feed of various metrics related to tile
1042
+
health and resource utilization.
1043
+
1044
+
The timers field is a matrix of percentages, where entry on row
1045
+
`i`, column `j` is the percentage of time tile `i` spent in `regimes[j]`
1046
+
over the previous 10 millisecond sampling window. A value of `-1`
1047
+
indicates no sample was taken in the window, typically because the tile
1048
+
was context switched out by the kernel or it is hung.
1049
+
1050
+
The regimes array contains the processing states that a tile can exist
1051
+
in. Tile regimes are the cartesian product of the following two state
1052
+
vectors:
1053
+
1054
+
State vector 1:
1055
+
1056
+
- running: means that at the time the run loop executed, there was no
1057
+
upstream message I/O for the tile to handle.
1058
+
- processing: means that at the time the run loop executed, there was one or
1059
+
more messages for the tile to consume.
1060
+
- stalled: means that at the time the run loop executed, a downstream
1061
+
consumer of the messages produced by this tile is slow or stalled, and
1062
+
the message link for that consumer has filled up. This state causes the
1063
+
tile to stop processing upstream messages.
1064
+
1065
+
State Vector 2:
1066
+
1067
+
- maintenance: the portion of the run loop that executes infrequent,
1068
+
potentially CPU heavy tasks
1069
+
- routine: the portion of the run loop that executes regularly,
1070
+
regardless of the presence of incoming messages
1071
+
- handling: the portion of the run loop that executes as a side effect
1072
+
of an incoming message from an upstream producer tile
1073
+
1074
+
```json
1075
+
[
1076
+
"running_maintenance",
1077
+
"processing_maintenance",
1078
+
"stalled_maintenance",
1079
+
"running_routine",
1080
+
"processing_routine",
1081
+
"stalled_routine",
1082
+
"running_handling",
1083
+
"processing_handling",
1084
+
// "stalled_handling" is an impossible state, and is therefore excluded
1085
+
]
1086
+
```
1087
+
1088
+
The tiles indicies `i` appear in the same order here that they are
1089
+
reported when you first connect by the `summary.tiles` message.
1090
+
1091
+
::: details Example
1092
+
1093
+
```json
1094
+
{
1095
+
"topic": "summary",
1096
+
"key": "live_tile_metrics",
1097
+
"value": {
1098
+
"timers": [
1099
+
[10.1, 0, 0, 15.3, 17, 58, 0, 0],
1100
+
[10, 0, 0, 15, 17, 58, 0, 0],
1101
+
...
1102
+
],
1103
+
"in_backp": [
1104
+
0,
1105
+
0,
1106
+
...
1107
+
],
1108
+
"backp_msgs": [
1109
+
0,
1110
+
10,
1111
+
...
1112
+
],
1113
+
"alive": [
1114
+
1,
1115
+
1,
1116
+
...
1117
+
],
1118
+
"nvcsw": [
1119
+
0,
1120
+
1234,
1121
+
...
1122
+
],
1123
+
"nivcsw": [
1124
+
0,
1125
+
3,
1126
+
...
1127
+
]
1128
+
}
1129
+
}
1130
+
```
1131
+
1132
+
:::
1133
+
1134
+
**`TileMetrics`**
1135
+
| Field | Type | Description |
1136
+
|------------|--------------|-------------|
1137
+
| timers |`number[][]`|`timers[i][j]` is the percentage of time from the last 10ms tile `i` spent in regime `regimes[j]`|
1138
+
| in_backp |`boolean[]`|`in_backp[i]` is `true` if tile `i` is currently backpressured and `false` otherwise. See description of regimes above for more context |
1139
+
| backp_msgs |`number[]`|`backp_msgs[i]` is the number of times since startup that tile `i` has had to wait for one of more consumers to catch up to resume publishing |
1140
+
| alive |`boolean[]`|`alive[i]` is `true` if tile `i` has updated its heartbeat timer any time in the last 10ms and `false` otherwise |
1141
+
| nvcsw |`number[]`|`nvcsw[i]` is the number of voluntary context switches the occurred for tile `i` since startup |
1142
+
| nivcsw |`number[]`|`nivcsw[i]` is the number of involuntary context switches the occurred for tile `i` since startup |
1143
+
987
1144
### block_engine
988
1145
Block engines are providers of additional transactions to the validator,
989
1146
which are configurable by the operator. The validator may not be
ulongidle_time=cur[ i ].caughtup_postfrag_ticks-prev[ i ].caughtup_postfrag_ticks;
790
-
ulongbackpressure_time=cur[ i ].backpressure_prefrag_ticks-prev[ i ].backpressure_prefrag_ticks;
779
+
ulongidle_time=cur[ i ].timers[ FD_METRICS_ENUM_TILE_REGIME_V_CAUGHT_UP_POSTFRAG_IDX ] -prev[ i ].timers[ FD_METRICS_ENUM_TILE_REGIME_V_CAUGHT_UP_POSTFRAG_IDX ];
780
+
ulongbackpressure_time=cur[ i ].timers[ FD_METRICS_ENUM_TILE_REGIME_V_BACKPRESSURE_PREFRAG_IDX ] -prev[ i ].timers[ FD_METRICS_ENUM_TILE_REGIME_V_BACKPRESSURE_PREFRAG_IDX ];
0 commit comments