Skip to content
This repository was archived by the owner on Dec 14, 2022. It is now read-only.

Commit 655fb6a

Browse files
author
Chris Wiechmann
authored
Merge pull request #161 from Axway-API-Management-Plus/retention-period
Configurable Retention period
2 parents 78f7e7b + 5737f4c commit 655fb6a

File tree

17 files changed

+569
-26
lines changed

17 files changed

+569
-26
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,14 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
2121
- `LOGSTASH_ELASTICSEARCH_SSL_VERIFICATIONMODE` to configure Logstash to Elasticsearch certificate validation [#156](https://github.com/Axway-API-Management-Plus/apigateway-openlogging-elk/issues/156)
2222
- `FILEBEAT_ELASTICSEARCH_SSL_VERIFICATIONMODE` to configure Filebeat to Elasticsearch certificate validation [#156](https://github.com/Axway-API-Management-Plus/apigateway-openlogging-elk/issues/156)
2323
- Helm chart supports this by the new new parameter: `validateElasticsearchCertificate` per component [#156](https://github.com/Axway-API-Management-Plus/apigateway-openlogging-elk/issues/156)
24+
- Support to configure the data retention period of indexed data instead of using hardcoded ILM-Settings [#160](https://github.com/Axway-API-Management-Plus/apigateway-openlogging-elk/issues/160)
2425

2526
### Fixed
2627
- APIBuilder4Elastic - The Swagger for this service is invalid - Duplicate operationId renamed [#158](https://github.com/Axway-API-Management-Plus/apigateway-openlogging-elk/issues/158)
2728

2829
### Security
2930
- Custom-Flow-Nodes dependencies updated to solve security issue https://github.com/advisories/GHSA-74fj-2j2h-c42q
30-
- API-Builder version update to version Exeter
31+
- API-Builder version update to version Exeter to solve security issues
3132

3233
## [4.0.3] 2021-12-20
3334
### Changed

README.md

Lines changed: 99 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -832,23 +832,109 @@ So your alerts should report a critical alert before 90%. For more information,
832832
## Lifecycle Management
833833

834834
Since new data is continuously stored in Elasticsearch in various indexes, these must of course be removed after a certain period of time.
835-
Since version 2.0.0, the solution uses the Elasticsearch [ILM](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html) feature for this purpose, which defines different lifecycle stages per index. The so-called ILM policies are automatically configured by the solution using [configuration files](apibuilder4elastic/elasticsearch_config) and can be reviewed in Kibana.
835+
Since version 2.0.0, the solution uses the Elasticsearch [ILM](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html) feature for this purpose, which defines different lifecycle stages per index. The so-called ILM policies are automatically configured by the solution with default values using [configuration files](apibuilder4elastic/elasticsearch_config) and can be reviewed in Kibana. Beginning with version 4.1.0, you can also configure the lifecycle of the data yourself according to your requirements.
836836
The indices pass through stages such as Hot, Warm, Cold which can be used to deploy different performance hardware per stage. This means that traffic details from two weeks ago no longer have to be stored on high-performance machines.
837837

838-
The configuration is defined here per data type (e.g. Summary, Details, Audit, ...). The following table gives an overview.
838+
The configuration is defined here per data type (e.g. Summary, Details, Audit, ...). The following table gives an overview about the default values. The number of days that is crucial for the retention period is the delete days. This gives the guaranteed number of days that the data is guaranteed to be available. More information on how the lifecycle works can be found later in this section. You can use the further phase, for example, to allocate more favorable resources accordingly.
839839

840-
| Data-Type | Description | Hot (Size/Days) | Warm | Cold | Delete | Total |
841-
| :--- |:--- | :--- | :--- | :--- | :--- | :--- |
842-
| **Traffic-Summary** | Main index for traffic-monitor overview and primary dashboard | 30GB / 7 days | 5 days | 3 days | 0 days | 15 days |
843-
| **Traffic-Details** | Details in Traffic-Monitor for Policy, Headers and Payload reference | 30GB / 7 days | 5 days | 3 days | 0 days | 15 days |
844-
| **Traffic-Trace** | Trace-Messages belonging to an API-Request shown in Traffic-Monitor | 30GB / 7 days | 5 days | 3 days | 0 days | 15 days |
845-
| **General-Trace** | General trace messages, like Start- & Stop-Messages | 30GB / 7 days | 5 days | 3 days | 0 days | 15 days |
846-
| **Gateway-Monitoring** | System status information (CPU, HDD, etc.) from Event-Files | 30GB / 60 days | 30 days | 15 days | 0 days | 105 days|
847-
| **Domain-Audit** | Domain Audit-Information as configured in Admin-Node-Manager | 10GB / 270 days | 270 days| 720 days| 30 days | >3 years|
840+
| Data-Type | Description | Hot (Rollover) | Warm | Cold | __Delete__ |
841+
| :--- |:--- | :--- | :--- | :--- | :--- |
842+
| **Traffic-Summary** | Main index for traffic-monitor overview and primary dashboard | 30GB / 7d | 0d | 12d | __15d__ |
843+
| **Traffic-Details** | Details in Traffic-Monitor for Policy, Headers and Payload reference | 30GB / 7d | 0d | 12d | __15d__ |
844+
| **Traffic-Trace** | Trace-Messages belonging to an API-Request shown in Traffic-Monitor | 30GB / 7d | 0d | 12d | __15d__ |
845+
| **General-Trace** | General trace messages, like Start- & Stop-Messages | 30GB / 7d | 0d | 12d | __15d__ |
846+
| **Gateway-Monitoring** | System status information (CPU, HDD, etc.) from Event-Files | 30GB / 60d | 0d | 90d | __105d__ |
847+
| **Domain-Audit** | Domain Audit-Information as configured in Admin-Node-Manager | 10GB / 270d | 270d | 720d | __750d__ |
848848

849-
Please note:
850-
:point_right: It's optional to use different hardware per stage
851-
:point_right: Do not change the ILM/Modify the ILM-Policies manually, as they are configured automatically. In a later version, the solution will provide options to customize the time range as needed without breaking updates.
849+
### Configure the lifecycle
850+
851+
As of version 4.1.0, you can configure how long the indexed data should be kept in Elasticsearch. Before starting, you should read and understand the following information thoroughly, because once deleted, data cannot be recovered.
852+
Individual API transactions are stored as documents in Elasticsearch Indices. However, it is not the case that individual documents are ultimately deleted again, instead it is always an entire index with millions of transactions/documents. Therefore, you can only control the retention period for an entire index, not per document.
853+
When API transactions are stored in an index, the size of the index increases accordingly. To prevent an index from growing infinitely, it can be rolled over after a certain time. A new active index is created, which is used to write the data. This replaces the old index, which is only used for reading. This process is called [rollover](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-rollover.html).
854+
855+
In order not to have to control this process manually, there are so-called [Index Lifecycle Management](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html) (ILM) policies in Elasticsearch, which perform the rollover based on defined rules and then send the index through further phases for various purposes.
856+
857+
These ILM policies are configured automatically by the solution with default values and are stored and managed for each index in Elasticsearch. The default values result in the data being available for at least 2 weeks.
858+
859+
If you would like to customize the lifecycle, then you can provide a corresponding configuration file from version 4.1.0 and use the parameter: `RETENTION_PERIOD_CONFIG`. This is used to adapt the ILM policies accordingly.
860+
861+
Here is an example:
862+
```json
863+
{
864+
"retentionPeriods": {
865+
"apigw-traffic-summary": {
866+
"rollover": {
867+
"max_age": "7d",
868+
"max_size": "15gb"
869+
},
870+
"retentionPeriod": "7d"
871+
},
872+
"apigw-traffic-details": {
873+
"rollover": {
874+
"max_age": "7d",
875+
"max_size": "15gb"
876+
},
877+
"retentionPeriod": "6d"
878+
},
879+
"apigw-traffic-trace": {
880+
"rollover": {
881+
"max_age": "7d",
882+
"max_primary_shard_size": "15gb"
883+
},
884+
"retentionPeriod": "5d"
885+
}
886+
}
887+
}
888+
```
889+
890+
The configuration is defined per index and is divided into two areas. When should the rollover happen and how many days after the rollover should the data still be available.
891+
The following figure illustrates the process:
892+
893+
![Lifecycle details](imgs/index-ilm-details.png)
894+
895+
__1. Create your rention period config file__
896+
897+
Create a new file for your retention period configuration. For example: `config/custom-retention-period.json`. As a template, you can use the file: `config/my-retention-period-sample.json`.
898+
899+
__2. Define the rollover__
900+
901+
It is important to understand that the time period until the rollover of an index is not exactly fixed.
902+
For example, if you specify a maximum age and size for an index, then the index will be rolled over as soon as a condition is met.
903+
904+
- If the maximum size is too small for your transaction volume, then an index can meet the size condition in less than 24 hours and will be rolled over.
905+
- If the maximum size is too large, the index will be rolled when it reaches the maximum age (e.g. after 7 days).
906+
907+
So how long the data is available from the very beginning to the end of an index is the sum of the period from the index's initial creation to the rollover __plus__ the period until the delete. As the rollover date cannot be defined exactly, you need to monitor your system accordingly and adjust the lifecycle accordingly to get the desired retention time.
908+
909+
You can use the following conditions for the rollover:
910+
- `max_age`: Defines the maximum age of an index until it is rolled over
911+
- `max_size`: The maxium index size. As an index has a Primary and Replica the required disk space is doubled (max_size: 30gb turns it 60gb disk space used)
912+
- `max_primary_shard_size`: Starting with an Elasticsearch version 7.13, you can also define the maximum shard size of an index. All indexes , except apigw-management-kpis and apigw-domainaudit, have 5 shards. So you have to multiply the specified size by 5.
913+
914+
For more information please read: [ILM Rollover options](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-rollover.html#ilm-rollover-options)
915+
916+
__3. Define the retention period__
917+
918+
With the parameter: `retentionPeriod` you define the time period for which the data is guaranteed to be available. As already described, the time until the rollover of the index adds to this. You can specify only days here.
919+
920+
__4. Apply the configuration__
921+
922+
The last step is to reference your configuration file in your `.env` file with the parameter: `RETENTION_PERIOD_CONFIG=./config/custom-retention-period.json` and restart API Builder.
923+
924+
`docker-compose stop apibuilder4elastic`
925+
`docker-compose up`
926+
927+
You can check in Kibana whether the ILM policy has been adjusted accordingly. To do this, go to Stack Management --> Index Lifecycle Policies - Open the corresponding policy here and check the phase.
928+
929+
__Further notes:__
930+
931+
- Changes to the ILM-Policy have no influence on indices that have already been rolled over, as these have already entered lifecycle management
932+
- Indexes should not be too small, as this increases the load on Elasticsearch too much.
933+
- For each active index there are 5 Primary- and 5 Replica-Shards.
934+
- Each shard corresponds to a Lucene instance, which consumes corresponding resources.
935+
- The smaller an index, the more indexes, the more shards, the more resources are needed.
936+
- Elastic's recommendation is 30GB. The solution does not allow index size below 5GB.
937+
- It's optional to use different hardware per stage
852938

853939
<p align="right"><a href="#table-of-content">Top</a></p>
854940

apibuilder4elastic/conf/default.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ module.exports = {
3333
managementKPIsInterval: process.env.MANAGEMENT_KPIS_INTERVAL || '3600000',
3434
managementKPIsEnabled: ("false" == process.env.MANAGEMENT_KPIS_ENABLED) ? false : true,
3535

36+
// This path is optional and if given used to adjust the ILM-Configuration.
37+
retentionPeriodConfigFile: process.env.RETENTION_PERIOD_CONFIG || 'NotSet',
38+
3639
// These version are used, that Filebeat and Logstash are configured as required
3740
// by the API-Builder release
3841
versions: {

apibuilder4elastic/custom_flow_nodes/api-builder-plugin-elk-solution-utils/src/actions.js

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ async function getIndexConfig(params, options) {
5252
if(indexConfig.ilm == undefined || indexConfig.ilm.config == undefined) {
5353
indexConfig.ilm = { config: "NotSet" } ;
5454
}
55+
// Additionally add the name to the indexConfig
56+
indexConfig.name = indexName;
5557
return indexConfig;
5658
}
5759

@@ -245,6 +247,77 @@ async function getPayloadFilename(params, options) {
245247
return extractedFileName;
246248
}
247249

250+
async function setupILMRententionPeriod(params, options) {
251+
const { indexConfig, ilmConfig, rententionPeriodConfig } = params;
252+
const { logger } = options;
253+
if (!indexConfig) {
254+
throw new Error('Missing required parameter: indexConfig');
255+
}
256+
if (!ilmConfig) {
257+
throw new Error('Missing required parameter: ilmConfig');
258+
}
259+
if (!rententionPeriodConfig) {
260+
logger.debug(`No retentionPeriodConfig is given. Using standard retention periods.`);
261+
return options.setOutput('notChanged', ilmConfig);
262+
}
263+
if (!indexConfig.name) {
264+
throw new Error('The name of the index is missing in the IndexConfig');
265+
}
266+
// Trying to read the retentionPeriodConfig file
267+
if (!rententionPeriodConfig.retentionPeriods) {
268+
throw new Error('rententionPeriodConfig must contain retentionPeriods object.');
269+
}
270+
const indexName = indexConfig.name;
271+
// Check if a retentionPeriod is defined for the given index
272+
if(!rententionPeriodConfig.retentionPeriods[indexName]) {
273+
logger.debug(`No retention period configured for index: ${indexName}. Using default ILM-Configuration.`);
274+
return options.setOutput('notChanged', ilmConfig);
275+
} else {
276+
var periodConfig = rententionPeriodConfig.retentionPeriods[indexName];
277+
// Defines when an index should be rolled over which means it enters the WARM, COLD, DELETE lifecycle
278+
if(periodConfig.rollover) {
279+
logger.info(`Setup ILM rollover configuration for index: ${indexName} with config: ${JSON.stringify(periodConfig.rollover)}`);
280+
var maxAge = parseInt(periodConfig.rollover.max_age);
281+
if(periodConfig.rollover.max_age) {
282+
ilmConfig.policy.phases.hot.actions.rollover.max_age = `${maxAge}d`;
283+
}
284+
if(periodConfig.rollover.max_size) {
285+
const maxSize = parseInt(periodConfig.rollover.max_size);
286+
if(isNaN(maxSize)) {
287+
throw new Error(`The given max_size: ${periodConfig.rollover.max_size} for index: ${indexName} is not a valid number.`);
288+
}
289+
if(maxSize<5) {
290+
throw new Error(`The given max_size: ${maxSize} for index: ${indexName} is too small. Please configure at least 5GB.`);
291+
}
292+
ilmConfig.policy.phases.hot.actions.rollover.max_size = `${maxSize}gb`;
293+
}
294+
if(periodConfig.rollover.max_primary_shard_size) {
295+
const maxPrimaryShardSize = parseInt(periodConfig.rollover.max_primary_shard_size);
296+
if(isNaN(maxPrimaryShardSize)) {
297+
throw new Error(`The given max_primary_shard_size: ${periodConfig.rollover.max_primary_shard_size} for index: ${indexName} is not a valid number.`);
298+
}
299+
if(maxPrimaryShardSize<5) {
300+
throw new Error(`The given max_primary_shard_size: ${maxPrimaryShardSize} for index: ${indexName} is too small. Please configure at least 5GB.`);
301+
}
302+
ilmConfig.policy.phases.hot.actions.rollover.max_primary_shard_size = `${maxSize}gb`;
303+
}
304+
305+
}
306+
// The single value period is distributed across the lifecycle stages COLD AND DELETED. WARM is not considered for now, as an rolled over index should
307+
// move to WARM immediatly after roll-over. This might be enhanced later if needed with extra config options instead of days only
308+
if(periodConfig.days) {
309+
var givenDays = parseInt(periodConfig.days);
310+
logger.info(`Setup ILM retention period for index: ${indexName} based on ${givenDays} number of days.`);
311+
// The given number of days is distrbuted evenly for stages COLD & DELETE
312+
var coldDays = Math.round(givenDays / 2); // It stay for a while warm before it goes to COLD
313+
var deleteDays = givenDays; // It stay for a while in COLD before delete
314+
ilmConfig.policy.phases.cold.min_age = `${coldDays}d`;
315+
ilmConfig.policy.phases.delete.min_age = `${deleteDays}d`;
316+
}
317+
}
318+
return ilmConfig;
319+
}
320+
248321
async function getHostname(params, options) {
249322
const hostname = os.hostname();
250323
options.logger.debug(`API-Builder process is running on host: ${hostname}`);
@@ -257,5 +330,6 @@ module.exports = {
257330
createIndices,
258331
updateRolloverAlias,
259332
getPayloadFilename,
260-
getHostname
333+
getHostname,
334+
setupILMRententionPeriod
261335
};

apibuilder4elastic/custom_flow_nodes/api-builder-plugin-elk-solution-utils/src/flow-nodes.yml

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,4 +207,46 @@ flow-nodes:
207207
type: object
208208
properties:
209209
message:
210-
type: string
210+
type: string
211+
212+
setupILMRententionPeriod:
213+
name: Setup ILM Rentention-Period
214+
description: "Configures the given ILM-Policy according to the provided number of days."
215+
parameters:
216+
indexConfig:
217+
name: Index config
218+
description: "Index configuration as defined in elasticsearch_config/index_config.json. It also contains the index name."
219+
required: true
220+
schema:
221+
type: object
222+
rententionPeriodConfig:
223+
name: Rentention period config
224+
description: "Contains the path of the retention period config file. If not given, the standard ILM is used."
225+
required: true
226+
schema:
227+
type: string
228+
ilmConfig:
229+
name: ILM-Config
230+
description: "The ILM Config object that is supposed to be send to Elasticsearch to create or update the ILM-Policy. It is read from the $.indexConfig.ilm.config file and converted into an object."
231+
required: true
232+
schema:
233+
type: object
234+
outputs:
235+
next:
236+
name: Next
237+
description: Returns the updated ILM-Configuration object.
238+
context: $.ilmPolicyBody
239+
schema:
240+
type: object
241+
notChanged:
242+
name: Not changed
243+
description: The given ILM-Policy has not changed, because either no rention period parameter is given or it defaults to 15 days.
244+
context: $.ilmPolicyBody
245+
schema:
246+
type: object
247+
error:
248+
name: Error
249+
description: An unexpected error happened
250+
context: $.error
251+
schema:
252+
type: object

0 commit comments

Comments
 (0)