Skip to content

Conversation

@Miqueasher
Copy link
Contributor

This PR adds custom metrics to the existing sample app to verify public documentation is correct and functionality continues to work as changes happen in the future (Regression testing). Specifically for OpenTelemetry metrics using the CloudWatch Agent.

The changes made in this PR are:

  1. 2 New imports to FrontendServiceController.java and dependencies to build.gradle.kts (sample app's frontend service for the 'main' instance and the dependency installer)
  2. Global meter, and gauge/histogram/counter added to FrontendServiceController.java from OTEL SDK Metrics page
  3. Update to environment variables in main.tf
  4. Update to jobs & env sections in java-ec2-default-test
  5. Update to Cloudwatch agent config to include otlp ports
  6. New Validator file to call updated predefined template file
  7. Update to PredefinedExpectedTemplate to include new mustache file
  8. New mustache file for validation of custom metrics

Git revert back to last passing CSHA: 82e7075

<Can we safely revert this commit if needed? If not, detail what must be done to safely revert and why it is needed.>

Ensure you've run the following tests on your changes and include the link below:

To do so, create a test.yml file with name: Test and workflow description to test your changes, then remove the file for your PR. Link your test run in your PR description. This process is a short term solution while we work on creating a staging environment for testing.

NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL. See the needs keyword to run tests in succession.

  • Run Java EKS on e2e-playground in us-east-1 and eu-central-2
  • Run Python EKS on e2e-playground in us-east-1 and eu-central-2
  • Run metric limiter on EKS cluster e2e-playground in us-east-1 and eu-central-2
  • Run EC2 tests in all regions
  • Run K8s on a separate K8s cluster (check IAD test account for master node endpoints; these will change as we create and destroy clusters for OS patching)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@thpierce thpierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there no change to the following files?

  • .github/workflows/java-ec2-default-test.yml
  • terraform/python/ec2/default/README.md

Comment on lines 158 to 160
name: cloud.platform
value: aws_ec2
value: aws_ec2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated.

Suggested change
name: cloud.platform
value: aws_ec2
value: aws_ec2
name: cloud.platform
value: aws_ec2

# Get and run the sample application with configuration
aws s3 cp ${var.sample_app_jar} ./main-service.jar
aws s3 cp ${var.sample_app_jar} ./main-service-delete-me.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

OTEL_RESOURCE_ATTRIBUTES="service.name=$${SERVICE_NAME},deployment.environment.name=$${DEPLOYMENT_ENVIRONMENT_NAME},Internal_Org=Financial,Business Unit=Payments,Region=us-east-1,aws.application_signals.metric_resource_keys=Business Unit&Region&Organization" \
OTEL_INSTRUMENTATION_COMMON_EXPERIMENTAL_CONTROLLER_TELEMETRY_ENABLED=true \
nohup java -XX:+UseG1GC -jar main-service.jar &> nohup.out &
nohup java -XX:+UseG1GC -jar main-service-delete-me.jar &> nohup.out &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

# Get and run the sample application with configuration
aws s3 cp ${var.sample_remote_app_jar} ./remote-service.jar
aws s3 cp ${var.sample_remote_app_jar} ./remote-service-delete-me.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

OTEL_RESOURCE_ATTRIBUTES=service.name=sample-remote-application-${var.test_id} \
OTEL_INSTRUMENTATION_COMMON_EXPERIMENTAL_CONTROLLER_TELEMETRY_ENABLED=true \
nohup java -XX:+UseG1GC -jar remote-service.jar &> nohup.out &
nohup java -XX:+UseG1GC -jar remote-service-delete-me.jar &> nohup.out &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

@ResponseBody
public String awssdkCall(@RequestParam(name = "testingId", required = false) String testingId) {

logger.info("Incrementing custom counter - OpenTelemetry available: {}", GlobalOpenTelemetry.get() != null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

Suggested change
logger.info("Incrementing custom counter - OpenTelemetry available: {}", GlobalOpenTelemetry.get() != null);

Comment on lines 103 to 105
private final LongCounter pipelineCounter;
private final DoubleHistogram pipelineHistogram;
private final LongUpDownCounter pipelineGauge;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final LongCounter pipelineCounter;
private final DoubleHistogram pipelineHistogram;
private final LongUpDownCounter pipelineGauge;
private final LongCounter customPipelineCounter;
private final DoubleHistogram customPipelineHistogram;
private final LongUpDownCounter customPipelineGauge;

.build();

SdkMeterProvider pipelineMeterProvider = SdkMeterProvider.builder()
.setResource(Resource.getDefault())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to create the resource with servicename/deploymentenvname?

.build();

MetricReader pipelineMetricReader = PeriodicMetricReader.builder(pipelineMetricExporter)
.setInterval(Duration.ofSeconds(60))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.setInterval(Duration.ofSeconds(60))
.setInterval(Duration.ofSeconds(1))

private static final LongUpDownCounter gauge = meter.upDownCounterBuilder("agent_based_gauge").build();

// Pipeline-based metrics (initialized in constructor)
private final Meter pipelineMeter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

Suggested change
private final Meter pipelineMeter;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants