Merge pull request #140 from aws-samples/contract_compliance_nova

krokoko · web-flow · commit 3b70c483356c · 2024-12-05T18:43:24.000-06:00
feat(sample): update contract compliance deps and enable Amazon Nova model support
diff --git a/samples/contract-compliance-analysis/back-end/README.md b/samples/contract-compliance-analysis/back-end/README.md
@@ -1,5 +1,14 @@
 # Contract Compliance Analysis - Back-end
 
+## Table of contents
+
+- [Basic setup](#basic-setup)
+    - [Local environment or Cloud9](#local-environment-or-cloud9)
+    - [Cloud9 environment (optional)](#cloud9-setup-optional)
+    - [Setup steps](#setup-steps)
+- [How to customize contract analysis according to your use case](#how-to-customize-contract-analysis-according-to-your-use-case)
+- [How to use a different Amazon Bedrock FM](#how-to-use-a-different-amazon-bedrock-fm)
+
 ## Basic setup
 
 ### Local environment or Cloud9
@@ -9,8 +18,6 @@ You have the option of running the setup from a local workspace or from a Cloud9
 In case you opt for Cloud9, you have to setup a Cloud9 environment in the same AWS Account where this Backend will
 be installed.
 
-If your local workspace has a non-x86 processor architecture (for instance ARM, like the M processor from Macbooks), it's strongly recommended to perform the setup steps from a Cloud9 environment, to avoid bundling issues of Lambda function dependencies (see [ticket](https://github.com/awslabs/generative-ai-cdk-constructs/issues/541)). Otherwise, set the `DOCKER_DEFAULT_PLATFORM` environmental variable to `linux/amd64` to build `x86_64` packages.
-
 #### Cloud9 setup (optional)
 
 1. Follow the steps on [https://docs.aws.amazon.com/cloud9/latest/user-guide/setting-up.html](https://docs.aws.amazon.com/cloud9/latest/user-guide/setting-up.html)
@@ -88,16 +95,12 @@ cdk bootstrap
     cdk deploy --require-approval=never
     ```
 
-    > Use `DOCKER_DEFAULT_PLATFORM=linux/amd64 cdk deploy --require-approval=never` on macOS
-
 2. Any modifications made to the code can be applied to the deployed stack by running the same command again.
 
     ```shell
     cdk deploy --require-approval=never
     ```
 
-    > Use `DOCKER_DEFAULT_PLATFORM=linux/amd64 cdk deploy --require-approval=never` on macOS
-
 #### Populate Guidelines table
 
 Once the Stack is setup, you need to populate the DynamoDB Guidelines table with the data from the Guidelines Excel sheet that is included in the `guidelines` folder.
@@ -137,7 +140,7 @@ Click the **Enable specific models** button and enable the checkbox for Anthropi
 
 Click **Next** and **Submit** buttons
 
-## How to customize contract analysis accordding to your use case  
+## How to customize contract analysis according to your use case  
 
 This solution was designed to support analysis of contracts of different types and of different languages, based on the assumption that the contracts establish an agreement between two parties: a given company and another party. The solution already comes pre-configured to analyze service contract contracts in English for the company *AnyCompany*, together with an example of guidelines.
 
@@ -164,3 +167,49 @@ The recommended sequence of steps:
     ```shell
     python load_guidelines.py --guidelines_file_path <custom_guidelines_file_path>
     ```
+
+## How to use a different Amazon Bedrock FM
+
+By default, the application uses Anthropic Claude 3 Haiku v1. Here are steps explaining how to update the model to use. For this example, we will use [Amazon Nova Pro v1](https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/):
+
+- Open the [app_properties.yaml](./app_properties.yaml) file and update the field ```claude_model_id``` to use the model you selected. In this case, we update the field to ```us.amazon.nova-pro-v1:0```. Replace it with the model id you want to use. The list of model ids available through Amazon Bedrock is available in the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html). Ensure the model you are selecting is enabled in the console (Amazon Bedrock -> Model access) and available in your region.
+- Depending on the model selected, you might need to update some hardcoded values regarding the max number of new tokens generated. For instance, Amazon Nova Pro v1 supports 5000 output tokens, which doesn't require any modifications. However, some models might have a max output tokens of 3000, which requires some changes in the sample. Update the following lines if required:
+    - In file [fn-preprocess-contract/index.py](./stack/sfn/preprocessing/fn-preprocess-contract/index.py), update line 96 to change the chunks size to a value smaller than the max tokens output for your model, as well as line 107 to match your model's max output tokens.
+    - In file [scripts/utils/llm.py](./scripts/utils/llm.py), update the max tokens output line 28.
+    - In file [common-layer/llm.py](./stack/sfn/common-layer/llm.py) update the max tokens output line 30.
+    - In file [fn-classify-clauses/index.py](.stack/sfn/classification/fn-classify-clauses/index.py), update line 182 the max tokens output for your model
+- Re-deploy the solution as described in previous sections
+
+### Troubleshooting
+
+#### KeyError in step X
+
+If you change the model, it is possible that you face an error in the step function run. This can be due to the parsing of the LLM response.
+In that case, identify the failing lambda function from the step functions logs, and update the dedicated lambda function code to enable verbose messaging. For instance, if the failing lambda function is ```PreprocessingStepPreproc```, open the file [fn-preprocess-contract/index.py](./stack/sfn/preprocessing/fn-preprocess-contract/index.py) and update the invoke_llm code:
+
+```python
+llm_response, model_usage, stop_reason = invoke_llm(
+    prompt=PROMPT_TEMPLATE.format(CONTRACT_EXCERPT=contract_excerpt),
+    model_id=prompt_vars_dict.get("claude_model_id", ''),
+    temperature=0.0,
+    top_p=0.999,
+    max_new_tokens=4096,
+    verbose=True # <- turn on verbose mode
+)
+```
+
+Then, modify the file [common-layer/llm.py](./stack/sfn/common-layer/llm.py) and print the response from the runnable invocation:
+
+```python
+response = chain.invoke({})
+logger.info(f"Model response: {response}") # <- log the response
+content = response.content
+```
+
+Re-deploy the solution, and verify in the logs the structure of the response. Depending on the model used, it is possible that the schema of the reponse is different, thus the ```usage``` and ```stop reason``` values might require to be parsed differently. In that case, add the correct code in the file [common-layer/llm.py](./stack/sfn/common-layer/llm.py):
+
+```python
+if ('mymodel' in model_id):
+        usage_data = response.XXX # <- specify how to parse usage data
+        stop_reason = response.XXX # <- specify how to parse stop reason
+```
diff --git a/samples/contract-compliance-analysis/back-end/requirements.txt b/samples/contract-compliance-analysis/back-end/requirements.txt
@@ -1,12 +1,11 @@
-aws-cdk-lib==2.144.0
-aws_cdk.aws_lambda_python_alpha==2.144.0a0
+aws-cdk-lib==2.166.0
+aws_cdk.aws_lambda_python_alpha==2.166.0a0
 constructs>=10.0.0,<11.0.0
-cdk_nag==2.28.145
-openpyxl==3.1.3
-boto3==1.34.106
-pandas==2.2.2
-awswrangler==3.8.0
+cdk_nag==2.34.16
+openpyxl==3.1.5
+boto3==1.35.76
+pandas==2.2.3
+awswrangler==3.10.1
 argparse==1.4.0
 retrying==1.3.4
-PyYAML==6.0.1
-cdklabs.generative_ai_cdk_constructs==0.1.198
+PyYAML==6.0.2
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/__init__.py b/samples/contract-compliance-analysis/back-end/stack/sfn/__init__.py
@@ -32,9 +32,6 @@
 from .evaluation import EvaluationStep
 from .risk import RiskStep
 
-from cdklabs.generative_ai_cdk_constructs import LangchainCommonDepsLayer
-
-
 class StepFunctionsStack(NestedStack):
 
     def __init__(
@@ -50,12 +47,13 @@ def __init__(
     ):
         super().__init__(scope, id, **kwargs)
 
-        self.langchain_deps_layer = LangchainCommonDepsLayer(
+        self.langchain_deps_layer = lambda_python.PythonLayerVersion(
             self,
-            "LangChainDependenciesLayer",
-            runtime=lambda_.Runtime.PYTHON_3_12,
-            architecture=lambda_.Architecture.X86_64
-        ).layer
+            'LangChainDependenciesLayer',
+            entry=os.path.join(os.path.dirname(__file__), "langchain-deps-layer"),
+            compatible_architectures=[lambda_.Architecture.X86_64],
+            compatible_runtimes=[lambda_.Runtime.PYTHON_3_12],
+        )
 
         self.common_layer = lambda_python.PythonLayerVersion(
             self,
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/classification/__init__.py b/samples/contract-compliance-analysis/back-end/stack/sfn/classification/__init__.py
@@ -15,6 +15,7 @@
 from constructs import Construct
 from aws_cdk import (
     Duration,
+    Aws,
     aws_lambda as lambda_,
     aws_stepfunctions as sfn,
     aws_stepfunctions_tasks as tasks,
@@ -75,7 +76,8 @@ def __init__(
                 "bedrock:InvokeModel",
             ],
             resources=[
-                "arn:aws:bedrock:*::foundation-model/anthropic*"
+                "arn:aws:bedrock:*::foundation-model/*",
+                "arn:aws:bedrock:*:"+Aws.ACCOUNT_ID+":inference-profile/*",
             ]
         ))
 
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/classification/fn-classify-clauses/requirements.txt b/samples/contract-compliance-analysis/back-end/stack/sfn/classification/fn-classify-clauses/requirements.txt
@@ -1 +1 @@
-more-itertools==10.3.0
+more-itertools==10.5.0
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/common-layer/llm.py b/samples/contract-compliance-analysis/back-end/stack/sfn/common-layer/llm.py
@@ -60,9 +60,19 @@ def invoke_llm(prompt, model_id, temperature=0.5, top_k=None, top_p=0.8, max_new
     response = chain.invoke({})
     content = response.content
 
+    usage_data = None
+    stop_reason = None
+
+    if ('anthropic' in model_id):
+        usage_data = response.response_metadata['usage']
+        stop_reason = response.response_metadata['stop_reason'] 
+    elif('amazon.nova' in model_id):
+        usage_data = response.usage_metadata 
+        stop_reason = response.response_metadata['stopReason']
+
     if verbose:
         logger.info(f"Model response: {content}")
-        logger.info(f"Model usage: {response.response_metadata['usage']}")
-        logger.info(f"Model stop_reason: {response.response_metadata['stop_reason']}")
+        logger.info(f"Model usage: {usage_data}")
+        logger.info(f"Model stop_reason: {stop_reason}")
 
-    return content, response.response_metadata['usage'], response.response_metadata["stop_reason"]
+    return content, usage_data, stop_reason
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/evaluation/__init__.py b/samples/contract-compliance-analysis/back-end/stack/sfn/evaluation/__init__.py
@@ -15,6 +15,7 @@
 from constructs import Construct
 from aws_cdk import (
     Duration,
+    Aws,
     aws_lambda as lambda_,
     aws_stepfunctions as sfn,
     aws_stepfunctions_tasks as tasks,
@@ -74,7 +75,8 @@ def __init__(
                 "bedrock:InvokeModel",
             ],
             resources=[
-                "arn:aws:bedrock:*::foundation-model/anthropic*"
+                "arn:aws:bedrock:*::foundation-model/*",
+                "arn:aws:bedrock:*:"+Aws.ACCOUNT_ID+":inference-profile/*",
             ]
         ))
 
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/langchain-deps-layer/requirements.txt b/samples/contract-compliance-analysis/back-end/stack/sfn/langchain-deps-layer/requirements.txt
@@ -0,0 +1,3 @@
+langchain==0.3.9
+langchain-community==0.3.9
+langchain-aws==0.2.9
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/__init__.py b/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/__init__.py
@@ -15,6 +15,7 @@
 from constructs import Construct
 from aws_cdk import (
     Duration,
+    Aws,
     aws_lambda as lambda_,
     aws_stepfunctions as sfn,
     aws_stepfunctions_tasks as tasks,
@@ -85,7 +86,8 @@ def __init__(
                 "bedrock:InvokeModel",
             ],
             resources=[
-                "arn:aws:bedrock:*::foundation-model/anthropic*"
+                "arn:aws:bedrock:*::foundation-model/*",
+                "arn:aws:bedrock:*:"+Aws.ACCOUNT_ID+":inference-profile/*",
             ]
         ))
 
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/fn-preprocess-contract/index.py b/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/fn-preprocess-contract/index.py
@@ -156,7 +156,7 @@ def handler(event, context):
         ))
         # Merge the contracts, accepting only the inclusion of separators
         merged_contract = "".join([
-            diff[2:] if diff.startswith("  ") or diff.startswith("- ") or diff.startswith(f"+ {CLAUSE_SEPARATOR}\n") else ""
+            diff[2:] if diff.startswith("  ") or diff.startswith("- ") or diff.startswith(f"+ {CLAUSE_SEPARATOR}\n") or diff.startswith(f"+ {CLAUSE_SEPARATOR} \n") else ""
             for diff in diffs
         ])
         # Get each individual clause and insert into table
diff --git a/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/fn-preprocess-contract/requirements.txt b/samples/contract-compliance-analysis/back-end/stack/sfn/preprocessing/fn-preprocess-contract/requirements.txt
@@ -1 +1 @@
-tiktoken==0.7.0
+tiktoken==0.8.0

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-more-itertools==10.3.0`
	`1`	`+more-itertools==10.5.0`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+langchain==0.3.9`
	`2`	`+langchain-community==0.3.9`
	`3`	`+langchain-aws==0.2.9`