Skip to content

Commit 9585cd5

Browse files
committed
add evaluation for matching clothes
1 parent 092caf7 commit 9585cd5

File tree

18 files changed

+2271
-0
lines changed

18 files changed

+2271
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Evaluation for the Image styling recommendation with prompte engineering
2+
3+
## 1. 비즈니스 문제 정의
4+
- 문제는 크게 아래 두가지로 정의 됩니다.
5+
- (1) 생성형 AI를 통한 어울리는 상품 찾기,
6+
- ![matching_clothes.png](img/matching_clothes.png)
7+
- (2) 어울리는 상품을 LLM 이 찾고 "선택 이유" 에 대해서 잘 기술이 되었는지를 체계적으로 검증
8+
- ![evaluation_problem.png](img/evaluation_problem.png)
9+
10+
## 2. 솔루션
11+
notebook 폴더에 아래의 두개의 노트북을 실행하면 솔루션을 얻을 수 있습니다.
12+
- 01_matching_codi_product.ipynb
13+
- 02_matching_reason_evaluation.ipynb
14+
15+
## 3.사용 데이터
16+
- 사용한 이미지는 ["무신사"](https://www.musinsa.com/app/?utm_source=google_shopping&utm_medium=sh&utm_campaign=pmax_ongoing&source=GOSHSAP001&utm_source=google_shopping&utm_medium=sh&utm_campaign=pmax_ongoing&source=GOSHSAP001&gad_source=1&gclid=CjwKCAjw57exBhAsEiwAaIxaZv09yuMwcaiR6VnTCsEtLNv2RGHtxR7uGrDROKAFhzW-rUZst1JCEBoC4I8QAvD_BwE) 의 웹사이트에서 다운로드 한 이미지를 사용합니다.
17+
18+
## 4.실험 환경
19+
### 4.1 SageMaker Studio Code Editor
20+
- 노트북은 [SageMaker Studio Code Editor](https://docs.aws.amazon.com/sagemaker/latest/dg/code-editor.html) 및 커널 base (Python 3.10.13) 에서 테스트 되었습니다.
21+
- 실행 환경에 설치된 Python Package 참고 하세요. --> [requirements.txt](requirements.txt)
22+
23+
### 4.2 기타 환경
24+
**요구 사항**
25+
26+
* Python 3.7 이상
27+
* AWS 계정 및 자격 증명
28+
* AWS CLI 설치 및 구성
29+
30+
**설치**
31+
32+
1. 이 저장소를 클론하세요.
33+
34+
`git clone https://github.com/aws-samples/aws-ai-ml-workshop-kr.git`
35+
36+
가상 환경을 생성하고 활성화합니다.
37+
38+
```bash
39+
python3 -m venv venv
40+
source venv/bin/activate
41+
```
42+
43+
필요한 Python 패키지를 설치합니다.
44+
45+
`pip install -r requirements.txt`
46+
47+
다운로드 받은 깃 리포의 해당 폴더로 이동 합니다.
48+
`cd genai/aws-gen-ai-kr/20_applications/05_image_styling_recommendation_with_prompt_engineering/evaluation`
49+
50+
51+
## A. 참고 자료
52+
- [Building with Anthropic’s Claude 3 on Amazon Bedrock and LangChain](https://medium.com/@dminhk/building-with-anthropics-claude-3-on-amazon-bedrock-and-langchain-%EF%B8%8F-2b842f9c0ca8)
53+
- [Amazon Bedrock 기반 Amorepacific 리뷰 요약 서비스 평가 방법 구현하기](langchain_core.runnables.base.RunnableSequence)
54+
- [Amazon Bedrock model IDs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html)
55+
이 저장소에는 Anthropic Claude-3 Sonnet 모델을 AWS Bedrock 런타임에서 사용하는 방법을 보여주는 Python 예제 코드가 포함되어 있습니다.
56+
- [Anthropic Claude 설명서](https://docs.anthropic.com/claude/docs/intro-to-claude)
57+
- [AWS Bedrock 런타임 설명서](https://docs.aws.amazon.com/ko_kr/bedrock/latest/userguide/service_code_examples_bedrock-runtime.html)

genai/aws-gen-ai-kr/20_applications/05_image_styling_recommendation_with_prompt_engineering/evaluation/eval_utils/__init__.py

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
import os
2+
import base64
3+
import json
4+
import boto3
5+
import sys
6+
import textwrap
7+
from io import StringIO
8+
from langchain_core.output_parsers import StrOutputParser
9+
from langchain_core.prompts import ChatPromptTemplate
10+
11+
12+
from langchain_aws import ChatBedrock
13+
14+
class BedrockLangChain:
15+
16+
def __init__(self, bedrock_runtime):
17+
self.bedrock_runtime = bedrock_runtime
18+
19+
def invoke_rewrite_langchain(self, model_id, model_kwargs, system_prompt, user_prompt, coordination_review, verbose):
20+
21+
model = ChatBedrock(
22+
client=self.bedrock_runtime,
23+
model_id= model_id,
24+
model_kwargs=model_kwargs,
25+
)
26+
27+
28+
messages = [
29+
("system", system_prompt),
30+
("human", user_prompt)
31+
]
32+
33+
prompt = ChatPromptTemplate.from_messages(messages)
34+
if verbose:
35+
print("messages: \n", messages)
36+
print("prompt: \n")
37+
self.print_ww(prompt)
38+
39+
chain = prompt | model | StrOutputParser()
40+
41+
print("## Created Prompt:\n")
42+
response = chain.invoke(
43+
{
44+
"coordination_review": coordination_review
45+
}
46+
)
47+
48+
return response
49+
50+
51+
52+
def invoke_creating_criteria_langchain(self, model_id, model_kwargs, system_prompt, user_prompt, guide, verbose):
53+
54+
model = ChatBedrock(
55+
client=self.bedrock_runtime,
56+
model_id= model_id,
57+
model_kwargs=model_kwargs,
58+
)
59+
60+
61+
messages = [
62+
("system", system_prompt),
63+
("human", user_prompt)
64+
]
65+
66+
prompt = ChatPromptTemplate.from_messages(messages)
67+
if verbose:
68+
print("messages: \n", messages)
69+
print("prompt: \n")
70+
self.print_ww(prompt)
71+
72+
chain = prompt | model | StrOutputParser()
73+
74+
print("## Created Prompt:\n")
75+
76+
for chunk in chain.stream(
77+
{
78+
"guide": guide
79+
}
80+
):
81+
print(chunk, end="", flush=True)
82+
83+
84+
def invoke_evaluating_fashion_review_langchain(self, model_id, model_kwargs, system_prompt, user_prompt, human_message, AI_message, verbose):
85+
86+
model = ChatBedrock(
87+
client=self.bedrock_runtime,
88+
model_id= model_id,
89+
model_kwargs=model_kwargs,
90+
)
91+
92+
93+
94+
messages = [
95+
("system", system_prompt),
96+
("human", user_prompt)
97+
]
98+
99+
prompt = ChatPromptTemplate.from_messages(messages)
100+
if verbose:
101+
print("messages: \n", messages)
102+
print("prompt: \n")
103+
self.print_ww(prompt)
104+
105+
chain = prompt | model | StrOutputParser()
106+
107+
108+
for chunk in chain.stream(
109+
{
110+
"human_text": human_message,
111+
"AI_text": AI_message,
112+
}
113+
):
114+
print(chunk, end="", flush=True)
115+
116+
117+
def set_text_langchain_body(self, prompt):
118+
text_only_body = {
119+
"messages": [
120+
{
121+
"role": "user",
122+
"content": [
123+
{
124+
"type": "text",
125+
"text": prompt,
126+
},
127+
],
128+
}
129+
],
130+
}
131+
return text_only_body
132+
def print_ww(self, *args, width: int = 100, **kwargs):
133+
"""Like print(), but wraps output to `width` characters (default 100)"""
134+
buffer = StringIO()
135+
try:
136+
_stdout = sys.stdout
137+
sys.stdout = buffer
138+
print(*args, **kwargs)
139+
output = buffer.getvalue()
140+
finally:
141+
sys.stdout = _stdout
142+
for line in output.splitlines():
143+
print("\n".join(textwrap.wrap(line, width=width)))
144+
145+
146+
147+
148+
# from langchain.callbacks import StreamlitCallbackHandler
149+
# model_id="anthropic.claude-3-sonnet-20240229-v1:0", # Claude 3 Sonnet 모델 선택
150+
# # 텍스트 생성 LLM 가져오기, streaming_callback을 인자로 받아옴
151+
# def get_llm(boto3_bedrock, model_id):
152+
# llm = BedrockChat(
153+
# model_id= model_id,
154+
# client=boto3_bedrock,
155+
# model_kwargs={
156+
# "max_tokens": 1024,
157+
# "stop_sequences": ["\n\nHuman"],
158+
# }
159+
# )
160+
# return llm
161+
# llm = get_llm(boto3_bedrock=client, model_id = model_id)
162+
# response_text = llm.invoke(prompt) #프롬프트에 응답 반환
163+
# print(response_text.content)
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
2+
class FashionPrompt():
3+
def __init__(self):
4+
# self.system_prompt = system_prompt
5+
pass
6+
pass
7+
8+
def get_rewrite_system_prompt(self):
9+
'''
10+
주어진 문장을 Re-Write 하는 시스템 프롬프트를 제공 함.
11+
'''
12+
13+
system_prompt = '''The task is to rewrite a given sentence in a different way while preserving its original meaning.\
14+
Your role is to take a sentence provided by the user and rephrase it using different words or sentence structures, \
15+
without altering the core meaning or message conveyed in the original sentence.
16+
17+
Instructions:
18+
1. Read the sentence carefully and ensure you understand its intended meaning.
19+
2. Identify the key components of the sentence, such as the subject, verb, object, and any modifiers or additional information.
20+
3. Think of alternative ways to express the same idea using different vocabulary, sentence structures, or phrasing.
21+
4. Ensure that your rewritten sentence maintains the same essential meaning as the original, without introducing any new information or altering the original intent.
22+
5. Pay attention to grammar, punctuation, and overall coherence to ensure your rewritten sentence is well-formed and easy to understand.
23+
6. If the original sentence contains idioms, metaphors, or cultural references, try to find equivalent expressions or explanations in your rewritten version.
24+
7. Avoid oversimplifying or overly complicating the sentence; aim for a natural and clear rephrasing that maintains the original tone and complexity.
25+
26+
Remember, the goal is to provide a fresh perspective on the sentence while preserving its core meaning and ensuring clarity and coherence in your rewritten version.
27+
'''
28+
29+
return system_prompt
30+
31+
def get_rewrite_user_prompt(self):
32+
'''
33+
주어진 문장을 Re-Write 하는 유저 프롬프트를 제공 함.
34+
'''
35+
36+
user_prompt = '''Given <coordination_review> based on the guide on system prompt
37+
Please write in Korean. Output in JSON format following the <output_example> format, excluding <output_example>
38+
39+
<coordination_review>{coordination_review}</coordination_review>
40+
<output_example>
41+
"original_coordination_review" :
42+
"rewrite_original_coordination_review" :
43+
</output_example>
44+
'''
45+
46+
return user_prompt
47+
48+
49+
def get_create_criteria_system_prompt(self):
50+
'''
51+
주어진 문장을 Re-Write 하는 유저 프롬프트를 제공 함.
52+
'''
53+
system_prompt = '''You are a prompt engineering expert.'''
54+
55+
return system_prompt
56+
57+
def get_create_criteria_user_prompt(self):
58+
59+
user_prompt = '''먼저 당신의 역할과 작업을 XML Tag 없이 기술하세요, \
60+
이후에 아래의 <guide> 에 맟주어서 프롬프트를 영어로 작성해주세요.
61+
<guide>{guide}</guide>'''
62+
63+
return user_prompt
64+
65+
def get_fashion_evaluation_system_prompt(self):
66+
'''
67+
의상 코디에 대한 관련성 여부를 평가 하기 위한 시스템 프롬프트를 제공
68+
'''
69+
70+
71+
system_prompt = '''
72+
You will be provided with two opinions: one from a fashion expert regarding clothing choices, and \
73+
another from an AI system offering recommendations on clothing choices. \
74+
Your task is to evaluate the relevance and coherence between these two opinions \
75+
by assigning a score from 1 to 5, where 1 indicates low relevance and 5 indicates high relevance.\
76+
You will need to define the criteria for scoring in the <criteria></criteria> section, and \
77+
outline the steps for evaluating the two opinions in the <steps></steps> section.
78+
79+
<criteria>
80+
1 - The two opinions are completely unrelated and contradict each other.
81+
2 - The opinions share some minor similarities, but the overall themes and recommendations are largely different.
82+
3 - The opinions have moderate overlap in their themes and recommendations, but there are still notable differences.
83+
4 - The opinions are mostly aligned, with only minor differences in their specific recommendations or perspectives.
84+
5 - The two opinions are highly coherent, complementary, and provide consistent recommendations or perspectives on clothing choices.
85+
</criteria>
86+
87+
<steps>
88+
1. Read and understand the opinion provided by the fashion expert.
89+
2. Read and understand the opinion provided by the AI system.
90+
3. Identify the main themes, recommendations, and perspectives presented in each opinion.
91+
4. Compare the two opinions and assess the degree of alignment or contradiction between them.
92+
5. Based on the criteria defined above, assign a score from 1 to 5 to reflect the relevance and coherence between the two opinions.
93+
6. Provide a brief explanation justifying the assigned score.
94+
</steps>
95+
'''
96+
return system_prompt
97+
98+
def get_fashion_evaluation_user_prompt(self):
99+
'''
100+
의상 코디에 대한 관련성 여부를 평가 하기 위한 유저 프롬프트를 제공
101+
'''
102+
103+
user_prompt = '''
104+
Given <human_view> and <AI_view>, based on the guide on system prompt
105+
Write in the form of <evaluation> in korean with JSON format
106+
107+
<human_view>{human_text}</human_view>
108+
<AI_view>{AI_text}</AI_view>
109+
110+
<evaluation>
111+
'human_view':
112+
'AI_view' :
113+
'score': 4,
114+
'reason': 'AI view is similar to human view'
115+
</evaluation>
116+
'''
117+
return user_prompt
118+

0 commit comments

Comments
 (0)