Skip to content

Conversation

@Chibukach
Copy link
Collaborator

Added support for arenahard generation

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. mind if we sync up in the next day or two for a walk through?

@@ -0,0 +1,11 @@
{"uid":"2edbb5f36f5b42be","category":"hard_prompt","subcategory":"coding","prompt":"Write me a zig program that solves the following problem from advent of code and reads the input from a file input.txt and prints the answer to stdout.\n```\n--- Day 25: Let It Snow ---\nMerry Christmas! Santa is booting up his weather machine; looks like you might get a white Christmas after all.\n\nThe weather machine beeps! On the console of the machine is a copy protection message asking you to enter a code from the instruction manual. Apparently, it refuses to run unless you give it that code. No problem; you'll just look up the code in the--\n\n\"Ho ho ho\", Santa ponders aloud. \"I can't seem to find the manual.\"\n\nYou look up the support number for the manufacturer and give them a call. Good thing, too - that 49th star wasn't going to earn itself.\n\n\"Oh, that machine is quite old!\", they tell you. \"That model went out of support six minutes ago, and we just finished shredding all of the manuals. I bet we can find you the code generation algorithm, though.\"\n\nAfter putting you on hold for twenty minutes (your call is very important to them, it reminded you repeatedly), they finally find an engineer that remembers how the code system works.\n\nThe codes are printed on an infinite sheet of paper, starting in the top-left corner. The codes are filled in by diagonals: starting with the first row with an empty first box, the codes are filled in diagonally up and to the right. This process repeats until the infinite paper is covered. So, the first few codes are filled in in this order:\n\n | 1 2 3 4 5 6 \n---+---+---+---+---+---+---+\n 1 | 1 3 6 10 15 21\n 2 | 2 5 9 14 20\n 3 | 4 8 13 19\n 4 | 7 12 18\n 5 | 11 17\n 6 | 16\nFor example, the 12th code would be written to row 4, column 2; the 15th code would be written to row 1, column 5.\n\nThe voice on the other end of the phone continues with how the codes are actually generated. The first code is 20151125. After that, each code is generated by taking the previous one, multiplying it by 252533, and then keeping the remainder from dividing that value by 33554393.\n\nSo, to find the second code (which ends up in row 2, column 1), start with the previous value, 20151125. Multiply it by 252533 to get 5088824049625. Then, divide that by 33554393, which leaves a remainder of 31916031. That remainder is the second code.\n\n\"Oh!\", says the voice. \"It looks like we missed a scrap from one of the manuals. Let me read it to you.\" You write down his numbers:\n\n | 1 2 3 4 5 6\n---+---------+---------+---------+---------+---------+---------+\n 1 | 20151125 18749137 17289845 30943339 10071777 33511524\n 2 | 31916031 21629792 16929656 7726640 15514188 4041754\n 3 | 16080970 8057251 1601130 7981243 11661866 16474243\n 4 | 24592653 32451966 21345942 9380097 10600672 31527494\n 5 | 77061 17552253 28094349 6899651 9250759 31663883\n 6 | 33071741 6796745 25397450 24659492 1534922 27995004\n\"Now remember\", the voice continues, \"that's not even all of the first few numbers; for example, you're missing the one at 7,1 that would come before 6,2. But, it should be enough to let your-- oh, it's time for lunch! Bye!\" The call disconnects.\n\nSanta looks nervous. Your puzzle input contains the message on the machine's console. What code do you give the machine?\n```"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, is there a tool to generate these entries?

data_type="emulated",
max_seconds=30,
data="prompt_tokens=512,generated_tokens=256",
#scenario = "benchmarking_32k",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this 128k and not 32k?

task = ArenaHardJudgeTask(
project_name="alexandre_debug",
task_name="test_guidellm_task",
#model="meta-llama/Llama-3.2-1B-Instruct",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the commented lines needed? this is also a question for the other lines in the other files.

model: Qwen/Qwen2.5-1.5B-Instruct
endpoints:
- api_base: http://127.0.0.1:8000/v1
api_key: '-'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice API key, we've been using "abc_123".

from typing import Optional, Sequence
import os

#DEFAULT_SERVER_WAIT_TIME = 30 # 600 seconds = 10 minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cruft?

# Check for conflicts in configs and constructor arguments
for key in config_kwargs:
if key in kwargs:
raise ValueError(f"{key} already defined in config's model_args. It can't be defined again in task instantiation.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

)

# Check for conflicts in configs and constructor arguments
for key in config_kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid having to update twice in the future, it'd be good to just pull this check out into a function. you can just call it here then.

environment_args = ConfigFactory.parse_string(environment_args)
raw_config = task.get_configuration_object("GuideLLM")
if raw_config is None:
print("[DEBUG] `GuideLLM` config not found in configuration — checking parameters as fallback")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe say "WARNING" instead of "DEBUG"

current_scenario = GenerativeTextScenario.from_file(filepath, dict(guidellm_args))
else:
raise ValueError(f"Scenario path {filepath} does not exist")
#elif len(get_builtin_scenarios()) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cruft?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants