You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: add spec for hyperparameters in task design and coder (#995)
* init commit
* remove the 5-fold spec from prompts
* refine the hyperparameter specification
* do not sample data
* a small spelling issue
* refine prompt to avoid submission cheating
* do not sample data
* simplify code
* refine the coder evaluator prompt
* refine wording
* remove runtime from proposal
* refine wording
* refine prompt
* add gpu info in runtime_info.py
* modify the spec
* add router and add refinement exp gen
* fix prompt bug
* use rule-based logic for router
* complete the prompt
* fix circular import bug
* fix bug
* make refine_decision optional
* update pipeline prompts: (1) add scenary: in an iterative cooding loop and use sample datasets (2)add some generation tops in coding (3)add evaluation guidelines in evaluation (4)polish the json schema and description
* fix a small bug
* fix a small bug
* rdagent/scenarios/data_science/loop.py back to the original version
* refactor: replace _get_exp_gen with default_exp_gen for exp generation
* import
* refactor: make the __init__ back to main
* fix small bugs
* fix bugs for proposal_version
* move refine into runner
* check early stop
* EDA improvement & coder classes number
* fix CI
* slightly refine the prompt
* remove rule_base_eval and remove useless prompt
---------
Co-authored-by: Xu <[email protected]>
Co-authored-by: TPLin22 <[email protected]>
Co-authored-by: amstrongzyf <[email protected]>
Co-authored-by: Xu Yang <[email protected]>
Co-authored-by: Xu Yang <[email protected]>
Co-authored-by: Young <[email protected]>
stdout+="The estimated full run time is less than three times the timeout period.\n"
81
-
else:
82
-
stdout+=f"The estimated full run time is more than three times the timeout period.\n"
78
+
stdout+=f"Debug mode ran in {debug_time:.2f} seconds, estimated full run time is {full_estimated_time:.2f} seconds. The estimated time is {full_estimated_time/env.conf.running_timeout_period*100:.2f}% the debug time."
83
79
else:
84
80
stdout+="Debug mode did not provide debug_time or estimated_time, it's a buggy implementation.\n"
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/prompts.yaml
+13-6Lines changed: 13 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,9 @@ exp_feedback:
5
5
Below is a detailed description of the current Kaggle competition scenario:
6
6
{{ scenario }}
7
7
8
-
Your task is to analyze the current experiment's hypothesis, implementation (code and its changes), and results, explicitly comparing them with previous experiments and the best previous result (SOTA).
8
+
Your task is to analyze the current experiment's hypothesis, implementation (code and its changes), and results, explicitly comparing them with previous best SOTA result step by step.
9
9
10
-
Step-by-step Analysis Process:
10
+
# Step-by-step Analysis Process:
11
11
12
12
Step 1: Verify Submission Format
13
13
- If the submission format check fails:
@@ -57,9 +57,14 @@ exp_feedback:
57
57
- Please examine the code carefully based on the above criteria and provide a detailed analysis of the code.
58
58
- Begin your `reasoning` with `[Code Analysis]`, clearly stating why the current code is better or worse than SOTA, based on the analysis of code implementation.
59
59
- If the current code is not better than SOTA, set `"Replace Best Result": "no"`. Otherwise, set `"Replace Best Result": "yes"`.
60
-
61
-
Provide detailed and constructive feedback structured as follows:
62
-
Example JSON Structure for Result Analysis:
60
+
61
+
Step 5: EDA improvement analysis (if needed)
62
+
- The user might provide Data Overview in EDA format which is the output of the EDA code. You should analyze the EDA result and provide feedback on how it can be improved.
63
+
- The improvement might include some addons or modifications or deletions to some part of the EDA code.
64
+
- You should provide your feedback based on the current code and SOTA code. Especially focus on the feature engineering part.
65
+
- For example, if the code truncate the line with N words, you can suggest to print the mean, median or quantile of the length of the line for better understanding of the data in the next rounds of experiments.
66
+
67
+
Provide detailed and constructive feedback structured as follows without anything else:
63
68
{
64
69
"Submission Format Check": "yes or no",
65
70
"First Valid Submission": "yes or no",
@@ -68,7 +73,9 @@ exp_feedback:
68
73
"Feedback for Hypothesis": Explicitly confirm or refute the hypothesis based on specific data points or performance trends. Limit to two sentences.",
69
74
"Evaluation Aligned With Task": "yes or no",
70
75
"Replace Best Result": "yes or no",
71
-
"Reasoning": "Clearly explain the reason for success or failure of the experiment. Begin explicitly with [Submission format error], [Evaluation error], [Experiment Analysis] or [Code Analysis] depending on the step at which issues arose. Reference specific scores and methodological differences with SOTA. Limit to three sentences."
76
+
"Refine Decision": "yes or no",
77
+
"Reasoning": "Clearly explain the reason for success or failure of the experiment. Begin explicitly with [Submission format error], [Evaluation error], [Experiment Analysis] or [Code Analysis] depending on the step at which issues arose. Reference specific scores and methodological differences with SOTA. Limit to three sentences.",
78
+
"EDA Improvement": "improvement suggestion for EDA code, if needed, otherwise set to 'no'. If there is no EDA code, set to 'no'."
stdout+=f"\nSubmission check:\n{submission_check_out}\nIf Submission check returns a 'Submission is valid' or similar message, despite some warning messages, you should still consider the submission as valid and give a positive final decision. "
0 commit comments