-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: add spec for hyperparameters in task design and coder #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2c7fa6e to
3619c95
Compare
…rui/fix_hyperparameter_problems
…rui/fix_hyperparameter_problems
…p and use sample datasets (2)add some generation tops in coding (3)add evaluation guidelines in evaluation (4)polish the json schema and description
| {% include "scenarios.data_science.share:guidelines.refine" %} | ||
| # Refinement Specification | ||
| ## Hypothesis: {{ hypothesis.hypothesis }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can discuss this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At last we think we should implement this in runner.
| def gen(self, trace: DSTrace) -> DSExperiment: | ||
| # Step 0: Prepare | ||
| pipeline = DS_RD_SETTING.coder_on_whole_pipeline | ||
| component_desc = T("scenarios.data_science.share:component_description_in_pipeline").r() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use include to simplify it.
| You might receive exploratory data analysis (EDA) details about the source data. Do not use this EDA information to create assertions or raise errors. We might generate sample data for quick coding (so your code may run on sample data which is part of the full-size data), but remember that the EDA details are based on the full-size data. | ||
| draft: |- | ||
| TODO | ||
| refine: |- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low hanging fruit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't refine part now
| return DSRefineExpGen(scen=self.scen).gen(trace=trace) | ||
|
|
||
| # Propose | ||
| if DS_RD_SETTING.proposal_version == "v1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope to remove proposal_version in this version.
| {% include "scenarios.data_science.share:guidelines.refine" %} | ||
| # Refinement Specification | ||
| ## Hypothesis: {{ hypothesis.hypothesis }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At last we think we should implement this in runner.
| @@ -0,0 +1,105 @@ | |||
| import json | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this now.
| You might receive exploratory data analysis (EDA) details about the source data. Do not use this EDA information to create assertions or raise errors. We might generate sample data for quick coding (so your code may run on sample data which is part of the full-size data), but remember that the EDA details are based on the full-size data. | ||
| draft: |- | ||
| TODO | ||
| refine: |- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't refine part now
…#995) * init commit * remove the 5-fold spec from prompts * refine the hyperparameter specification * do not sample data * a small spelling issue * refine prompt to avoid submission cheating * do not sample data * simplify code * refine the coder evaluator prompt * refine wording * remove runtime from proposal * refine wording * refine prompt * add gpu info in runtime_info.py * modify the spec * add router and add refinement exp gen * fix prompt bug * use rule-based logic for router * complete the prompt * fix circular import bug * fix bug * make refine_decision optional * update pipeline prompts: (1) add scenary: in an iterative cooding loop and use sample datasets (2)add some generation tops in coding (3)add evaluation guidelines in evaluation (4)polish the json schema and description * fix a small bug * fix a small bug * rdagent/scenarios/data_science/loop.py back to the original version * refactor: replace _get_exp_gen with default_exp_gen for exp generation * import * refactor: make the __init__ back to main * fix small bugs * fix bugs for proposal_version * move refine into runner * check early stop * EDA improvement & coder classes number * fix CI * slightly refine the prompt * remove rule_base_eval and remove useless prompt --------- Co-authored-by: Xu <[email protected]> Co-authored-by: TPLin22 <[email protected]> Co-authored-by: amstrongzyf <[email protected]> Co-authored-by: Xu Yang <[email protected]> Co-authored-by: Xu Yang <[email protected]> Co-authored-by: Young <[email protected]>
Description
Motivation and Context
How Has This Been Tested?
Screenshots of Test Results (if appropriate):
Types of changes
📚 Documentation preview 📚: https://RDAgent--995.org.readthedocs.build/en/995/