Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
da36f43
init commit
Jun 27, 2025
979640a
remove the 5-fold spec from prompts
Jun 27, 2025
2c87022
refine the hyperparameter specification
Jun 27, 2025
ccdb471
do not sample data
Jun 27, 2025
84bf563
a small spelling issue
TPLin22 Jun 27, 2025
13be390
refine prompt to avoid submission cheating
TPLin22 Jun 27, 2025
4ca0411
do not sample data
Jun 27, 2025
c122816
simplify code
Jun 27, 2025
ffec796
refine the coder evaluator prompt
Jun 27, 2025
ffe70ca
refine wording
RolandMinrui Jun 27, 2025
b1f03f2
remove runtime from proposal
Jun 27, 2025
771e7e8
refine wording
Jun 27, 2025
55d8d03
refine prompt
Jun 27, 2025
3619c95
add gpu info in runtime_info.py
Jun 27, 2025
3f487fe
Merge branch 'main' of https://github.com/microsoft/RD-Agent into min…
Jun 30, 2025
6ec2080
modify the spec
Jun 30, 2025
7d27e09
add router and add refinement exp gen
Jul 1, 2025
b669365
fix prompt bug
Jul 2, 2025
bbb8bcf
Merge branch 'main' of https://github.com/microsoft/RD-Agent into min…
Jul 2, 2025
49d9686
use rule-based logic for router
Jul 2, 2025
43255d6
complete the prompt
Jul 2, 2025
1995f6a
Merge branch 'main' of https://github.com/microsoft/RD-Agent into min…
Jul 3, 2025
8944273
fix circular import bug
Jul 3, 2025
81d284a
fix bug
Jul 3, 2025
a18e454
make refine_decision optional
Jul 3, 2025
408e7ab
update pipeline prompts: (1) add scenary: in an iterative cooding loo…
Hoder-zyf Jul 3, 2025
beb3bf8
fix a small bug
peteryang1 Jul 3, 2025
93a3acd
fix a small bug
peteryang1 Jul 4, 2025
3a15f5c
Merge branch 'main' into minrui/fix_hyperparameter_problems
peteryangms Jul 4, 2025
6d9607a
rdagent/scenarios/data_science/loop.py back to the original version
Hoder-zyf Jul 4, 2025
8312380
refactor: replace _get_exp_gen with default_exp_gen for exp generation
you-n-g Jul 4, 2025
ed984eb
import
you-n-g Jul 4, 2025
ceb6335
refactor: make the __init__ back to main
Hoder-zyf Jul 4, 2025
833be8f
fix small bugs
Hoder-zyf Jul 4, 2025
2e6d190
fix bugs for proposal_version
Hoder-zyf Jul 4, 2025
71e68c6
move refine into runner
peteryangms Jul 4, 2025
2b8a2ed
Merge branch 'xuyang1/help_minrui_hyppp' into minrui/fix_hyperparamet…
peteryangms Jul 4, 2025
e56ebfd
check early stop
peteryangms Jul 4, 2025
7caad02
Merge branch 'main' into minrui/fix_hyperparameter_problems
peteryangms Jul 5, 2025
eb9ec5d
EDA improvement & coder classes number
peteryangms Jul 7, 2025
2ebcc35
fix CI
peteryangms Jul 8, 2025
65deb7d
slightly refine the prompt
Jul 8, 2025
1edf3a9
remove rule_base_eval and remove useless prompt
peteryangms Jul 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rdagent/components/coder/data_science/pipeline/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ def implement_one_task(
queried_former_failed_knowledge=queried_former_failed_knowledge[0],
out_spec=PythonAgentOut.get_spec(),
runtime_environment=runtime_environment,
hyperparameter_spec=T("scenarios.data_science.share:spec.hyperparameter").r(),
spec=T("scenarios.data_science.share:component_spec.Pipeline").r(),
enable_model_dump=DS_RD_SETTING.enable_model_dump,
)
Expand Down
11 changes: 10 additions & 1 deletion rdagent/components/coder/data_science/pipeline/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ pipeline_coder:
## The runtime environment your code will running on
{{ runtime_environment }}

## Hyperparameters Specification
Follow the hyperparameter choices if they are specified in the task description, unless prior attempts have demonstrated that they are ineffective or incorrect.
In this case, refer to the guidelines below for appropriate adjustments:
{{ hyperparameter_spec }}

## Specification your code should follow
{{ spec }}

Expand Down Expand Up @@ -118,7 +123,10 @@ pipeline_eval:
Step 1: Executes successfully without any errors. Please distinguish between the errors and warnings.

Step 2: Correctly generates a final submission in the correct format, ensuring: they align with the submission structure, the index names and column names should match the sample, and the items should not be empty or apparently incorrect.

- Carefully check that the submission file and any reported scores are genuinely produced by a real model training and inference process.
- Any attempt to bypass model training or inference—such as generating random predictions, hard-coding outputs, or otherwise fabricating results—should be considered cheating and must result in evaluation failure.
- Any attempt to sample a subset of training data for efficiency is not allowed. All training data must be loaded and used.

Step 3: Aligns with the competition requirements. This includes:
- CAREFULLY ANALYZE WHETHER THE EXPERIMENTAL SETUP AND CODE MAY CAUSE MISALIGNMENT BETWEEN VALIDATION AND TEST PERFORMANCE.
- Confirm strict adherence to the competition's evaluation rules listed in `scenario`:
Expand All @@ -137,6 +145,7 @@ pipeline_eval:
[Note]
1. Model performance is NOT a concern in this evaluation—only correct execution and formatting matter.
2. You only check the format of the submission since we only feed you part of the data, so the submission might has different index to the sample submission data.
3. Submissions and scores must be the result of actual model training and inference. Any form of cheating or fabrication (e.g., random or hard-coded outputs) is strictly prohibited and should lead to rejection.

Please respond with your feedback in the following JSON format and order
```json
Expand Down
4 changes: 1 addition & 3 deletions rdagent/scenarios/data_science/proposal/exp_gen/idea_pool.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@
from rdagent.components.knowledge_management.graph import (
UndirectedNode, # TODO: add appendix attribute to node
)
from rdagent.components.knowledge_management.graph import (
UndirectedGraph,
)
from rdagent.components.knowledge_management.graph import UndirectedGraph
from rdagent.log import rdagent_logger as logger
from rdagent.oai.llm_utils import APIBackend
from rdagent.utils.agent.tpl import T
Expand Down
15 changes: 10 additions & 5 deletions rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ hypothesis_gen:
- *Good Example (Efficiency)*: "To resolve the 'timeout during training' challenge, reduce `NUM_EPOCHS` from 5 to 2 and `N_SPLITS` for cross-validation from 5 to 3 in the main training loop, aiming to complete execution within the 1-hour limit while minimizing impact on the F1-score."
- *Poor Example*: "Tune the model for better results."
- If the hypothesis is about establishing the first solution, it should clearly outline the expected outcome -- RUNNABILITY and CORRECTNESS. Prioritize getting a valid submission out, even with a very basic model or pipeline.
- *Good Example*: "Implement a simple RandomForest classifier with default parameters, using 5-fold cross-validation for model evaluation. This will lead to a decent baseline model that can run to completion and generate a valid submission file."
- *Good Example*: "Implement a simple RandomForest classifier with default parameters, using 3-fold cross-validation for model evaluation. This will lead to a decent baseline model that can run to completion and generate a valid submission file."
3. **Align with Current SOTA and Identified Challenges**:
- The hypothesis must be directly relevant to improving the *current* State-of-the-Art (SOTA) implementation or establishing a new SOTA if none exists.
- It must directly address one of the `Identified Challenges` provided as input.
Expand Down Expand Up @@ -280,7 +280,7 @@ task_gen:

Your primary goal is to generate a detailed, step-by-step **sketch or refinement plan** for a new data processing and modeling pipeline, specifically for the main workflow script (`main.py`), that effectively implements the `Proposed Hypothesis`. This sketch will guide a developer to write the code correctly.

### BACKGROUND CONTEXT: Pipeline Implementation Standards & Constraints ###
# BACKGROUND CONTEXT: Pipeline Implementation Standards & Constraints

The `main.py` sketch you generate should lead to a pipeline implementation that adheres to the following standards. These are guiding principles for the final *outcome* of your sketch:

Expand Down Expand Up @@ -309,15 +309,13 @@ task_gen:
- Prevent data leakage from test/validation sets into any training stage.
7. **Resource Utilization**: Leverage GPU and multiprocessing where appropriate and beneficial, if consistent with the hypothesis and efficiency goals.
8. **Metric Calculation and Storage (`scores.csv`)**:
- Calculate the official competition metric on a proper validation set (e.g., K-fold CV, typically 3-5 folds unless efficiency dictates fewer). Save results to `scores.csv`.
- Calculate the official competition metric on a proper validation set. Save results to `scores.csv`.
- The sketch must ensure this step is included. A successful run should always produce scores.
- `scores.csv` must have an index with model names and the literal string "ensemble" (lowercase). Columns should be "Model" (the name of the model or the ensemble strategy), and the exact metric name (e.g., "AUC").
- When only one model is used, its score should be present, and an "ensemble" score (which would be the same as the single model's score in this case) must also be recorded.
- Ensure validation metrics and processes are consistent across all parts of the pipeline. Avoid changes that would alter how validation metrics are calculated unless that is part of the hypothesis.
9. **Submission File (`submission.csv`)**: Generate `submission.csv` in the **exact format** required (column names, order, data types), as detailed by `sample_submission.csv` in the `Competition Scenario Description`. This is a critical step.

### END OF BACKGROUND CONTEXT ###

# Guidelines for Sketching the `main.py` Workflow

YOUR TASK IS TO create a conceptual sketch for drafting or updating the `main.py` workflow. This is a plan, not code.
Expand Down Expand Up @@ -354,6 +352,13 @@ task_gen:
- Confirm no `tqdm` or other progress bars are in the final script.
- Double-check that validation scores are saved correctly to `scores.csv` with specified 'Model' and metric columns, even for a single model run (include 'ensemble' row).

# Hyperparameters Specification
The workflow will be implemented in the following runtime environment:
{{ runtime_environment }}

Choose hyperparameters to ensure strong performance while meeting resource and time constraints. Specify values only when clearly justified by evidence or strong rationale.
{{ hyperparameter_spec }}

{% if task_output_format is not none %}
## [Partial Response Format 1] Task Output Format:
{{ task_output_format }}
Expand Down
3 changes: 3 additions & 0 deletions rdagent/scenarios/data_science/proposal/exp_gen/proposal.py
Original file line number Diff line number Diff line change
Expand Up @@ -724,11 +724,14 @@ def task_gen(
component_info = get_component("Pipeline")
else:
component_info = get_component(hypotheses[0].component)
runtime_environment = self.scen.get_runtime_environment()
data_folder_info = self.scen.processed_data_folder_description
sys_prompt = T(".prompts_v2:task_gen.system").r(
task_output_format=component_info["task_output_format"] if not self.support_function_calling else None,
# task_output_format=component_info["task_output_format"],
component_desc=component_desc,
runtime_environment=runtime_environment,
hyperparameter_spec=T("scenarios.data_science.share:spec.hyperparameter").r(),
workflow_check=not pipeline and hypotheses[0].component != "Workflow",
)
user_prompt = T(".prompts_v2:task_gen.user").r(
Expand Down
18 changes: 17 additions & 1 deletion rdagent/scenarios/data_science/share.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -291,13 +291,14 @@ component_spec:
- Handle missing values and outliers appropriately (e.g., impute, remove, or replace).
- Ensure consistency between feature data types and transformations.
- Prevent data leakage: Do not use information derived from the test set when transforming training data.
- NEVER sample a subset of data, even when memory is insufficient or time limit is exceeded.

6. Notes:
- GPU and multiprocessing are available and are encouraged to use for accelerating transformations.

7. Metric Calculation and Storage:
- Calculate the metric (mentioned in the evaluation section of the competition information) for each model and ensemble strategy on valid, and save the results in `scores.csv`
- The evaluation should be based on 5-fold cross-validation but only if that's an appropriate evaluation for the task at hand. Store the mean validation score of 5-fold cross-validation in `scores.csv` on each model.
- The evaluation should be based on k-fold cross-validation but only if that's an appropriate evaluation for the task at hand. Store the mean validation score of k-fold cross-validation in `scores.csv` on each model. Refer to the hyperparameter specification for rules to set the CV folds.
- Even if only one model is present, compute the ensemble score and store it under `"ensemble"`.
- The index of `scores.csv` should include the model name and the "ensemble" strategy. "ensemble" should be exactly in the index with all lower case letters. Ensemble is the result from several models. If only one model is present, the ensemble score should be the same as the model score.
- The column names in `scores.csv` should be:
Expand All @@ -312,3 +313,18 @@ component_spec:
guidelines:
coding: |-
You might receive exploratory data analysis (EDA) details about the source data. Do not use this EDA information to create assertions or raise errors. We might generate sample data for quick coding (so your code may run on sample data which is part of the full-size data), but remember that the EDA details are based on the full-size data.

spec:
hyperparameter: |-
1. Hyperparameters Requiring Tuning (e.g., learning rate, weight decay, optimizer, etc.)
- Adjust conservatively to avoid instability.
- Apply a systematic hyperparameter tuning strategy to identify optimal values.
2. Hyperparameters Dependent on Empirical Estimation or Past Failures (e.g., epochs, CV folds, batch size, etc.)
- Estimate these parameters based on the runtime environment constraints and experiences from previous experiment failures.
3. Balancing Epochs and CV Folds
- When runtime permit, prioritize increasing the number of training epochs, but always implement early stopping to prevent overfitting and ensure the process completes within the allowed runtime.
- When runtime constrained, first reduce the number of CV folds—provided that validation reliability remains acceptable—before lowering the number of epochs.
4. Early Stopping Strategy
- Sufficient epochs are completed.
- Sufficient low loss are reached.
- Sufficient stable validation loss are achieved.
Loading