-
Notifications
You must be signed in to change notification settings - Fork 222
Feat/242 aura sizing calculator #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Make avg_properties_per_node and avg_properties_per_relationship required parameters - Update tool, aura_manager, and service layer signatures - Update all tests to include required property parameters - Improve docstrings to emphasize required parameters for accurate sizing - Fix field name references in test scenarios This ensures users must provide property counts, preventing wildly inaccurate sizing calculations when property data is missing.
- Document calculate_database_sizing and forecast_database_size tools in README - Emphasize that property counts are required for accurate sizing - Update CHANGELOG with new features and breaking changes
…eters - Add avg_properties_per_node and avg_properties_per_relationship to all test cases - Update test_calculate_sizing_none_defaults to test optional params only - Fixes 7 failing tests that were missing required parameters
| class ExponentialWithVectorGrowthModel(GrowthModel): | ||
| """Exponential growth with additional vector index growth | ||
| Good for agentic/RAG workloads where vector indexes grow separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumption is that total Chunk size + vector will be much larger than total Entity size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference this page for calculating vector size requirements:
https://neo4j.com/docs/operations-manual/current/performance/vector-index-memory-configuration/#_example_calculations
| return size | ||
|
|
||
|
|
||
| def get_growth_model_for_workloads(workloads: List[WorkloadType]) -> GrowthModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have additional input on these growth models? They seem like good estimates, but has Dave F or anyone else looked at them?
| if TYPE_CHECKING: | ||
| from .models import Neo4jSizingCalculationResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug -> I'm switched to a direct import.
| "Neo4jSizingCalculator", | ||
| "SizingCalculator", # Backward compatibility alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first release of the feature, so don't think we need this backwards capability alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| class SizingCalculations(BaseModel): | ||
| """Detailed sizing calculations.""" | ||
| size_of_nodes_gb: float = Field(..., description="Size of nodes in GB") | ||
| size_of_relationships_gb: float = Field(..., description="Size of relationships in GB") | ||
| size_of_properties_gb: float = Field(..., description="Size of properties in GB") | ||
| size_of_vector_indexes_gb: float = Field(..., description="Size of vector indexes in GB") | ||
| total_size_without_indexes_gb: float = Field(..., description="Total size without indexes in GB") | ||
| size_of_non_vector_indexes_gb: float = Field(..., description="Size of non-vector indexes in GB") | ||
| total_size_with_indexes_gb: float = Field(..., description="Total size with all indexes in GB") | ||
| recommended_memory_gb: int = Field(..., description="Recommended memory in GB (defaults to 1:1 ratio with storage)") | ||
| recommended_vcpus: int = Field(..., description="Recommended vCPUs (defaults to 1, or 2 vCPU per concurrent_end_users if provided)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this if we have Neo4jSizingCalculationResult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| **IMPORTANT**: Property counts are required for accurate sizing. If the user provides incomplete | ||
| information, ask about properties before calling this tool, as missing property data leads to | ||
| wildly inaccurate results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant ^^
| **Graph Domains (7 Graphs of the Enterprise):** | ||
| - customer: Customer 360, interactions → Default workloads: transactional, analytical | ||
| - product: Product catalogs, recommendations → Default workloads: analytical | ||
| - employee: Org charts, skills → Default workloads: analytical | ||
| - supplier: Supply chain, dependencies → Default workloads: analytical | ||
| - transaction: Fraud detection, payments → Default workloads: transactional | ||
| - process: Workflows, dependencies → Default workloads: analytical | ||
| - security: Access control, threats → Default workloads: transactional, analytical | ||
| **Workload Types** (affect growth speed): | ||
| - transactional: Fast growth (high write volume, real-time) | ||
| - agentic: Fastest growth (RAG, vector search, AI/ML) | ||
| - analytical: Moderate growth (reporting, BI) | ||
| - graph_data_science: Slowest growth (algorithms, batch processing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is redundant - stated in Field objects
| return ForecastResult(**result_dict) | ||
|
|
||
| @mcp.prompt(title="Calculate Database Sizing") | ||
| def calculate_database_sizing_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an alternative approach here is provide fields in the prompt for each of the parameters we need from the user. then inject those into the prompt sent to the agent with instructions on how to efficiently call tools. currently it looks like we have just a single field to have the user describe the deployment they want without much clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return prompt | ||
|
|
||
| @mcp.prompt(title="Forecast Database Size") | ||
| def forecast_database_size_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above ^^
| from typing import Dict, Any | ||
|
|
||
|
|
||
| def get_calculator_parameter_info() -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this function if we refactor the prompts as described above.
…and standardize types - Update ExponentialWithVectorGrowthModel to reflect Neo4j vector index formula and assumption that chunk size + vector >> entity size for RAG workloads - Increase default vector_proportion from 0.3 to 0.7 (70%) - Add documentation referencing Neo4j vector index size calculation formula - Simplify implementation to single code path - Remove duplicate SizingCalculations model, use Neo4jSizingCalculationResult directly - Eliminates unnecessary conversion step in service layer - Simplifies codebase by removing redundant model - Remove unnecessary backward compatibility alias - Remove 'SizingCalculator' alias from __init__.py as this is first release - Improve type hints and validation - Change memory_to_storage_ratio from float to int (only accepts 1, 2, 4, 8) - Update default values from 1.0 to 1 - Add clear documentation that ValueError is raised for invalid values - Update all type hints across service, calculator, projector, and API layers - Remove unnecessary validation check - Remove annual_growth_rate > 1000 check (unhelpful error message) - Simplify imports - Use direct import for Neo4jSizingCalculationResult instead of TYPE_CHECKING (no circular import exists)
- Refactor calculate_database_sizing_prompt to use individual Optional fields instead of single graph_description field for better structured data collection - Refactor forecast_database_size_prompt to use individual Optional fields for each parameter (base_size_gb, base_memory_gb, base_cores, etc.) - Remove redundant parameter descriptions from tool docstrings (info already in Field objects) - Remove redundant graph domain and workload type descriptions from docstrings Sizing Calculator Improvements: - Add OS floor enforcement (2GB minimum) for recommended_memory_gb in calculator - Update memory_to_storage_ratio to use integers only (1, 2, 4, 8) instead of floats - Update type hints and default values for memory_to_storage_ratio across codebase Test Improvements: - Add comprehensive smoke tests for calculate_database_sizing_prompt - Add comprehensive smoke tests for forecast_database_size_prompt - Reorganize prompt tests into dedicated TestPrompts class - Remove duplicate test_pause_instance and test_update_instance_name tests - Update test_sizing_calculator to use integer values for memory_to_storage_ratio - Fix test assertions to account for OS memory floor (2GB minimum) Code Quality: - Fix import and type hint issues in interfaces.py - Add eval scripts and design proposal to .gitignore
- Implement component-based growth models where storage, memory, and vcpu can have independent growth patterns - Add smart default growth rates based on domain/workload (transactional: 20%, agentic: 15%, analytical: 5%) - Implement dynamic core scaling based on workload type and storage growth - Make domain parameter required in forecast_sizing (primary driver for growth model selection) - Add domain-based workload inference when explicit workloads not provided - Update projector to use component-based models and apply memory_to_storage_ratio as constraint - Refactor prompts to use individual Optional fields for better agent interaction - Remove deprecated growth model selection functions - Add comprehensive unit tests for get_default_growth_rate and smart defaults - Update all existing tests to account for new component-based behavior - Add smoke test files to .gitignore All 191 unit tests passing.
Update integration tests to include the required domain parameter in forecast_database_size calls.
Description
Tool Descriptions
1.
calculate_database_sizingPurpose: Calculates current Neo4j database storage requirements based on graph characteristics.
What it does:
Key Features:
Use Case: Size a database for a given graph structure before provisioning.
2.
forecast_database_sizePurpose: Projects database growth over multiple years using workload-based growth models.
What it does:
Key Features:
Use Case: Plan capacity over 3-5 years for budgeting and scaling decisions.
Relationship Between Tools
These tools work together in a typical workflow:
calculate_database_sizingto determine current requirementsforecast_database_sizeType of Change
Complexity
Complexity:
How Has This Been Tested?
This feature has been smoke tested. I have a number of scenarios I am running and I plan on reviewing with Dave Fauth.
Checklist
The following requirements should have been met (depending on the changes in the branch):