Feat/242 aura sizing calculator #245

rfrneo4j · 2025-12-13T03:26:48Z

Description

Tool Descriptions

1. `calculate_database_sizing`

Purpose: Calculates current Neo4j database storage requirements based on graph characteristics.

What it does:

Computes storage breakdown for nodes, relationships, properties, and indexes
Supports vector indexes with quantization (4x storage reduction)
Accounts for large properties (128+ bytes)
Recommends memory (configurable ratio) and vCPUs (based on concurrent users)
Applies a 2GB OS floor for minimum storage

Key Features:

Required inputs: Node count, relationship count, average properties per node/relationship
Optional: Vector dimensions, large properties, memory-to-storage ratio, concurrent users
Returns: Detailed storage breakdown (nodes, relationships, properties, indexes, vector indexes), total size, recommended memory, and vCPUs

Use Case: Size a database for a given graph structure before provisioning.

2. `forecast_database_size`

Purpose: Projects database growth over multiple years using workload-based growth models.

What it does:

Takes current size, memory, and cores as baseline
Applies growth models based on workload type (transactional, agentic, analytical, graph data science)
Projects size, memory, and cores for each year
Supports domain-based defaults (e.g., "customer" graph defaults to transactional + analytical)
Scales memory with projected size (configurable ratio)
Keeps cores static (as per requirements)

Key Features:

Growth models: Compound, Linear, Log-Linear, Logistic, ExponentialWithVector
Workload-aware: Automatically selects growth model based on workload types
Domain support: "7 Graphs of the Enterprise" with inferred workloads
Configurable: Annual growth rate, projection years, memory-to-storage ratio

Use Case: Plan capacity over 3-5 years for budgeting and scaling decisions.

Relationship Between Tools

These tools work together in a typical workflow:

Use calculate_database_sizing to determine current requirements
Use the output (size, memory, cores) as input to forecast_database_size
Get multi-year projections for planning

Type of Change

Complexity

LOW
MEDIUM
HIGH

Complexity:

How Has This Been Tested?

Unit tests
Integration tests
Manual tests

This feature has been smoke tested. I have a number of scenarios I am running and I plan on reviewing with Dave Fauth.

Checklist

The following requirements should have been met (depending on the changes in the branch):

Documentation has been updated
Unit tests have been updated
Integration tests have been updated
Server has been tested in an MCP application
CHANGELOG.md updated if appropriate

- Make avg_properties_per_node and avg_properties_per_relationship required parameters - Update tool, aura_manager, and service layer signatures - Update all tests to include required property parameters - Improve docstrings to emphasize required parameters for accurate sizing - Fix field name references in test scenarios This ensures users must provide property counts, preventing wildly inaccurate sizing calculations when property data is missing.

- Document calculate_database_sizing and forecast_database_size tools in README - Emphasize that property counts are required for accurate sizing - Update CHANGELOG with new features and breaking changes

…eters - Add avg_properties_per_node and avg_properties_per_relationship to all test cases - Update test_calculate_sizing_none_defaults to test optional params only - Fixes 7 failing tests that were missing required parameters

a-s-g93 · 2025-12-23T18:28:39Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/sizing/growth_models.py

+class ExponentialWithVectorGrowthModel(GrowthModel):
+    """Exponential growth with additional vector index growth
+    
+    Good for agentic/RAG workloads where vector indexes grow separately.


Assumption is that total Chunk size + vector will be much larger than total Entity size.

Reference this page for calculating vector size requirements:
https://neo4j.com/docs/operations-manual/current/performance/vector-index-memory-configuration/#_example_calculations

a-s-g93 · 2025-12-23T18:30:07Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/sizing/growth_models.py

+        return size
+
+
+def get_growth_model_for_workloads(workloads: List[WorkloadType]) -> GrowthModel:


Do we have additional input on these growth models? They seem like good estimates, but has Dave F or anyone else looked at them?

a-s-g93 · 2025-12-23T18:31:31Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/sizing/interfaces.py

+if TYPE_CHECKING:
+    from .models import Neo4jSizingCalculationResult


What is this for?

bug -> I'm switched to a direct import.

a-s-g93 · 2025-12-23T18:32:56Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/sizing/__init__.py

+    "Neo4jSizingCalculator",
+    "SizingCalculator",  # Backward compatibility alias


This is the first release of the feature, so don't think we need this backwards capability alias

a-s-g93 · 2025-12-23T18:34:14Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/sizing/models.py

+class SizingCalculations(BaseModel):
+    """Detailed sizing calculations."""
+    size_of_nodes_gb: float = Field(..., description="Size of nodes in GB")
+    size_of_relationships_gb: float = Field(..., description="Size of relationships in GB")
+    size_of_properties_gb: float = Field(..., description="Size of properties in GB")
+    size_of_vector_indexes_gb: float = Field(..., description="Size of vector indexes in GB")
+    total_size_without_indexes_gb: float = Field(..., description="Total size without indexes in GB")
+    size_of_non_vector_indexes_gb: float = Field(..., description="Size of non-vector indexes in GB")
+    total_size_with_indexes_gb: float = Field(..., description="Total size with all indexes in GB")
+    recommended_memory_gb: int = Field(..., description="Recommended memory in GB (defaults to 1:1 ratio with storage)")
+    recommended_vcpus: int = Field(..., description="Recommended vCPUs (defaults to 1, or 2 vCPU per concurrent_end_users if provided)")


do we need this if we have Neo4jSizingCalculationResult?

a-s-g93 · 2025-12-23T18:59:14Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/server.py

+        **IMPORTANT**: Property counts are required for accurate sizing. If the user provides incomplete 
+        information, ask about properties before calling this tool, as missing property data leads to 
+        wildly inaccurate results.


redundant ^^

a-s-g93 · 2025-12-23T19:00:30Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/server.py

+        **Graph Domains (7 Graphs of the Enterprise):**
+        - customer: Customer 360, interactions → Default workloads: transactional, analytical
+        - product: Product catalogs, recommendations → Default workloads: analytical
+        - employee: Org charts, skills → Default workloads: analytical
+        - supplier: Supply chain, dependencies → Default workloads: analytical
+        - transaction: Fraud detection, payments → Default workloads: transactional
+        - process: Workflows, dependencies → Default workloads: analytical
+        - security: Access control, threats → Default workloads: transactional, analytical
+        
+        **Workload Types** (affect growth speed):
+        - transactional: Fast growth (high write volume, real-time)
+        - agentic: Fastest growth (RAG, vector search, AI/ML)
+        - analytical: Moderate growth (reporting, BI)
+        - graph_data_science: Slowest growth (algorithms, batch processing)


this is redundant - stated in Field objects

a-s-g93 · 2025-12-23T19:02:36Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/server.py

+        return ForecastResult(**result_dict)
+
+    @mcp.prompt(title="Calculate Database Sizing")
+    def calculate_database_sizing_prompt(


I think an alternative approach here is provide fields in the prompt for each of the parameters we need from the user. then inject those into the prompt sent to the agent with instructions on how to efficiently call tools. currently it looks like we have just a single field to have the user describe the deployment they want without much clarity.

a-s-g93 · 2025-12-23T19:02:57Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/server.py

+        return prompt
+
+    @mcp.prompt(title="Forecast Database Size")
+    def forecast_database_size_prompt(


see above ^^

a-s-g93 · 2025-12-23T19:06:38Z

servers/mcp-neo4j-cloud-aura-api/src/mcp_neo4j_aura_manager/utils.py

+from typing import Dict, Any
+
+
+def get_calculator_parameter_info() -> Dict[str, Any]:


I don't think we need this function if we refactor the prompts as described above.

…and standardize types - Update ExponentialWithVectorGrowthModel to reflect Neo4j vector index formula and assumption that chunk size + vector >> entity size for RAG workloads - Increase default vector_proportion from 0.3 to 0.7 (70%) - Add documentation referencing Neo4j vector index size calculation formula - Simplify implementation to single code path - Remove duplicate SizingCalculations model, use Neo4jSizingCalculationResult directly - Eliminates unnecessary conversion step in service layer - Simplifies codebase by removing redundant model - Remove unnecessary backward compatibility alias - Remove 'SizingCalculator' alias from __init__.py as this is first release - Improve type hints and validation - Change memory_to_storage_ratio from float to int (only accepts 1, 2, 4, 8) - Update default values from 1.0 to 1 - Add clear documentation that ValueError is raised for invalid values - Update all type hints across service, calculator, projector, and API layers - Remove unnecessary validation check - Remove annual_growth_rate > 1000 check (unhelpful error message) - Simplify imports - Use direct import for Neo4jSizingCalculationResult instead of TYPE_CHECKING (no circular import exists)

- Refactor calculate_database_sizing_prompt to use individual Optional fields instead of single graph_description field for better structured data collection - Refactor forecast_database_size_prompt to use individual Optional fields for each parameter (base_size_gb, base_memory_gb, base_cores, etc.) - Remove redundant parameter descriptions from tool docstrings (info already in Field objects) - Remove redundant graph domain and workload type descriptions from docstrings Sizing Calculator Improvements: - Add OS floor enforcement (2GB minimum) for recommended_memory_gb in calculator - Update memory_to_storage_ratio to use integers only (1, 2, 4, 8) instead of floats - Update type hints and default values for memory_to_storage_ratio across codebase Test Improvements: - Add comprehensive smoke tests for calculate_database_sizing_prompt - Add comprehensive smoke tests for forecast_database_size_prompt - Reorganize prompt tests into dedicated TestPrompts class - Remove duplicate test_pause_instance and test_update_instance_name tests - Update test_sizing_calculator to use integer values for memory_to_storage_ratio - Fix test assertions to account for OS memory floor (2GB minimum) Code Quality: - Fix import and type hint issues in interfaces.py - Add eval scripts and design proposal to .gitignore

- Implement component-based growth models where storage, memory, and vcpu can have independent growth patterns - Add smart default growth rates based on domain/workload (transactional: 20%, agentic: 15%, analytical: 5%) - Implement dynamic core scaling based on workload type and storage growth - Make domain parameter required in forecast_sizing (primary driver for growth model selection) - Add domain-based workload inference when explicit workloads not provided - Update projector to use component-based models and apply memory_to_storage_ratio as constraint - Refactor prompts to use individual Optional fields for better agent interaction - Remove deprecated growth model selection functions - Add comprehensive unit tests for get_default_growth_rate and smart defaults - Update all existing tests to account for new component-based behavior - Add smoke test files to .gitignore All 191 unit tests passing.

…r sizing tools

Update integration tests to include the required domain parameter in forecast_database_size calls.

rfrneo4j added 2 commits December 12, 2025 19:21

docs: add sizing tools documentation to README and update CHANGELOG

7fbdb51

- Document calculate_database_sizing and forecast_database_size tools in README - Emphasize that property counts are required for accurate sizing - Update CHANGELOG with new features and breaking changes

rfrneo4j requested a review from a-s-g93 December 13, 2025 03:26

rfrneo4j marked this pull request as draft December 13, 2025 03:35

a-s-g93 requested changes Dec 23, 2025

View reviewed changes

rfrneo4j added 5 commits January 7, 2026 12:26

chore: remove evaluation scripts from .gitignore and update README fo…

dd45da0

…r sizing tools

fix: add required domain parameter to integration tests

2e2c832

Update integration tests to include the required domain parameter in forecast_database_size calls.

rfrneo4j marked this pull request as ready for review January 12, 2026 15:09

		return size


		def get_growth_model_for_workloads(workloads: List[WorkloadType]) -> GrowthModel:

		if TYPE_CHECKING:
		from .models import Neo4jSizingCalculationResult

		"Neo4jSizingCalculator",
		"SizingCalculator", # Backward compatibility alias

		from typing import Dict, Any


		def get_calculator_parameter_info() -> Dict[str, Any]:

Feat/242 aura sizing calculator #245

Are you sure you want to change the base?

Feat/242 aura sizing calculator #245

Uh oh!

Conversation

rfrneo4j commented Dec 13, 2025

Description

Tool Descriptions

1. calculate_database_sizing

2. forecast_database_size

Relationship Between Tools

Type of Change

Complexity

How Has This Been Tested?

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `calculate_database_sizing`

2. `forecast_database_size`