Skip to content

Conversation

@rfrneo4j
Copy link
Collaborator

Description

Tool Descriptions

1. calculate_database_sizing

Purpose: Calculates current Neo4j database storage requirements based on graph characteristics.

What it does:

  • Computes storage breakdown for nodes, relationships, properties, and indexes
  • Supports vector indexes with quantization (4x storage reduction)
  • Accounts for large properties (128+ bytes)
  • Recommends memory (configurable ratio) and vCPUs (based on concurrent users)
  • Applies a 2GB OS floor for minimum storage

Key Features:

  • Required inputs: Node count, relationship count, average properties per node/relationship
  • Optional: Vector dimensions, large properties, memory-to-storage ratio, concurrent users
  • Returns: Detailed storage breakdown (nodes, relationships, properties, indexes, vector indexes), total size, recommended memory, and vCPUs

Use Case: Size a database for a given graph structure before provisioning.


2. forecast_database_size

Purpose: Projects database growth over multiple years using workload-based growth models.

What it does:

  • Takes current size, memory, and cores as baseline
  • Applies growth models based on workload type (transactional, agentic, analytical, graph data science)
  • Projects size, memory, and cores for each year
  • Supports domain-based defaults (e.g., "customer" graph defaults to transactional + analytical)
  • Scales memory with projected size (configurable ratio)
  • Keeps cores static (as per requirements)

Key Features:

  • Growth models: Compound, Linear, Log-Linear, Logistic, ExponentialWithVector
  • Workload-aware: Automatically selects growth model based on workload types
  • Domain support: "7 Graphs of the Enterprise" with inferred workloads
  • Configurable: Annual growth rate, projection years, memory-to-storage ratio

Use Case: Plan capacity over 3-5 years for budgeting and scaling decisions.


Relationship Between Tools

These tools work together in a typical workflow:

  1. Use calculate_database_sizing to determine current requirements
  2. Use the output (size, memory, cores) as input to forecast_database_size
  3. Get multi-year projections for planning

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

  • LOW
  • MEDIUM
  • HIGH

Complexity:

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual tests

This feature has been smoke tested. I have a number of scenarios I am running and I plan on reviewing with Dave Fauth.

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • Integration tests have been updated
  • Server has been tested in an MCP application
  • CHANGELOG.md updated if appropriate

- Make avg_properties_per_node and avg_properties_per_relationship required parameters
- Update tool, aura_manager, and service layer signatures
- Update all tests to include required property parameters
- Improve docstrings to emphasize required parameters for accurate sizing
- Fix field name references in test scenarios

This ensures users must provide property counts, preventing wildly inaccurate
sizing calculations when property data is missing.
- Document calculate_database_sizing and forecast_database_size tools in README
- Emphasize that property counts are required for accurate sizing
- Update CHANGELOG with new features and breaking changes
@rfrneo4j rfrneo4j requested a review from a-s-g93 December 13, 2025 03:26
…eters

- Add avg_properties_per_node and avg_properties_per_relationship to all test cases
- Update test_calculate_sizing_none_defaults to test optional params only
- Fixes 7 failing tests that were missing required parameters
@rfrneo4j rfrneo4j marked this pull request as draft December 13, 2025 03:35
class ExponentialWithVectorGrowthModel(GrowthModel):
"""Exponential growth with additional vector index growth
Good for agentic/RAG workloads where vector indexes grow separately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumption is that total Chunk size + vector will be much larger than total Entity size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return size


def get_growth_model_for_workloads(workloads: List[WorkloadType]) -> GrowthModel:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have additional input on these growth models? They seem like good estimates, but has Dave F or anyone else looked at them?

Comment on lines 9 to 10
if TYPE_CHECKING:
from .models import Neo4jSizingCalculationResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug -> I'm switched to a direct import.

Comment on lines 21 to 22
"Neo4jSizingCalculator",
"SizingCalculator", # Backward compatibility alias
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first release of the feature, so don't think we need this backwards capability alias

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 35 to 45
class SizingCalculations(BaseModel):
"""Detailed sizing calculations."""
size_of_nodes_gb: float = Field(..., description="Size of nodes in GB")
size_of_relationships_gb: float = Field(..., description="Size of relationships in GB")
size_of_properties_gb: float = Field(..., description="Size of properties in GB")
size_of_vector_indexes_gb: float = Field(..., description="Size of vector indexes in GB")
total_size_without_indexes_gb: float = Field(..., description="Total size without indexes in GB")
size_of_non_vector_indexes_gb: float = Field(..., description="Size of non-vector indexes in GB")
total_size_with_indexes_gb: float = Field(..., description="Total size with all indexes in GB")
recommended_memory_gb: int = Field(..., description="Recommended memory in GB (defaults to 1:1 ratio with storage)")
recommended_vcpus: int = Field(..., description="Recommended vCPUs (defaults to 1, or 2 vCPU per concurrent_end_users if provided)")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this if we have Neo4jSizingCalculationResult?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines 273 to 275
**IMPORTANT**: Property counts are required for accurate sizing. If the user provides incomplete
information, ask about properties before calling this tool, as missing property data leads to
wildly inaccurate results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant ^^

Comment on lines 351 to 364
**Graph Domains (7 Graphs of the Enterprise):**
- customer: Customer 360, interactions → Default workloads: transactional, analytical
- product: Product catalogs, recommendations → Default workloads: analytical
- employee: Org charts, skills → Default workloads: analytical
- supplier: Supply chain, dependencies → Default workloads: analytical
- transaction: Fraud detection, payments → Default workloads: transactional
- process: Workflows, dependencies → Default workloads: analytical
- security: Access control, threats → Default workloads: transactional, analytical
**Workload Types** (affect growth speed):
- transactional: Fast growth (high write volume, real-time)
- agentic: Fastest growth (RAG, vector search, AI/ML)
- analytical: Moderate growth (reporting, BI)
- graph_data_science: Slowest growth (algorithms, batch processing)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant - stated in Field objects

return ForecastResult(**result_dict)

@mcp.prompt(title="Calculate Database Sizing")
def calculate_database_sizing_prompt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternative approach here is provide fields in the prompt for each of the parameters we need from the user. then inject those into the prompt sent to the agent with instructions on how to efficiently call tools. currently it looks like we have just a single field to have the user describe the deployment they want without much clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return prompt

@mcp.prompt(title="Forecast Database Size")
def forecast_database_size_prompt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above ^^

from typing import Dict, Any


def get_calculator_parameter_info() -> Dict[str, Any]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this function if we refactor the prompts as described above.

…and standardize types

- Update ExponentialWithVectorGrowthModel to reflect Neo4j vector index formula
  and assumption that chunk size + vector >> entity size for RAG workloads
  - Increase default vector_proportion from 0.3 to 0.7 (70%)
  - Add documentation referencing Neo4j vector index size calculation formula
  - Simplify implementation to single code path

- Remove duplicate SizingCalculations model, use Neo4jSizingCalculationResult directly
  - Eliminates unnecessary conversion step in service layer
  - Simplifies codebase by removing redundant model

- Remove unnecessary backward compatibility alias
  - Remove 'SizingCalculator' alias from __init__.py as this is first release

- Improve type hints and validation
  - Change memory_to_storage_ratio from float to int (only accepts 1, 2, 4, 8)
  - Update default values from 1.0 to 1
  - Add clear documentation that ValueError is raised for invalid values
  - Update all type hints across service, calculator, projector, and API layers

- Remove unnecessary validation check
  - Remove annual_growth_rate > 1000 check (unhelpful error message)

- Simplify imports
  - Use direct import for Neo4jSizingCalculationResult instead of TYPE_CHECKING
  (no circular import exists)
- Refactor calculate_database_sizing_prompt to use individual Optional fields
  instead of single graph_description field for better structured data collection
- Refactor forecast_database_size_prompt to use individual Optional fields
  for each parameter (base_size_gb, base_memory_gb, base_cores, etc.)
- Remove redundant parameter descriptions from tool docstrings (info already in Field objects)
- Remove redundant graph domain and workload type descriptions from docstrings

Sizing Calculator Improvements:
- Add OS floor enforcement (2GB minimum) for recommended_memory_gb in calculator
- Update memory_to_storage_ratio to use integers only (1, 2, 4, 8) instead of floats
- Update type hints and default values for memory_to_storage_ratio across codebase

Test Improvements:
- Add comprehensive smoke tests for calculate_database_sizing_prompt
- Add comprehensive smoke tests for forecast_database_size_prompt
- Reorganize prompt tests into dedicated TestPrompts class
- Remove duplicate test_pause_instance and test_update_instance_name tests
- Update test_sizing_calculator to use integer values for memory_to_storage_ratio
- Fix test assertions to account for OS memory floor (2GB minimum)

Code Quality:
- Fix import and type hint issues in interfaces.py
- Add eval scripts and design proposal to .gitignore
- Implement component-based growth models where storage, memory, and vcpu can have independent growth patterns
- Add smart default growth rates based on domain/workload (transactional: 20%, agentic: 15%, analytical: 5%)
- Implement dynamic core scaling based on workload type and storage growth
- Make domain parameter required in forecast_sizing (primary driver for growth model selection)
- Add domain-based workload inference when explicit workloads not provided
- Update projector to use component-based models and apply memory_to_storage_ratio as constraint
- Refactor prompts to use individual Optional fields for better agent interaction
- Remove deprecated growth model selection functions
- Add comprehensive unit tests for get_default_growth_rate and smart defaults
- Update all existing tests to account for new component-based behavior
- Add smoke test files to .gitignore

All 191 unit tests passing.
Update integration tests to include the required domain parameter in forecast_database_size calls.
@rfrneo4j rfrneo4j marked this pull request as ready for review January 12, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants