-
Notifications
You must be signed in to change notification settings - Fork 12
Adding ClickHouse as an alternative to Elasticsearch #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
d128775
resolves #7
czajkub c36d342
changed xml.py name to avoid name conflicts
czajkub a215131
added clickhouse as an alternative to elasticsearch
czajkub ea780d4
Merge branch 'the-momentum:main' into main
czajkub 186bba0
deleted the old xml.py file (name change)
czajkub 9dc9aeb
changed project files and gitignore to reflect clickhouse addition
czajkub d66ad02
fixed search_health_records tool
czajkub 91955a7
Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…
czajkub 311e9fd
pretty printing time results
czajkub 323c117
brushed up on every tool, fixed naming issues, completed sql queries
czajkub 01f3660
changed return types for linter
czajkub 8c4567c
changed update_database comment for LLM
czajkub e9bf8e1
Update README to include ClickHouse
czajkub 738a4fa
update makefile for clickhouse
czajkub 45e4597
Update README.md for windows usage of clickhouse
czajkub d1f55e8
clickhouse support for windows and changing variables to more legible…
czajkub 45c662f
Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…
czajkub 2013b17
added makefile comments and improved windows functionality
czajkub 9b8e8ab
add more settings and change variable names for readability
czajkub 2495b10
added chunk_size to settings
czajkub d1068cf
minor fixes
czajkub b2af44d
comment for clarification and changed inequality sign to point in the…
czajkub 32dcf01
updating records more elegant now
czajkub 1430f34
moved column_names tuple to broader scope
czajkub 777322f
moving ch.py into scripts and splitting it into two files
czajkub a929b1e
fixed relative import and dockerfile for windows
czajkub 92e19f9
Update README.md to include new env variables
czajkub d781104
change docker launch
czajkub d882907
improved readme comment about docker deployment
czajkub 2bb0d54
FINALLY working windows clickhouse support with docker
czajkub fd76d48
Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…
czajkub 549c645
removal of docker volume after make chwin
czajkub 4361ff3
fixed makefile for ch linux
czajkub b4af660
added db dirname to config
czajkub 253a15c
separating ch client and indexer
czajkub 215ca16
calling inits explicitly
czajkub a35dace
super ultra fix
czajkub b692083
removed update db mcp tool
czajkub fc2e165
Update README.md
czajkub 9cd48f8
changed ch dockerfile name
czajkub 27a9959
removed applehealth db
czajkub a717404
added chdb back to gitignore
czajkub 4b53fbe
Merge branch 'main' of https://github.com/czajkub/apple-health-mcp-se…
czajkub File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,3 +25,4 @@ Makefile | |
| .* | ||
| README.md | ||
| *.xml | ||
| *.chdb | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -148,4 +148,7 @@ docker/volumes/ | |
| volumes | ||
|
|
||
| # Data Source | ||
| *.xml | ||
| *.xml | ||
|
|
||
| # ClickHouse database | ||
| *.chdb | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| FROM ghcr.io/astral-sh/uv:python3.13-bookworm | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| ENV UV_COMPILE_BYTECODE=1 | ||
| ENV UV_LINK_MODE=copy | ||
| ENV UV_TOOL_BIN_DIR=/usr/local/bin | ||
|
|
||
| RUN --mount=type=cache,target=/root/.cache/uv \ | ||
| --mount=type=bind,source=uv.lock,target=uv.lock \ | ||
| --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ | ||
| uv sync --locked --no-install-project --no-dev | ||
|
|
||
| COPY . /app | ||
| RUN mv /app/xmltemp123 /app/scripts/raw.xml | ||
| RUN --mount=type=cache,target=/root/.cache/uv \ | ||
| uv sync --locked --no-dev | ||
|
|
||
| ENV PATH="/app/.venv/bin:$PATH" | ||
|
|
||
| RUN echo '#!/bin/bash\n\ | ||
| set -e\n\ | ||
| echo "Running clickhouse importer..."\n\ | ||
| uv run --directory /app/scripts/ clickhouse_importer.py && \ | ||
| echo "Copying applehealth.chdb to volume..." && \ | ||
| cp -r /app/scripts/applehealth.chdb /volume/applehealth.chdb && \ | ||
| echo "Complete!"' > /app/entrypoint.sh | ||
|
|
||
| RUN chmod +x /app/entrypoint.sh | ||
|
|
||
| ENTRYPOINT ["/app/entrypoint.sh"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,9 @@ | ||
| from fastmcp import FastMCP | ||
|
|
||
| from app.mcp.v1.tools import es_reader, xml_reader | ||
| from app.mcp.v1.tools import es_reader, xml_reader, ch_reader | ||
|
|
||
| mcp_router = FastMCP(name="Main MCP") | ||
|
|
||
| mcp_router.mount(es_reader.es_reader_router) | ||
| mcp_router.mount(xml_reader.xml_reader_router) | ||
| mcp_router.mount(ch_reader.ch_reader_router) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| from typing import Any | ||
| from fastmcp import FastMCP | ||
|
|
||
| from app.schemas.record import RecordType, IntervalType, HealthRecordSearchParams | ||
| from app.services.health.clickhouse import ( | ||
| get_health_summary_from_ch, | ||
| search_health_records_from_ch, | ||
| get_statistics_by_type_from_ch, | ||
| get_trend_data_from_ch, | ||
| ) | ||
|
|
||
| ch_reader_router = FastMCP(name="CH Reader MCP") | ||
|
|
||
| @ch_reader_router.tool | ||
| def get_health_summary_ch() -> dict[str, Any]: | ||
| """ | ||
| Get a summary of Apple Health data from ClickHouse. | ||
| The function returns total record count, record type breakdown, and (optionally) a date range aggregation. | ||
|
|
||
| Notes for LLM: | ||
| - IMPORTANT - Do not guess, autofill, or assume any missing data. | ||
| - When asked for medical advice, try to use my data from ClickHouse first. | ||
| """ | ||
| try: | ||
| return get_health_summary_from_ch() | ||
| except Exception as e: | ||
| return {'error': str(e)} | ||
|
|
||
| @ch_reader_router.tool | ||
| def search_health_records_ch(params: HealthRecordSearchParams) -> dict[str, Any]: | ||
| """ | ||
| Search health records in ClickHouse with flexible query building. | ||
|
|
||
| Parameters: | ||
| - params: HealthRecordSearchParams object containing all search/filter parameters. | ||
|
|
||
| Notes for LLMs: | ||
| - This function should return a list of health record documents (dicts) matching the search criteria. | ||
| - Each document in the list should represent a single health record as stored in ClickHouse. | ||
| - If an error occurs, the function should return a list with a single dict containing an 'error' key and the error message. | ||
| - Use this to retrieve structured health data for further analysis, filtering, or display. | ||
| - Example source_name: "Rob’s iPhone", "Polar Flow", "Sync Solver". | ||
| - Example date_from/date_to: "2020-01-01T00:00:00+00:00" | ||
| - Example value_min/value_max: "10", "100.5" | ||
| - IMPORTANT - Do not guess, autofill, or assume any missing data. | ||
| - When asked for medical advice, try to use my data from ClickHouse first. | ||
| """ | ||
| try: | ||
| return search_health_records_from_ch(params) | ||
| except Exception as e: | ||
| return {'error': str(e)} | ||
|
|
||
| @ch_reader_router.tool | ||
| def get_statistics_by_type_ch(record_type: RecordType | str) -> dict[str, Any]: | ||
| """ | ||
| Get comprehensive statistics for a specific health record type from ClickHouse. | ||
|
|
||
| Parameters: | ||
| - record_type: The type of health record to analyze. Use RecordType for most frequent types. Use str if that type is beyond RecordType scope. | ||
|
|
||
| Returns: | ||
| - record_type: The analyzed record type | ||
| - total_count: Total number of records of this type in the index | ||
| - value_statistics: Statistical summary of the 'value' field including: | ||
| * count: Number of records with values | ||
| * min: Minimum value recorded | ||
| * max: Maximum value recorded | ||
| * avg: Average value across all records | ||
| * sum: Sum of all values | ||
| - sources: Breakdown of records by source device/app (e.g., "Rob's iPhone", "Polar Flow") | ||
|
|
||
| Notes for LLMs: | ||
| - This function provides comprehensive statistical analysis for any health record type. | ||
| - The value_statistics object contains all basic statistics (count, min, max, avg, sum) for the 'value' field. | ||
| - The sources breakdown shows which devices/apps contributed data for this record type. | ||
| - Example types: "HKQuantityTypeIdentifierStepCount", "HKQuantityTypeIdentifierBodyMassIndex", "HKQuantityTypeIdentifierHeartRate", etc. | ||
| - Use this function to understand the distribution, range, and trends of specific health metrics. | ||
| - The function is useful for health analysis, identifying outliers, and understanding data quality. | ||
| - date_range key for query is commented, since it contained hardcoded from date, but you can use it anyway if you replace startDate with your data. | ||
| - IMPORTANT - Do not guess, autofill, or assume any missing data. | ||
| - When asked for medical advice, try to use my data from ClickHouse first. | ||
| """ | ||
| try: | ||
| return get_statistics_by_type_from_ch(record_type) | ||
| except Exception as e: | ||
| return {"error": f"Failed to get statistics: {str(e)}"} | ||
|
|
||
|
|
||
| @ch_reader_router.tool | ||
| def get_trend_data_ch( | ||
| record_type: RecordType | str, | ||
| interval: IntervalType = "month", | ||
| date_from: str | None = None, | ||
| date_to: str | None = None, | ||
| ) -> dict[str, Any]: | ||
| """ | ||
| Get trend data for a specific health record type over time using ClickHouse date histogram aggregation. | ||
|
|
||
| Parameters: | ||
| - record_type: The type of health record to analyze (e.g., "HKQuantityTypeIdentifierStepCount") | ||
| - interval: Time interval for aggregation. | ||
| - date_from, date_to: Optional ISO8601 date strings for filtering date range | ||
|
|
||
| Returns: | ||
| - record_type: The analyzed record type | ||
| - interval: The time interval used | ||
| - trend_data: List of time buckets with statistics for each period: | ||
| * date: The time period (ISO string) | ||
| * avg_value: Average value for the period | ||
| * min_value: Minimum value for the period | ||
| * max_value: Maximum value for the period | ||
| * count: Number of records in the period | ||
|
|
||
| Notes for LLMs: | ||
| - Use this to analyze trends, patterns, and seasonal variations in health data | ||
| - The function automatically handles date filtering if date_from/date_to are provided | ||
| - IMPORTANT - interval must be one of: "day", "week", "month", or "year". Do not use other values. | ||
| - Do not guess, autofill, or assume any missing data. | ||
| - When asked for medical advice, try to use my data from ClickHouse first. | ||
| """ | ||
| try: | ||
| return get_trend_data_from_ch(record_type, interval, date_from, date_to) | ||
| except Exception as e: | ||
| return {"error": f"Failed to get trend data: {str(e)}"} | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| import json | ||
| from dataclasses import dataclass | ||
| from json import JSONDecodeError | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
| import chdb | ||
|
|
||
| from app.config import settings | ||
|
|
||
|
|
||
| @dataclass | ||
| class CHClient: | ||
| def __init__(self): | ||
| self.session = chdb.session.Session(settings.CH_DIRNAME) | ||
| self.db_name: str = settings.CH_DB_NAME | ||
| self.table_name: str = settings.CH_TABLE_NAME | ||
| self.path: Path = Path(settings.RAW_XML_PATH) | ||
KaliszS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| def __post_init__(self): | ||
| if not self.path.exists(): | ||
| raise FileNotFoundError(f"XML file not found: {self.path}") | ||
| self.session.query(f"CREATE DATABASE IF NOT EXISTS {self.db_name}") | ||
|
|
||
| def inquire(self, query: str) -> dict[str, Any]: | ||
| """ | ||
| Makes an SQL query to the database | ||
| :return: result of the query | ||
| """ | ||
| # first call to json.loads() only returns a string, and the second one a dict | ||
| response: str = json.dumps(str(self.session.query(query, fmt='JSON'))) | ||
| try: | ||
| return json.loads(json.loads(response)) | ||
| except JSONDecodeError as e: | ||
| return {'error': str(e)} | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.