fix: live progress + concurrency for embedding builds #80

Merged

sharang merged 1 commits from fix/embedding-build-progress into main

2026-05-13 10:01:05 +00:00

Author	SHA1	Message	Date
Sharang Parnerkar	b96dda11fb	fix: live progress + concurrency for embedding builds CI / Check (pull_request) Successful in 10m34s Details CI / Detect Changes (pull_request) Has been skipped Details CI / Deploy Agent (pull_request) Has been skipped Details CI / Deploy Dashboard (pull_request) Has been skipped Details CI / Deploy Docs (pull_request) Has been skipped Details CI / Deploy MCP (pull_request) Has been skipped Details The embedding build progress was only written to MongoDB after every batch completed (and the final flush + status update only happened at the very end), so the dashboard would show "0/N chunks (0%)" for the entire run, then jump straight to "complete." For a repo with 2k+ chunks this looked like the build was stuck. Three fixes: - pipeline: call update_build(Running, embedded_count, ...) after each batch so /api/v1/chat/:repo_id/status reflects real progress, and flush embeddings to Mongo every 200 records so a partial failure does not lose everything. - pipeline: drive batches with FuturesUnordered at concurrency=4 so litellm requests overlap instead of going strictly serial (112 sequential requests for a 2221-chunk repo were the wall-time floor). - llm client: give the reqwest client a 300s request timeout and 10s connect timeout. Previously LlmClient used reqwest::Client::new() with no timeout, so a hung embedding call would block the build indefinitely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 11:39:37 +02:00

Author

SHA1

Message

Date

Sharang Parnerkar

b96dda11fb

fix: live progress + concurrency for embedding builds

CI / Check (pull_request) Successful in 10m34s

Details

CI / Detect Changes (pull_request) Has been skipped

Details

CI / Deploy Agent (pull_request) Has been skipped

Details

CI / Deploy Dashboard (pull_request) Has been skipped

Details

CI / Deploy Docs (pull_request) Has been skipped

Details

CI / Deploy MCP (pull_request) Has been skipped

Details

The embedding build progress was only written to MongoDB after every
batch completed (and the final flush + status update only happened at
the very end), so the dashboard would show "0/N chunks (0%)" for the
entire run, then jump straight to "complete." For a repo with 2k+
chunks this looked like the build was stuck.

Three fixes:
- pipeline: call update_build(Running, embedded_count, ...) after each
  batch so /api/v1/chat/:repo_id/status reflects real progress, and
  flush embeddings to Mongo every 200 records so a partial failure does
  not lose everything.
- pipeline: drive batches with FuturesUnordered at concurrency=4 so
  litellm requests overlap instead of going strictly serial (112
  sequential requests for a 2221-chunk repo were the wall-time floor).
- llm client: give the reqwest client a 300s request timeout and 10s
  connect timeout. Previously LlmClient used reqwest::Client::new()
  with no timeout, so a hung embedding call would block the build
  indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-13 11:39:37 +02:00

fix: live progress + concurrency for embedding builds #80

1 Commits