feat(ai-sdk): vocab->tag proposer (P2 slice 5, type 3)

Extends Method C: for each unknown narrative token that pattern text names, suggest the keyword_dictionary tag = the RequiredComponentTags shared by the naming patterns (ranked by frequency, kept only when shared by >=40% of them, top 3). Surfaces real dictionary gaps like "zwischenkreis" -> stored_energy and "updates" -> has_software, which close coverage without hand-editing the dict. Two precision fixes to Method C while here: - patternsMentioning now matches WHOLE WORDS, not substrings — substring matching flagged fragments like "stehen" inside "entstehen" and produced nonsensical tag suggestions. - a token is only proposed with a tag if one is shared by >=40% of its naming patterns, so diffuse common verbs (spread across categories) drop out. Wired into iace-audit propose -> audit-reports/vocab.{md,json}. Residual common-verb noise is left to the human/LLM filter rather than a hand-grown stopword list. Type 4 (coverage blind spots) + P3 (pin accepted proposals into a GT case) remain for slice 6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-25 09:51:12 +02:00
parent 662aec209a
commit c13aa9183a
4 changed files with 143 additions and 7 deletions
@@ -36,6 +36,10 @@ type DictionarySuggestion struct {
 	Token      string   `json:"token"`
 	Field      string   `json:"field"`
 	PatternIDs []string `json:"pattern_ids"`
+	// SuggestedTags are the RequiredComponentTags shared by the naming patterns,
+	// ranked by frequency — the candidate tags a keyword_dictionary entry for this
+	// token would emit so narratives mentioning it can trigger those patterns.
+	SuggestedTags []string `json:"suggested_tags,omitempty"`
 }

 type VocabularyReport struct {