fix(pitch-deck): TTS language detection, technical FAQ, proper German umlauts + abbreviations

TTS Language Bug: - ChatFAB: detect response language from text content instead of UI language - German text with umlauts/ß triggers German TTS even when UI is in English Presenter Script (German TTS pronunciation): - Add proper umlauts (ä/ö/ü) throughout German text - Expand abbreviations for clear pronunciation: DSGVO → Datenschutz-Grundverordnung SAST → Static Application Security Testing DAST → Dynamic Application Security Testing SBOM → Software Bill of Materials VVT → Verarbeitungsverzeichnis TOMs → technisch-organisatorische Maßnahmen BSI → Bundesamt für Sicherheit in der Informationstechnik KMU → kleine und mittlere Unternehmen, etc. Technical FAQ (12 new entries): - BGE-M3, RAG, Qdrant, Cross-Encoder, Hybrid Search - SAST/DAST, SBOM, BSI, Cloud Providers (SysEleven/Hetzner) - Controls/Prüfaspekte, Policy Engine, VVT/TOMs/DSFA Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 01:18:03 +02:00
parent 9005a05bd7
commit c157e9cbca
3 changed files with 197 additions and 72 deletions
@@ -615,4 +615,126 @@ export const PRESENTER_FAQ: FAQEntry[] = [
    goto_slide: 'product',
    priority: 9,
  },
+
+  // === TECHNOLOGIE-GLOSSAR (fuer Investor-Verstaendnis) ===
+  {
+    id: 'tech-bge-m3',
+    keywords: ['bge-m3', 'bge', 'embedding', 'embeddings', 'vektorisierung', 'vectorization', 'sentence transformer'],
+    question_de: 'Was ist BGE-M3?',
+    question_en: 'What is BGE-M3?',
+    answer_de: 'BGE-M3 ist ein State-of-the-Art Embedding-Modell, entwickelt vom Beijing Academy of Artificial Intelligence. M3 steht fuer Multi-Lingual, Multi-Functionality und Multi-Granularity. Wir nutzen es, um Gesetzestexte, Normen und Compliance-Dokumente in hochdimensionale Vektoren umzuwandeln. Der Vorteil: BGE-M3 versteht ueber 100 Sprachen gleichzeitig — perfekt fuer EU-Regularien, die in verschiedenen Sprachen vorliegen. Es unterstuetzt Dense Retrieval, Sparse Retrieval und Multi-Vector Retrieval in einem Modell, was unsere Hybrid Search ermoeglicht. Das Modell laeuft lokal auf unserer EU-Infrastruktur — keine Daten verlassen den europaeischen Raum.',
+    answer_en: 'BGE-M3 is a state-of-the-art embedding model developed by the Beijing Academy of Artificial Intelligence. M3 stands for Multi-Lingual, Multi-Functionality and Multi-Granularity. We use it to convert legal texts, standards and compliance documents into high-dimensional vectors. The advantage: BGE-M3 understands over 100 languages simultaneously — perfect for EU regulations that exist in different languages. It supports dense retrieval, sparse retrieval and multi-vector retrieval in a single model, enabling our hybrid search. The model runs locally on our EU infrastructure — no data leaves the European space.',
+    goto_slide: 'annex-aipipeline',
+    priority: 7,
+  },
+  {
+    id: 'tech-rag',
+    keywords: ['rag', 'retrieval', 'augmented', 'generation', 'wissensbasis', 'knowledge base', 'wie funktioniert ki', 'how does ai work'],
+    question_de: 'Was ist RAG und wie nutzt ihr es?',
+    question_en: 'What is RAG and how do you use it?',
+    answer_de: 'RAG steht fuer Retrieval Augmented Generation — ein Verfahren, bei dem das KI-Modell nicht aus seinem Training antwortet, sondern zuerst in unserer Wissensbasis nach relevanten Dokumenten sucht und diese als Kontext nutzt. Das ist entscheidend fuer Compliance: Wir wollen keine halluzinierten Antworten, sondern praezise Aussagen mit Quellenangabe. Unsere RAG-Pipeline indexiert ueber 380 Regularien und Normen in sechs Qdrant-Collections. Bei einer Anfrage werden die relevantesten Textpassagen per Hybrid Search gefunden, durch einen Cross-Encoder re-rankt und dann dem LLM als Kontext uebergeben. Das Ergebnis: jede Antwort ist quellenbasiert und nachpruefbar.',
+    answer_en: 'RAG stands for Retrieval Augmented Generation — a method where the AI model does not answer from its training but first searches our knowledge base for relevant documents and uses them as context. This is critical for compliance: we want no hallucinated answers but precise statements with source references. Our RAG pipeline indexes over 380 regulations and standards in six Qdrant collections. For a query, the most relevant text passages are found via hybrid search, re-ranked by a cross-encoder and then provided to the LLM as context. The result: every answer is source-based and verifiable.',
+    goto_slide: 'annex-aipipeline',
+    priority: 8,
+  },
+  {
+    id: 'tech-qdrant',
+    keywords: ['qdrant', 'vektordatenbank', 'vector database', 'vector db', 'collections', 'similarity search'],
+    question_de: 'Was ist Qdrant?',
+    question_en: 'What is Qdrant?',
+    answer_de: 'Qdrant ist eine hochperformante Open-Source-Vektordatenbank, die wir fuer unsere semantische Suche nutzen. Sie speichert die Embedding-Vektoren unserer ueber 380 indexierten Regularien und ermoeglicht Similarity Search in Millisekunden. Wir betreiben sechs separate Qdrant-Collections — getrennt nach Rechtsgebiet und Dokumenttyp — fuer praezise und schnelle Ergebnisse. Qdrant laeuft auf unserer eigenen Hetzner-Infrastruktur in Deutschland, ist MIT-lizenziert und benoetigt keine Cloud-Anbindung an US-Provider.',
+    answer_en: 'Qdrant is a high-performance open-source vector database that we use for our semantic search. It stores the embedding vectors of our over 380 indexed regulations and enables similarity search in milliseconds. We operate six separate Qdrant collections — separated by legal domain and document type — for precise and fast results. Qdrant runs on our own Hetzner infrastructure in Germany, is MIT-licensed and requires no cloud connection to US providers.',
+    goto_slide: 'annex-aipipeline',
+    priority: 7,
+  },
+  {
+    id: 'tech-cross-encoder',
+    keywords: ['cross-encoder', 'cross encoder', 're-ranking', 'reranking', 'rerank', 'relevanz'],
+    question_de: 'Was ist ein Cross-Encoder?',
+    question_en: 'What is a cross-encoder?',
+    answer_de: 'Ein Cross-Encoder ist ein KI-Modell, das die Relevanz zwischen einer Suchanfrage und einem Dokument praezise bewertet. In unserer Pipeline nutzen wir ihn als zweite Stufe: Zuerst findet die schnelle Hybrid Search die Top-Kandidaten, dann bewertet der Cross-Encoder jedes Ergebnis einzeln und sortiert sie nach tatsaechlicher Relevanz. Das verbessert die Qualitaet unserer Compliance-Antworten erheblich — besonders bei juristisch komplexen Fragestellungen, wo Wortaehnlichkeit allein nicht ausreicht.',
+    answer_en: 'A cross-encoder is an AI model that precisely evaluates the relevance between a search query and a document. In our pipeline we use it as a second stage: first, the fast hybrid search finds the top candidates, then the cross-encoder evaluates each result individually and sorts them by actual relevance. This significantly improves the quality of our compliance answers — especially for legally complex questions where word similarity alone is not sufficient.',
+    goto_slide: 'annex-aipipeline',
+    priority: 6,
+  },
+  {
+    id: 'tech-sast-dast',
+    keywords: ['sast', 'dast', 'static analysis', 'dynamic analysis', 'code scanning', 'code analyse', 'statische analyse', 'dynamische analyse', 'penetrationstest', 'pentest'],
+    question_de: 'Was sind SAST und DAST?',
+    question_en: 'What are SAST and DAST?',
+    answer_de: 'SAST steht fuer Static Application Security Testing — dabei wird der Quellcode analysiert, ohne ihn auszufuehren. Man findet Schwachstellen wie SQL-Injection, Cross-Site-Scripting oder unsichere Kryptografie direkt im Code. DAST steht fuer Dynamic Application Security Testing — dabei wird die laufende Anwendung von aussen getestet, aehnlich wie ein echter Angreifer. Wir fuehren beides kontinuierlich bei jeder Code-Aenderung durch, nicht nur einmal im Jahr wie bei klassischen Pentests. Das spart dem Kunden etwa 13.000 Euro jaehrlich an externen Pentest-Kosten allein im KMU-Bereich.',
+    answer_en: 'SAST stands for Static Application Security Testing — it analyzes source code without executing it. You find vulnerabilities like SQL injection, cross-site scripting or insecure cryptography directly in the code. DAST stands for Dynamic Application Security Testing — it tests the running application from the outside, similar to a real attacker. We run both continuously on every code change, not just once a year like traditional pentests. This saves customers about EUR 13,000 annually in external pentest costs for SMEs alone.',
+    goto_slide: 'solution',
+    priority: 8,
+  },
+  {
+    id: 'tech-sbom',
+    keywords: ['sbom', 'software bill of materials', 'stueckliste', 'abhaengigkeiten', 'dependencies', 'supply chain', 'lieferkette'],
+    question_de: 'Was ist eine SBOM?',
+    question_en: 'What is an SBOM?',
+    answer_de: 'SBOM steht fuer Software Bill of Materials — eine vollstaendige Stueckliste aller Software-Komponenten, Bibliotheken und Abhaengigkeiten in einem Produkt. Mit dem Cyber Resilience Act wird die SBOM fuer alle Produkte mit digitalen Elementen in der EU zur Pflicht. Unsere Plattform generiert SBOMs automatisch bei jeder Code-Aenderung, ueberwacht bekannte Schwachstellen in Abhaengigkeiten und alarmiert sofort, wenn eine neue CVE veroeffentlicht wird. Das ist fuer produzierende Unternehmen besonders relevant, weil sie ihre Software-Lieferkette lueckenlos dokumentieren muessen.',
+    answer_en: 'SBOM stands for Software Bill of Materials — a complete inventory of all software components, libraries and dependencies in a product. With the Cyber Resilience Act, SBOMs become mandatory for all products with digital elements in the EU. Our platform generates SBOMs automatically on every code change, monitors known vulnerabilities in dependencies and alerts immediately when a new CVE is published. This is particularly relevant for manufacturing companies because they must document their software supply chain without gaps.',
+    goto_slide: 'annex-engineering',
+    priority: 8,
+  },
+  {
+    id: 'tech-bsi',
+    keywords: ['bsi', 'bundesamt', 'sicherheit', 'informationstechnik', 'zertifizierung', 'certification', 'c5', 'cloud security'],
+    question_de: 'Was bedeutet BSI-zertifiziert?',
+    question_en: 'What does BSI-certified mean?',
+    answer_de: 'BSI steht fuer das Bundesamt fuer Sicherheit in der Informationstechnik — die deutsche Bundesbehoerde fuer Cybersicherheit. Eine BSI-Zertifizierung, insbesondere der C5-Standard (Cloud Computing Compliance Criteria Catalogue), bestaetigt, dass ein Cloud-Anbieter hoechste Sicherheitsstandards einhalt. Unsere Infrastruktur laeuft auf SysEleven, einem BSI-C5-zertifizierten deutschen Cloud-Provider. Das bedeutet: Ihre Daten werden nach den strengsten europaeischen Sicherheitsstandards geschuetzt — ohne Zugriff durch US-Behoerden.',
+    answer_en: 'BSI stands for the Federal Office for Information Security — the German federal authority for cybersecurity. A BSI certification, particularly the C5 standard (Cloud Computing Compliance Criteria Catalogue), confirms that a cloud provider maintains the highest security standards. Our infrastructure runs on SysEleven, a BSI C5-certified German cloud provider. This means: your data is protected according to the strictest European security standards — without access by US authorities.',
+    goto_slide: 'annex-architecture',
+    priority: 8,
+  },
+  {
+    id: 'tech-cloud-providers',
+    keywords: ['syseleven', 'hetzner', 'cloud', 'hosting', 'infrastruktur', 'infrastructure', 'server', 'rechenzentrum', 'data center', 'wo laufen', 'where hosted'],
+    question_de: 'Auf welcher Infrastruktur laeuft die Plattform?',
+    question_en: 'What infrastructure does the platform run on?',
+    answer_de: 'Unsere Plattform laeuft zu 100 Prozent auf europaeischer Cloud-Infrastruktur — ohne einen einzigen US-Anbieter. Fuer LLM-Inferenz und KI-Workloads nutzen wir SysEleven, einen BSI-C5-zertifizierten deutschen Cloud-Provider mit GPU-Kapazitaet. Fuer Datenbanken, Vektorspeicher und Anwendungslogik setzen wir auf Hetzner — ebenfalls deutsch, ISO 27001-zertifiziert und deutlich kostenguenstiger als AWS oder Azure. Das CI/CD laeuft ueber Gitea Actions mit automatischem Deploy via Coolify auf Hetzner. Diese Kombination gibt uns einen strukturellen Kostenvorteil bei voller EU-Datensouveraenitaet.',
+    answer_en: 'Our platform runs 100 percent on European cloud infrastructure — without a single US provider. For LLM inference and AI workloads we use SysEleven, a BSI C5-certified German cloud provider with GPU capacity. For databases, vector storage and application logic we rely on Hetzner — also German, ISO 27001-certified and significantly more cost-effective than AWS or Azure. CI/CD runs via Gitea Actions with automatic deploy via Coolify on Hetzner. This combination gives us a structural cost advantage with full EU data sovereignty.',
+    goto_slide: 'annex-architecture',
+    priority: 8,
+  },
+  {
+    id: 'tech-controls',
+    keywords: ['controls', 'pruefaspekte', 'audit aspects', 'pruefpunkte', 'checkpoints', '25000', 'control extraction'],
+    question_de: 'Was sind Controls bzw. Pruefaspekte?',
+    question_en: 'What are controls or audit aspects?',
+    answer_de: 'Controls sind konkrete, pruefbare Anforderungen, die aus Gesetzen und Normen abgeleitet werden. Zum Beispiel wird aus DSGVO Artikel 32 (Sicherheit der Verarbeitung) eine Reihe konkreter Controls wie Verschluesselungspflicht, Zugriffskontrolle und regelmaessige Sicherheitstests. Wir haben ueber 25.000 solcher Controls aus ueber 380 Regularien und Normen extrahiert. Jeder Control hat eine eindeutige ID, ist einer Regulierung zugeordnet und kann automatisch gegen den Ist-Zustand eines Unternehmens geprueft werden. Das ist das Herzstueck unserer Compliance-Automatisierung.',
+    answer_en: 'Controls are concrete, verifiable requirements derived from laws and standards. For example, GDPR Article 32 (Security of Processing) generates a series of concrete controls like encryption requirements, access control and regular security testing. We have extracted over 25,000 such controls from over 380 regulations and standards. Each control has a unique ID, is mapped to a regulation and can be automatically checked against a company current state. This is the heart of our compliance automation.',
+    goto_slide: 'annex-regulatory',
+    priority: 8,
+  },
+  {
+    id: 'tech-hybrid-search',
+    keywords: ['hybrid search', 'hybrid suche', 'dense', 'sparse', 'bm25', 'semantic search', 'semantische suche', 'volltextsuche'],
+    question_de: 'Was ist Hybrid Search?',
+    question_en: 'What is hybrid search?',
+    answer_de: 'Hybrid Search kombiniert zwei Suchverfahren: Dense Retrieval (semantische Aehnlichkeit ueber Vektoren) und Sparse Retrieval (klassische Schlagwortsuche aehnlich Google). Warum beide? Juristische Texte enthalten oft spezifische Begriffe wie Artikelnummern oder Normenbezeichnungen, die semantische Suche allein nicht praezise findet. Umgekehrt versteht die semantische Suche den Kontext besser als reine Schlagwortsuche. Durch die Kombination beider Verfahren mit anschliessendem Cross-Encoder Re-Ranking erreichen wir die hoechste Praezision bei Compliance-Anfragen.',
+    answer_en: 'Hybrid search combines two search methods: dense retrieval (semantic similarity via vectors) and sparse retrieval (classic keyword search similar to Google). Why both? Legal texts often contain specific terms like article numbers or standard designations that semantic search alone cannot find precisely. Conversely, semantic search understands context better than pure keyword search. By combining both methods with subsequent cross-encoder re-ranking, we achieve the highest precision for compliance queries.',
+    goto_slide: 'annex-aipipeline',
+    priority: 7,
+  },
+  {
+    id: 'tech-policy-engine',
+    keywords: ['policy engine', 'deterministic', 'deterministisch', 'regeln', 'rules', 'eskalation', 'escalation', 'e0', 'e1', 'e2', 'e3'],
+    question_de: 'Was ist die deterministische Policy Engine?',
+    question_en: 'What is the deterministic policy engine?',
+    answer_de: 'Unsere Policy Engine ist das Gegenstueck zum LLM — sie arbeitet rein regelbasiert und deterministisch. 45 vordefinierte Regeln pruefen Compliance-Ergebnisse auf Vollstaendigkeit, Konsistenz und Dringlichkeit. Das LLM liefert die Analyse, aber die Policy Engine entscheidet, was passiert: Eskalationsstufe E0 bedeutet informativ, E1 erfordert Massnahmen, E2 hat Fristen und E3 geht an die Geschaeftsfuehrung. So stellen wir sicher, dass keine KI-Halluzination zu einer falschen Compliance-Entscheidung fuehrt.',
+    answer_en: 'Our policy engine is the counterpart to the LLM — it works purely rule-based and deterministically. 45 predefined rules check compliance results for completeness, consistency and urgency. The LLM delivers the analysis, but the policy engine decides what happens: escalation level E0 is informational, E1 requires action, E2 has deadlines and E3 goes to management. This ensures no AI hallucination leads to a wrong compliance decision.',
+    goto_slide: 'annex-aipipeline',
+    priority: 7,
+  },
+  {
+    id: 'tech-vvt-toms',
+    keywords: ['vvt', 'toms', 'dsfa', 'verarbeitungsverzeichnis', 'technisch organisatorische massnahmen', 'datenschutz-folgenabschaetzung', 'ropa', 'dpia', 'loeschfristen', 'retention'],
+    question_de: 'Was sind VVT, TOMs und DSFA?',
+    question_en: 'What are RoPA, TOMs and DPIA?',
+    answer_de: 'Das sind drei zentrale DSGVO-Dokumente, die jedes Unternehmen fuehren muss. VVT ist das Verarbeitungsverzeichnis (Record of Processing Activities) — eine Liste aller Datenverarbeitungstaetigkeiten mit Zweck, Rechtsgrundlage und Empfaengern. TOMs sind Technisch-Organisatorische Massnahmen — konkrete Sicherheitsmassnahmen wie Verschluesselung, Zugriffskontrolle oder Pseudonymisierung. DSFA ist die Datenschutz-Folgenabschaetzung (Data Protection Impact Assessment) — eine vertiefte Risikoanalyse fuer besonders sensible Verarbeitungen. Unsere Plattform generiert alle drei Dokumente automatisch und haelt sie bei Aenderungen aktuell.',
+    answer_en: 'These are three central GDPR documents that every company must maintain. RoPA is the Record of Processing Activities — a list of all data processing activities with purpose, legal basis and recipients. TOMs are Technical and Organizational Measures — concrete security measures like encryption, access control or pseudonymization. DPIA is the Data Protection Impact Assessment — an in-depth risk analysis for particularly sensitive processing. Our platform generates all three documents automatically and keeps them current when changes occur.',
+    goto_slide: 'solution',
+    priority: 8,
+  },
 ]