fix: Playwright user permission + etracker DSE matching + CMP skip

1. Dockerfile: install Playwright AS appuser (not root) so chromium
   binary is accessible at runtime. Was causing 500 error.
2. DSE service matching: text-search fallback when LLM extraction fails.
   If "etracker" appears in DSE text, mark as documented even without
   LLM parsing the service list.
3. CMP skip: consent managers in category "cmp" skipped (not just "other"
   with id "cmp").

NOT DEPLOYED — RAG pipeline is running.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-29 19:36:46 +02:00
parent cedc5de15d
commit 58957a4aaa
3 changed files with 25 additions and 4 deletions
+9 -1
View File
@@ -7,15 +7,23 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create user BEFORE installing Playwright (so browsers are in user's cache)
RUN useradd --create-home appuser
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Install Playwright browsers AS appuser (so they land in /home/appuser/.cache/)
USER appuser
RUN playwright install chromium
USER root
COPY . .
RUN chown -R appuser:appuser /app
RUN useradd --create-home appuser
USER appuser
EXPOSE 8094