Technical Documentation

Complete guide to understanding, deploying, and extending the PHT Strategy 5 AI content pipeline.

Table of Contents

System Architecture

PHT Strategy 5 is built as a modular pipeline with three main components:

src/ ├── briefs/ # Topic discovery and brief generation │ ├── fetch_feeds.py # RSS/API data collection │ ├── cluster_topics.py # ML-based topic clustering │ └── write_brief.py # Structured brief creation ├── drafts/ # Article generation and optimization │ ├── expand_article.py # LLM-based article writing │ ├── fact_check.py # RAG-based verification │ ├── seo_enrich.py # Metadata and schema generation │ └── image_prompts.py # Visual content suggestions ├── feedback/ # Analytics and improvement │ ├── ingest_cf_logs.py # Cloudflare log processing │ ├── score_topics.py # Performance analysis │ └── open_tasks.py # Improvement task generation └── utils/ # Shared utilities ├── gh.py # GitHub API integration ├── rag_store.py # Knowledge base management └── prompts/ # LLM prompt templates

Data Flow

  1. Collection: RSS feeds and APIs → Raw topic data
  2. Processing: Topic clustering → Content gap analysis
  3. Generation: Structured briefs → GitHub Issues
  4. Approval: Human review → Approved topics
  5. Expansion: LLM generation → Full articles
  6. Verification: Fact-checking → Quality assurance
  7. Publishing: SEO optimization → Pull requests
  8. Feedback: Analytics → Improvement suggestions

Topic Discovery & Clustering

The system continuously monitors multiple sources to identify trending privacy topics:

Data Sources

# config/sources.yml feeds: - name: eff url: https://www.eff.org/rss/updates.xml - name: mozilla url: https://blog.mozilla.org/en/feed/ - name: proton url: https://proton.me/blog/feed # Additional sources (configurable): # - Reddit communities (r/privacy, r/security) # - Google Trends privacy categories # - Hacker News privacy discussions # - Government policy RSS feeds

Clustering Algorithm

Topics are clustered using machine learning to identify related content:

# Example clustering output { "cluster_id": "android-privacy-2025", "topics": [ "Android 15 privacy features", "Google Play privacy changes", "Android app permissions update" ], "search_gap": "Missing comprehensive guide for new Android 15 privacy settings", "confidence": 0.87 }

Brief Generation Process

Each topic cluster is transformed into a structured brief containing all information needed for article creation:

# Example brief structure { "slug": "android-15-privacy-guide-2025", "pillar": "mobile-privacy", "title": "Complete Android 15 Privacy Guide: New Features & Settings", "primary_keyword": "android 15 privacy", "secondary_keywords": ["android privacy settings", "google privacy controls"], "search_intent": "howto", "search_gap": "No comprehensive guide covering all Android 15 privacy features", "outline": [ { "section": "Introduction", "summary": "Overview of Android 15 privacy improvements" }, { "section": "New Privacy Features", "summary": "Detailed walkthrough of new privacy controls" }, { "section": "Step-by-Step Setup", "summary": "How to configure optimal privacy settings" } ], "sources_hint": ["developer.android.com", "support.google.com"], "estimated_length": 2500, "difficulty": "beginner" }

Brief Quality Criteria

Human Approval Workflow

Every brief requires human approval before article generation begins:

GitHub Issues Integration

  1. Brief generated and posted as GitHub Issue with "brief" label
  2. Human editor reviews topic relevance and quality
  3. Editor adds "approved" label to selected briefs
  4. GitHub Action triggered automatically on label addition
  5. Approved brief enters article generation pipeline
# GitHub Issue Template Title: [BRIEF] Complete Android 15 Privacy Guide: New Features & Settings Labels: brief, mobile-privacy, android Body: **Primary Keyword:** android 15 privacy **Search Gap:** No comprehensive guide covering all Android 15 privacy features **Estimated Length:** 2500 words **Difficulty:** Beginner **Outline:** 1. Introduction - Overview of Android 15 privacy improvements 2. New Privacy Features - Detailed walkthrough of new privacy controls 3. Step-by-Step Setup - How to configure optimal privacy settings **Sources:** developer.android.com, support.google.com --- To approve this brief for article generation, add the "approved" label.

Approval Metrics

The system tracks approval rates to improve brief quality:

Article Expansion & Generation

Approved briefs are expanded into full articles using large language models:

Generation Process

  1. Section-by-Section: Articles written incrementally for better quality control
  2. Template Adherence: Consistent structure with intro, key takeaways, instructions, FAQ
  3. Style Guidelines: Short paragraphs, bullet lists, clear headings
  4. Citation Integration: Automatic source attribution and link insertion
# Article generation prompt template You are writing a comprehensive privacy guide for technically-minded users. Article Details: - Title: {title} - Primary Keyword: {primary_keyword} - Target Length: {target_length} words - Audience: {difficulty_level} Content Requirements: - Short paragraphs (2-3 sentences max) - Numbered lists for step-by-step instructions - Bullet points for feature lists and benefits - Include "Key Takeaways" section near the top - Add FAQ section at the end - Cite sources using [source: domain.com] format Section to write: {section_title} Section summary: {section_summary} Write this section now:

Quality Controls

Fact-Checking System

Every article undergoes automated fact-checking using Retrieval-Augmented Generation (RAG):

Knowledge Base

The RAG system maintains a curated database of authoritative privacy sources:

# Fact-checking process 1. Extract claims from generated article 2. Query RAG knowledge base for relevant sources 3. Compare claims against retrieved passages 4. Flag unverifiable or contradictory statements 5. Insert HTML comments for human review # Example flagged content <!-- VERIFY: This claim about Android 15 permissions could not be verified against developer.android.com documentation. Please confirm accuracy. -->

Verification Levels

SEO & Metadata Optimization

Every article is optimized for both traditional search engines and AI systems:

SEO Elements

# Automated SEO optimization - Meta title (50-60 characters) - Meta description (150-160 characters) - H1, H2, H3 hierarchy with target keywords - Alt text for all images - Internal linking to related content - JSON-LD schema markup (Article, HowTo, FAQPage) - OpenGraph and Twitter Card metadata - Canonical URLs and redirects

Schema Markup Examples

{ "@context": "https://schema.org", "@type": "HowTo", "name": "Complete Android 15 Privacy Guide", "description": "Step-by-step guide to configure privacy settings in Android 15", "image": "https://example.com/android-privacy-guide.jpg", "supply": [ { "@type": "HowToSupply", "name": "Android 15 device" } ], "step": [ { "@type": "HowToStep", "name": "Open Privacy Settings", "text": "Navigate to Settings > Privacy", "image": "https://example.com/step1.jpg" } ] }

AI Optimization

Content structured for AI scraping and summarization:

Analytics & Feedback Loop

Continuous improvement using Cloudflare analytics and performance data:

Metrics Collection

# Key performance indicators - Page views and unique visitors - Average time on page - Bounce rate and exit rate - Search engine impressions and clicks - Social media shares and engagement - Internal link click-through rates - Conversion metrics (if applicable)

Performance Thresholds

# config/thresholds.yml seo: min_lighthouse_score: 90 min_word_count: 1200 max_external_links: 20 feedback: min_pageviews: 100 # Monthly threshold min_time_on_page: 45.0 # Seconds max_bounce_rate: 70 # Percentage

Automated Improvements

The system automatically identifies optimization opportunities:

Configuration Management

The system uses YAML configuration files for easy customization:

Sources Configuration

# config/sources.yml feeds: - name: eff url: https://www.eff.org/rss/updates.xml weight: 1.0 category: advocacy - name: mozilla url: https://blog.mozilla.org/en/feed/ weight: 0.8 category: browser reddit: subreddits: - privacy - security - privacytoolsio min_score: 50 google_trends: categories: - "Internet & Telecom/Internet/Web Services" - "Computers & Electronics/Software/Internet Software"

Quality Thresholds

# config/thresholds.yml content: min_word_count: 1200 max_word_count: 5000 flesch_reading_ease: min: 55 max: 75 seo: min_lighthouse_score: 90 max_external_links: 20 min_internal_links: 2 ml: clustering: min_cluster_size: 3 eps: 0.3 similarity_threshold: 0.75

Deployment & Setup

Complete guide to deploying PHT Strategy 5 on GitHub Actions and Cloudflare Pages:

Prerequisites

Environment Variables

# Required GitHub Secrets OPENAI_API_KEY=sk-... GITHUB_TOKEN=ghp_... CLOUDFLARE_API_TOKEN=... # Optional ANTHROPIC_API_KEY=... REDDIT_CLIENT_ID=... REDDIT_CLIENT_SECRET=... GOOGLE_TRENDS_API_KEY=...

Local Development Setup

# Clone and setup git clone https://github.com/michaeljensen/pht-strategy5.git cd pht-strategy5 python -m venv venv source venv/bin/activate # Linux/Mac # or: venv\Scripts\activate # Windows # Install dependencies pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env with your API keys # Test the pipeline python src/briefs/fetch_feeds.py python src/briefs/cluster_topics.py

Development Guide

Guidelines for extending and customizing the system:

Code Structure

# Example module structure from dataclasses import dataclass from typing import List, Optional import yaml @dataclass class FeedItem: """Normalized representation of a topic candidate.""" source: str title: str summary: str url: str published_at: datetime def load_sources(config_path: Path) -> List[str]: """Load RSS sources from configuration.""" with config_path.open("r") as f: data = yaml.safe_load(f) or {} return [entry["url"] for entry in data.get("feeds", [])]

Testing Strategy

API Reference

Key functions and classes for system extension:

Core Classes

class FeedItem: """Represents a discovered topic candidate.""" source: str # RSS feed or API source title: str # Original headline summary: str # Brief description url: str # Source URL published_at: datetime class Brief: """Structured brief for article generation.""" slug: str # URL-friendly identifier pillar: str # Content category title: str # Proposed article title primary_keyword: str secondary_keywords: List[str] search_intent: str # "howto", "explainer", "review" outline: List[Section] sources_hint: List[str] class Article: """Generated article with metadata.""" brief: Brief content: str # Markdown content metadata: dict # SEO and schema data fact_check_results: List[FactCheck]

Key Functions

# Topic Discovery def fetch_sources(sources: List[str]) -> List[FeedItem] def cluster_topics(items: List[FeedItem]) -> List[TopicCluster] def analyze_search_gap(cluster: TopicCluster) -> SearchGapAnalysis # Brief Generation def generate_brief(cluster: TopicCluster) -> Brief def create_github_issue(brief: Brief) -> Issue # Article Generation def expand_article(brief: Brief) -> Article def fact_check_article(article: Article) -> List[FactCheck] def optimize_seo(article: Article) -> Article # Analytics def ingest_cf_logs(log_path: str) -> List[AnalyticsEvent] def score_content(article: Article, events: List[AnalyticsEvent]) -> ContentScore def suggest_improvements(score: ContentScore) -> List[Improvement]

Need Help?

Get support, report issues, or contribute to the project.

Report Issues Discussions See Examples