Documentation - PHT Strategy 5

System Architecture

PHT Strategy 5 is built as a modular pipeline with three main components:

                
src/
├── briefs/           # Topic discovery and brief generation
│   ├── fetch_feeds.py      # RSS/API data collection
│   ├── cluster_topics.py   # ML-based topic clustering
│   └── write_brief.py      # Structured brief creation
├── drafts/           # Article generation and optimization  
│   ├── expand_article.py   # LLM-based article writing
│   ├── fact_check.py       # RAG-based verification
│   ├── seo_enrich.py       # Metadata and schema generation
│   └── image_prompts.py    # Visual content suggestions
├── feedback/         # Analytics and improvement
│   ├── ingest_cf_logs.py   # Cloudflare log processing
│   ├── score_topics.py     # Performance analysis
│   └── open_tasks.py       # Improvement task generation
└── utils/            # Shared utilities
    ├── gh.py              # GitHub API integration
    ├── rag_store.py       # Knowledge base management
    └── prompts/           # LLM prompt templates
                
            

Data Flow

Collection: RSS feeds and APIs → Raw topic data
Processing: Topic clustering → Content gap analysis
Generation: Structured briefs → GitHub Issues
Approval: Human review → Approved topics
Expansion: LLM generation → Full articles
Verification: Fact-checking → Quality assurance
Publishing: SEO optimization → Pull requests
Feedback: Analytics → Improvement suggestions

Topic Discovery & Clustering

The system continuously monitors multiple sources to identify trending privacy topics:

Data Sources

                
# config/sources.yml
feeds:
  - name: eff
    url: https://www.eff.org/rss/updates.xml
  - name: mozilla  
    url: https://blog.mozilla.org/en/feed/
  - name: proton
    url: https://proton.me/blog/feed

# Additional sources (configurable):
# - Reddit communities (r/privacy, r/security)
# - Google Trends privacy categories
# - Hacker News privacy discussions
# - Government policy RSS feeds
                
            

Clustering Algorithm

Topics are clustered using machine learning to identify related content:

Embeddings: Text content converted to vector representations
Similarity: Cosine similarity between topic vectors
Clustering: DBSCAN or HDBSCAN for automatic cluster detection
Gap Analysis: Compare clusters to existing SERP results

                
# Example clustering output
{
  "cluster_id": "android-privacy-2025",
  "topics": [
    "Android 15 privacy features",
    "Google Play privacy changes",
    "Android app permissions update"
  ],
  "search_gap": "Missing comprehensive guide for new Android 15 privacy settings",
  "confidence": 0.87
}
                
            

Brief Generation Process

Each topic cluster is transformed into a structured brief containing all information needed for article creation:

                
# Example brief structure
{
  "slug": "android-15-privacy-guide-2025",
  "pillar": "mobile-privacy",
  "title": "Complete Android 15 Privacy Guide: New Features & Settings",
  "primary_keyword": "android 15 privacy",
  "secondary_keywords": ["android privacy settings", "google privacy controls"],
  "search_intent": "howto",
  "search_gap": "No comprehensive guide covering all Android 15 privacy features",
  "outline": [
    {
      "section": "Introduction",
      "summary": "Overview of Android 15 privacy improvements"
    },
    {
      "section": "New Privacy Features", 
      "summary": "Detailed walkthrough of new privacy controls"
    },
    {
      "section": "Step-by-Step Setup",
      "summary": "How to configure optimal privacy settings"
    }
  ],
  "sources_hint": ["developer.android.com", "support.google.com"],
  "estimated_length": 2500,
  "difficulty": "beginner"
}
                
            

Brief Quality Criteria

Clear search intent identification (HowTo, Explainer, Review)
Quantified search gap with competitive analysis
Realistic scope and target word count
Authoritative source suggestions for fact-checking
SEO keyword analysis and difficulty assessment

Human Approval Workflow

Every brief requires human approval before article generation begins:

GitHub Issues Integration

Brief generated and posted as GitHub Issue with "brief" label
Human editor reviews topic relevance and quality
Editor adds "approved" label to selected briefs
GitHub Action triggered automatically on label addition
Approved brief enters article generation pipeline

                
# GitHub Issue Template
Title: [BRIEF] Complete Android 15 Privacy Guide: New Features & Settings

Labels: brief, mobile-privacy, android

Body:
**Primary Keyword:** android 15 privacy
**Search Gap:** No comprehensive guide covering all Android 15 privacy features
**Estimated Length:** 2500 words
**Difficulty:** Beginner

**Outline:**
1. Introduction - Overview of Android 15 privacy improvements
2. New Privacy Features - Detailed walkthrough of new privacy controls  
3. Step-by-Step Setup - How to configure optimal privacy settings

**Sources:** developer.android.com, support.google.com

---
To approve this brief for article generation, add the "approved" label.
                
            

Approval Metrics

The system tracks approval rates to improve brief quality:

Brief approval percentage by topic category
Time from brief creation to approval decision
Editor feedback patterns and preferences
Correlation between brief quality scores and approval rates

Article Expansion & Generation

Approved briefs are expanded into full articles using large language models:

Generation Process

Section-by-Section: Articles written incrementally for better quality control
Template Adherence: Consistent structure with intro, key takeaways, instructions, FAQ
Style Guidelines: Short paragraphs, bullet lists, clear headings
Citation Integration: Automatic source attribution and link insertion

                
# Article generation prompt template
You are writing a comprehensive privacy guide for technically-minded users.

Article Details:
- Title: {title}
- Primary Keyword: {primary_keyword}  
- Target Length: {target_length} words
- Audience: {difficulty_level}

Content Requirements:
- Short paragraphs (2-3 sentences max)
- Numbered lists for step-by-step instructions
- Bullet points for feature lists and benefits
- Include "Key Takeaways" section near the top
- Add FAQ section at the end
- Cite sources using [source: domain.com] format

Section to write: {section_title}
Section summary: {section_summary}

Write this section now:
                
            

Quality Controls

Length Validation: Meets minimum word count thresholds
Readability: Flesch Reading Ease score 55-75
Structure: Proper heading hierarchy and formatting
Links: Minimum internal links and appropriate external citations

Fact-Checking System

Every article undergoes automated fact-checking using Retrieval-Augmented Generation (RAG):

Knowledge Base

The RAG system maintains a curated database of authoritative privacy sources:

EFF: Blog posts and privacy guides
Mozilla: Developer documentation and privacy policies
Standards: IETF RFCs and W3C specifications
Vendors: Official Apple, Google, Microsoft documentation
Legal: GDPR, CCPA, and other privacy regulations

                
# Fact-checking process
1. Extract claims from generated article
2. Query RAG knowledge base for relevant sources  
3. Compare claims against retrieved passages
4. Flag unverifiable or contradictory statements
5. Insert HTML comments for human review

# Example flagged content
<!-- VERIFY: This claim about Android 15 permissions 
     could not be verified against developer.android.com 
     documentation. Please confirm accuracy. -->
                
            

Verification Levels

Verified: Direct match with authoritative source
Supported: Consistent with similar authoritative content
Unverified: No supporting evidence found - flagged for review
Contradicted: Conflicts with known authoritative source

SEO & Metadata Optimization

Every article is optimized for both traditional search engines and AI systems:

SEO Elements

                
# Automated SEO optimization
- Meta title (50-60 characters)
- Meta description (150-160 characters)  
- H1, H2, H3 hierarchy with target keywords
- Alt text for all images
- Internal linking to related content
- JSON-LD schema markup (Article, HowTo, FAQPage)
- OpenGraph and Twitter Card metadata
- Canonical URLs and redirects
                
            

Schema Markup Examples

                
{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "Complete Android 15 Privacy Guide",
  "description": "Step-by-step guide to configure privacy settings in Android 15",
  "image": "https://example.com/android-privacy-guide.jpg",
  "supply": [
    {
      "@type": "HowToSupply", 
      "name": "Android 15 device"
    }
  ],
  "step": [
    {
      "@type": "HowToStep",
      "name": "Open Privacy Settings",
      "text": "Navigate to Settings > Privacy",
      "image": "https://example.com/step1.jpg"
    }
  ]
}
                
            

AI Optimization

Content structured for AI scraping and summarization:

Key Takeaways: Prominent summary section for AI extraction
FAQ Format: Question-answer pairs for voice search
Structured Data: Tables and lists for data extraction
Citation Format: Clear source attribution for AI verification

Analytics & Feedback Loop

Continuous improvement using Cloudflare analytics and performance data:

Metrics Collection

                
# Key performance indicators
- Page views and unique visitors
- Average time on page  
- Bounce rate and exit rate
- Search engine impressions and clicks
- Social media shares and engagement
- Internal link click-through rates
- Conversion metrics (if applicable)
                
            

Performance Thresholds

                
# config/thresholds.yml
seo:
  min_lighthouse_score: 90
  min_word_count: 1200
  max_external_links: 20

feedback:
  min_pageviews: 100        # Monthly threshold
  min_time_on_page: 45.0    # Seconds
  max_bounce_rate: 70       # Percentage
                
            

Automated Improvements

The system automatically identifies optimization opportunities:

Expand: High-performing content gets additional sections
Update: Outdated content flagged for refresh
Merge: Similar low-performing content consolidated
Optimize: SEO improvements for underperforming pages

Configuration Management

The system uses YAML configuration files for easy customization:

Sources Configuration

                
# config/sources.yml
feeds:
  - name: eff
    url: https://www.eff.org/rss/updates.xml
    weight: 1.0
    category: advocacy
  - name: mozilla
    url: https://blog.mozilla.org/en/feed/ 
    weight: 0.8
    category: browser

reddit:
  subreddits:
    - privacy
    - security
    - privacytoolsio
  min_score: 50

google_trends:
  categories:
    - "Internet & Telecom/Internet/Web Services"
    - "Computers & Electronics/Software/Internet Software"
                
            

Quality Thresholds

                
# config/thresholds.yml
content:
  min_word_count: 1200
  max_word_count: 5000
  flesch_reading_ease:
    min: 55
    max: 75
  
seo:
  min_lighthouse_score: 90
  max_external_links: 20
  min_internal_links: 2

ml:
  clustering:
    min_cluster_size: 3
    eps: 0.3
  similarity_threshold: 0.75
                
            

Deployment & Setup

Complete guide to deploying PHT Strategy 5 on GitHub Actions and Cloudflare Pages:

Prerequisites

GitHub repository with Actions enabled
Cloudflare account with Pages access
OpenAI or Anthropic API key (or local Ollama setup)
Python 3.11+ for local development

Environment Variables

                
# Required GitHub Secrets
OPENAI_API_KEY=sk-...
GITHUB_TOKEN=ghp_...
CLOUDFLARE_API_TOKEN=...

# Optional
ANTHROPIC_API_KEY=...
REDDIT_CLIENT_ID=...
REDDIT_CLIENT_SECRET=...
GOOGLE_TRENDS_API_KEY=...
                
            

Local Development Setup

                
# Clone and setup
git clone https://github.com/michaeljensen/pht-strategy5.git
cd pht-strategy5
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Test the pipeline
python src/briefs/fetch_feeds.py
python src/briefs/cluster_topics.py
                
            

Development Guide

Guidelines for extending and customizing the system:

Code Structure

Modular Design: Each component is independently testable
Configuration-Driven: Behavior controlled via YAML files
Type Hints: Full type annotation for better IDE support
Error Handling: Graceful degradation and retry logic

                
# Example module structure
from dataclasses import dataclass
from typing import List, Optional
import yaml

@dataclass
class FeedItem:
    """Normalized representation of a topic candidate."""
    source: str
    title: str
    summary: str
    url: str
    published_at: datetime

def load_sources(config_path: Path) -> List[str]:
    """Load RSS sources from configuration."""
    with config_path.open("r") as f:
        data = yaml.safe_load(f) or {}
    return [entry["url"] for entry in data.get("feeds", [])]
                
            

Testing Strategy

Unit Tests: Individual function testing with mocked APIs
Integration Tests: End-to-end pipeline testing
Performance Tests: Content quality and generation speed
A/B Tests: Prompt optimization and output comparison

API Reference

Key functions and classes for system extension:

Core Classes

                
class FeedItem:
    """Represents a discovered topic candidate."""
    source: str          # RSS feed or API source
    title: str           # Original headline
    summary: str         # Brief description
    url: str            # Source URL
    published_at: datetime

class Brief:
    """Structured brief for article generation."""
    slug: str           # URL-friendly identifier
    pillar: str         # Content category
    title: str          # Proposed article title
    primary_keyword: str
    secondary_keywords: List[str]
    search_intent: str  # "howto", "explainer", "review"
    outline: List[Section]
    sources_hint: List[str]

class Article:
    """Generated article with metadata."""
    brief: Brief
    content: str        # Markdown content
    metadata: dict      # SEO and schema data
    fact_check_results: List[FactCheck]
                
            

Key Functions

                
# Topic Discovery
def fetch_sources(sources: List[str]) -> List[FeedItem]
def cluster_topics(items: List[FeedItem]) -> List[TopicCluster]
def analyze_search_gap(cluster: TopicCluster) -> SearchGapAnalysis

# Brief Generation  
def generate_brief(cluster: TopicCluster) -> Brief
def create_github_issue(brief: Brief) -> Issue

# Article Generation
def expand_article(brief: Brief) -> Article
def fact_check_article(article: Article) -> List[FactCheck]
def optimize_seo(article: Article) -> Article

# Analytics
def ingest_cf_logs(log_path: str) -> List[AnalyticsEvent]
def score_content(article: Article, events: List[AnalyticsEvent]) -> ContentScore
def suggest_improvements(score: ContentScore) -> List[Improvement]
                
            

Need Help?

Get support, report issues, or contribute to the project.

Report Issues Discussions See Examples

Technical Documentation

Table of Contents