DiceTales

DiceTales DM Evaluator Documentation

Overview

The DM Evaluator (advanced/js/dmEvaluator.js) is an intelligent response quality assessment system that scores and improves AI-generated content to feel more like a human Dungeon Master. This system analyzes AI responses across multiple dimensions and provides feedback for continuous improvement of storytelling quality.

Architecture

Core Components

Evaluation Framework

The system evaluates responses across six key criteria:

evaluationCriteria: {
    immersion: {
        weight: 0.25,
        name: "Immersion & Atmosphere",
        description: "Rich sensory details, vivid descriptions, world-building"
    },
    personality: {
        weight: 0.20,
        name: "DM Personality", 
        description: "Human-like warmth, enthusiasm, unique voice"
    },
    engagement: {
        weight: 0.20,
        name: "Player Engagement",
        description: "Specific story developments, concrete events, avoiding open-ended questions"
    },
    flow: {
        weight: 0.15,
        name: "Narrative Flow",
        description: "Natural transitions, pacing, story coherence"
    },
    authenticity: {
        weight: 0.10,
        name: "D&D Authenticity",
        description: "Rules knowledge, genre conventions, terminology"
    },
    creativity: {
        weight: 0.10,
        name: "Creative Flair",
        description: "Unexpected twists, memorable NPCs, unique situations"
    }
}

Key Features

📊 Multi-Dimensional Scoring

Comprehensive Assessment:

Weighted Scoring System:

// Example score calculation
const totalScore = (
    immersionScore * 0.25 +
    personalityScore * 0.20 +
    engagementScore * 0.20 +
    flowScore * 0.15 +
    authenticityScore * 0.10 +
    creativityScore * 0.10
);

🎯 Detailed Quality Analysis

Immersion & Atmosphere Assessment:

DM Personality Evaluation:

Player Engagement Metrics:

🔧 Automatic Improvement System

Quality Threshold Monitoring:

Improvement Suggestions:

API Reference

Constructor

const dmEvaluator = new DMEvaluator();

Core Methods

evaluateResponse(response, context)

Evaluates an AI response and returns detailed scoring.

const evaluation = dmEvaluator.evaluateResponse(
    "The ancient door creaks open, revealing...", 
    {
        playerAction: "Open the door",
        setting: "medieval-fantasy",
        character: characterData
    }
);

Returns:

{
    overallScore: 7.2,
    scores: {
        immersion: 8.0,
        personality: 7.5,
        engagement: 6.8,
        flow: 7.0,
        authenticity: 7.2,
        creativity: 6.5
    },
    feedback: {
        strengths: ["Rich atmospheric description", "Good character voice"],
        improvements: ["Add more specific story elements", "Include clearer choices"]
    },
    passesThreshold: true,
    suggestedImprovements: [...]
}

analyzeImmersion(response)

Analyzes response for immersive qualities.

const immersionScore = dmEvaluator.analyzeImmersion(response);

analyzePersonality(response)

Evaluates DM personality and voice.

const personalityScore = dmEvaluator.analyzePersonality(response);

analyzeEngagement(response)

Assesses player engagement factors.

const engagementScore = dmEvaluator.analyzeEngagement(response);

analyzeFlow(response, context)

Evaluates narrative flow and coherence.

const flowScore = dmEvaluator.analyzeFlow(response, context);

analyzeAuthenticity(response, setting)

Checks D&D/RPG authenticity and genre appropriateness.

const authenticityScore = dmEvaluator.analyzeAuthenticity(response, "medieval-fantasy");

analyzeCreativity(response)

Measures creative and unique elements.

const creativityScore = dmEvaluator.analyzeCreativity(response);

Analysis and Reporting

getPerformanceReport()

Returns comprehensive performance analysis.

const report = dmEvaluator.getPerformanceReport();

getImprovementSuggestions(scores)

Provides specific improvement recommendations.

const suggestions = dmEvaluator.getImprovementSuggestions(evaluationScores);

generateQualityReport(responseHistory)

Creates detailed quality analysis report.

const qualityReport = dmEvaluator.generateQualityReport(responseHistory);

Scoring Criteria Details

🌟 Immersion & Atmosphere (25%)

High Score Indicators:

Evaluation Patterns:

const immersionIndicators = [
    /\b(?:smell|scent|aroma|stench)\b/gi,
    /\b(?:sound|noise|echo|whisper|rumble)\b/gi,
    /\b(?:texture|rough|smooth|cold|warm)\b/gi,
    /\b(?:shadows|light|darkness|glow|shimmer)\b/gi
];

Example High-Quality Response:

“The musty smell of ancient parchment fills your nostrils as you step into the forgotten library. Dust motes dance in the pale moonlight streaming through cracked windows, and somewhere in the darkness, you hear the faint scratching of… something moving among the shelves.”

🎭 DM Personality (20%)

High Score Indicators:

Personality Markers:

const personalityIndicators = [
    /\b(?:Oh|Ah|Well|Now|Hmm)\b/g,
    /[!]{1,2}$/gm,
    /\b(?:interesting|fascinating|exciting|wonderful)\b/gi,
    /you (?:notice|feel|hear|see|sense)/gi
];

Example High-Quality Response:

“Oh, this is interesting! As you approach the mysterious figure, you can’t help but notice their eyes seem to shimmer with an otherworldly intelligence. There’s definitely more to this encounter than meets the eye…”

🎮 Player Engagement (20%)

High Score Indicators:

Engagement Patterns:

const engagementIndicators = [
    /suddenly|unexpectedly|meanwhile|however/gi,
    /you (?:must|can|could|might) (?:decide|choose|determine)/gi,
    /what (?:do you|will you) do/gi,
    /three (?:paths|options|choices)/gi
];

Example High-Quality Response:

“Your successful intimidation causes the bandit to drop his weapon immediately! He backs away, fear evident in his eyes, and points toward a hidden path through the woods. ‘The treasure… it’s in the old mill by the river,’ he stammers. You now have a clear lead, but you must decide whether to trust this information or interrogate him further.”

📖 Narrative Flow (15%)

High Score Indicators:

Flow Analysis:

⚔️ D&D Authenticity (10%)

High Score Indicators:

Authenticity Markers:

const authenticityTerms = {
    'medieval-fantasy': ['spell', 'magic', 'sword', 'armor', 'tavern', 'quest'],
    'modern-urban': ['technology', 'city', 'investigation', 'conspiracy'],
    'sci-fi-space': ['starship', 'alien', 'technology', 'galaxy', 'quantum'],
    'eldritch-horror': ['ancient', 'forbidden', 'cosmic', 'sanity', 'investigation']
};

✨ Creative Flair (10%)

High Score Indicators:

Creativity Patterns:

Integration with AI System

Automatic Evaluation

// In AI response processing
if (this.enableEvaluation && this.dmEvaluator) {
    const evaluation = this.dmEvaluator.evaluateResponse(response, {
        playerAction: actionData.action,
        setting: campaign.setting,
        character: character,
        context: memoryContext
    });
    
    if (evaluation.overallScore < this.improvementThreshold) {
        // Trigger improvement process
        response = this.improveResponse(response, evaluation.suggestedImprovements);
    }
}

Quality Monitoring

// Track evaluation history
this.responseHistory.push({
    response: response,
    evaluation: evaluation,
    timestamp: Date.now(),
    context: context
});

// Update performance metrics
this.updateAverageScore(evaluation.overallScore);

Performance Analytics

const performanceReport = {
    averageScore: 7.2,
    totalEvaluations: 156,
    trendDirection: 'improving',
    strongestCriteria: 'immersion',
    weakestCriteria: 'creativity',
    improvementRate: '+0.3 points over last 10 responses'
};

Detailed Metrics

const detailedMetrics = {
    criteriaAverages: {
        immersion: 8.1,
        personality: 7.8,
        engagement: 6.9,
        flow: 7.2,
        authenticity: 7.5,
        creativity: 6.3
    },
    passingRate: 0.78, // 78% of responses meet threshold
    improvementTriggers: 34,
    averageImprovementGain: 1.2
};

Usage Examples

Basic Evaluation

// Initialize evaluator
const dmEvaluator = new DMEvaluator();

// Evaluate a response
const response = "You enter a dark room. What do you do?";
const evaluation = dmEvaluator.evaluateResponse(response, {
    playerAction: "Open the door",
    setting: "medieval-fantasy"
});

console.log(`Overall Score: ${evaluation.overallScore}/10`);
console.log(`Improvements needed: ${evaluation.suggestedImprovements}`);

Advanced Analysis

// Get detailed breakdown
const detailedAnalysis = {
    immersion: dmEvaluator.analyzeImmersion(response),
    personality: dmEvaluator.analyzePersonality(response),
    engagement: dmEvaluator.analyzeEngagement(response),
    flow: dmEvaluator.analyzeFlow(response, context),
    authenticity: dmEvaluator.analyzeAuthenticity(response, setting),
    creativity: dmEvaluator.analyzeCreativity(response)
};

// Generate improvement suggestions
const improvements = dmEvaluator.getImprovementSuggestions(detailedAnalysis);

Performance Monitoring

// Track quality over time
setInterval(() => {
    const report = dmEvaluator.getPerformanceReport();
    if (report.averageScore < 6.5) {
        console.warn('Response quality below threshold - review needed');
    }
}, 100000); // Check every 100 responses

Configuration Options

Evaluation Thresholds

const evaluationConfig = {
    improvementThreshold: 6.5,
    maxHistorySize: 50,
    enableAutoImprovement: true,
    detailedLogging: true,
    criteriaWeights: {
        immersion: 0.25,
        personality: 0.20,
        engagement: 0.20,
        flow: 0.15,
        authenticity: 0.10,
        creativity: 0.10
    }
};

Quality Standards

const qualityStandards = {
    excellent: 8.5,
    good: 7.0,
    acceptable: 6.5,
    needsImprovement: 5.0,
    poor: 3.0
};

Best Practices

Evaluation Guidelines

  1. Context Matters: Always provide relevant context for accurate evaluation
  2. Regular Monitoring: Track evaluation trends over time
  3. Threshold Tuning: Adjust improvement thresholds based on campaign needs
  4. Balanced Criteria: Ensure evaluation criteria weights match campaign priorities

Quality Improvement

  1. Focus on Weakest Areas: Prioritize improvement suggestions for lowest-scoring criteria
  2. Consistency: Maintain consistent evaluation standards across sessions
  3. Player Feedback: Combine automated evaluation with player satisfaction metrics
  4. Iterative Enhancement: Use evaluation data to refine AI prompt engineering

Troubleshooting

Common Issues

Inconsistent Scoring:

Low Evaluation Scores:

Performance Impact:

Debug Functions

// Test evaluation system
dmEvaluator.debugEvaluation(sampleResponse, sampleContext);

// Validate scoring consistency
dmEvaluator.validateScoringConsistency(responseSet);

// Export evaluation data
const evaluationData = dmEvaluator.exportEvaluationHistory();

Future Enhancements

Planned Features

  1. Machine Learning Integration: AI-powered evaluation improvement
  2. Player Feedback Integration: Combined automated and human evaluation
  3. Custom Criteria: User-defined evaluation dimensions
  4. Real-time Suggestions: Live improvement recommendations during generation
  5. Comparative Analysis: Benchmarking against high-quality examples

Technical Improvements

  1. Performance Optimization: Faster evaluation processing
  2. Evaluation Accuracy: Enhanced pattern recognition and analysis
  3. Context Awareness: Deeper understanding of campaign and character context
  4. Quality Prediction: Predictive quality modeling for proactive improvement

Conclusion

The DM Evaluator ensures that DiceTales maintains high standards of storytelling quality by providing comprehensive, automated assessment of AI-generated content. Through detailed multi-criteria analysis and continuous improvement feedback, it helps create more immersive, engaging, and human-like gaming experiences that rival the best human Dungeon Masters.