DiceTales

DiceTales DM Evaluator Documentation

Overview

The DM Evaluator (advanced/js/dmEvaluator.js) is an intelligent response quality assessment system that scores and improves AI-generated content to feel more like a human Dungeon Master. This system analyzes AI responses across multiple dimensions and provides feedback for continuous improvement of storytelling quality.

Architecture

Core Components

DMEvaluator Class: Central evaluation and scoring system
Multi-Criteria Analysis: Comprehensive response assessment framework
Quality Metrics: Quantitative scoring across storytelling dimensions
Improvement Suggestions: Actionable feedback for response enhancement
Performance Tracking: Historical analysis and trend monitoring

Evaluation Framework

The system evaluates responses across six key criteria:

evaluationCriteria: {
    immersion: {
        weight: 0.25,
        name: "Immersion & Atmosphere",
        description: "Rich sensory details, vivid descriptions, world-building"
    },
    personality: {
        weight: 0.20,
        name: "DM Personality", 
        description: "Human-like warmth, enthusiasm, unique voice"
    },
    engagement: {
        weight: 0.20,
        name: "Player Engagement",
        description: "Specific story developments, concrete events, avoiding open-ended questions"
    },
    flow: {
        weight: 0.15,
        name: "Narrative Flow",
        description: "Natural transitions, pacing, story coherence"
    },
    authenticity: {
        weight: 0.10,
        name: "D&D Authenticity",
        description: "Rules knowledge, genre conventions, terminology"
    },
    creativity: {
        weight: 0.10,
        name: "Creative Flair",
        description: "Unexpected twists, memorable NPCs, unique situations"
    }
}

Key Features

📊 Multi-Dimensional Scoring

Comprehensive Assessment:

Immersion & Atmosphere (25%): Sensory details, vivid descriptions, world-building depth
DM Personality (20%): Human-like warmth, enthusiasm, unique narrative voice
Player Engagement (20%): Specific developments, concrete events, meaningful choices
Narrative Flow (15%): Natural transitions, proper pacing, story coherence
D&D Authenticity (10%): Rules knowledge, genre conventions, terminology usage
Creative Flair (10%): Unexpected twists, memorable NPCs, unique situations

Weighted Scoring System:

// Example score calculation
const totalScore = (
    immersionScore * 0.25 +
    personalityScore * 0.20 +
    engagementScore * 0.20 +
    flowScore * 0.15 +
    authenticityScore * 0.10 +
    creativityScore * 0.10
);

🎯 Detailed Quality Analysis

Immersion & Atmosphere Assessment:

Sensory detail richness (sight, sound, smell, touch, taste)
Environmental description quality
Mood and atmosphere establishment
World-building consistency and depth

DM Personality Evaluation:

Human-like conversational tone
Enthusiasm and energy level
Consistent narrative voice
Warmth and approachability

Player Engagement Metrics:

Concrete story developments
Meaningful choice presentation
Action consequence clarity
Avoidance of generic responses

🔧 Automatic Improvement System

Quality Threshold Monitoring:

Target score: 6.5/10 minimum
Automatic improvement triggers
Response regeneration recommendations
Quality trend tracking

Improvement Suggestions:

Specific enhancement recommendations
Common improvement patterns
Best practice guidance
Quality benchmark examples

API Reference

Constructor

const dmEvaluator = new DMEvaluator();

Core Methods

`evaluateResponse(response, context)`

Evaluates an AI response and returns detailed scoring.

const evaluation = dmEvaluator.evaluateResponse(
    "The ancient door creaks open, revealing...", 
    {
        playerAction: "Open the door",
        setting: "medieval-fantasy",
        character: characterData
    }
);

Returns:

{
    overallScore: 7.2,
    scores: {
        immersion: 8.0,
        personality: 7.5,
        engagement: 6.8,
        flow: 7.0,
        authenticity: 7.2,
        creativity: 6.5
    },
    feedback: {
        strengths: ["Rich atmospheric description", "Good character voice"],
        improvements: ["Add more specific story elements", "Include clearer choices"]
    },
    passesThreshold: true,
    suggestedImprovements: [...]
}

`analyzeImmersion(response)`

Analyzes response for immersive qualities.

const immersionScore = dmEvaluator.analyzeImmersion(response);

`analyzePersonality(response)`

Evaluates DM personality and voice.

const personalityScore = dmEvaluator.analyzePersonality(response);

`analyzeEngagement(response)`

Assesses player engagement factors.

const engagementScore = dmEvaluator.analyzeEngagement(response);

`analyzeFlow(response, context)`

Evaluates narrative flow and coherence.

const flowScore = dmEvaluator.analyzeFlow(response, context);

`analyzeAuthenticity(response, setting)`

Checks D&D/RPG authenticity and genre appropriateness.

const authenticityScore = dmEvaluator.analyzeAuthenticity(response, "medieval-fantasy");

`analyzeCreativity(response)`

Measures creative and unique elements.

const creativityScore = dmEvaluator.analyzeCreativity(response);

Analysis and Reporting

`getPerformanceReport()`

Returns comprehensive performance analysis.

const report = dmEvaluator.getPerformanceReport();

`getImprovementSuggestions(scores)`

Provides specific improvement recommendations.

const suggestions = dmEvaluator.getImprovementSuggestions(evaluationScores);

`generateQualityReport(responseHistory)`

Creates detailed quality analysis report.

const qualityReport = dmEvaluator.generateQualityReport(responseHistory);

Scoring Criteria Details

🌟 Immersion & Atmosphere (25%)

High Score Indicators:

Rich sensory descriptions (sounds, smells, textures)
Vivid environmental details
Atmospheric mood establishment
Consistent world-building elements

Evaluation Patterns:

const immersionIndicators = [
    /\b(?:smell|scent|aroma|stench)\b/gi,
    /\b(?:sound|noise|echo|whisper|rumble)\b/gi,
    /\b(?:texture|rough|smooth|cold|warm)\b/gi,
    /\b(?:shadows|light|darkness|glow|shimmer)\b/gi
];

Example High-Quality Response:

“The musty smell of ancient parchment fills your nostrils as you step into the forgotten library. Dust motes dance in the pale moonlight streaming through cracked windows, and somewhere in the darkness, you hear the faint scratching of… something moving among the shelves.”

🎭 DM Personality (20%)

High Score Indicators:

Conversational and warm tone
Enthusiasm for the story
Consistent narrative voice
Human-like expressions and reactions

Personality Markers:

const personalityIndicators = [
    /\b(?:Oh|Ah|Well|Now|Hmm)\b/g,
    /[!]{1,2}$/gm,
    /\b(?:interesting|fascinating|exciting|wonderful)\b/gi,
    /you (?:notice|feel|hear|see|sense)/gi
];

Example High-Quality Response:

“Oh, this is interesting! As you approach the mysterious figure, you can’t help but notice their eyes seem to shimmer with an otherworldly intelligence. There’s definitely more to this encounter than meets the eye…”

🎮 Player Engagement (20%)

High Score Indicators:

Specific story developments
Clear action consequences
Meaningful choice presentation
Concrete events and outcomes

Engagement Patterns:

const engagementIndicators = [
    /suddenly|unexpectedly|meanwhile|however/gi,
    /you (?:must|can|could|might) (?:decide|choose|determine)/gi,
    /what (?:do you|will you) do/gi,
    /three (?:paths|options|choices)/gi
];

Example High-Quality Response:

“Your successful intimidation causes the bandit to drop his weapon immediately! He backs away, fear evident in his eyes, and points toward a hidden path through the woods. ‘The treasure… it’s in the old mill by the river,’ he stammers. You now have a clear lead, but you must decide whether to trust this information or interrogate him further.”

📖 Narrative Flow (15%)

High Score Indicators:

Smooth scene transitions
Logical story progression
Appropriate pacing
Coherent narrative structure

Flow Analysis:

Transition word usage
Sentence variety and rhythm
Logical cause-and-effect relationships
Balanced description and action

⚔️ D&D Authenticity (10%)

High Score Indicators:

Proper D&D terminology usage
Rules-appropriate scenarios
Genre-consistent elements
Authentic fantasy/sci-fi/horror atmosphere

Authenticity Markers:

const authenticityTerms = {
    'medieval-fantasy': ['spell', 'magic', 'sword', 'armor', 'tavern', 'quest'],
    'modern-urban': ['technology', 'city', 'investigation', 'conspiracy'],
    'sci-fi-space': ['starship', 'alien', 'technology', 'galaxy', 'quantum'],
    'eldritch-horror': ['ancient', 'forbidden', 'cosmic', 'sanity', 'investigation']
};

✨ Creative Flair (10%)

High Score Indicators:

Unexpected plot twists
Memorable NPC characteristics
Unique problem-solving scenarios
Original world-building elements

Creativity Patterns:

Unusual adjective combinations
Unexpected story developments
Novel problem presentations
Creative use of setting elements

Integration with AI System

Automatic Evaluation

// In AI response processing
if (this.enableEvaluation && this.dmEvaluator) {
    const evaluation = this.dmEvaluator.evaluateResponse(response, {
        playerAction: actionData.action,
        setting: campaign.setting,
        character: character,
        context: memoryContext
    });
    
    if (evaluation.overallScore < this.improvementThreshold) {
        // Trigger improvement process
        response = this.improveResponse(response, evaluation.suggestedImprovements);
    }
}

Quality Monitoring

// Track evaluation history
this.responseHistory.push({
    response: response,
    evaluation: evaluation,
    timestamp: Date.now(),
    context: context
});

// Update performance metrics
this.updateAverageScore(evaluation.overallScore);

Performance Analytics

Quality Trends

const performanceReport = {
    averageScore: 7.2,
    totalEvaluations: 156,
    trendDirection: 'improving',
    strongestCriteria: 'immersion',
    weakestCriteria: 'creativity',
    improvementRate: '+0.3 points over last 10 responses'
};

Detailed Metrics

const detailedMetrics = {
    criteriaAverages: {
        immersion: 8.1,
        personality: 7.8,
        engagement: 6.9,
        flow: 7.2,
        authenticity: 7.5,
        creativity: 6.3
    },
    passingRate: 0.78, // 78% of responses meet threshold
    improvementTriggers: 34,
    averageImprovementGain: 1.2
};

Usage Examples

Basic Evaluation

// Initialize evaluator
const dmEvaluator = new DMEvaluator();

// Evaluate a response
const response = "You enter a dark room. What do you do?";
const evaluation = dmEvaluator.evaluateResponse(response, {
    playerAction: "Open the door",
    setting: "medieval-fantasy"
});

console.log(`Overall Score: ${evaluation.overallScore}/10`);
console.log(`Improvements needed: ${evaluation.suggestedImprovements}`);

Advanced Analysis

// Get detailed breakdown
const detailedAnalysis = {
    immersion: dmEvaluator.analyzeImmersion(response),
    personality: dmEvaluator.analyzePersonality(response),
    engagement: dmEvaluator.analyzeEngagement(response),
    flow: dmEvaluator.analyzeFlow(response, context),
    authenticity: dmEvaluator.analyzeAuthenticity(response, setting),
    creativity: dmEvaluator.analyzeCreativity(response)
};

// Generate improvement suggestions
const improvements = dmEvaluator.getImprovementSuggestions(detailedAnalysis);

Performance Monitoring

// Track quality over time
setInterval(() => {
    const report = dmEvaluator.getPerformanceReport();
    if (report.averageScore < 6.5) {
        console.warn('Response quality below threshold - review needed');
    }
}, 100000); // Check every 100 responses

Configuration Options

Evaluation Thresholds

const evaluationConfig = {
    improvementThreshold: 6.5,
    maxHistorySize: 50,
    enableAutoImprovement: true,
    detailedLogging: true,
    criteriaWeights: {
        immersion: 0.25,
        personality: 0.20,
        engagement: 0.20,
        flow: 0.15,
        authenticity: 0.10,
        creativity: 0.10
    }
};

Quality Standards

const qualityStandards = {
    excellent: 8.5,
    good: 7.0,
    acceptable: 6.5,
    needsImprovement: 5.0,
    poor: 3.0
};

Best Practices

Evaluation Guidelines

Context Matters: Always provide relevant context for accurate evaluation
Regular Monitoring: Track evaluation trends over time
Threshold Tuning: Adjust improvement thresholds based on campaign needs
Balanced Criteria: Ensure evaluation criteria weights match campaign priorities

Quality Improvement

Focus on Weakest Areas: Prioritize improvement suggestions for lowest-scoring criteria
Consistency: Maintain consistent evaluation standards across sessions
Player Feedback: Combine automated evaluation with player satisfaction metrics
Iterative Enhancement: Use evaluation data to refine AI prompt engineering

Troubleshooting

Common Issues

Inconsistent Scoring:

Verify context data completeness
Check evaluation criteria configuration
Review scoring algorithm parameters

Low Evaluation Scores:

Analyze specific criteria breakdowns
Review improvement suggestion patterns
Check AI prompt engineering quality

Performance Impact:

Monitor evaluation processing time
Consider evaluation frequency adjustment
Optimize evaluation algorithm efficiency

Debug Functions

// Test evaluation system
dmEvaluator.debugEvaluation(sampleResponse, sampleContext);

// Validate scoring consistency
dmEvaluator.validateScoringConsistency(responseSet);

// Export evaluation data
const evaluationData = dmEvaluator.exportEvaluationHistory();

Future Enhancements

Planned Features

Machine Learning Integration: AI-powered evaluation improvement
Player Feedback Integration: Combined automated and human evaluation
Custom Criteria: User-defined evaluation dimensions
Real-time Suggestions: Live improvement recommendations during generation
Comparative Analysis: Benchmarking against high-quality examples

Technical Improvements

Performance Optimization: Faster evaluation processing
Evaluation Accuracy: Enhanced pattern recognition and analysis
Context Awareness: Deeper understanding of campaign and character context
Quality Prediction: Predictive quality modeling for proactive improvement

Conclusion

The DM Evaluator ensures that DiceTales maintains high standards of storytelling quality by providing comprehensive, automated assessment of AI-generated content. Through detailed multi-criteria analysis and continuous improvement feedback, it helps create more immersive, engaging, and human-like gaming experiences that rival the best human Dungeon Masters.