The DM Evaluator (advanced/js/dmEvaluator.js
) is an intelligent response quality assessment system that scores and improves AI-generated content to feel more like a human Dungeon Master. This system analyzes AI responses across multiple dimensions and provides feedback for continuous improvement of storytelling quality.
The system evaluates responses across six key criteria:
evaluationCriteria: {
immersion: {
weight: 0.25,
name: "Immersion & Atmosphere",
description: "Rich sensory details, vivid descriptions, world-building"
},
personality: {
weight: 0.20,
name: "DM Personality",
description: "Human-like warmth, enthusiasm, unique voice"
},
engagement: {
weight: 0.20,
name: "Player Engagement",
description: "Specific story developments, concrete events, avoiding open-ended questions"
},
flow: {
weight: 0.15,
name: "Narrative Flow",
description: "Natural transitions, pacing, story coherence"
},
authenticity: {
weight: 0.10,
name: "D&D Authenticity",
description: "Rules knowledge, genre conventions, terminology"
},
creativity: {
weight: 0.10,
name: "Creative Flair",
description: "Unexpected twists, memorable NPCs, unique situations"
}
}
Comprehensive Assessment:
Weighted Scoring System:
// Example score calculation
const totalScore = (
immersionScore * 0.25 +
personalityScore * 0.20 +
engagementScore * 0.20 +
flowScore * 0.15 +
authenticityScore * 0.10 +
creativityScore * 0.10
);
Immersion & Atmosphere Assessment:
DM Personality Evaluation:
Player Engagement Metrics:
Quality Threshold Monitoring:
Improvement Suggestions:
const dmEvaluator = new DMEvaluator();
evaluateResponse(response, context)
Evaluates an AI response and returns detailed scoring.
const evaluation = dmEvaluator.evaluateResponse(
"The ancient door creaks open, revealing...",
{
playerAction: "Open the door",
setting: "medieval-fantasy",
character: characterData
}
);
Returns:
{
overallScore: 7.2,
scores: {
immersion: 8.0,
personality: 7.5,
engagement: 6.8,
flow: 7.0,
authenticity: 7.2,
creativity: 6.5
},
feedback: {
strengths: ["Rich atmospheric description", "Good character voice"],
improvements: ["Add more specific story elements", "Include clearer choices"]
},
passesThreshold: true,
suggestedImprovements: [...]
}
analyzeImmersion(response)
Analyzes response for immersive qualities.
const immersionScore = dmEvaluator.analyzeImmersion(response);
analyzePersonality(response)
Evaluates DM personality and voice.
const personalityScore = dmEvaluator.analyzePersonality(response);
analyzeEngagement(response)
Assesses player engagement factors.
const engagementScore = dmEvaluator.analyzeEngagement(response);
analyzeFlow(response, context)
Evaluates narrative flow and coherence.
const flowScore = dmEvaluator.analyzeFlow(response, context);
analyzeAuthenticity(response, setting)
Checks D&D/RPG authenticity and genre appropriateness.
const authenticityScore = dmEvaluator.analyzeAuthenticity(response, "medieval-fantasy");
analyzeCreativity(response)
Measures creative and unique elements.
const creativityScore = dmEvaluator.analyzeCreativity(response);
getPerformanceReport()
Returns comprehensive performance analysis.
const report = dmEvaluator.getPerformanceReport();
getImprovementSuggestions(scores)
Provides specific improvement recommendations.
const suggestions = dmEvaluator.getImprovementSuggestions(evaluationScores);
generateQualityReport(responseHistory)
Creates detailed quality analysis report.
const qualityReport = dmEvaluator.generateQualityReport(responseHistory);
High Score Indicators:
Evaluation Patterns:
const immersionIndicators = [
/\b(?:smell|scent|aroma|stench)\b/gi,
/\b(?:sound|noise|echo|whisper|rumble)\b/gi,
/\b(?:texture|rough|smooth|cold|warm)\b/gi,
/\b(?:shadows|light|darkness|glow|shimmer)\b/gi
];
Example High-Quality Response:
“The musty smell of ancient parchment fills your nostrils as you step into the forgotten library. Dust motes dance in the pale moonlight streaming through cracked windows, and somewhere in the darkness, you hear the faint scratching of… something moving among the shelves.”
High Score Indicators:
Personality Markers:
const personalityIndicators = [
/\b(?:Oh|Ah|Well|Now|Hmm)\b/g,
/[!]{1,2}$/gm,
/\b(?:interesting|fascinating|exciting|wonderful)\b/gi,
/you (?:notice|feel|hear|see|sense)/gi
];
Example High-Quality Response:
“Oh, this is interesting! As you approach the mysterious figure, you can’t help but notice their eyes seem to shimmer with an otherworldly intelligence. There’s definitely more to this encounter than meets the eye…”
High Score Indicators:
Engagement Patterns:
const engagementIndicators = [
/suddenly|unexpectedly|meanwhile|however/gi,
/you (?:must|can|could|might) (?:decide|choose|determine)/gi,
/what (?:do you|will you) do/gi,
/three (?:paths|options|choices)/gi
];
Example High-Quality Response:
“Your successful intimidation causes the bandit to drop his weapon immediately! He backs away, fear evident in his eyes, and points toward a hidden path through the woods. ‘The treasure… it’s in the old mill by the river,’ he stammers. You now have a clear lead, but you must decide whether to trust this information or interrogate him further.”
High Score Indicators:
Flow Analysis:
High Score Indicators:
Authenticity Markers:
const authenticityTerms = {
'medieval-fantasy': ['spell', 'magic', 'sword', 'armor', 'tavern', 'quest'],
'modern-urban': ['technology', 'city', 'investigation', 'conspiracy'],
'sci-fi-space': ['starship', 'alien', 'technology', 'galaxy', 'quantum'],
'eldritch-horror': ['ancient', 'forbidden', 'cosmic', 'sanity', 'investigation']
};
High Score Indicators:
Creativity Patterns:
// In AI response processing
if (this.enableEvaluation && this.dmEvaluator) {
const evaluation = this.dmEvaluator.evaluateResponse(response, {
playerAction: actionData.action,
setting: campaign.setting,
character: character,
context: memoryContext
});
if (evaluation.overallScore < this.improvementThreshold) {
// Trigger improvement process
response = this.improveResponse(response, evaluation.suggestedImprovements);
}
}
// Track evaluation history
this.responseHistory.push({
response: response,
evaluation: evaluation,
timestamp: Date.now(),
context: context
});
// Update performance metrics
this.updateAverageScore(evaluation.overallScore);
const performanceReport = {
averageScore: 7.2,
totalEvaluations: 156,
trendDirection: 'improving',
strongestCriteria: 'immersion',
weakestCriteria: 'creativity',
improvementRate: '+0.3 points over last 10 responses'
};
const detailedMetrics = {
criteriaAverages: {
immersion: 8.1,
personality: 7.8,
engagement: 6.9,
flow: 7.2,
authenticity: 7.5,
creativity: 6.3
},
passingRate: 0.78, // 78% of responses meet threshold
improvementTriggers: 34,
averageImprovementGain: 1.2
};
// Initialize evaluator
const dmEvaluator = new DMEvaluator();
// Evaluate a response
const response = "You enter a dark room. What do you do?";
const evaluation = dmEvaluator.evaluateResponse(response, {
playerAction: "Open the door",
setting: "medieval-fantasy"
});
console.log(`Overall Score: ${evaluation.overallScore}/10`);
console.log(`Improvements needed: ${evaluation.suggestedImprovements}`);
// Get detailed breakdown
const detailedAnalysis = {
immersion: dmEvaluator.analyzeImmersion(response),
personality: dmEvaluator.analyzePersonality(response),
engagement: dmEvaluator.analyzeEngagement(response),
flow: dmEvaluator.analyzeFlow(response, context),
authenticity: dmEvaluator.analyzeAuthenticity(response, setting),
creativity: dmEvaluator.analyzeCreativity(response)
};
// Generate improvement suggestions
const improvements = dmEvaluator.getImprovementSuggestions(detailedAnalysis);
// Track quality over time
setInterval(() => {
const report = dmEvaluator.getPerformanceReport();
if (report.averageScore < 6.5) {
console.warn('Response quality below threshold - review needed');
}
}, 100000); // Check every 100 responses
const evaluationConfig = {
improvementThreshold: 6.5,
maxHistorySize: 50,
enableAutoImprovement: true,
detailedLogging: true,
criteriaWeights: {
immersion: 0.25,
personality: 0.20,
engagement: 0.20,
flow: 0.15,
authenticity: 0.10,
creativity: 0.10
}
};
const qualityStandards = {
excellent: 8.5,
good: 7.0,
acceptable: 6.5,
needsImprovement: 5.0,
poor: 3.0
};
Inconsistent Scoring:
Low Evaluation Scores:
Performance Impact:
// Test evaluation system
dmEvaluator.debugEvaluation(sampleResponse, sampleContext);
// Validate scoring consistency
dmEvaluator.validateScoringConsistency(responseSet);
// Export evaluation data
const evaluationData = dmEvaluator.exportEvaluationHistory();
The DM Evaluator ensures that DiceTales maintains high standards of storytelling quality by providing comprehensive, automated assessment of AI-generated content. Through detailed multi-criteria analysis and continuous improvement feedback, it helps create more immersive, engaging, and human-like gaming experiences that rival the best human Dungeon Masters.