AI Conversation Engine Design / AI 대화 엔진 설계

Last verified: 2026년 2월 / February 2026

한국어

개요

본 문서는 혜경궁 홍씨(Lady Hyegyong) AI NPC 프로젝트의 핵심인 대화 엔진 설계에 관한 리서치 결과를 정리합니다. AI NPC의 대화 엔진은 단순히 사용자의 질문에 답변하는 수준을 넘어, 조선 시대 왕실의 격식 있는 말투를 유지하면서도 정해진 교육적 목표(편지 쓰기, 서예, 예절, 다과)를 자연스럽게 달성해야 하는 복합적인 임무를 수행합니다.

전통적인 챗봇 시스템은 정해진 시나리오에 따라 움직이므로 유연성이 부족하고, 순수 LLM(Large Language Model) 기반 시스템은 대화의 흐름을 통제하기 어려워 교육적 목적을 달성하지 못하거나 역사적 사실에서 벗어나는 ‘환각(Hallucination)’ 현상이 발생할 위험이 있습니다. 따라서 본 프로젝트에서는 LLM의 자연스러운 언어 생성 능력과 구조화된 계획(Structured Planning) 시스템을 결합한 하이브리드 아키텍처를 채택하여, 자유로운 대화와 목적 지향적 상호작용 사이의 균형을 맞추고자 합니다.

이 엔진은 1816년이라는 시간적 배경 내에서 혜경궁 홍씨의 페르소나를 완벽하게 구현하며, 사용자와의 10-15분 내외의 세션 동안 깊이 있는 역사적 체험을 제공하는 것을 목표로 합니다.

핵심 발견

1. 하이브리드 대화 아키텍처 (Hybrid LLM + Structured Planning)

리서치 결과, 가장 효과적인 접근 방식은 LLM이 자연스러운 대화를 담당하고, Arbor 및 ChatSOP와 같은 구조화된 계획 도구가 교육적 목표와 활동 시퀀스를 관리하는 하이브리드 모델입니다.

Arbor 플랫폼 활용: 대화의 논리적 흐름과 상태를 관리하는 하이브리드 엔진입니다. 리서치 데이터에 따르면 Arbor 도입 시 대화 정확도가 29.4pp 향상되었으며, 레이턴시는 57.1% 감소, 운영 비용은 14.4배 절감되는 효과가 확인되었습니다.
ChatSOP 시스템: 표준 운영 절차(SOP)를 기반으로 AI의 행동을 제어합니다. 이를 통해 복잡한 활동(예: 다과 상차림) 중 AI가 수행해야 할 구체적인 액션의 정확도가 27.95% 향상되었습니다.

2. 교육 콘텐츠 전달 전략 (Mission System & Topic Tracking)

사용자가 지루함을 느끼지 않도록 교육적 내용을 주입하는 것이 핵심입니다.

Mission System: 사용자가 현재 수행 중인 과업(예: “효(孝)에 대해 배우기”)을 추적하고, 관련 역사적 사실이 대화 중에 언급되었는지 확인합니다.
토픽 트래킹 (Topic Tracking): 4가지 주요 활동(편지 쓰기, 서예, 예절, 다과)이 대화 흐름 속에서 유기적으로 소개되도록 관리합니다. NPC는 일방적인 강의가 아닌, 사용자의 질문이나 반응에 맞춰 관련 주제를 자연스럽게 꺼냅니다.

3. 활동 통합 대화 패턴

각 활동은 ‘시연(Demonstration) → 사용자 시도(User attempt) → NPC 반응(NPC response)‘의 3단계 패턴을 따릅니다.

편지 쓰기: 혜경궁이 편지의 중요성을 설명하고 직접 쓰는 모습을 보여준 뒤, 사용자가 내용을 구성하도록 유도합니다.
효 편액 서예: 붓 잡는 법과 글자의 의미를 설명하고, 사용자가 가상 붓으로 글씨를 쓸 때 격려와 피드백을 제공합니다.
아침 문안 예절: 조선 왕실의 인사법을 시연하고 사용자의 동작을 가이드합니다.
접빈 다과 상차림: 다구의 명칭과 배치 순서를 대화로 풀어나가며 함께 상을 차립니다.

4. 컨텍스트 및 메모리 관리

일관성 있는 대화를 위해 4가지 유형의 메모리를 운용합니다.

Sliding Window: 최근 대화 내용(약 10-20턴)을 유지하여 문맥을 파악합니다.
에피소드 기억 (Episodic Memory): 현재 세션에서 발생한 고유한 사건(예: 사용자의 이름, 이전에 나눈 특정 주제)을 기억합니다.
시맨틱 기억 (Semantic Memory): 혜경궁 홍씨와 조선 왕실에 대한 역사적 사실 데이터를 저장합니다.
절차적 기억 (Procedural Memory): 각 활동을 수행하는 단계별 방법론을 저장합니다.

5. 프롬프트 엔지니어링 프레임워크 (7계층 구조)

일관된 페르소나 유지를 위해 다음과 같은 7계층 프롬프트 구조를 설계합니다.

정체성 (Identity): 혜경궁 홍씨로서의 기본 설정.
성격 (Personality): 한중록(Hanjungnok) 분석을 통한 7가지 핵심 특성(인내, 비극적 우아함, 교육적 열정 등).
화법 (Speech Style): 조선 시대 궁중 용어 및 격식체 사용 지침.
지식 범위 (Knowledge Scope): 1816년 이전의 지식으로 제한 (미래 사건 언급 방지).
주제 처리 (Topic Handling): 사도세자(Prince Sado) 등 민감한 주제에 대한 완곡어법 및 대응 전략.
의사결정 트리 (Decision Tree): 대화 상황에 따라 설명할지, 시연할지, 질문할지 결정하는 로직.
가드레일 (Guardrails): 역사적 왜곡 방지 및 안전 가이드라인.

6. 세션 내러티브 아크 (Narrative Arc)

10-15분간의 체험은 다음 4단계로 구성됩니다.

인사 (Greeting): 사용자를 반갑게 맞이하며 시대적 배경과 상황을 설정합니다.
전개 (Development): 일상적인 대화를 통해 역사적 배경을 공유하고 활동의 필요성을 언급합니다.
참여 (Engagement): 본격적인 활동(서예, 다과 등)을 함께 수행하며 상호작용합니다.
마무리 (Closing): 활동의 의미를 되새기고 따뜻한 작별 인사를 나눕니다.

비교 분석

2026년 2월 기준, 주요 LLM 모델 및 플랫폼의 대화 엔진 성능 비교입니다.

비교 항목	GPT-4o (OpenAI)	Claude 3.5 Sonnet (Anthropic)	Convai Built-in
한국어 자연스러움	최상 (문맥 파악 우수)	상 (격식체 표현 우수)	중상 (NPC 특화 최적화)
추론 능력	매우 높음	높음	중간
레이턴시 (TTFT)	~200ms (API 기준)	~300ms	<150ms (최적화됨)
캐릭터 유지력	우수 (프롬프트 의존)	매우 우수 (지침 준수 엄격)	우수 (전용 툴셋 제공)
활동/액션 연동	API 커스텀 필요	API 커스텀 필요	네이티브 지원 (Action Graph)
비용 (1k 토큰)	$0.005 / $0.015	$0.003 / $0.015	구독형 (월 $1,199 Scale)
역사적 정확성	RAG 필수	RAG 필수	Knowledge Base 내장

알려진 갭 및 향후 과제

Samsung Galaxy XR (알려진 갭): Android XR 기반의 새로운 디바이스로, OpenXR 표준을 따르지만 대화 엔진의 음성 인식(STT) 및 처리 레이턴시가 Quest 3와 동일한 수준으로 유지될지는 추가 검증이 필요합니다.
복잡한 활동 로직: 서예나 다과 상차림 시 사용자의 미세한 동작을 실시간으로 인식하여 대화에 반영하는 ‘멀티모달(Multimodal)’ 연동은 현재 기술 수준에서 레이턴시 발생의 주요 원인이 될 수 있습니다.
오프라인 환경: 전시장 네트워크 장애 시 로컬 모델(Llama 3.2 3B 등)로 전환될 때, 페르소나의 말투와 지식 수준이 급격히 저하되는 ‘페르소나 열화’ 문제를 해결하기 위한 경량화 프롬프트 연구가 필요합니다.

출처 및 참고문헌

Arbor Official Documentation: “Hybrid Conversation Design for AI NPCs” (2025)
ChatSOP Technical Whitepaper: “Action Accuracy in Procedural AI Interactions” (2025)
OpenAI API Reference: “GPT-4o Real-time Capabilities” (2026)
Anthropic Documentation: “Claude 3.5 Sonnet System Prompting Guide” (2025)
Convai SDK Manual: “Integrating Actions with Narrative Design” (2026)

English

Overview

This document summarizes the research findings on the design of the AI Conversation Engine, which is the core of the Lady Hyegyong AI NPC project. The conversation engine for the AI NPC performs a complex mission that goes beyond simply answering user questions; it must maintain the formal speech style of the Joseon royal court while naturally achieving set educational goals (letter writing, calligraphy, etiquette, and tea ceremony).

Traditional chatbot systems lack flexibility as they follow fixed scenarios, and pure Large Language Model (LLM)-based systems have difficulty controlling the flow of conversation, risking failure to achieve educational objectives or causing ‘hallucinations’ where the AI deviates from historical facts. Therefore, this project adopts a hybrid architecture that combines the natural language generation capabilities of LLMs with a Structured Planning system to balance free conversation and goal-oriented interaction.

This engine aims to perfectly implement the persona of Lady Hyegyong within the historical context of 1816, providing a deep historical experience during a session of approximately 10-15 minutes with the user.

Key Findings

1. Hybrid Conversation Architecture (Hybrid LLM + Structured Planning)

Research indicates that the most effective approach is a hybrid model where the LLM handles natural dialogue, and structured planning tools like Arbor and ChatSOP manage educational goals and activity sequences.

Arbor Platform Utilization: A hybrid engine that manages the logical flow and state of conversation. According to research data, the introduction of Arbor improved conversation accuracy by 29.4 percentage points (pp), reduced latency by 57.1%, and decreased operating costs by 14.4 times.
ChatSOP System: Controls AI behavior based on Standard Operating Procedures (SOP). This has improved the accuracy of specific actions the AI must perform during complex activities (e.g., tea ceremony table setting) by 27.95%.

2. Educational Content Delivery Strategy (Mission System & Topic Tracking)

The key is to inject educational content so that users do not feel bored.

Mission System: Tracks the tasks the user is currently performing (e.g., “Learning about Hyo (Filial Piety)”) and verifies if relevant historical facts have been mentioned during the conversation.
Topic Tracking: Manages the organic introduction of the four main activities (letter writing, calligraphy, etiquette, and tea ceremony) within the flow of conversation. The NPC naturally brings up relevant topics based on the user’s questions or reactions, rather than giving a one-sided lecture.

3. Activity Integration Dialogue Patterns

Each activity follows a three-step pattern: ‘Demonstration → User attempt → NPC response.’

Letter Writing (편지 쓰기): Lady Hyegyong explains the importance of letters and demonstrates writing one, then encourages the user to compose the content.
Hyo Calligraphy (효 편액 서예): She explains how to hold the brush and the meaning of the characters, providing encouragement and feedback as the user writes with a virtual brush.
Morning Greeting Etiquette (아침 문안 예절): She demonstrates the greeting methods of the Joseon royal court and guides the user’s movements.
Tea Ceremony (접빈 다과 상차림): She explains the names and placement order of the tea set through conversation and sets the table together with the user.

4. Context and Memory Management

Four types of memory are operated for consistent conversation.

Sliding Window: Maintains recent conversation history (approximately 10-20 turns) to understand the context.
Episodic Memory: Remembers unique events that occurred in the current session (e.g., the user’s name, specific topics discussed previously).
Semantic Memory: Stores historical fact data about Lady Hyegyong and the Joseon royal court.
Procedural Memory: Stores step-by-step methodologies for performing each activity.

5. Prompt Engineering Framework (7-Layer Structure)

A 7-layer prompt structure is designed to maintain a consistent persona.

Identity: Basic settings as Lady Hyegyong.
Personality: Seven core traits (patience, tragic elegance, educational passion, etc.) analyzed through Hanjungnok (Memoirs of Lady Hyegyong).
Speech Style: Guidelines for using Joseon court terminology and formal language.
Knowledge Scope: Limited to knowledge prior to 1816 (preventing mention of future events).
Topic Handling: Euphemisms and response strategies for sensitive topics such as Prince Sado (사도세자).
Decision Tree: Logic to decide whether to explain, demonstrate, or ask questions based on the conversation situation.
Guardrails: Guidelines for preventing historical distortion and ensuring safety.

6. Session Narrative Arc

The 10-15 minute experience consists of the following four stages:

Greeting (인사): Welcomes the user and sets the historical background and situation.
Development (전개): Shares historical background through casual conversation and mentions the necessity of activities.
Engagement (참여): Interacts while performing main activities (calligraphy, tea ceremony, etc.) together.
Closing (마무리): Reflects on the meaning of the activities and shares a warm farewell.

Comparative Analysis

As of February 2026, here is a comparison of the conversation engine performance of major LLM models and platforms.

Comparison Item	GPT-4o (OpenAI)	Claude 3.5 Sonnet (Anthropic)	Convai Built-in
Korean Naturalness	Excellent (Contextual understanding)	High (Formal expressions)	Medium-High (NPC-specific optimization)
Reasoning Ability	Very High	High	Medium
Latency (TTFT)	~200ms (API-based)	~300ms	<150ms (Optimized)
Character Consistency	Excellent (Prompt-dependent)	Very Excellent (Strict adherence)	Excellent (Dedicated toolset)
Activity/Action Integration	Requires API Customization	Requires API Customization	Native Support (Action Graph)
Cost (per 1k tokens)	$0.005 / $0.015	$0.003 / $0.015	Subscription (Scale: $1,199/mo)
Historical Accuracy	RAG Required	RAG Required	Built-in Knowledge Base

Recommendations & Trade-off Analysis

Option 1: Design Based on Convai Integrated Platform

Recommended when rapid prototyping and stable action integration are required.

Pros: Deep integration with Unity SDK, ease of activity integration through built-in Action Graph, low latency.
Cons: Limitations in customization, high monthly subscription cost, platform dependency.
Suitable Situations: When the development period is tight and you want to connect NPC behavior and dialogue without complex coding.

Option 2: Custom Stack (GPT-4o/Claude + Arbor)

Recommended when the highest level of conversation quality and a proprietary system are required.

Pros: Ability to utilize the powerful reasoning capabilities of the latest LLMs, precise conversation flow control through Arbor, securing data ownership.
Cons: High development difficulty, need for latency management due to individual API integration, initial setup costs.
Suitable Situations: When long-term production operation is required and you want to finely tune the unique conversation logic of Lady Hyegyong.

Known Gaps & Future Work

Samsung Galaxy XR (Known Gap): As a new device based on Android XR, it follows OpenXR standards, but further verification is needed to see if speech recognition (STT) and processing latency for the conversation engine will be maintained at the same level as Quest 3.
Complex Activity Logic: ‘Multimodal’ integration, which recognizes the user’s fine movements during calligraphy or tea ceremony and reflects them in conversation in real-time, can be a major cause of latency at the current technical level.
Offline Environment: Research on lightweight prompts is needed to solve the ‘persona degradation’ problem, where the tone and knowledge level of the persona drop sharply when switching to local models (such as Llama 3.2 3B) during exhibition network failures.

Sources & References

Arbor Official Documentation: “Hybrid Conversation Design for AI NPCs” (2025)
ChatSOP Technical Whitepaper: “Action Accuracy in Procedural AI Interactions” (2025)
OpenAI API Reference: “GPT-4o Real-time Capabilities” (2026)
Anthropic Documentation: “Claude 3.5 Sonnet System Prompting Guide” (2025)
Convai SDK Manual: “Integrating Actions with Narrative Design” (2026)