Exhibition Operations Infrastructure / 전시 운영 인프라

Last verified: 2026년 2월 / February 2026

한국어

개요

본 문서는 ‘혜경궁 홍씨 AI NPC’ 프로젝트의 성공적인 전시 운영을 위한 하드웨어 및 소프트웨어 인프라 설계를 다룹니다. 박물관이나 전시관과 같은 공공장소에서 다수의 사용자가 동시에 고품질의 MR(Mixed Reality) 체험을 안정적으로 즐기기 위해서는 클라우드 의존도를 낮추고 현장 처리 능력을 강화하는 엣지 컴퓨팅(Edge Computing) 전략이 필수적입니다.

전시 인프라는 단순히 기기를 연결하는 것을 넘어, 네트워크 지연 시간(Latency) 최소화, 인터넷 단절 시의 대응 체계(Fallback), 다수 기기의 동시 운영 관리, 그리고 관람객의 개인정보 보호를 포괄하는 종합적인 시스템 아키텍처를 지향합니다. 특히 본 프로젝트는 Quest 3 헤드셋을 주력 기기로 사용하며, 향후 Samsung Galaxy XR과의 호환성까지 고려한 유연한 인프라 구성을 목표로 합니다.

핵심 발견

리서치를 통해 확인된 전시 운영 인프라의 주요 핵심 사항은 다음과 같습니다.

엣지 서버의 필요성: 실시간 대화형 AI NPC의 응답 속도를 보장하기 위해 NVIDIA Jetson AGX Orin을 활용한 로컬 추론 서버 구축이 가장 효율적입니다. 이는 클라우드 장애 시에도 기본적인 대화 기능을 유지할 수 있게 합니다.
Wi-Fi 6E 표준 채택: 5대 이상의 헤드셋이 동시에 고해상도 패스쓰루(Passthrough) 및 음성 데이터를 주고받는 환경에서는 6GHz 대역을 사용하는 Wi-Fi 6E가 간섭 최소화와 대역폭 확보를 위해 필수적입니다.
4단계 폴백(Fallback) 전략: 클라우드(GPT-4o) → 엣지(Local LLM) → 온디바이스(Whisper Sentis) → 프리스크립트(Pre-scripted)로 이어지는 계층적 대응 체계를 통해 어떠한 상황에서도 전시가 중단되지 않도록 설계해야 합니다.
실시간 모니터링 체계: vLLM 엔진과 Prometheus, Grafana를 연동하여 AI 모델의 추론 속도, 에러율, 서버 부하 상태를 운영 인력이 실시간으로 파악할 수 있는 대시보드가 필요합니다.
개인정보 보호 준수: 한국의 개인정보보호법(PIPA)에 따라 관람객의 음성 데이터를 서버에 저장하지 않는 ‘No Data Retention’ 정책과 API 키의 안전한 관리가 필수적입니다.

비교 분석

1. 엣지 서버 하드웨어 비교 (2026년 2월 기준)

항목	NVIDIA Jetson AGX Orin	Mini PC (RTX 4070)	비고
가격	약 $2,500	약 $1,500	Orin은 산업용 모듈 포함가
AI 성능	275 TOPS (INT8)	고성능 CUDA 코어 활용	Orin은 전력 효율 우수
전력 소모	15W ~ 60W (가변)	200W ~ 400W	Orin은 장시간 전시 운영에 유리
폼팩터	소형 임베디드 (팬리스 가능)	일반 데스크탑/미니 PC	Orin은 전시물 내 매립 용이
소프트웨어	JetPack SDK, TensorRT-LLM	표준 Windows/Linux 환경	Orin은 최적화 난이도 높음

2. 로컬 LLM 옵션 비교 (한국어 성능 중심)

모델명	파라미터 수	한국어 품질	추론 속도 (Orin 기준)	특징
Llama 3.2	3B / 8B	보통 (튜닝 필요)	매우 빠름	글로벌 생태계 지원 우수
HyperCLOVA X	비공개 (API/On-prem)	최상	보통 (네트워크 의존)	한국어 문맥 및 문화 이해도 최고
OPEN-SOLAR-KO	10.7B	우수	빠름	한국어 특화 오픈소스 모델

4단계 오프라인 폴백(Fallback) 전략

전시장의 인터넷 환경은 가변적일 수 있으므로, 다음과 같은 계층적 대응 체계를 구축합니다.

Tier 1: Cloud (Primary)
- 구성: GPT-4o + ElevenLabs (Cloud)
- 특징: 최고의 대화 품질과 자연스러운 음성 제공. 인터넷 연결이 정상일 때 기본 사용.
Tier 2: Edge (Local LLM)
- 구성: Jetson AGX Orin + Local LLM (Llama 3.2) + Local TTS
- 특징: 인터넷 단절 시 즉시 전환. 클라우드 대비 품질은 약간 낮으나 실시간 대화 유지 가능.
Tier 3: On-device (Minimal)
- 구성: Quest 3 + Whisper Sentis + Minimal Response Logic
- 특징: 엣지 서버 통신 장애 시 헤드셋 자체에서 구동. “잠시만 기다려 주십시오”와 같은 안내 및 기본 상호작용 수행.
Tier 4: Pre-scripted (Emergency)
- 구성: Hardcoded Dialogue Data
- 특징: 모든 AI 시스템 불능 시 미리 저장된 시나리오 기반의 대화로 전환하여 체험의 흐름을 유지.

멀티 헤드셋 운영 및 네트워크 설계

5대의 헤드셋을 동시에 운영하기 위한 상세 설계안입니다.

세션 격리: 각 헤드셋은 독립적인 세션 ID를 부여받아 엣지 서버의 개별 컨텍스트 메모리를 점유합니다.
대역폭 관리: 헤드셋당 평균 5Mbps, 피크 시 10Mbps를 할당하여 총 25~50Mbps의 안정적인 대역폭을 확보합니다.
VLAN 분리: 전시용 헤드셋 망(VLAN 10), 운영 관리용 망(VLAN 20), 일반 관람객용 공용 Wi-Fi를 논리적으로 분리하여 보안과 성능을 보장합니다.
QoS(Quality of Service): 네트워크 스위치에서 XR 트래픽(음성 및 제어 데이터)에 최우선 순위(DSCP 0x28 등)를 부여하여 데이터 패킷 손실을 방지합니다.

보안 및 개인정보 보호

개인정보보호법 준수: 관람객의 음성 데이터는 텍스트 변환(STT) 후 즉시 파기하며, 어떠한 개인 식별 정보도 서버에 저장하지 않는 ‘Zero-Retention’ 아키텍처를 적용합니다.
WPA3 보안: 무선 네트워크 보안을 위해 최신 WPA3 암호화 표준을 적용합니다.
API 키 관리: 클라우드 서비스의 API 키는 환경 변수나 보안 볼트(Vault) 시스템을 통해 관리하며 정기적으로 갱신합니다.

하드웨어 BOM (Bill of Materials)

2026년 2월 기준 예상 견적

품목	상세 사양	수량	단가 (예상)	합계
엣지 서버	NVIDIA Jetson AGX Orin 64GB	1	$2,500	$2,500
XR 헤드셋	Meta Quest 3 128GB	5	$500	$2,500
무선 공유기	Wi-Fi 6E 지원 (Tri-band)	1	$300	$300
네트워크 스위치	L2 Managed Switch (PoE 지원)	1	$200	$200
무정전 전원장치	UPS (1500VA급)	1	$300	$300
운영용 PC	모니터링 및 관리용 노트북	1	$1,500	$1,500
기타 소모품	케이블, 랙, 냉각 팬 등	1	$500	$500
총계				약 $7,800

참고: HMS(Huawei Mobile Services) 등 일부 플랫폼 서비스는 2030년까지 무료 이용이 가능한 것으로 파악되어 운영비 산정에서 제외함.

일일 운영 워크플로우

개장 전 점검 (30분): 엣지 서버 부팅, 네트워크 상태 확인, 헤드셋 배터리 완충 확인, AI 모델 로딩 테스트.
운영 중 모니터링: Grafana 대시보드를 통한 실시간 레이턴시 및 에러 모니터링, 관람객 착용 지원.
폐장 후 관리: 데이터 백업(비식별 통계 데이터), 기기 소독 및 충전, 시스템 업데이트 확인.

알려진 갭 및 향후 과제

Samsung Galaxy XR 호환성: 현재 Quest 3 중심으로 설계되었으나, Samsung Galaxy XR의 구체적인 네트워크 요구사항 및 SDK 최적화 데이터가 부족함. 이는 향후 Android XR 플랫폼 출시 시점에 맞춰 업데이트가 필요함.
대규모 동시 접속: 5대 이상의 대규모 운영 시 엣지 서버의 부하 분산(Load Balancing) 기술에 대한 추가 검증이 필요함.

출처 및 참고문헌

NVIDIA Jetson AGX Orin Technical Documentation (2025)
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention (2024)
Meta Quest for Business: Network Setup Guide (2025)
한국 개인정보보호위원회: AI 시대 개인정보 보호 가이드라인 (2024)

English

Overview

This document outlines the design of the hardware and software infrastructure for the successful exhibition operation of the ‘Lady Hyegyong AI NPC’ project. To ensure that multiple users can stably enjoy high-quality MR (Mixed Reality) experiences in public spaces such as museums or exhibition halls, an edge computing strategy that reduces cloud dependency and strengthens on-site processing capabilities is essential.

Exhibition infrastructure goes beyond simply connecting devices; it aims for a comprehensive system architecture that encompasses minimizing network latency, establishing a fallback system for internet outages, managing the simultaneous operation of multiple devices, and protecting visitor privacy. In particular, this project uses the Quest 3 headset as the primary device and aims for a flexible infrastructure configuration that considers compatibility with the Samsung Galaxy XR in the future.

Key Findings

The key findings regarding exhibition operation infrastructure identified through research are as follows:

Necessity of Edge Servers: To guarantee the response speed of real-time conversational AI NPCs, building a local inference server using NVIDIA Jetson AGX Orin is the most efficient approach. This allows basic conversation functions to be maintained even during cloud failures.
Adoption of Wi-Fi 6E Standard: In an environment where five or more headsets simultaneously exchange high-resolution passthrough and voice data, Wi-Fi 6E using the 6GHz band is essential for minimizing interference and securing bandwidth.
4-Tier Fallback Strategy: A hierarchical response system consisting of Cloud (GPT-4o) → Edge (Local LLM) → On-device (Whisper Sentis) → Pre-scripted ensures that the exhibition is not interrupted under any circumstances.
Real-time Monitoring System: A dashboard is needed that allows operation personnel to grasp the inference speed, error rate, and server load status of AI models in real-time by linking the vLLM engine with Prometheus and Grafana.
Privacy Compliance: In accordance with the Korean Personal Information Protection Act (PIPA), a ‘No Data Retention’ policy that does not store visitor voice data on the server and secure management of API keys are mandatory.

Comparative Analysis

1. Edge Server Hardware Comparison (As of February 2026)

Item	NVIDIA Jetson AGX Orin	Mini PC (RTX 4070)	Remarks
Price	Approx. $2,500	Approx. $1,500	Orin price includes industrial module
AI Performance	275 TOPS (INT8)	High-performance CUDA cores	Orin has superior power efficiency
Power Consumption	15W ~ 60W (Variable)	200W ~ 400W	Orin is advantageous for long-term operation
Form Factor	Small Embedded (Fanless possible)	Standard Desktop/Mini PC	Orin is easy to embed in exhibits
Software	JetPack SDK, TensorRT-LLM	Standard Windows/Linux	Orin has higher optimization difficulty

2. Local LLM Options Comparison (Focusing on Korean Performance)

Model Name	Parameters	Korean Quality	Inference Speed (Orin)	Features
Llama 3.2	3B / 8B	Average (Tuning needed)	Very Fast	Excellent global ecosystem support
HyperCLOVA X	Undisclosed (API/On-prem)	Excellent	Average (Network dependent)	Best understanding of Korean context
OPEN-SOLAR-KO	10.7B	Good	Fast	Korean-specialized open-source model

Recommendations & Trade-off Analysis

1. Recommended Edge Server: NVIDIA Jetson AGX Orin

Reason for Recommendation: In an exhibition environment, heat management in narrow spaces and stability over long hours (8+ hours a day) are crucial. Jetson AGX Orin is highly power-efficient and possesses industrial-grade reliability, making it the most suitable for embedding within exhibits.

Detailed Configuration: Deploy vLLM based on Docker containers to improve management efficiency.


# vLLM execution example (Llama 3.2 3B)
vllm serve meta-llama/Llama-3.2-3B-Instruct \
  --gpu-memory-utilization 0.7 \
  --max-model-len 4096 \
  --max-num-seq 16

Trade-off: The initial cost is higher than a standard PC, and specialized personnel are required to optimize AI models in a Linux environment based on the ARM architecture.

2. Recommended Network Configuration: Dedicated Wi-Fi 6E Network

Reason for Recommendation: When five Quest 3 headsets each occupy more than 5Mbps of bandwidth and transmit real-time voice data, existing 2.4GHz or 5GHz bands are vulnerable to surrounding interference. Wi-Fi 6E, which uses the 6GHz band, has almost no channel interference, guaranteeing stable latency.
Trade-off: The price of Wi-Fi 6E supported routers is high, and the signal reach may be shortened depending on the material of the exhibition walls, requiring precise AP (Access Point) placement design.

3. Recommended Monitoring Stack: vLLM + Prometheus + Grafana

Reason for Recommendation: It is necessary to immediately detect situations where AI NPC responses slow down or errors occur. Since vLLM natively provides metrics in Prometheus format, these can be visualized on a Grafana dashboard for at-a-glance management at the operations desk.
Key Metrics: Monitor TTFT (Time to First Token), Inter-token Latency, and KV Cache Usage.
Trade-off: Operating a separate monitoring server increases infrastructure complexity.

4-Tier Offline Fallback Strategy

Since the internet environment of the exhibition hall can be variable, the following hierarchical response system is established:

Tier 1: Cloud (Primary)
- Configuration: GPT-4o + ElevenLabs (Cloud)
- Features: Provides the highest conversation quality and natural voice. Used as the default when the internet connection is normal.
Tier 2: Edge (Local LLM)
- Configuration: Jetson AGX Orin + Local LLM (Llama 3.2) + Local TTS
- Features: Immediate switch upon internet disconnection. Quality is slightly lower than the cloud, but real-time conversation can be maintained.
Tier 3: On-device (Minimal)
- Configuration: Quest 3 + Whisper Sentis + Minimal Response Logic
- Features: Runs on the headset itself in case of edge server communication failure. Performs basic interactions and announcements such as “Please wait a moment.”
Tier 4: Pre-scripted (Emergency)
- Configuration: Hardcoded Dialogue Data
- Features: Switches to conversations based on pre-stored scenarios in case of total AI system failure to maintain the flow of the experience.

Multi-Headset Operations & Network Design

Detailed design for operating five headsets simultaneously:

Session Isolation: Each headset is assigned a unique session ID and occupies individual context memory on the edge server.
Bandwidth Management: An average of 5Mbps and a peak of 10Mbps are allocated per headset to secure a stable total bandwidth of 25-50Mbps.
VLAN Segmentation: The exhibition headset network (VLAN 10), operation management network (VLAN 20), and public Wi-Fi for general visitors are logically separated to ensure security and performance.
QoS (Quality of Service): Top priority (e.g., DSCP 0x28) is given to XR traffic (voice and control data) on the network switch to prevent data packet loss.

Security & Privacy

Compliance with PIPA: Visitor voice data is destroyed immediately after text conversion (STT), and a ‘Zero-Retention’ architecture is applied where no personally identifiable information is stored on the server.
WPA3 Security: The latest WPA3 encryption standard is applied for wireless network security.
API Key Management: API keys for cloud services are managed through environment variables or security vault systems and updated regularly.

Hardware BOM (Bill of Materials)

Estimated Quote as of February 2026

Item	Detailed Specification	Quantity	Unit Cost (Est.)	Total
Edge Server	NVIDIA Jetson AGX Orin 64GB	1	$2,500	$2,500
XR Headset	Meta Quest 3 128GB	5	$500	$2,500
Wireless Router	Wi-Fi 6E Support (Tri-band)	1	$300	$300
Network Switch	L2 Managed Switch (PoE Support)	1	$200	$200
UPS	UPS (1500VA Class)	1	$300	$300
Operation PC	Laptop for Monitoring & Management	1	$1,500	$1,500
Miscellaneous	Cables, Rack, Cooling Fans, etc.	1	$500	$500
Total				Approx. $7,800

Note: Some platform services like HMS (Huawei Mobile Services) are identified as free to use until 2030 and are excluded from the operating cost calculation.

Daily Operations Workflow

Pre-opening Check (30 min): Boot edge server, check network status, ensure headsets are fully charged, perform AI model loading test.
During Exhibition Monitoring: Monitor real-time latency and errors via Grafana dashboard, assist visitors with wearing devices.
Post-closing Management: Back up data (de-identified statistical data), sanitize and charge devices, check for system updates.

Known Gaps & Future Work

Samsung Galaxy XR Compatibility: Currently designed around Quest 3, but specific network requirements and SDK optimization data for Samsung Galaxy XR are lacking. This needs to be updated in line with the release of the Android XR platform.
Large-scale Concurrent Access: Further verification of load balancing technology for edge servers is required for large-scale operations of more than five units.

Sources & References

NVIDIA Jetson AGX Orin Technical Documentation (2025)
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention (2024)
Meta Quest for Business: Network Setup Guide (2025)
Personal Information Protection Commission: AI Era Personal Information Protection Guidelines (2024)