Visual Novel Guide

This guide is for visual novels, galgames, ADV titles, and other dialogue-heavy reading workflows. It explains how Mask's current implementation fits together in practice: Focus, OCR Region, Dual Mask Regions, auto-triggering, auto page-turn, and coordinate calibration.

If you have not finished the baseline setup yet, read Usage Guide (Quick Start) first.

1. Best-fit scenarios and prerequisites

This guide is most useful when you are reading visual novels in one of these environments:

Native macOS windows
Windows visual novels running through CrossOver or Whisky
Long-form dialogue reading with choices and recurring UI prompts

Before you start, make sure:

Required permissions are granted, especially Screen Recording
A provider and API key are already configured
You can complete at least one small test translation

2. Recommended workflow

Do not enable everything at once. The stable order is:

Click Focus and bind the target game window.
Open OCR Region and define the main dialogue box first.
Add option regions only if the game uses separate choice areas.
Configure Dual Mask Regions only when you need translated overlays in fixed positions.
Choose the Auto Capture Scope.
Then add trigger detection, global click triggering, auto page-turn, and calibration as needed.

Think of this as two levels:

Minimum usable setup: Focus + OCR Region, then manually run Translate Once
Stable advanced setup: add dual masks, trigger regions, auto page-turn, and calibration if your environment needs it

3. Select the window first

Focus opens a shareable window list. The selected window becomes the bound target for OCR, VLM, selection preview, and auto page-turn.

This matters because Mask does not simply act on whichever window happens to be frontmost. It binds explicitly to a windowID.

That bound window is used for:

Translate Once
VLM Translation
The live background preview inside region editors

If no window is selected first, those flows are blocked and the app asks you to go back and choose a window.

Practical advice:

Start the game and enter a text-visible scene before clicking Focus
If the game restarts, changes process, or switches to another real window, verify the binding again
If the region editor preview stays blank or gray, confirm the selected window is correct

4. Define OCR regions

OCR Region is not limited to one rectangle. The current implementation supports:

1 main dialogue region
0 or more option regions

That means you can separate the main dialogue from choice text instead of mixing them into one OCR pass.

Main dialogue region

Keep the box tight around the actual dialogue text. Avoid including:

character portraits
animated effects
decorative subtitles
clickable UI buttons

This reduces OCR noise and makes automatic triggering more stable.

Option regions

If the game presents choices outside the main dialogue box, add dedicated option regions.

The editor currently supports:

adding option regions
deleting the current option
renaming each option region
reordering option regions

Interaction behavior:

drag on empty space to create a box
drag the box body to move it
drag the edge or corner handles to resize it

If your goal is to translate dialogue and choices separately, this is the key setup step.

5. Dual mask regions

Dual Mask Regions controls where translated overlays are shown. It does not control where OCR reads from.

Keep this distinction clear:

OCR Region decides what gets read
Dual Mask Regions decides where translated text is displayed

In single-region OCR, one dialogue mask or even no mask can be acceptable.

But once multi-region OCR is enabled, meaning you have separate option OCR regions, the current implementation requires exactly two masks:

1 dialogue mask
1 option mask

Otherwise the translation flow is rejected.

Recommended mapping:

dialogue mask aligned to the in-game dialogue box
option mask aligned to the in-game choice area

This keeps dialogue and option translations from colliding visually.

6. Auto capture scope and trigger strategy

Auto Capture Scope changes what Mask actually screenshots during automatic translation.

OCR region only

Automatic translation captures only the main OCR region.

Best for:

stable dialogue box layouts
workflows focused only on dialogue text
minimizing interference from unrelated UI and visual effects

Important: if this mode is selected but no main OCR region is configured, the automatic path fails immediately.

Full window

Automatic translation captures the whole bound window.

Best for:

layouts where dialogue position changes often
cases where broader visual context matters
scenes where you want more window-level context

Screen change detection region

Screen Change Detection Region affects only change detection. It is not the same thing as the final translation screenshot area.

This is especially useful in visual novels because many games include:

blinking portraits
slight character movement
particle effects
flashing UI elements

Keeping the detection region tight around the dialogue area reduces false triggers.

Click detection region

Click Detection Region limits where a global click should count as a meaningful trigger candidate.

This is useful when:

you only want clicks near the dialogue box or page-turn hotspot to matter
you do not want unrelated UI clicks to trigger a new detection pass

A safe default for visual novels is:

keep the detection region near the dialogue box
keep the click detection region near the reading or page-turn interaction area

7. Auto page-turn and coordinate calibration

If the current environment supports page-turn injection, Auto Page-Turn Region appears.

One important limitation: Due to the limit of Apple, the Mac App Store version does not support auto page-turn at this time. If you use the store build, you can still follow the rest of this guide for window binding, OCR regions, detection regions, and overlay setup, then rely on manual translation or other trigger methods instead.

This region is the click target used for automatic progression, typically:

next line
next page
the game's text-advance hotspot

Recommended workflow:

Define the real clickable page-turn hotspot first.
If the game has transition frames or afterimages, increase the post-click buffer.
Only then decide whether to enable translate-after-click.

This reduces the chance of capturing half-transition frames.

When calibration is necessary

Coordinate Calibration mainly exists for CrossOver and Whisky setups.

Use it if you see patterns like:

the box looks correct but the actual click lands off target
auto page-turn clicks miss the hotspot
captured regions are consistently offset from the intended area

Recommended order:

visualize and define the auto page-turn region first
open Coordinate Calibration
fine-tune offsetX, offsetY, scaleX, and scaleY

If your environment already aligns correctly, you do not need calibration.

8. Transient snip and VLM

Transient snip translation

Transient snip translation is best for occasional cases, for example:

a system prompt outside the main dialogue box
a menu or pop-up that you only need once
a one-off translation without changing the main OCR workflow

It is a supplemental tool, not the core visual novel reading path.

VLM translation

VLM Translation is better understood as full-window scene-aware enhancement, not as a replacement for everyday OCR dialogue reading.

A good mental model is:

everyday continuous reading: use the OCR workflow first
complex scenes, unusual layout, or cases that need extra context: try VLM

In the current implementation, manual VLM Translation always captures the full window and does not follow Auto Capture Scope.

9. Three practical setup patterns

Setup A: Basic dialogue mode

Best for stable main-dialogue reading.

bind the game window with Focus
set only the main dialogue OCR Region
enable screen change detection
optionally use one main dialogue mask

Notes:

this is the most stable entry-level setup
if false triggers happen often, tighten the detection region first

Setup B: Split-choice mode

Best for games with separate choice text and where you want separate translation display.

use the main dialogue box as the primary OCR region
add dedicated OCR regions for options
configure Dual Mask Regions
make sure the mask count is exactly 2

Notes:

multi-region OCR currently requires dialogue mask + option mask
if the mask count is wrong, translation is refused

Setup C: Auto-advance mode

Best when you want long reading sessions with minimal manual intervention.

define Auto Page-Turn Region
enable auto page-turn
optionally enable translate-after-click
add coordinate calibration only if offsets exist

Notes:

first make page-turn clicks accurate, then automate the rest
if screenshots often catch transition frames, increase the page-turn buffer first

10. When to debug configuration before changing models

If results feel unstable, do not assume the model is the first problem. Check these items first:

whether the bound window is still correct
whether the main OCR region is too large
whether the detection region includes portraits or effects
whether multi-region OCR already has exactly two masks
whether the auto page-turn target really lands on the game's advance hotspot

In many visual novel workflows, accurate configuration matters more than switching to a stronger model.

1. Best-fit scenarios and prerequisites​

2. Recommended workflow​

3. Select the window first​

4. Define OCR regions​

Main dialogue region​

Option regions​

5. Dual mask regions​

6. Auto capture scope and trigger strategy​

OCR region only​

Full window​

Screen change detection region​

Click detection region​

7. Auto page-turn and coordinate calibration​

When calibration is necessary​

8. Transient snip and VLM​

Transient snip translation​

VLM translation​

9. Three practical setup patterns​

Setup A: Basic dialogue mode​

Setup B: Split-choice mode​

Setup C: Auto-advance mode​

10. When to debug configuration before changing models​

1. Best-fit scenarios and prerequisites

2. Recommended workflow

3. Select the window first

4. Define OCR regions

Main dialogue region

Option regions

5. Dual mask regions

6. Auto capture scope and trigger strategy

OCR region only

Full window

Screen change detection region

Click detection region

7. Auto page-turn and coordinate calibration

When calibration is necessary

8. Transient snip and VLM

Transient snip translation

VLM translation

9. Three practical setup patterns

Setup A: Basic dialogue mode

Setup B: Split-choice mode

Setup C: Auto-advance mode

10. When to debug configuration before changing models