Skip to main content

Visual Novel Guide

This guide is for visual novels, galgames, ADV titles, and other dialogue-heavy reading workflows. It explains how Mask's current implementation fits together in practice: Focus, OCR Region, Dual Mask Regions, auto-triggering, auto page-turn, and coordinate calibration.

If you have not finished the baseline setup yet, read Usage Guide (Quick Start) first.

1. Best-fit scenarios and prerequisites

This guide is most useful when you are reading visual novels in one of these environments:

  • Native macOS windows
  • Windows visual novels running through CrossOver or Whisky
  • Long-form dialogue reading with choices and recurring UI prompts

Before you start, make sure:

  • Required permissions are granted, especially Screen Recording
  • A provider and API key are already configured
  • You can complete at least one small test translation

Do not enable everything at once. The stable order is:

  1. Click Focus and bind the target game window.
  2. Open OCR Region and define the main dialogue box first.
  3. Add option regions only if the game uses separate choice areas.
  4. Configure Dual Mask Regions only when you need translated overlays in fixed positions.
  5. Choose the Auto Capture Scope.
  6. Then add trigger detection, global click triggering, auto page-turn, and calibration as needed.

Think of this as two levels:

  • Minimum usable setup: Focus + OCR Region, then manually run Translate Once
  • Stable advanced setup: add dual masks, trigger regions, auto page-turn, and calibration if your environment needs it

3. Select the window first

Focus opens a shareable window list. The selected window becomes the bound target for OCR, VLM, selection preview, and auto page-turn.

This matters because Mask does not simply act on whichever window happens to be frontmost. It binds explicitly to a windowID.

That bound window is used for:

  • Translate Once
  • VLM Translation
  • The live background preview inside region editors

If no window is selected first, those flows are blocked and the app asks you to go back and choose a window.

Practical advice:

  • Start the game and enter a text-visible scene before clicking Focus
  • If the game restarts, changes process, or switches to another real window, verify the binding again
  • If the region editor preview stays blank or gray, confirm the selected window is correct

4. Define OCR regions

OCR Region is not limited to one rectangle. The current implementation supports:

  • 1 main dialogue region
  • 0 or more option regions

That means you can separate the main dialogue from choice text instead of mixing them into one OCR pass.

Main dialogue region

Keep the box tight around the actual dialogue text. Avoid including:

  • character portraits
  • animated effects
  • decorative subtitles
  • clickable UI buttons

This reduces OCR noise and makes automatic triggering more stable.

Option regions

If the game presents choices outside the main dialogue box, add dedicated option regions.

The editor currently supports:

  • adding option regions
  • deleting the current option
  • renaming each option region
  • reordering option regions

Interaction behavior:

  • drag on empty space to create a box
  • drag the box body to move it
  • drag the edge or corner handles to resize it

If your goal is to translate dialogue and choices separately, this is the key setup step.

5. Dual mask regions

Dual Mask Regions controls where translated overlays are shown. It does not control where OCR reads from.

Keep this distinction clear:

  • OCR Region decides what gets read
  • Dual Mask Regions decides where translated text is displayed

In single-region OCR, one dialogue mask or even no mask can be acceptable.

But once multi-region OCR is enabled, meaning you have separate option OCR regions, the current implementation requires exactly two masks:

  • 1 dialogue mask
  • 1 option mask

Otherwise the translation flow is rejected.

Recommended mapping:

  • dialogue mask aligned to the in-game dialogue box
  • option mask aligned to the in-game choice area

This keeps dialogue and option translations from colliding visually.

6. Auto capture scope and trigger strategy

Auto Capture Scope changes what Mask actually screenshots during automatic translation.

OCR region only

Automatic translation captures only the main OCR region.

Best for:

  • stable dialogue box layouts
  • workflows focused only on dialogue text
  • minimizing interference from unrelated UI and visual effects

Important: if this mode is selected but no main OCR region is configured, the automatic path fails immediately.

Full window

Automatic translation captures the whole bound window.

Best for:

  • layouts where dialogue position changes often
  • cases where broader visual context matters
  • scenes where you want more window-level context

Screen change detection region

Screen Change Detection Region affects only change detection. It is not the same thing as the final translation screenshot area.

This is especially useful in visual novels because many games include:

  • blinking portraits
  • slight character movement
  • particle effects
  • flashing UI elements

Keeping the detection region tight around the dialogue area reduces false triggers.

Click detection region

Click Detection Region limits where a global click should count as a meaningful trigger candidate.

This is useful when:

  • you only want clicks near the dialogue box or page-turn hotspot to matter
  • you do not want unrelated UI clicks to trigger a new detection pass

A safe default for visual novels is:

  • keep the detection region near the dialogue box
  • keep the click detection region near the reading or page-turn interaction area

7. Auto page-turn and coordinate calibration

If the current environment supports page-turn injection, Auto Page-Turn Region appears.

One important limitation: Due to the limit of Apple, the Mac App Store version does not support auto page-turn at this time. If you use the store build, you can still follow the rest of this guide for window binding, OCR regions, detection regions, and overlay setup, then rely on manual translation or other trigger methods instead.

This region is the click target used for automatic progression, typically:

  • next line
  • next page
  • the game's text-advance hotspot

Recommended workflow:

  1. Define the real clickable page-turn hotspot first.
  2. If the game has transition frames or afterimages, increase the post-click buffer.
  3. Only then decide whether to enable translate-after-click.

This reduces the chance of capturing half-transition frames.

When calibration is necessary

Coordinate Calibration mainly exists for CrossOver and Whisky setups.

Use it if you see patterns like:

  • the box looks correct but the actual click lands off target
  • auto page-turn clicks miss the hotspot
  • captured regions are consistently offset from the intended area

Recommended order:

  1. visualize and define the auto page-turn region first
  2. open Coordinate Calibration
  3. fine-tune offsetX, offsetY, scaleX, and scaleY

If your environment already aligns correctly, you do not need calibration.

8. Transient snip and VLM

Transient snip translation

Transient snip translation is best for occasional cases, for example:

  • a system prompt outside the main dialogue box
  • a menu or pop-up that you only need once
  • a one-off translation without changing the main OCR workflow

It is a supplemental tool, not the core visual novel reading path.

VLM translation

VLM Translation is better understood as full-window scene-aware enhancement, not as a replacement for everyday OCR dialogue reading.

A good mental model is:

  • everyday continuous reading: use the OCR workflow first
  • complex scenes, unusual layout, or cases that need extra context: try VLM

In the current implementation, manual VLM Translation always captures the full window and does not follow Auto Capture Scope.

9. Three practical setup patterns

Setup A: Basic dialogue mode

Best for stable main-dialogue reading.

  • bind the game window with Focus
  • set only the main dialogue OCR Region
  • enable screen change detection
  • optionally use one main dialogue mask

Notes:

  • this is the most stable entry-level setup
  • if false triggers happen often, tighten the detection region first

Setup B: Split-choice mode

Best for games with separate choice text and where you want separate translation display.

  • use the main dialogue box as the primary OCR region
  • add dedicated OCR regions for options
  • configure Dual Mask Regions
  • make sure the mask count is exactly 2

Notes:

  • multi-region OCR currently requires dialogue mask + option mask
  • if the mask count is wrong, translation is refused

Setup C: Auto-advance mode

Best when you want long reading sessions with minimal manual intervention.

  • define Auto Page-Turn Region
  • enable auto page-turn
  • optionally enable translate-after-click
  • add coordinate calibration only if offsets exist

Notes:

  • first make page-turn clicks accurate, then automate the rest
  • if screenshots often catch transition frames, increase the page-turn buffer first

10. When to debug configuration before changing models

If results feel unstable, do not assume the model is the first problem. Check these items first:

  • whether the bound window is still correct
  • whether the main OCR region is too large
  • whether the detection region includes portraits or effects
  • whether multi-region OCR already has exactly two masks
  • whether the auto page-turn target really lands on the game's advance hotspot

In many visual novel workflows, accurate configuration matters more than switching to a stronger model.