Visual Novel Guide
This guide is for visual novels, galgames, ADV titles, and other dialogue-heavy reading workflows. It explains how Mask's current implementation fits together in practice: Focus, OCR Region, Dual Mask Regions, auto-triggering, auto page-turn, and coordinate calibration.
If you have not finished the baseline setup yet, read Usage Guide (Quick Start) first.
1. Best-fit scenarios and prerequisites
This guide is most useful when you are reading visual novels in one of these environments:
- Native macOS windows
- Windows visual novels running through CrossOver or Whisky
- Long-form dialogue reading with choices and recurring UI prompts
Before you start, make sure:
- Required permissions are granted, especially Screen Recording
- A provider and API key are already configured
- You can complete at least one small test translation
2. Recommended workflow
Do not enable everything at once. The stable order is:
- Click
Focusand bind the target game window. - Open
OCR Regionand define the main dialogue box first. - Add option regions only if the game uses separate choice areas.
- Configure
Dual Mask Regionsonly when you need translated overlays in fixed positions. - Choose the
Auto Capture Scope. - Then add trigger detection, global click triggering, auto page-turn, and calibration as needed.
Think of this as two levels:
- Minimum usable setup:
Focus+OCR Region, then manually runTranslate Once - Stable advanced setup: add dual masks, trigger regions, auto page-turn, and calibration if your environment needs it
3. Select the window first
Focus opens a shareable window list. The selected window becomes the bound target for OCR, VLM, selection preview, and auto page-turn.
This matters because Mask does not simply act on whichever window happens to be frontmost. It binds explicitly to a windowID.
That bound window is used for:
Translate OnceVLM Translation- The live background preview inside region editors
If no window is selected first, those flows are blocked and the app asks you to go back and choose a window.
Practical advice:
- Start the game and enter a text-visible scene before clicking
Focus - If the game restarts, changes process, or switches to another real window, verify the binding again
- If the region editor preview stays blank or gray, confirm the selected window is correct
4. Define OCR regions
OCR Region is not limited to one rectangle. The current implementation supports:
1 main dialogue region0 or more option regions
That means you can separate the main dialogue from choice text instead of mixing them into one OCR pass.
Main dialogue region
Keep the box tight around the actual dialogue text. Avoid including:
- character portraits
- animated effects
- decorative subtitles
- clickable UI buttons
This reduces OCR noise and makes automatic triggering more stable.
Option regions
If the game presents choices outside the main dialogue box, add dedicated option regions.
The editor currently supports:
- adding option regions
- deleting the current option
- renaming each option region
- reordering option regions
Interaction behavior:
- drag on empty space to create a box
- drag the box body to move it
- drag the edge or corner handles to resize it
If your goal is to translate dialogue and choices separately, this is the key setup step.
5. Dual mask regions
Dual Mask Regions controls where translated overlays are shown. It does not control where OCR reads from.
Keep this distinction clear:
OCR Regiondecides what gets readDual Mask Regionsdecides where translated text is displayed
In single-region OCR, one dialogue mask or even no mask can be acceptable.
But once multi-region OCR is enabled, meaning you have separate option OCR regions, the current implementation requires exactly two masks:
1 dialogue mask1 option mask
Otherwise the translation flow is rejected.
Recommended mapping:
- dialogue mask aligned to the in-game dialogue box
- option mask aligned to the in-game choice area
This keeps dialogue and option translations from colliding visually.
6. Auto capture scope and trigger strategy
Auto Capture Scope changes what Mask actually screenshots during automatic translation.
OCR region only
Automatic translation captures only the main OCR region.
Best for:
- stable dialogue box layouts
- workflows focused only on dialogue text
- minimizing interference from unrelated UI and visual effects
Important: if this mode is selected but no main OCR region is configured, the automatic path fails immediately.
Full window
Automatic translation captures the whole bound window.
Best for:
- layouts where dialogue position changes often
- cases where broader visual context matters
- scenes where you want more window-level context
Screen change detection region
Screen Change Detection Region affects only change detection. It is not the same thing as the final translation screenshot area.
This is especially useful in visual novels because many games include:
- blinking portraits
- slight character movement
- particle effects
- flashing UI elements
Keeping the detection region tight around the dialogue area reduces false triggers.
Click detection region
Click Detection Region limits where a global click should count as a meaningful trigger candidate.
This is useful when:
- you only want clicks near the dialogue box or page-turn hotspot to matter
- you do not want unrelated UI clicks to trigger a new detection pass
A safe default for visual novels is:
- keep the detection region near the dialogue box
- keep the click detection region near the reading or page-turn interaction area
7. Auto page-turn and coordinate calibration
If the current environment supports page-turn injection, Auto Page-Turn Region appears.
One important limitation: Due to the limit of Apple, the Mac App Store version does not support auto page-turn at this time. If you use the store build, you can still follow the rest of this guide for window binding, OCR regions, detection regions, and overlay setup, then rely on manual translation or other trigger methods instead.
This region is the click target used for automatic progression, typically:
- next line
- next page
- the game's text-advance hotspot
Recommended workflow:
- Define the real clickable page-turn hotspot first.
- If the game has transition frames or afterimages, increase the post-click buffer.
- Only then decide whether to enable translate-after-click.
This reduces the chance of capturing half-transition frames.
When calibration is necessary
Coordinate Calibration mainly exists for CrossOver and Whisky setups.
Use it if you see patterns like:
- the box looks correct but the actual click lands off target
- auto page-turn clicks miss the hotspot
- captured regions are consistently offset from the intended area
Recommended order:
- visualize and define the auto page-turn region first
- open
Coordinate Calibration - fine-tune
offsetX,offsetY,scaleX, andscaleY
If your environment already aligns correctly, you do not need calibration.
8. Transient snip and VLM
Transient snip translation
Transient snip translation is best for occasional cases, for example:
- a system prompt outside the main dialogue box
- a menu or pop-up that you only need once
- a one-off translation without changing the main OCR workflow
It is a supplemental tool, not the core visual novel reading path.
VLM translation
VLM Translation is better understood as full-window scene-aware enhancement, not as a replacement for everyday OCR dialogue reading.
A good mental model is:
- everyday continuous reading: use the OCR workflow first
- complex scenes, unusual layout, or cases that need extra context: try VLM
In the current implementation, manual VLM Translation always captures the full window and does not follow Auto Capture Scope.
9. Three practical setup patterns
Setup A: Basic dialogue mode
Best for stable main-dialogue reading.
- bind the game window with
Focus - set only the main dialogue
OCR Region - enable screen change detection
- optionally use one main dialogue mask
Notes:
- this is the most stable entry-level setup
- if false triggers happen often, tighten the detection region first
Setup B: Split-choice mode
Best for games with separate choice text and where you want separate translation display.
- use the main dialogue box as the primary OCR region
- add dedicated OCR regions for options
- configure
Dual Mask Regions - make sure the mask count is exactly
2
Notes:
- multi-region OCR currently requires
dialogue mask + option mask - if the mask count is wrong, translation is refused
Setup C: Auto-advance mode
Best when you want long reading sessions with minimal manual intervention.
- define
Auto Page-Turn Region - enable auto page-turn
- optionally enable translate-after-click
- add coordinate calibration only if offsets exist
Notes:
- first make page-turn clicks accurate, then automate the rest
- if screenshots often catch transition frames, increase the page-turn buffer first
10. When to debug configuration before changing models
If results feel unstable, do not assume the model is the first problem. Check these items first:
- whether the bound window is still correct
- whether the main OCR region is too large
- whether the detection region includes portraits or effects
- whether multi-region OCR already has exactly two masks
- whether the auto page-turn target really lands on the game's advance hotspot
In many visual novel workflows, accurate configuration matters more than switching to a stronger model.