■ Compiler

■ Filer ■ Auditor ■ Summariser ■ Tools

What this is

Compiler takes the CMS bulk export of case notes (one note per page, some spanning multiple) plus a folder of loose stage documents, and emits a single bound PDF that interleaves them on one chronological timeline β€” oldest first, with same-day docs preceding same-day notes so a reader sees the work product before the time-recording entry that bills for it.

It's a sister tool to Filer (outbound case-file builder, sectioned output) and Auditor (review companion). All three are matter-agnostic, PDF-in / PDF-out, and read the same 01 LH/ Stage folder. They differ in what they emit: Filer produces a recipient package, Compiler a journal, Auditor reports on the inputs.

Workflow

  1. Case Details β€” set the case's matter reference, client name, and dates (date of request / date actioned / case closed on). Tick Include cover page, Mark as draft, and Lock for distribution as required.
  2. Compiler β€” drop the bulk case-notes PDF into the first zone and the Stage folder into the second. Tick Include LAA/ subfolder if admin paperwork should join the timeline (off by default). Hit Compile.
  3. Output is a single bound PDF β€” Case-Compiler-<matter-ref>.pdf β€” downloaded to your browser-default folder. Drop it into +Filing/ manually if you want filing.py to pick it up.

The tabs

πŸ—‚οΈ Case Details

Per-build identity + dates + cover toggles. Matter reference is matter-agnostic β€” Compiler doesn't pattern-match the ref shape.

πŸ—“οΈ Compiler

The two drop zones, the LAA-include toggle, and the Compile button. This tab is short and re-runnable: drop new files, hit Compile.

βš™οΈ Config

Per-caseworker preferences β€” cover row toggles, filename pattern, LAA-include default, SEQ tie-break default. (Wired in Phase 4.)

🎨 Settings

App appearance, font, tab visibility, embedded canonical spec.

Confidentiality: this tool runs entirely in your browser; nothing is uploaded. The files you load, the case references and the client names are confidential β€” keep Compiler output on this machine, or filed via the normal route.

Case Summary

Identity

Cover, stamp and footer defaults live in Config (ifyiConfig.compiler.{cover,confidentialStamp,footer}.*). Position vocabulary is the canonical 6-anchor enum from PDF-OUTPUT-SPEC Β§7a. The per-build ticks and selects above override.

Compiler

Drop the CMS case-notes PDF, the Stage folder (01 LH/) and (optionally) the AP letters bundle. Compiler interleaves the lot on one chronological timeline; letters that duplicate an 01 Stage doc are dropped automatically.

1 Β· Case notes PDF

Drop the bulk case-notes PDF here

2 Β· Correspondence PDF (optional)

Drop the bulk letters / correspondence PDF here

3 Β· Stage folder

Drop the Stage folder here

4 Β· Documents folder (advanced)

Drop the Documents folder here

Ready.

Sources

Advanced settings

Config (not yet wired)

App font

App appearance

Dracula themes are dark-only β€” switching to Light keeps them dark.

Tab visibility

Syncs across every GMIAU Shell tool via ifyi_hide_guide_tab + ifyi_hide_config_tab.

Canonical spec

The current COMPILER-SPEC.md is inlined at build time. Edit the spec at gmiau-specs/COMPILER-SPEC.md and run immigrationfyi-tools update to refresh every tool that embeds it.

Show COMPILER-SPEC.md
# GMIAU Compiler Spec

**Version 0.1 Β· 2026-05-18 Β· DRAFT β€” third sister to Filer + Auditor**

- **v0.1 Β· 2026-05-18** β€” Initial spec. Phase 0 of the rollout plan.
  Targets the bulk-case-notes-PDF + loose-stage-docs use case only;
  bulk-letters-PDF input (`Bestford Ryan (RB12345) Letters - …pdf`) and
  01 Stage dedup are deferred to v0.2+.

Compiler takes the CMS bulk export of case notes (one note per page,
some spanning multiple) and a folder of loose stage documents named per
`EXPORT-SPEC` (`YYMMDD CLIENTREF SEQ Description.ext`), and emits a
single bound PDF that interleaves them on one chronological timeline β€”
oldest first, with same-day docs preceding same-day notes so a reader
sees the work product before the time-recording entry that bills for it.

**Lean-by-default (scope-reset 2026-05-18).** Compiler's primary job is
consolidation: letters / emails / file notes / attendance notes
spliced into one chronological PDF with duplicates removed (the
bulk-letters PDF deduped against the Stage folder per Β§6.2). The
Filer-style packaging β€” **Cover, Case Summary page, Index, section
dividers, Documents folder** β€” is **hidden by default** and revealed
via a `Show Filer-style options` (Advanced settings) tickbox in
Config. Class on `<html>`: `compiler-advanced-on`; localStorage key:
`ifyi_compiler_advanced` ('0' default | '1'); CSS hook:
`.advanced-only` elements are hidden when the class is absent.
Rationale: those packaging features overlap with Filer's purpose
(Filer is the proper sectioned-case-file builder); Compiler should
default to the consolidation+dedup job and let caseworkers opt in
when they want Filer-shaped output without leaving the tool. **GMIAU
logo on the cover defaults to OFF** (Filer parity, same rationale β€”
most outbound packages don't want the letterhead on the cover; opt-in
via Config when sending externally).

This is the **timeline-level companion** to:

- [`FILER-SPEC.md`](./FILER-SPEC.md) β€” outbound case-file builder
  (section-organised: Admin & Financial / Casework / Documents).
- [`AUDITOR-SPEC.md`](./AUDITOR-SPEC.md) β€” Filer's review companion
  (ownership / completeness checks on the same source folders).

Filer, Auditor, and Compiler are matter-agnostic PDF-in / PDF-out
tools that share the same `01 LH/` Stage folder as their primary input.
They differ in what they emit: Filer produces a *recipient* package
(sectioned by source folder), Compiler produces a *journal* (one
timeline). Auditor produces neither β€” it reports on the inputs.

Cross-links:

- [`FILER-SPEC.md`](./FILER-SPEC.md) β€” sister tool (sectioned output).
- [`AUDITOR-SPEC.md`](./AUDITOR-SPEC.md) β€” sister tool (review/audit).
- [`EXPORT-SPEC.md`](./EXPORT-SPEC.md) β€” filename pattern for stage docs.
- [`GMIAU-STYLE-GUIDE.md`](./GMIAU-STYLE-GUIDE.md) β€” GMIAU Shell scaffold.
- [`PDF-OUTPUT-SPEC.md`](./PDF-OUTPUT-SPEC.md) β€” PDF visual rules.

---

## Β§1 Architecture

- **Standalone HTML tool** at `immigrationfyi-tools/source/compiler.html`.
  Offline-first; no backend.
- **Sister-tool link** in the canonical header β€” `β–  Filer` next to the
  `β–  Index` back-link. (Auditor stays paired with Filer in *its* header;
  Compiler pairs to Filer as the closer of the two siblings.)
- **GMIAU Shell scaffold** per STYLE-GUIDE Β§2A: `<header>` + 4-tab
  `<nav class="tabbar">` (πŸ“– Guide β†’ πŸ—“οΈ Compiler β†’ βš™οΈ Config β†’
  βš™οΈ Settings) plus the header-level Index back-link.
- **Sentinels:** `@@IFYI_THEME@@`, `@@IFYI_SETTINGS@@`, `@@IFYI_LOGO@@`,
  `@@SPEC:COMPILER-SPEC.md@@` in `<head>` (the spec sentinel inlines
  the current spec into a Settings-tab `<pre class="gmiau-spec">` per
  the canonical pattern).
- **No client data leaves the machine.** pdf.js extracts text and slices
  pages in-tab; pdf-lib assembles the bound output; qpdf-wasm encrypts
  it; everything stays local.

---

## Β§2 Tabs (5)

| Tab | Icon | Role |
|---|---|---|
| **Guide** | πŸ“– | Authored how-to per STYLE-GUIDE Β§2B. First tab, **not** default-active. |
| **Case Details** | πŸ—‚οΈ | Per-build identity + dates + cover/footer toggles. Matter reference (matter-agnostic), client name, dates (request / actioned / closed), case description, **Import Case Summary** button (`gmiau-case-summary/1` JSON; reuses Filer's `applySummary` helper), "Include cover page" tick, three **per-page footer** ticks (Show client reference / Show client name / Show page number β€” defaults come from `ifyiConfig.compiler.footer.*`), "Mark as draft" tick, "Lock for distribution" tick. Filer-parity shape so a caseworker who knows Filer recognises this immediately. |
| **Compiler** | πŸ—“οΈ | Two drop zones (**Case notes PDF** Β· **Stage folder**) + LAA-include toggle + same-day collision review + **Compile** button. **Default-active tab.** |
| **Config** | βš™οΈ | Cover row toggles, filename pattern, LAA-include default, SEQ-tie-break default. Hideable via Settings. |
| **Settings** | βš™οΈ | Canonical 6-toggle Settings pane per STYLE-GUIDE Β§11 + Β§2C. |

Case Details and Compiler are deliberately split so per-build identity
edits don't share a pane with per-build file drops β€” the Compiler tab
is then short and re-runnable (drop new files, hit Compile).

---

## Β§3 Input/output contract

### 3.1 Inputs

Three drop zones on the Compiler tab:

1. **Case notes PDF** β€” a single PDF produced by the CMS's bulk
   "Print case notes" feature. Filename typically `Print Notes.pdf` or
   `<CLIENTREF> Notes.pdf`. Notes are printed **newest-first**; each
   note starts on a fresh page; long notes spill onto continuation
   pages with no marker.
2. **Stage folder** β€” the case's `01 LH/` (or whatever the Stage
   subfolder is named β€” Compiler doesn't pattern-match the folder
   name, only its contents). Drag-drop or `webkitdirectory` pick. Files
   matching `YYMMDD CLIENTREF SEQ Description.ext` (see EXPORT-SPEC)
   are parsed; non-matching files are surfaced in an "unparsed"
   sub-list and excluded from the timeline by default.
3. **Letters bundle PDF** *(optional)* β€” the CMS's bulk-letters export.
   Filename pattern `CLIENTSURNAME CLIENTFIRSTNAME (CASEREF) Letters -DD-MM-YYYY.pdf`
   (e.g. `Bestford Ryan (RB12345) Letters - 18-05-2026.pdf`). Page 1 is
   an `[No Client Contacts]` administrative cover (discarded); each
   subsequent letter is sliced by the Β§4.2 boundary rule and placed in
   the timeline at its own date. Letters that duplicate a 01 Stage doc
   are dropped via the Β§6.2 dedup rule. Missing/skipped β€” Compiler
   produces a valid bundle from just the notes PDF + stage folder.

The Stage folder may contain subfolders (e.g. `LAA/`). The **LAA
include-toggle** in the Compiler pane controls whether `LAA/` content
joins the timeline; default is **off** (admin-setup paperwork lives
elsewhere conceptually). Other subfolders flatten into the same
timeline with their first-level subsection recorded for the index.

### 3.2 Output

A single bound PDF, downloaded to the user's browser-default folder.

- **Filename:** `Case-Compiler-<matter-ref>.pdf` (matches Filer's
  `Case-File-<matter-ref>.pdf` slug shape so the two outputs sort
  adjacently on disk).
- **Confidentiality stamp:** PRIVATE & CONFIDENTIAL on every body page
  (ampersand glyph, never "AND" β€” per `feedback_private_and_confidential_glyph`).
  Per-build tickbox in Case Details (`Show PRIVATE & CONFIDENTIAL stamp`,
  default on). **Position is caseworker-selectable** β€” one of the six
  canonical anchors documented in
  [`PDF-OUTPUT-SPEC Β§7a`](./PDF-OUTPUT-SPEC.md#7a-body-page-overlay-positions);
  default `top-centre`. Applied by the canonical `gmiauApplyOverlays`
  helper (see `ref_ifyi_pdf_protection`); cover + index pages are
  always skipped regardless of the position chosen.
- **DRAFT watermark** when "Mark as draft" is on in Case Details:
  faint diagonal (110 pt bold Helvetica, 30Β° rotation, grey rgb(0.78,
  0.78, 0.78), opacity 0.30, roughly page-centred), drawn on **every
  page** including cover, case-summary, and index β€” a draft bundle is
  draft end-to-end. Drawn AFTER the per-page footer + confidential
  stamp so it visually overlays them. Renderer (`_compilerDrawDraftWatermark`)
  is lifted verbatim from Filer's `gmiauDrawDraftWatermark`. Implements
  per `feedback_private_and_confidential_glyph` parity with Filer.
- **PDF outline / bookmarks** β€” emitted automatically (no tickbox; the
  caseworker can ignore the bookmarks pane if they don't want it).
  Optional Cover entry at the top (labelled `<matter ref> <client
  name>`, omitted when both are blank), then one entry per timeline
  item using the same shape as the index row (`<UK-long date> β€” <Entry>`).
  Target: top of the corresponding body page. Skipped items (no body
  page) emit no bookmark. PageMode set to `UseOutlines` so PDF viewers
  open with the bookmarks pane visible.
  - **Merged-doc sub-bookmarks** (Header 2 under Header 1, added
    2026-05-18). When a 01 Stage PDF is a merged document carrying its
    own outline (e.g. a letter + enclosures bundled by Evidence
    Exhibitor or pdftk), Compiler extracts the top-level entries of
    that outline via pdf.js (`getOutline`/`getDestination`/`getPageIndex`)
    and adds them as nested children under the parent timeline-item
    bookmark. Source-outline page-0 entries are dropped (they'd just
    duplicate the parent). Deeper nesting in the source outline is
    flattened β€” only one level of children is surfaced. Letter bundles
    are unaffected β€” the CMS doesn't add outlines to bundle PDFs.
    Renderer pattern lifted from Filer's Section→Document hierarchy
    (`_filerAddOutlines`).
- **qpdf-wasm encryption** when "Lock for distribution" is on. AES-256
  via `@neslinesli93/qpdf-wasm@0.3.0` β€” same engine + version as Filer.
  Default permissions: print allowed, accessibility allowed, modify
  blocked, copy blocked, annotate blocked. Two passwords in Case
  Details (`Read (User)` / `Settings (Owner)`); when only one is
  entered it's mirrored into the other (file openable, settings
  locked behind the same password). Compile throws if Lock is on but
  both password fields are blank β€” silently delivering an unencrypted
  PDF would be a real bug. The init machinery + `window.gmiauLockPdfBytes`
  are lifted verbatim from Filer; Compiler calls it as the final
  step of the build pipeline before download.
- **Per-page footer:** body pages only (cover and index unmarked). Up to
  three elements joined by ` β€” `: client reference, client name, `p.N`.
  Each element independently toggleable from Case Details; element
  defaults in `ifyiConfig.compiler.footer.*`. **Position is
  caseworker-selectable** β€” same six-anchor enum from
  [`PDF-OUTPUT-SPEC Β§7a`](./PDF-OUTPUT-SPEC.md#7a-body-page-overlay-positions);
  default `bottom-centre`. Page numbering is body-only β€” cover and index
  don't count toward `N`. Font: small/muted (8 pt,
  `--text-muted`-equivalent) so it doesn't compete with content. Full
  mechanics in Β§8.3.
- **Metadata title:** `Case Chronology of <client name>, <UK-long date
  of compile>` β€” falls back to `Case Chronology, <date>` if client name
  is blank. Subject = `<matter ref>`. Author / Producer per the shared
  `gmiauApplyPdfMetadata` helper.

---

## Β§4 Note boundary detection

A note's **first page** is detected by per-page text extraction
(pdf.js `getTextContent`). The first non-whitespace text on page `n`
is matched against:

- `^Date Created:` β€” every note (notes 2+ in the source PDF).
- `^Case Notes Report` β€” applies only to page 1 of the source PDF,
  which carries the CMS report header above the first note's
  `Date Created:` line. Treated as part of the same first-note page.

A page that matches neither marker is a **continuation page** of the
previously-started note.

Slicing: once boundaries `[p1, p2, …, pN, totalPages+1]` are known,
note `i` is page-range `[pi, p(i+1) βˆ’ 1]` and is extracted via pdf-lib
`copyPages` into its own in-memory PDFDocument.

**Headers on continuation pages.** The CMS does not repeat the
`Date Created:` line on continuation pages, so no defensive logic is
needed. If a future CMS template adds a recurring header, Compiler
will need a stricter rule (e.g. require `Date Created:` *plus* a
specific Y-position).

**CMS report header.** The `Case Notes Report` + `Client Name:` +
`Case Identifier:` block on page 1 is **dropped** in the output β€” the
Compiler cover supplies the same identity information once, in the
right place. (See [[project_compiler]] for the rationale.)

### 4.2 Letters-bundle boundary detection (added 2026-05-18 β€” was v0.2)

The CMS exports a **bulk letters PDF** alongside the case-notes PDF:
filename pattern `CLIENTSURNAME CLIENTFIRSTNAME (CASEREF) Letters -DD-MM-YYYY.pdf`
(e.g. `Bestford Ryan (RB12345) Letters - 18-05-2026.pdf`). Each letter
inside is a full document with its own date header. The bundle's first
page is an `[No Client Contacts]` administrative cover β€” discard.

**Important asymmetry:** in CMS-generated bundles, only the **first**
letter carries the GMIAU letterhead (logo + firm contact line) above
the `PRIVATE & CONFIDENTIAL` banner. Subsequent letters skip the
letterhead and start directly with the banner. The boundary rule must
handle both.

**Boundary rule:** a page is a letter's first page iff its first ~800
characters contain BOTH `PRIVATE & CONFIDENTIAL` (case-insensitive)
AND a `Date:\s+<D> <Month> <YYYY>` pattern in close proximity. Page 1
of the bundle is always excluded (cover). Continuation pages
(signature blocks, embedded skeletons, schedules) typically carry one
marker or the other in isolation, not both adjacent β€” the combined
rule rejects them. Pages where `PRIVATE & CONFIDENTIAL` appears as
quoted content (inside a witness statement embedded in a skeleton)
won't have a fresh `Date:` line right beside it; they pass through as
continuation.

Letter slicing then works the same way as note slicing: every page
between two first-pages belongs to the earlier letter; the final
letter runs to the last bundle page β€” **except** when an attachment
marker (see Β§4.2.1) is detected, in which case the current letter
closes early and the attachment pages are dropped from the output.

### 4.2.1 Bundle attachment filter (added 2026-05-18)

CMS letter bundles frequently include non-letter attachments after a
letter β€” a Case Summary Export, a File Opening Form, a Conflict of
Interests check, GMIAU Authorities, an attached CCL, the OAL, etc.
These attachments aren't letters in their own right; they're
duplicates of the authoritative files already in `01 Stage/` (or, for
the case summary, of the JSON imported via Case Details).

**Detection.** A page is treated as an **attachment start** when its
first ~200 characters match one of these markers (case-insensitive
word match):

- `CASE SUMMARY EXPORT`
- `FILE OPENING FORM`
- `CONFLICT OF INTEREST` (with or without `S CHECK` suffix)
- `GMIAU AUTHORITIES`
- `CLIENT CARE LETTER` (the CCL, when attached as a document)
- `OPENING ADVICE LETTER` (the OAL)
- `SET(P) APPLICATION`
- `CW1` / `CW2` / `CW3` / `ECF<n>`
- `EVIDENCE OF MEANS`

The 200-char top-of-page restriction stops false positives when a
real letter quotes these phrases in body prose.

**Behaviour.** When an attachment start is detected, the in-progress
letter closes (its `endPage` becomes the previous page) and Compiler
enters an "attachment-drop" mode β€” every subsequent page is dropped
from the output until the next letter boundary (PRIVATE & CONFIDENTIAL
+ Date:) is hit. Dropped pages are counted on
`state.lettersParsed._droppedAttachmentPages` and surfaced in the
status line (`N bundle attachment page(s) dropped`).

**Rationale.** The user's authoritative case-summary source is the
imported JSON (Β§6.1); the authoritative form / letter source is the
01 Stage folder. Re-including bundle copies of these introduces both
duplication and visual collisions (the per-page PRIVATE & CONFIDENTIAL
stamp overlays the top-of-page text on a case summary, for example).
Dropping them entirely keeps the output clean.

### 4.1 Attendance-note detection

Each parsed note is classified as either a **file note** (default) or
an **attendance note**. Detection is purely text-based: when the note's
concatenated body text (every page in its `startPage..endPage` range)
matches the regex `/\bATTENDANCE NOTE\b/i` β€” the heading the CMS
template emits at the top of an attendance-style note's body β€” the
note's `isAttendance` flag is set to `true`. Otherwise it's `false`
(file note).

This single-rule detection has no false-negative recovery: if a
caseworker writes an attendance-style note without the canonical
heading, it'll show in the index as a file note. The CMS template
should consistently emit the heading; any drift becomes a CMS-template
issue, not a Compiler-parser issue.

The flag flows from `parseNotesPdf` β†’ `buildTimeline` β†’ `_compilerDrawIndex`
where it picks `File note` vs `Attendance note` for the kind label
(see Β§8.2).

---

## Β§5 Date parsing

### 5.3 Letter dates (added 2026-05-18)

Inside the bulk letters PDF, each letter's first page carries a date
in UK long form: `Date: 9 May 2026` (variants with leading zeros and
ordinals like `Date: 9th May 2026` also seen β€” accept both). Regex:

```
Date:\s+(\d{1,2})(?:st|nd|rd|th)?\s+
  (Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|
   Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
  \s+(\d{4})
```

No time component β€” letters carry day-resolution only. The parsed
date is the LETTER'S date (the day it was sent / drafted), not the
bundle's export date. Letters that fail the regex are unparsed and
tail the timeline like unparsed notes (per Β§6's unparsed-tail rule).

The bundle PDF's filename date (`Bestford Ryan (RB12345) Letters -
DD-MM-YYYY.pdf`) is the *export* date β€” useful for status reporting
("loaded bundle exported 18 May 2026") but not used for sorting.

### 5.4 Letter addressee (added 2026-05-18)

The line directly under the `Date:` line (typically a single
right-aligned full name) is captured as the letter's `addressee`.
Used by Β§6.2 for dedup matching. Best-effort β€” if the addressee can't
be parsed, dedup falls back to date-only matching with a soft
warning.

### 5.1 Case-notes timestamps

Regex on the first-page text of each note (case-insensitive,
whitespace-tolerant):

```
Date Created:\s+
  (?<dow>Mon|Tue|Wed|Thu|Fri|Sat|Sun),\s+
  (?<day>\d{1,2})\s+
  (?<mon>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+
  (?<year>\d{4})\s+
  (?<hh>\d{1,2}):(?<mm>\d{2})(?<ampm>AM|PM)
```

- `dow` is verified against the parsed date (mismatch β‡’ unparsed,
  flagged in the UI). Cheap sanity check; catches OCR drift on scanned
  print-outs without false positives on clean CMS output.
- `hh`/`mm`/`ampm` are required β€” every CMS note has a time component.
- A note that fails the regex is **unparsed** and lands at the tail of
  the timeline (matches Filer's Β§4.3 default for unparsed loose docs).

### 5.2 Loose-doc filenames

Reused verbatim from Filer's `_FILER_NAME_RE` + `_parseFilerName` (see
[`FILER-SPEC Β§3`](./FILER-SPEC.md) and `filer.html`). Output fields:
`{date YYMMDD, seq, name, ext, ref?}`. CLIENTREF is optional.

---

## Β§6 Chronological ordering rule

The output timeline is built by merging two lists:

1. Parsed notes β€” keyed on `{Y, M, D, hh, mm}`.
2. Parsed loose docs β€” keyed on `{Y, M, D, seq, name, creationDate?}`.

Sort precedence (ascending throughout):

| Level | Notes | Docs |
|---|---|---|
| 1 | Date (`Y, M, D`) | Date (parsed from filename) |
| 2 | *Type* β€” **docs sort before notes on the same day** | (same) |
| 3 | `hh:mm` | `seq` |
| 4 | (not used) | Filename alpha (collision tiebreak) |

**Same-day docs-before-notes** is the load-bearing rule. The reader sees
the work product (CCL, OAL, brief, letter) before the time-recording
entry that bills for it. *The ordering itself is not configurable.*

**Note-like items always tail their day** *(clarified 0.3)*. "Notes"
for ordering means both CMS Print Notes entries **and** Stage-folder
PDFs whose description matches `/\b(?:attendance\s*note|file\s*note|
casework\s*note)\b/i`. A note file exported into `01 LH/` with, say,
SEQ `004` still sorts to the **end** of its day β€” behind every letter
and document of that date β€” regardless of the SEQ it carries. So
`001 Letter Β· 002 Letter Β· 003 Email Β· 004 File Note` and a day with
`001 Letter Β· 002 Attendance Note Β· 003 File Note` both place the notes
last. Within the day's note pile the order is forward-chronological
(CMS notes by `hh:mm`; note files by SEQ). Implementation: each item
carries a `noteLike` flag and the same-day order rank is
`noteLike ? 2 : (letter ? 1 : 0)`.

**Reverse the CMS print.** The CMS prints newest-first; Compiler
reverses to oldest-first. The reversal is implicit in the ascending
sort β€” no special handling.

**Unparsed items.** Unparsed notes and unparsed docs both tail the
timeline, after the last parsed entry, in their respective original
orders. They're listed in the index under a final `Unparsed (excluded
from timeline)` section so the caseworker knows they're there.

### 6.2 Letter dedup against the 01 Stage folder (added 2026-05-18)

When a bulk letters PDF is loaded alongside the 01 Stage folder, many
of the letters in the bundle are **duplicates** of letters already
saved as standalone PDFs in `01 LH/` (the caseworker exports the
letter once for the file, and AP also bundles a copy into the per-case
letters export). Compiler detects these duplicates and **excludes the
bundle copy from the timeline + body + index**.

**Match rule:**

1. Restrict candidates: only 01 LH/ docs whose parsed name contains
   `Letter` or `Email` (case-insensitive) are eligible. Filenames like
   `Letter to Ryan`, `Letter to David`, `Email to David`, `Email from
   Hassoun` all qualify; `Case Summary Export` (already excluded by
   Β§6.1) and arbitrary applications don't.
2. **Date + addressee match:** a bundle letter matches a stage doc iff
   their parsed dates are equal (same `YYYY-MM-DD`) AND the bundle
   letter's addressee name appears as a substring (case-insensitive)
   within the stage doc's parsed description, OR vice versa.
   - Example: bundle letter dated `9 May 2026` addressed to `Ryan
     John Bestford` matches `260509 12345 001 Letter to Ryan.pdf`
     (the word `Ryan` from the bundle addressee appears in the stage
     filename `Letter to Ryan`).
   - Example: same-day same-recipient won't false-positive on
     `260509 12345 002 Brief to Counsel.pdf` because `Ryan` isn't in
     the filename.
3. **No match β†’ keep:** bundle letters with no candidate stage doc
   stay in the timeline (they're outbound letters that weren't filed
   separately β€” the bundle is the only source).
4. **Match β†’ drop:** the bundle letter is annotated
   `it.duplicateOf = <stage doc id>` and excluded from the body
   splicing pass. It does NOT appear in the index either β€” the index
   reflects what's in the body.

**Status reporting:** `reportTimeline` surfaces the dedup count
("N bundle letter(s) deduplicated against 01 Stage folder") so the
caseworker sees the rule fire. Per-letter detail (which bundle letter
matched which stage doc) is logged to the console for debugging but
not exposed in the UI in v0.2.

**Tie-breakers:** the match is **bijective** β€” each stage doc can
claim AT MOST ONE bundle letter. Two scenarios fall out of this:
- *Bundle letter matches multiple stage docs* (e.g. two `Letter to
  Ryan` exported on the same day with different SEQs): dedup matches
  the first-by-SEQ stage doc; that stage doc is then "claimed" and not
  eligible to claim any subsequent bundle letter.
- *Multiple bundle letters match one stage doc* (e.g. three bundle
  letters dated 21 April 2026 all addressed to `Ryan John Bestford`,
  but only one `Letter to Ryan` stage doc on that day): the first
  bundle letter (by page order) claims the stage doc and dedupes; the
  other two bundle letters **stay in the timeline** β€” they're
  presumably different letters with the same recipient on the same
  day, not copies of the one stage doc. Conservative; missed
  duplicates are recoverable (the caseworker sees the dupes in the
  output and re-runs after correcting filenames), false dedupes are
  not.

Edge cases logged to the console for review (`[compiler] dedup: ...`).

### 6.3 Administration & Financial section (added 2026-05-18)

When the LAA toggle is on AND the Stage folder contains a `LAA/`
subfolder, the LAA-folder docs are routed into an **Administration &
Financial** section that sits BEFORE the chronological body. Mirrors
Filer's three-section pattern (see FILER-SPEC Β§2 β€” Filer also ships a
default `LAA*` β†’ `admin-financial` routing rule on user request
2026-05-18 for parity).

**Output structure with both sections present:**

```
Cover  Β·  Case Summary page  Β·  Index
Administration & Financial   ← divider page (28 pt centred accent)
  LAA doc 1
  LAA doc 2
  …
Chronological case file       ← divider page
  Stage doc                  (oldest)
  Note
  Letter
  Stage doc
  …                           (newest)
```

When only one section has items (LAA toggle off, or all docs are in
LAA), the divider page is skipped β€” there's nothing to divide from.

**Index treatment.** Index rows are grouped by section with a small
accent-purple heading row between groups (`ADMINISTRATION & FINANCIAL`,
`CHRONOLOGICAL CASE FILE`). The heading rows count toward the index
pagination but carry no link annotation. Single-section indexes
render without a heading.

**Implementation.** Each timeline item carries a `section` field β€”
`'admin'` for stage docs with `inLaa: true`, `'body'` for everything
else (notes, letters, non-LAA stage docs). Sort precedence becomes:
section (admin < body) β†’ date β†’ kind β†’ primary β†’ secondary β†’ label.
Section dividers are drawn in the body splicing pass when the section
changes.

### 6.1 Case-summary file exclusion (added 2026-05-18)

Stage-folder docs whose parsed `name` matches `/^case summary\b/i`
(case-insensitive, start-of-name word match β€” catches `Case Summary
Export.pdf`, `Case Summary.pdf`; does **not** catch unrelated names
like `Case File Summary.pdf`) are filtered out of the timeline before
sort. They don't appear in the body, the index, the page count, or the
collision check.

Rationale: the case-summary content is rendered on Β§8.1a's Case Summary
page from a single, explicit source of truth (the JSON imported via
Case Details). Treating a case-summary PDF found in the Stage folder
as just-another-document would put duplicate / potentially stale
identity content into the bundle alongside the authoritative version.

The Compiler pane's status line reports the excluded count when
non-zero (`N Case Summary file(s) excluded (use Case Details β†’ Import
Case Summary)`) so the caseworker sees that the filter fired.

### 6.3 Prefer-CMS note dedup (added 0.3, 2026-05-21)

Caseworkers sometimes save a file note or attendance note into `01 LH/`
as its own PDF **and** have the same note in the CMS Print Notes export.
Compiler can drop the Stage-folder copy so the note isn't bundled twice,
treating Print Notes as the source of truth for note content.

**This is now a content comparison, not a blanket filter.** *(Before
0.3, any Stage-folder file matching the note regex was dropped
unconditionally and silently β€” even when no Print Notes PDF was loaded
and even when the export didn't actually contain that note. That lost
notes from the bundle with no signal. The 0.3 behaviour replaces it.)*

**Option.** A sticky per-browser toggle **Prefer the CMS Print Notes
export over Stage-folder note files** lives in the Config pane's
*Sources* group (`localStorage['ifyi_compiler_prefer_cms_notes']`,
default from `ifyiConfig.compiler.preferCmsNotesDefault`, default
**on**). Off β‡’ every Stage note is kept.

**When a drop happens.** Only when the option is **on** *and* a Print
Notes PDF is loaded. With no notes PDF, nothing is dropped.

**Match rule** (mirrors the Β§6.2 letter dedup: same-day, bijective,
confident-only):

1. Candidates are Stage-folder docs matching the note regex (Β§6 list).
2. For each, Compiler compares against unclaimed CMS notes **on the
   same date**. Comparison is distinctive-token overlap between the
   Stage file's extracted text (first ≀3 pages; filename description as
   fallback if the PDF won't read) and the CMS note's body + work type.
   Boilerplate / short words are stripped before scoring.
3. A match requires `shared / min(tokenset sizes) β‰₯ 0.5`. The Stage
   file is annotated `duplicateOfNote` and excluded from the timeline,
   body, index and page count. Each CMS note claims **at most one**
   Stage file.
4. A Stage note the export doesn't cover scores below threshold and is
   **kept** (and still sorts to the end of its day per Β§6).

The Compiler pane's status line reports the dropped count when non-zero
(`N Stage note file(s) dropped (duplicated in CMS Print Notes)`).

---

## Β§7 SEQ collision handling

Real-world `01 LH/` folders occasionally contain two files with the
same `YYMMDD CLIENTREF SEQ` triple β€” a filing slip. Compiler's
behaviour:

1. **Detect** during the parse step; group collisions by `{date, seq}`.
2. **Default-order** within each group by PDF `CreationDate` (read via
   pdf.js's `getMetadata`), ascending. This is a *best guess* β€” it
   reflects when the PDF was saved, not when the work was done, and
   caseworkers often re-export PDFs out of work order.
3. **Surface a soft warning** in the Compiler pane: a yellow
   call-out per collision listing the colliding files in their default
   order, with **drag-reorder** handles and a "Confirm order" button.
   No collision blocks the Compile β€” the default is always a valid
   fallback.
4. **Mention in the output index.** Collision entries are not visually
   marked in the body, but the index footnotes the collision (`†`) so
   a reader knows the order was caseworker-confirmed rather than
   purely date-derived.

Auditor will (separately) flag SEQ collisions as a filing issue. The
two tools' responsibilities don't overlap: Auditor *reports* the
collision; Compiler *resolves* it for one specific build.

---

## Β§8 Cover + index renderers

### 8.1 Cover

Single page, **shape lifted verbatim from Filer's Page 1** (the
2026-05-11 title-page redesign) β€” see `_filerDrawCoverPage1` for the
reference renderer. Caseworker preference, confirmed 2026-05-18:
Compiler should not visually diverge from Filer here.

Layout (top β†’ bottom):

- **PRIVATE & CONFIDENTIAL** banner (always on; centred, 14 pt bold).
- **No "Case Chronology" headline** under the banner β€” removed per
  user request 2026-05-18; the title-page identity (client name +
  matter ref) carries enough context.
- Centred **GMIAU logo** (toggleable; ~260 pt wide).
- **Client name** (24 pt bold, centred).
- **Matter reference** (16 pt regular, centred).
- (Big gap)
- **Firm + contact block** (toggleable; left-aligned LABEL: value rows
  at 10/11 pt β€” `FIRM:`, `CONTACT NAME:`, `CONTACT EMAIL:`,
  `CONTACT NUMBER:`; anchored near the bottom).
- **No `DATE OF REQUEST` / `DATE ACTIONED` footer.** Compiler
  diverges from Filer's cover here (user request 2026-05-18) β€” those
  dates are workflow tracking and add visual noise on the cover.
  Readers who need them still see `CASE OPENED:` / `CASE ACTIONED:`
  on the Case Summary page (Β§8.1a).

Page 2 of Filer's cover is **not used as Page 2 of the Compiler
cover**; instead, its layout is repurposed as a standalone **Case
Summary page** that sits between Cover and Index β€” see Β§8.1a.

### 8.1a Case Summary page (added 2026-05-18)

A single A4 page rendered between Cover and Index, layout lifted from
Filer's `_filerDrawCoverPage2`. PRIVATE & CONFIDENTIAL banner β†’ `CASE
SUMMARY` title (22 pt bold caps centred) β†’ `LABEL: value` rows in four
visual groups separated by a small vertical gap:

| Group | Fields |
|---|---|
| Identity | `CLIENT NAME`, `CLIENT REFERENCE NUMBER` |
| Bio | `CLIENT DATE OF BIRTH`, `COUNTRY OF ORIGIN`, `NATIONALITY`, `PLACE OF BIRTH` |
| Contact | `ADDRESS` (multi-line collapsed to comma-joined), `EMAIL`, `MOBILE` |
| Case / LAA | `MATTER REFERENCE` (only if differs from `CLIENT REFERENCE NUMBER`), `CASE OPENED`, `CASE ACTIONED` *(always the compile date β€” see note below)*, `CASE OWNER`, `MATTER TYPE`, `FUNDING`, `UCN`, `UFN`, `HO REFERENCE` |
| Description | `CASE DESCRIPTION:` heading + greedy word-wrapped body (10 pt) |

**Empty fields drop silently** β€” no orphan labels, no `[Not Specified]`
sentinels. If a whole group is empty, its gap collapses too. The page
ends without the cover's DATE OF REQUEST / DATE ACTIONED footer (those
belong on the cover).

**Source-of-truth rule (load-bearing):** the data comes from the
imported `gmiau-case-summary/1` JSON cached on
`window.gmiauCompiler.caseSummary` (Case Details β†’ Import Case
Summary button) β€” **not** from any case-summary PDF found in the
01 Stage folder, the bulk notes PDF, or (when v0.2 lands) the bulk
letters PDF. Case-summary files in the Stage folder are filtered out
of the timeline upstream β€” see Β§6.1.

Case Details DOM fields (matter ref, client name, dates, description)
take precedence over the imported JSON for the fields they cover, so a
caseworker who typed past an import sees their typed values on the
page. Fields the form doesn't carry (bio, contact, LAA, funding) come
straight from the imported JSON.

**CASE ACTIONED is auto-set to the compile date** (changed 2026-05-18,
on user request). No caseworker override. The `Date actioned` field
has been dropped from Case Details β€” the value caseworkers actually
want there is "the date this bundle was compiled", and pre-setting it
risks staleness if the bundle is rebuilt later. CASE OPENED still
comes from the imported case-summary JSON / the `Date of request`
input.

If no Case Summary was imported and no Case Details fields were
typed, the page renders as just the title + an empty space below. The
page is **always** drawn (no toggle in v0.1) β€” caseworkers who want it
gone can suppress it later via a Config switch (Phase 7 follow-up).

### 8.2 Index

Clickable index, one entry per timeline item, in chronological order.
Per-entry format:

- **Doc:** `<parsed name>, <UK-long date>` (matches FILER-SPEC Β§4.1).
- **Note:** `<kind> on <Work type>` where `<kind>` is either
  `File note` (default) or `Attendance note` (when an `ATTENDANCE NOTE`
  heading is found anywhere in the note's body text β€” see Β§4.1), and
  `<Work type>` is the `Work: …` substring from the note's
  `Record Description` line (e.g. `Telephone Call`, `FSU Appt Booking`).
  - If no `Work:` prefix is found, the row collapses to just `<kind>`
    (the trailing ` on …` is elided).
  - **Generic-bucket elision:** when the Work type is `Work Package`
    (case-insensitive β€” the CMS's generic time-recording bucket label),
    the tail is also elided. So a `Work Package`-typed note renders as
    just `File note` (or `Attendance note`), not the redundant
    `File note on Work Package`. Only `Work Package` is treated as
    generic in v0.1; specific work-types (`Legal Submissions`, `OAL`,
    `Telephone Call`, etc.) always show the tail.
  - The note's `HH:MM` does **not** appear in the index β€” time is used
    internally for same-day ordering per Β§6, but the date column
    already shows the day so the time would be visual noise.

Each entry's right-hand column shows the destination page number, and
the **whole row is a clickable /Link annotation** β€” clicking jumps to
the first page of the corresponding body item. Link rects span Date β†’
Page column inclusive (~ROW_H tall, centred on the row baseline);
borders are zero-width so the rows look like plain text until the
cursor hovers. Implementation: index renders first-pass with row
placeholders that record `{page, y, itemIdx, rect}`; after body
splicing assigns each item its `bodyPage`, `_compilerWireIndexLinks` walks
the placeholders and attaches `/Annots` entries to the index pages
pointing at the resolved body-page refs. Pattern lifted verbatim from
Filer's `_filerAddLinkAnnotations`. Items that were skipped (e.g.
non-PDF stage doc) render `β€”` in the Page column and carry no link.

The index does **not** group by day β€” every entry is a flat row. A
caseworker who wants per-day grouping can read the dates.

### 8.3 Per-page footer

A small line is drawn on every **body** page (not cover, not index β€”
see Β§3.2) at the caseworker-selected anchor (default `bottom-centre`;
six-token enum from
[`PDF-OUTPUT-SPEC Β§7a`](./PDF-OUTPUT-SPEC.md#7a-body-page-overlay-positions)).
The footer is composed of up to three elements, joined by ` β€” `
(em-dash with spaces):

1. **Client reference** β€” value of `cd-matter-ref` from Case Details.
2. **Client name** β€” value of `cd-client-name`.
3. **Page number** β€” `p.N` where `N` is the 1-indexed body-page number
   (cover and index don't count). Final body page does NOT show
   `N of M` in v0.1.

Each element is on a per-build tickbox in Case Details, defaulting to
`ifyiConfig.compiler.footer.{ref,name,page}` (all `true` by default).
If all three are off, the footer is omitted entirely (no empty line
reserved). If a single element evaluates to empty (no matter ref
entered, no client name), it's skipped from the join β€” no orphan
em-dashes.

Drawn via the canonical `gmiauApplyOverlays` extension, in the same
pass as the PRIVATE & CONFIDENTIAL header. Font: 8pt
`--text-muted`-equivalent. Bottom margin: ~12 pt from the page edge.

### 8.4 Case-summary import

The Case Details pane carries an **Import Case Summary** button that
reads a `gmiau-case-summary/1` JSON file (the same format Filer + the
other case-scoped tools consume β€” see `ref_ifyi_case_summary_consumers`).
The handler reuses Filer's `applySummary` logic, populating:

- `cd-matter-ref` from `client.case_reference` (with fallback chain
  `matter_reference` β†’ `appeal_reference`).
- `cd-client-name` from `client.client_name` (fallback `appellant_name`,
  composed `Mr, First, Middle, Surname` β†’ `Surname, First`).
- `cd-date-requested` from `client.case_open_date` (Filer parity).
- Other identity fields stored on `window.shared.*` for the cover
  renderer.

Import does **not** touch the per-build cover/footer ticks β€” those
remain caseworker-controlled per build. Re-import overwrites the
identity fields (no merge UI in v0.1; matches Filer behaviour).

---

## Β§9 Configuration

`ifyiConfig.compiler.*` (additive within `gmiau-config/1`).
Independent of `ifyiConfig.filer.*` β€” different tools, different
defaults.

- `cover.included` β€” bool, default `true`. Master cover toggle.
- `cover.page1.{logo,clientName,matterRef,subtitle,firm,dateRange}` β€”
  bool, defaults all `true` except `dateRange` which is `false` until
  the date-range computation lands.
- `confidentialStamp.included` β€” bool, default `true`. Master toggle
  for the PRIVATE & CONFIDENTIAL body-page stamp.
- `confidentialStamp.position` β€” one of the six tokens from
  [`PDF-OUTPUT-SPEC Β§7a`](./PDF-OUTPUT-SPEC.md#7a-body-page-overlay-positions),
  default `top-centre`.
- `footer.{ref,name,page}` β€” bool, defaults all `true`. Per-element
  defaults for the body-page footer (see Β§8.3). Per-build ticks in
  Case Details override.
- `footer.position` β€” one of the six tokens from
  [`PDF-OUTPUT-SPEC Β§7a`](./PDF-OUTPUT-SPEC.md#7a-body-page-overlay-positions),
  default `bottom-centre`.
- `laaIncludeDefault` β€” bool, default `false`. Initial state of the
  Compiler-pane LAA-include toggle on load.
- `preferCmsNotesDefault` β€” bool, default `true`. Initial state of the
  Β§6.3 "Prefer CMS Print Notes" toggle on load.
- `seqTieBreak` β€” `"creationDate"` (default) | `"alpha"`. Picks the
  default ordering when SEQ collides; the per-build drag-reorder UI
  overrides regardless.
- `unparsedPlacement` β€” `"tail"` (default) | `"omit"`. `tail` lists
  unparsed items in a final index section but excludes them from the
  body; `omit` drops them silently. (Note: there is intentionally no
  `interleave` option β€” Compiler refuses to guess.)
- `filenamePattern` β€” string, default Filer's. Same caseworker
  override mechanism as FILER-SPEC Β§5.

Browser-local preferences (not part of the One File config):

- `localStorage['ifyi_compiler_laa_include']` β€” sticky state of the
  Compiler-pane LAA-include toggle, `'1'` / `'0'`. Defaults to the
  `laaIncludeDefault` value above on first load.
- `localStorage['ifyi_compiler_prefer_cms_notes']` β€” sticky state of
  the Β§6.3 prefer-CMS-notes toggle, `'1'` / `'0'`. Defaults to
  `preferCmsNotesDefault` on first load.

---

## Β§10 Out of scope (v0.1)

- **Editing the timeline after compile.** Output is a one-way bound
  PDF. To change order, fix the inputs and re-compile.
- **Multi-case batch.** One case at a time.
- **Per-day dividers / month headings in the body.** Index is enough
  in v0.1; revisit if caseworkers find the flat stream hard to scan.
- **Photo-EXIF dates as a fallback for unparsed images.** Filer's
  image→A4 wrap doesn't apply here yet (no image inputs).
- **Filing inbox routing.** Compiler output downloads to the
  browser-default folder; the caseworker drops it into `+Filing/`
  manually if they want filing.py to pick it up. (Filer parity.)

---

## Β§11 Versioning

- **0.3 Β· 2026-05-21** β€” Note-like ordering + prefer-CMS dedup.
  Stage-folder file/attendance-note PDFs now sort to the **end of their
  day** like CMS notes regardless of SEQ (Β§6 `noteLike` rank). The
  blanket, silent drop of note-named Stage files (an undocumented
  pre-0.3 extension of Β§6.1) is replaced by Β§6.3: a per-build,
  content-comparison dedup gated on the new **Prefer CMS Print Notes**
  toggle (`ifyiConfig.compiler.preferCmsNotesDefault`, default on) and
  on a Print Notes PDF actually being loaded. Status line reports the
  dropped count.
- **0.2 Β· 2026-05-18** β€” Added bulk-letters-PDF input (third drop
  zone) + letter dedup against the 01 Stage folder. Letter boundary
  detection per Β§4.2 (PRIVATE & CONFIDENTIAL + Date: combined),
  letter date / addressee parsing per Β§5.3 + Β§5.4, bijective dedup
  per Β§6.2. Same-day kind order: doc < letter < note. Removed two
  out-of-scope bullets that v0.2 closed. Renamed from Chronology to
  Compiler same day (tool identity change; mechanics unchanged).
- **0.1 Β· 2026-05-18** β€” Initial spec. Captures the case-notes-PDF +
  `01 LH/` input model, same-day docs-before-notes rule, SEQ collision
  UX, cover + index renderers, and a 4-tab GMIAU Shell. Drafted as
  CHRONOLOGY-SPEC.md.