Transcripts and captions —
institutional tier.
Audio-and-video to text deliverables — Thai ↔ English — across the five substantive transcript/caption forms: verbatim transcription for court, arbitration, and regulatory record; cleaned transcripts for board, AGM, audit committee, investor day; closed captions for pre-recorded video and conference recordings under WCAG 2.2 accessibility; translated subtitles for bilingual video deliverables; and live captioning / CART for real-time event accessibility. All operate under named in-house bench review against ISO/IEC 20071-23 and WCAG 2.2 accessibility floors, with AI-assisted first-pass disclosure baked into the engagement letter — Othello does not deliver raw machine transcripts as final product.
forms
accuracy floor
accessibility
not raw AI
Five forms. One operating discipline.
Transcription and captioning cover the substantive production of text deliverables from audio and video source — five distinct deliverable forms, each with its own accuracy floor, time-coding discipline, and downstream use. The forms differ; the operating discipline does not. Named in-house bench review, NDA from first email, engagement-letter accountability, AI-disclosure transparency, termbase continuity with the interpretation and translation columns.
The five forms answer five distinct procurement questions. Verbatim transcription serves court, arbitration, deposition, and regulatory record purposes where every word — including “uh,” “um,” false starts, and overlapping speech — is preserved on the record. Cleaned transcripts serve board minutes, AGM record, audit committee documentation, and investor-day archives where the readable speaker intent matters more than the verbatim utterance. Closed captions deliver pre-recorded video accessibility under WCAG 2.2 and ISO/IEC 20071-23 — different formatting, timing, and reading-speed discipline from prose transcripts.
Translated subtitles bridge the transcription and translation columns: Thai-language source video with English subtitles, or English-language source with Thai subtitles, under ISO 17100 translation discipline combined with subtitling readability standards. Live captioning / CART (Communication Access Realtime Translation) delivers real-time captions during live events — hybrid AGM accessibility, conference accessibility, training programme accessibility — operating at simultaneous-grade cognitive load on the captioner.
Across all five forms, the substantive procurement question is accuracy floor, formatting discipline, and disclosure of AI assistance. The Othello operating discipline answers all three: named in-house bench reviews every deliverable before final delivery, AI-assisted first pass is disclosed in the engagement letter, accuracy floors are documented by deliverable type, and termbase continuity with interpretation and translation engagements is operationally built-in. The bench that transcribes a board meeting is often the bench that translated the prior quarter’s 56-1 One Report — and works from the same termbase.
Othello uses ASR (automatic speech recognition) as a first-pass tool for non-confidential general-content transcription where it accelerates throughput. The engagement letter discloses specifically where ASR is used and where the workflow is human-only. For confidential, privileged, or sensitive content — counsel-engaged matters, regulatory interviews, board recordings, M&A working sessions — the workflow operates human-only with no third-party AI tool ingestion. For content where ASR first-pass is used, named in-house bench reviews, corrects, and signs off on every deliverable before final delivery. Raw AI output is never the final product.
Accessibility and translation stacks. Independently verifiable.
Transcription and captioning sit at the intersection of two standards stacks — the accessibility stack (ISO/IEC 20071-23 for caption presentation, WCAG 2.2 Time-based media for web video, and broadcasting captioning conventions where applicable) and the translation stack (ISO 17100 for translated subtitles and bilingual transcripts). For live captioning at events, AIIC professional practice for the simultaneous-grade cognitive discipline overlays. The full stack is independently verifiable through the ISO Online Browsing Platform, W3C, and aiic.org.
Organization for
Standardization · IEC
Visual presentation of audio information · captions on televisions
The substantive ISO/IEC standard governing visual presentation of caption content — font, size, positioning, contrast, reading speed, line length, and synchronisation with source audio. Originally framed for broadcast television but operationally the reference for any pre-recorded video captioning where accessibility floor matters. The procurement-grade caption presentation reference for institutional video deliverables.
Web Consortium
Web Content Accessibility — time-based media
W3C Web Content Accessibility Guidelines 2.2, Section 1.2 Time-based Media, governs captioning of pre-recorded and live video on the web. Three substantive conformance levels: 1.2.1 (alt text equivalent for audio-only), 1.2.2 (captions for pre-recorded), 1.2.4 (live captions for live audio in synchronized media). The substantive accessibility procurement anchor for web-delivered video — SET-listed issuer investor relations video, multinational global town halls, capacity-building programme materials.
Organization for
Standardization
Translation services — requirements
The ISO standard for translation services, applied to translated subtitle and bilingual transcript deliverables. Covers translator qualifications, revision discipline (independent second-translator review), terminological consistency, and quality management. For Othello, the same ISO 17100 lockstep that anchors the Technical Translation column applies to translated subtitles and bilingual transcripts — bench continuity built in, not bolted on.
Access Realtime
Translation tradition
AIIC discipline · CART live-captioning practice
AIIC professional practice and the CART (Communication Access Realtime Translation) tradition govern live real-time captioning at events. The cognitive form is simultaneous-grade — the captioner listens, parses, and renders to written form continuously at the source language’s natural cadence. AIIC working-time limits apply; paired delivery is the standard for sessions of any length. Live captioning at AGMs and conferences operates at the same cognitive-load and working-time discipline as booth simultaneous.
ISO/IEC 20071-23 and ISO 17100 verifiable through the ISO Online Browsing Platform, WCAG 2.2 through w3.org/WAI/standards-guidelines/wcag/, and AIIC discipline through aiic.org. Caption file format standards (WebVTT, SRT, TTML, EBU-TT) are independently published by W3C and the European Broadcasting Union. Procurement evaluation panels can cross-reference every layer of Othello’s compliance claim against the published standards without going through the vendor. Independent verification at every standards layer.
Pick the form. Lock the discipline.
The five deliverable forms below answer five substantively different procurement questions. Choosing the right form is the first scoping question — different accuracy floors, different time-coding discipline, different downstream platforms, different engagement-letter terms. The form drives everything else in the engagement.
The substantive distinctions are not merely formatting. Verbatim transcription demands preservation of every utterance including hesitations and false starts, because the on-record value is fidelity to what was actually said. Cleaned transcripts remove the hesitations to deliver readable speaker intent — substantively useful for board minutes and AGM record but operationally wrong for court use. Closed captions add time-coding discipline (reading speed, line length, on-screen positioning) absent from prose transcripts. Translated subtitles compound the timing discipline with cross-language rendering at ISO 17100 lockstep. Live captioning compounds all of the above with simultaneous-grade cognitive delivery.
For procurement: the engagement-letter specifies the form at scoping, and the form determines the bench skill set, accuracy floor, file format deliverables, and pricing tier. Form mismatches — asking for “a transcript” of a hearing when the use case is appellate record review — produce expensive corrective re-engagement.
Verbatim transcription
True verbatim transcript — every utterance preserved including hesitations, false starts, interjections, overlapping speech, and speaker attributions. The substantive form for court hearings, arbitration tribunals, regulatory interviews, deposition records, and any setting where the on-record value is fidelity to what was actually said. Accuracy floor ≥99.5%; speaker attribution mandatory; time-stamping at speaker turns; certified transcript option available for court-record use.
Cleaned transcripts
Readable speaker-intent transcript — hesitations and false starts removed, speaker attributions preserved, content fidelity maintained. The substantive form for board minutes preparation, AGM record archives, audit committee documentation, investor-day archives, and corporate-governance record-keeping where readability matters more than verbatim utterance. Accuracy floor ≥98% on substantive content; speaker attribution preserved; section markers at agenda transitions.
Closed captions (CC)
Time-coded captions for pre-recorded video deliverables under WCAG 2.2 Time-based Media accessibility. The substantive form for SET-listed investor relations video, multinational global town halls, capacity-building programme materials, training and education video, podcast-and-interview video deliverables. ISO/IEC 20071-23 caption presentation discipline — reading speed ≤170 words per minute, max 32 characters per line, max 2 lines on-screen, synchronised to ≤500ms.
Translated subtitles
Translated subtitles for cross-language video deliverables — Thai-language source with English subtitles, English source with Thai subtitles, or multilingual subtitle tracks for broader distribution. Substantively a compound deliverable combining captioning timing discipline with ISO 17100 translation lockstep. Substantive form for SET-listed issuer investor videos for global audience, multinational training material localisation, government bilateral communications, capacity-building video materials.
Live captioning · CART
Real-time captioning during live events — hybrid AGM accessibility, conference live-captioning, training programme accessibility, multilateral hearing real-time text. The substantive form for events where accessibility floor requires synchronised text captions delivered live, or where deaf and hard-of-hearing participants are present. Cognitive form is simultaneous-grade — captioner listens, parses, and renders to written form at the source language’s natural cadence. AIIC working-time limits apply; paired delivery is the standard with rotation at agreed transitions. CART captioner is in many engagements working alongside the simultaneous interpreter, with shared briefing and termbase, in a single coordinated bench.
Six stages. Named bench at every one.
The substantive production pipeline runs from audio/video intake through final deliverable across six discrete stages. Named in-house bench operates at every stage — there is no point in the pipeline where the work is left to an anonymous platform or unsupervised AI. Stage 02 (ASR first-pass) is the optional acceleration layer for non-confidential content; for confidential content the workflow runs human-only throughout. The engagement letter discloses where stage 02 is used and where it is skipped.
Why pipeline transparency matters at procurement tier
Institutional clients procuring transcription and captioning are not buying a finished file — they are buying a chain-of-custody-traceable production process. The substantive procurement question is: what touched the audio, in what sequence, with what review, by whom, under what confidentiality regime. Vendors that obscure the production pipeline force the client to either trust on faith or run an audit-style due diligence. Othello discloses the pipeline upfront so neither is necessary.
The disclosure goes in the engagement letter. Stage-by-stage chain-of-custody, where ASR first-pass is applied and where it is not, who reviews at each stage (named bench identifiers), and what client-side audit access is available. This is operational discipline aligned with Big Four audit, Big Law privilege regime, and SEC/SET regulated content handling.
How the pipeline scales across deliverable forms
All five deliverable forms share the same six-stage pipeline structure — but the substantive work at each stage varies by form. Verbatim transcription emphasises stage 03 (named-bench review at 2-pass discipline) and stage 05 (QC against ≥99.5% floor). Closed captions emphasise stage 04 (time-coding and synchronisation discipline) and the reading-speed floor. Translated subtitles add an ISO 17100 lockstep revision layer between stages 03 and 04. Live CART compresses stages 01-05 into real-time delivery — the captioner operates all stages concurrently under simultaneous-grade cognitive load.
For procurement: the pipeline is form-agnostic; the engagement letter specifies the deliverable form, the form determines which stages get emphasised, the standards stack applies at the stage where it is operationally engaged.
Six-stage production flow — audio in, deliverable out
Each stage runs under named in-house bench. The chain-of-custody log is retained under engagement-letter discipline.
Source media received under NDA · format compatibility verified (WAV / MP3 / MP4 / MOV / M4A / OGG / FLAC) · audio quality assessed (signal-to-noise, intelligibility, background noise) · speaker count enumerated · chain-of-custody log opened.
Where engagement-letter permits, ASR (automatic speech recognition) generates a first-pass machine transcript as working draft for bench review — never as final deliverable. Skipped entirely for confidential/privileged content · counsel-engaged matters · regulatory interviews · board recordings.
Human bench reviews entire transcript against source audio · corrects errors, hesitation handling, false starts, speaker attribution · resolves overlapping speech and unclear utterances · 2-pass review for verbatim transcription (court / arbitration / regulatory) · 1-pass for cleaned transcripts.
For captions and subtitles: time-coding to ≤500ms synchronisation, line-break discipline (≤32 char, ≤2 lines, ≤170 wpm), positioning markers. For transcripts: agenda-section markers, speaker-turn timestamps, certified-transcript formatting. ISO/IEC 20071-23 reference for caption presentation.
QC pass against documented accuracy floor (verbatim ≥99.5%, cleaned ≥98% substantive, captions WCAG 2.2 conformance, subtitles ISO 17100 revision discipline) · final spell-check, punctuation, formatting consistency · QC log entry retained in chain-of-custody.
Final deliverable in client-specified format — DOCX for transcripts (with optional PDF certified), SRT / WebVTT / TTML for captions, EBU-TT for broadcast, JSON for platform-integrated deployments. Termbase carry-forward committed to engagement record for continuity with subsequent engagement.
Client format-of-record at delivery stage
Verbatim and cleaned transcripts. Speaker attribution, timestamps, agenda markers, certified-transcript PDF option.
SubRip subtitle. Universal subtitle format. Most video editing software and platforms accept SRT natively.
Web Video Text Tracks. W3C-published format for HTML5 video. WCAG 2.2 web-accessibility default.
Timed Text Markup Language. W3C/SMPTE-published format with advanced positioning and styling. Broadcast and OTT delivery.
EBU Timed Text. European Broadcasting Union format for broadcast captioning. Used in European TV and broadcast workflows.
Platform-integrated. Structured time-coded captions for direct platform integration (LMS, CMS, video platform APIs).
Broadcast legacy. SCC (US broadcast) and STL (EBU subtitling) for legacy broadcast pipelines where required.
Real-time text stream. Platform-delivered live text via StreamText, 1CapApp, Zoom captioning, or client-platform integration.
Documented floors. Disclosed AI use.
The substantive procurement discipline for transcription and captioning is (1) documented accuracy floors by deliverable form, and (2) transparent disclosure of where AI assistance is used in the pipeline. Both go in the engagement letter at scoping. The accuracy floor determines bench review depth and QC effort; the AI disclosure determines what content moves through ASR first-pass and what does not. Vendors that decline to specify either are not operating at institutional tier.
Why floors are documented, not assumed
The phrase “high-accuracy transcription” without a documented floor is operationally meaningless. What does it mean — 95%? 99%? 99.9%? Each level corresponds to substantively different bench effort, review discipline, and downstream usability. A 95% transcript can have one substantive error every 20 words; a 99.5% transcript can have one error every 200 words. For court-record purposes the difference is determinative; for podcast captioning the floor is suitable but for AGM minutes the floor matters less than the cleaned-presentation work.
Othello documents the accuracy floor in the engagement letter by deliverable form. Floors are not aspirational targets — they are contractual commitments verifiable by sampling QC at delivery. Client-side sampling QC at handover is welcome under engagement-letter terms.
AI assistance · where it operates, where it does not
The engagement-letter discloses ASR first-pass use stage-by-stage. Two operative principles below.
Non-confidential general content
- Publicly-recorded eventsPublic conferences, published interviews, broadcast media transcripts
- Marketing & communicationsPodcast captioning for marketing channels, public webinar transcripts
- Training & educationAlready-public capacity-building programme materials, MOOCs, published academic lectures
- Client-permitted general contentWhere the client engagement-letter explicitly permits ASR first-pass for the specific content tier
Confidential · privileged · regulated
- Counsel-engaged mattersArbitration recordings, deposition tapes, witness preparation, M&A working sessions under privilege
- Regulatory contentSEC investigation interviews, SET disciplinary hearings, BoT regulatory bilateral, MFA Saranrom diplomatic
- Board & audit recordingsBoard meeting recordings, audit committee deliberations, executive compensation discussions
- Restricted ESG contentPre-disclosure 56-1 One Report drafts, IFRS S2 pre-release sustainability content, internal climate-risk deliberations
The verbatim-vs-cleaned distinction determines the bench discipline.
Verbatim preserves
- Hesitations and filler — “uh,” “um,” “well,” “you know” all preserved on the record
- False starts and self-corrections — “I think we should — no, actually we need to” preserved as said
- Overlapping speech — speakers talking simultaneously, marked with attribution and overlap notation
- Interjections and side comments — “[crosstalk],” “[inaudible],” “[unintelligible]” markers where preserved
- Non-verbal annotations — “[pause],” “[laughter],” “[paper shuffle]” where contextually material
- Repetitions — “the the company” preserved if that’s what was said
- Speaker attribution — every utterance labelled with speaker identifier
Cleaned removes
- Hesitations and filler — “uh,” “um,” “well” removed for readable flow
- False starts — completed thought rendered, abandoned starts removed
- Minor self-corrections — corrected version rendered, original removed
- Non-substantive repetitions — “the the company” rendered as “the company”
- Side comments — non-substantive interjections removed unless agenda-relevant
- Paralinguistic noise — “[paper shuffle],” “[cough]” removed unless material
- Preserved: substantive content, speaker attribution, agenda flow — readable speaker intent
Briefing materials. Bench preparation. Termbase lock.
Preparation for transcription and captioning engagements parallels the interpretation column on glossary depth and termbase carry-forward, adds a substantive layer for source-audio quality assessment and turnaround scoping, and locks the deliverable form, accuracy floor, AI-disclosure policy, and final format at engagement-letter signing.
Scoping + briefing intake
For pre-recorded content: source media intake, deliverable form confirmed, accuracy floor documented, AI-disclosure policy locked. For live CART: agenda, participant bios, briefing materials. NDA from first email applies throughout.
Glossary simultaneous-grade build
Bilingual termbase drafted from briefing materials. For live CART: simultaneous-grade depth, same as Mode 01 booth simultaneous. For pre-recorded transcription / captioning: form-aligned glossary depth, sectoral terminology, named-entity index.
Glossary client review · locked
Termbase circulated for client preference confirmation. House-preferred terminology, regulatory choices, named-entity spellings. Locked at T-5. For multi-engagement clients on recurring transcription cycle, prior-engagement termbase carries forward.
Tech check · audio source · platform
For live CART: hub or qualified-home tech check per Remote setup discipline. For pre-recorded transcription: source-audio QC pre-pass, format compatibility confirmed, delivery platform verified.
Live CART · final tech check
For live CART only: 60-minute pre-engagement final check — platform login, paired captioner coordination on rotation, last-mile glossary review, client briefing absorbed. For pre-recorded: stage 01 audio intake.
Active production
Pre-recorded: pipeline stages 01–06 sequenced under engagement-letter timeline. Live CART: real-time paired delivery under AIIC working-time discipline. Chain-of-custody log maintained throughout.
Same six phases. Pipeline layer at Phase 05.
Othello operates transcription and captioning under the same 6-phase methodology applied across the interpretation modes, with substantive variation at Phase 03 (deliverable form confirmation replaces equipment site visit) and Phase 05 (the 6-stage production pipeline runs inside Phase 05 delivery).
Form selection + accuracy floor
Deliverable form confirmed against use case — see Section 03 five forms. Accuracy floor documented. AI-disclosure policy locked. Final delivery format specified (DOCX / SRT / WebVTT / TTML / EBU-TT / JSON).
Glossary build + termbase carry-forward
Bilingual termbase drafted from briefing materials at form-aligned depth. Prior-engagement termbase carry-forward for recurring institutional clients on multi-engagement cycle. Locked at T-5.
Bench assignment + source QC
Named bench assigned per deliverable form (verbatim specialists for legal · captioning specialists for accessibility · CART specialists for live · translation specialists for subtitles). Source-audio QC pre-pass for pre-recorded content.
Final readiness
Last-mile briefing absorbed · glossary updates integrated · for live CART, paired captioner coordination on rotation and hand-over signalling · for pre-recorded, pipeline kick-off scheduled.
Six-stage production pipeline
For pre-recorded: pipeline stages 01–06 sequenced. For live CART: real-time paired delivery under AIIC discipline with chain-of-custody log maintained. See Section 04 pipeline for stage detail.
Termbase carry-forward · QC log retained
Updated bilingual termbase committed under engagement-letter confidentiality. Chain-of-custody and QC log retained as operational continuity asset for recurring institutional engagement (quarterly board cycle · annual AGM · monthly audit committee · ongoing video content channel).
Where transcription & captioning actually deploys.
The use cases below map to engagement types substantively present in Othello’s institutional client roster — SET-listed Thai corporates, international Big Law arbitration teams, ESG advisory clients, capacity-building programmes, and US/UK/EU government bilateral. Each use case anchors a specific deliverable form and accuracy floor.
Verbatim transcripts · TAI · THAC · ICC · SIAC
Verbatim transcription of arbitration hearings at TAI · THAC · ICC · SIAC, deposition records, witness sessions, and pre-arbitration counsel proceedings. ≥99.5% accuracy floor, 2-pass bench review, speaker attribution, certified-transcript PDF on counsel request. Mode-cross-reference: pairs with Mode 03 Court & Legal interpretation and Mode 04 Remote for hybrid hearings.
Cleaned transcripts · SET-listed AGM · board minutes
Cleaned transcripts of SET-listed Thai issuer AGMs, board meetings, audit committee deliberations, and risk committee sessions — readable speaker-intent rendering for minutes preparation and corporate governance archive. ≥98% substantive accuracy, 1-pass bench review, agenda-section markers, speaker attribution preserved. Recurring engagement pattern: quarterly board cycle, annual AGM, monthly audit committee.
Closed captions · SET issuer IR video
Closed-caption delivery for SET-listed issuer investor relations video archive — earnings call video, post-earnings briefing, ESG investor day video, IFRS S2 sustainability disclosure video. WCAG 2.2 conformance for global investor accessibility. Often paired with translated subtitles for cross-language IR deliverables (Thai source with English subs for global investor base).
Translated subtitles · multinational comms
Translated subtitles for cross-language video deliverables — multinational corporate global town-hall video, Thai-language CEO message to global workforce, multinational training video localisation, government bilateral video communications. ISO 17100 translation lockstep combined with subtitling timing discipline. Multi-track subtitle delivery for broader linguistic distribution.
Live CART · hybrid AGM accessibility
Live real-time captioning at hybrid AGM events for accessibility floor under WCAG 1.2.4 live captions in synchronized media. Paired CART captioners deliver simultaneous-grade text rendering with rotation at agenda transitions. Often co-deployed with Mode 01 booth simultaneous or Mode 04 RSI — coordinated bench shares briefing and termbase.
Verbatim · Thai SEC · BoT · MFA
Verbatim transcription of regulatory interviews and proceedings — Thai SEC investigation interviews, Bank of Thailand regulatory bilateral, MFA Saranrom diplomatic record, SET disciplinary proceedings, ministerial bilateral working sessions. Workflow runs human-only (no ASR first-pass) under engagement-letter privilege regime. ≥99.5% accuracy floor with regulatory-content-specific terminology lockstep.
Cleaned transcripts · AA1000AS · ISO 14064 interviews
Transcription of ESG assurance interviews under AA1000AS, ISO 14064 climate inventory verification interviews, and CSDDD supply-chain due diligence interviews. Substantive content overlay — climate accounting (Panit-Thitaree pairing), sustainability reporting (Kittichai cross-anchor). Cross-engagement termbase continuity with the firm’s ESG advisory column deliverables.
Captions + subtitles · capacity-building
Closed-caption and translated-subtitle delivery for capacity-building programme video — GIZ programme deliverables, UK PACT training materials, UN Women programme video, World Bank technical assistance video, US State Department programme video. WCAG 2.2 accessibility conformance plus ISO 17100 translation lockstep for cross-language distribution. Often programme-deliverable specification in donor procurement.
Procurement-grade questions answered.
Substantive answers to the questions procurement evaluation panels, in-house counsel, IR teams, and accessibility programme managers ask when scoping transcription and captioning at institutional tier.
Q.01Which of the five forms do we need — verbatim, cleaned, captions, subtitles, or live CART?
Form selection follows from the substantive use case. Verbatim transcription for court / arbitration / regulatory record where every utterance matters on the record. Cleaned transcript for board minutes / AGM record / audit committee / IR archive where readable speaker intent matters more than fidelity to hesitations. Closed captions for pre-recorded video accessibility under WCAG 2.2. Translated subtitles for cross-language video distribution. Live CART for real-time event accessibility at AGMs, conferences, training.
The scoping question that resolves form selection: “what is the downstream use of this deliverable?” If the deliverable is part of a court record or arbitration tribunal submission, verbatim. If it’s for internal minutes or external IR archive, cleaned. If it’s a video on the company web site or LMS, closed captions. If it’s a video crossing language boundaries, translated subtitles. If it’s a live event with accessibility floor, CART. See Section 03 for full form detail.
Q.02Where exactly do you use AI / ASR, and where don’t you?
The engagement letter discloses this stage-by-stage. ASR (automatic speech recognition) is permitted as a first-pass working draft for bench review on non-confidential general content — public conferences, marketing communications, already-public training material, client-permitted general content. ASR is skipped entirely for confidential, privileged, or regulated content: counsel-engaged matters (arbitration, deposition, witness preparation, M&A under privilege), regulatory content (SEC investigation interviews, BoT bilateral, MFA Saranrom diplomatic record, SET disciplinary), board and audit recordings, restricted ESG content pre-disclosure.
Where ASR first-pass is used, named in-house bench reviews, corrects, and signs off on every deliverable before final delivery. Raw AI output is never the final product at Othello. The substantive procurement test: ask the vendor where AI is used, and require the answer in the engagement letter. Vendors that decline to specify are not operating at institutional tier. See Section 05 for the AI-discipline framework.
Q.03What accuracy floor do you guarantee, and how is it measured?
Accuracy floors are documented in the engagement letter by deliverable form: verbatim ≥99.5%, cleaned ≥98% substantive content, closed captions WCAG 2.2 conformance, translated subtitles ISO 17100 lockstep, live CART ≥96% real-time floor. These are contractual commitments, not aspirational targets, and are verifiable by client-side sampling QC at handover under engagement-letter terms.
Measurement is on a per-word substantive accuracy basis for transcripts, against WCAG 2.2 conformance criteria for captions, against ISO 17100 revision discipline for translated subtitles. Word Error Rate (WER) is the industry-standard measurement for transcription accuracy — Othello commits to the documented floor and supports client-side WER audit at handover. For live CART, the ≥96% real-time floor is calibrated to NCRA (National Court Reporters Association) live captioning practice. See Section 05 accuracy table.
Q.04How does confidentiality work — and what about platform-side recording for live CART?
NDA from first email applies throughout — same discipline as the interpretation and translation columns. For pre-recorded transcription, the engagement letter covers source-media chain-of-custody, AI-disclosure stage-by-stage, named-bench identifiers, and client-side audit access. For confidential / privileged content, ASR is skipped entirely (Section 05) — no third-party AI tool ingestion of source audio.
For live CART at events, platform-side recording controls are confirmed before the engagement opens — where the CART text stream is recorded, who has access, what happens to it after the event. For hybrid AGMs with WCAG 1.2.4 live caption requirement, the engagement letter typically specifies that the live CART text stream is delivered to the platform but is not retained by Othello post-event unless the client requests it as a deliverable. Counsel privilege regime applies where the CART is at a counsel-engaged matter (rare for CART but possible for arbitration with accessibility needs).
Q.05Can you deliver in our specific file format — SRT, WebVTT, TTML, EBU-TT, JSON?
Yes — pipeline stage 06 delivers in the client format-of-record. Standard caption file formats: SRT (universal subtitle, broadest compatibility), WebVTT (W3C HTML5 video, WCAG 2.2 web default), TTML (W3C/SMPTE advanced positioning and styling), EBU-TT (European Broadcasting Union broadcast format), SCC and STL (US broadcast / EBU subtitling legacy). Transcript formats: DOCX with formatting (speaker attribution, timestamps, agenda markers), PDF certified for court-record use.
Platform-integrated formats: JSON for direct platform API integration (LMS, CMS, video platform APIs). For live CART: real-time text stream delivered via StreamText, 1CapApp, Zoom captioning channel, MS Teams live captions API, Webex captions, or client-platform integration. The engagement letter specifies which format(s) the deliverable arrives in — multiple format delivery is supported at no operational overhead beyond the format-conversion stage.
Q.06How does pricing work — per word, per minute of audio, per hour of bench?
Engagement-letter basis — not consumer-grade per-minute auto-pricing. Cost drivers: deliverable form (verbatim is denser than cleaned; CART carries simultaneous-grade discipline), accuracy floor (≥99.5% requires 2-pass bench review), source audio quality (noisy or multi-speaker audio takes longer), language pair (Thai ↔ English at institutional tier vs general English-only), AI-disclosure policy (human-only is denser than ASR-assisted), delivery turnaround, and volume / continuity (recurring engagement cycles attract continuity pricing).
Standard turnaround: pre-recorded transcription at 3–5 business days from intake for verbatim, 2–3 business days for cleaned, captions and subtitles on parallel timeline. Expedited (next-business-day) and rush (same-day) available at engagement-letter terms. Live CART is scheduled per engagement. Volume discounts on recurring institutional engagement — quarterly board cycle, annual AGM, monthly audit committee, ongoing video content channel. See How We Quote for the substantive cost framework.
Q.07Do you certify transcripts for court or arbitration use?
Yes — certified transcripts available for court and arbitration submission. The certification is a signed declaration by the named bench transcriber, attesting under engagement-letter discipline that the transcript is a true and accurate verbatim record of the source audio to the documented ≥99.5% accuracy floor, with 2-pass bench review applied, under chain-of-custody log retained for the engagement record. Procurement-grade certification, suitable for tribunal submission and counsel use.
For Thai court hearings (Civil Court of Bangkok, Central IP & IT Court, Central Labour Court, Central Tax Court, Court of Appeal, Supreme Court / Sandika), the verbatim transcript is typically delivered alongside a separately-certified interpreter testimony under Mode 03 Court & Legal. For arbitration tribunals (TAI, THAC, ICC, SIAC), the certified verbatim transcript supports the tribunal’s procedural order on record-keeping. Thailand has no national sworn-interpreter or sworn-transcriber registry — Othello’s certification is on engagement-letter basis with named bench attestation, aligned with tribunal procedural requirements.
Q.08How do you handle Thai-language audio specifically — what are the operational challenges?
Thai-language transcription has substantive operational challenges that generic transcription vendors often handle poorly. Thai script has no word boundaries — segmentation is contextual, and ASR systems trained on Western languages frequently mis-segment. Thai tonal phonology — five lexical tones — means homophone disambiguation depends on context that surface-level ASR misses. Code-switching between Thai and English is pervasive in institutional Bangkok content (board meetings, AGMs, regulatory bilateral) and breaks single-language ASR pipelines.
Othello’s bench handles Thai-language audio with native-Thai bench review at every stage of the pipeline. For ASR-assisted workflows (where permitted), the first-pass machine output is treated as a starting point only, not a credible draft — substantive correction at stage 03 is operationally mandatory. For pure-Thai or heavily code-switched audio, the workflow runs human-only across all stages; ASR first-pass adds friction rather than acceleration. The engagement letter specifies which workflow applies. Bilingual native-Thai-and-native-English bench is the substantive operational answer to Thai-language audio at institutional tier.
Q.09How does this integrate with the broader Othello engagement?
Transcription and captioning sits in the interpretation column alongside the five interpretation sub-modes (Simultaneous, Consecutive, Court & Legal, Remote, Whispered), operating under the same NDA discipline, named in-house bench, and engagement-letter accountability. The bench that transcribes is the bench that interprets — termbase continuity is operationally built in.
Cross-column continuity: a SET-listed issuer client engaging Othello for quarterly investor day RSI interpretation + earnings call video captions + 56-1 One Report translation + IFRS S2 sustainability disclosure translation works from one termbase, one bench, one engagement-letter framework. Transcription and captioning amplifies the cross-column continuity rather than fragmenting it. Recurring institutional engagement on multi-column scope reduces marginal cost per deliverable and improves terminological consistency across the client’s external-facing language stack.
Q.10Can you accept our standing platform contract — Zoom recording, MS Teams, Webex, video CMS?
Yes — Othello operates platform-agnostically across Zoom recordings, MS Teams meeting recordings, Cisco Webex recordings, Google Meet recordings, dedicated event platforms (Interprefy, KUDO, Boostlingo Events), and client-side video CMS / LMS integrations. Source media intake at pipeline stage 01 handles any modern audio or video format. Delivery at stage 06 integrates with the client’s video CMS, learning management system, or platform API where the engagement letter specifies platform-integrated deliverable.
For live CART, the platform-side text-stream integration is confirmed pre-engagement: Zoom captioning channel, MS Teams live captions API, Webex captions integration, StreamText for general-platform delivery, 1CapApp for accessibility-focused events. Where the client has a non-standard platform, the technical check at T-2 days (Phase 04) confirms operational compatibility. See Contact Pathway 02 — Pre-RFP Scoping for a 30-minute scoping call to discuss platform integration.
Pick a form. Lock the floor.
Transcription and captioning engagements scope most efficiently when the deliverable form, accuracy floor, AI-disclosure policy, and final delivery format are specified at engagement-letter signing. For recurring institutional cycles (quarterly board, monthly audit committee, ongoing video channel), continuity terms are negotiated upfront. NDA from first email applies throughout.