UTF-8

© nemo 2010-2022

These are Beta releases of parts of the UTF-8 support for RISC OS 3 and later

They are for testing and evaluation purposes only, and NOT FOR DISTRIBUTION (yet)

Unless you are a software author the contents of this page are unlikely to be particularly useful to you at the moment

Available for evaluation:

Complete:

Nearly finished:

UTF8Alphabet

This module provides the UTF-8 alphabet, together with some Asian countries, ISO3316-1 country codes, IETF country and language codes, and Unicode tables for versions of the OS that do not have them built-in.

It also implements the *FallbackAlphabet command used to control fallback behaviour when operating in the UTF-8 alphabet, and *MasterCountry to allow the legacy pseudo-countries to behave sensibly with newer APIs.

The zip file above contains an extensive ReadMe I will not duplicate here. In particular it documents Service_International 260, 264, 268, 276, 280 and 284.

Here’s the changelog:

; 1.00 (08 Apr 2018) First version
; 1.01 (06 May 2018) All the ISO3316-1 codes for &43,9
; 1.02 (07 May 2018) All the Unicodes for the RO4 alphabets, plus
; &43,260 to set fallback alphabet (and 264 to read)
; *FallbackAlphabet implemented - is this the right place?
; 1.03 (21 May 2018) Also defines fallback chr set for UTF-8. Plus "None".
; 1.04 (21 May 2018) Adds 'Auto', and reuses NameToNum.
; 1.05 (21 May 2018) Fix for 5.24 move of Serv_SysChains. :-[
; 1.06 (22 Apr 2019) Fix for BFont Unicodes, and correct ISO3166-1 'gb' code
; Configurable code for Master & Compact 'territories'
; 1.07 (31 Aug 2019) &43,268 to notify/read alphabets
; 1.08 (05 Sep 2019) Revert to "uk" instead of "gb", &43,276 & 280
; Also noticed it was terminating names, which is wrong
; UniTables and SI268 removed as build options - always on
; 1.09 (15 Sep 2019) SpriteOp,51 intercepted when alphabet is UTF-8
; 1.10 (22 Jan 2020) Prevent reinstantiation, implemented &43,280. SpriteOp
; greatly improved, and compatibility with Print Drivers.
; 1.11 (03 Mar 2020) NotifyChange uses configured Fallback unless it's Auto
; when it uses the actual Fallback instead. Bugfix in
; &43,268,0 which was returning Fallback only. D'oh.
; Unbelievably wasn't reading current Alpha on Init. >8-(
; Correction to length of Welsh IETF name.
; 1.12 (11 Aug 2020) Aaaargh. Unicode tables need to be invariant, so we
; must cache them all separately.
; 1.13 (22 Sep 2020) Disable the SpriteOp for now (breaks UniEdit). Needs
; WrchV support which isn't enabled yet.
; Add *MasterCountry [UNALLOCATED]
; 1.14 (01 Nov 2020) Fix for Russia2 IETF language, Canada's is deliberately
; wrong - should be mixed English/French, but I've made it
; Inuktitut tee hee.
; 1.20 (26 Apr 2021) Fixes TerritoryManager too. Wimp task. Major update.
; 1.21 (27 May 2021) Bugfix to callback
; 1.22 (11 Jun 2021) Bugfix to Wimp version, workspace alignment and
; Fallback overwrite.
; 1.23 (07 Jul 2021) Correction to BFont Åå
; 1.24 (06 Sep 2021) Handle Byte70 and Byte240 too. Though there's code here
; to make Master Alphabet follow MasterCountry, that's
; not the right thing to do so it's disabled.
; 1.25 (08 Sep 2021) Fixes to ja,el,pl,sk,ru-cyrl2,IN,KR,LK.
; 1.26 (01 Dec 2021) Use of ADRPC instead of ADR for tail-calls.
; 1.27 (05 Mar 2022) High-vector compatible ServiceCall furtling. en-CY(!)
; 1.28 (16 Jul 2022) Correction to Belgium (Dutch, not French)

Elastic

0.07 (02 Jun 2018) Full range of Unicode zero-width characters suppressed.

This is an interesting text utility, somewhat like *Print, that implements Elastic Tabstops for text files containing tab characters.

It is also UTF-8 aware, and performs its tab expansion and formatting correctly when running under the UTF-8 alphabet, and also implements some special behaviour for particular Unicode characters.

NOTE HOWEVER that you will need UTF-8 support in your TaskWindow application or at the command line for the results to display correctly (when using UTF-8). If you understand UTF-8 sequences you will be able to confirm that the results are correct, I hope. Command line UTF-8 support will be released separately.

The zip file above contains a ReadMe and some example files, some of which feature UTF-8.