[go: up one dir, main page]

Thursday, October 30, 2025

Unicode CLDR 48 available

Postal Horn emojiUnicode CLDR 48 is now available and has been integrated into version 78 of ICU


Some of the most significant changes in this release are the following (for more detail, see the CLDR 48 release note page):

  • Updated for Unicode 17, including new names and search terms for new emoji, new sort order, and Han→Latin romanization additions for many characters.

  • Updated to the latest external standards and data sources, such as the language subtag registry, UN M49 macro regions, ISO 4217 currencies, etc.

  • Many enhancements of the CLDR specification (LDML)

  • Many additions to language data including:

    • Likely Subtags, for deriving the likely script and region from the language (used in many processes)

  • New formatting options:

    • Rational number formats added, allowing for formats like “5½” in tech preview

    • For timezones, usesMetazone adds two new attributes stdOffset and dstOffset so that implementations can use either “main” or  “rearguard” TZDB data

    • Combination formats added for relative dates + times, such as “tomorrow at 12:30”

    • Additional units added for scientific contexts (coulombs, farads, teslas, etc.) and for English systems (fortnights, imperial pints, etc.)

  • Many corrections and updates for Metazone data and calendars eras (including removal of eras and fixes to start dates)

  • This is the first release where the new CLDR Organization process is in place for DDL languages. As a result, several locales were able to reach higher levels (see below).

See the CLDR 48 release note page for information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.


CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems. 

Locale Coverage Levels

Level
Count
With Script
Regional Variants
Usage
Modern
104
5
305
Suitable for full UI internationalization
Moderate
13
0
1
Suitable for “document content” internationalization, eg. in spreadsheet
Basic
57
10
22
Suitable for locale selection, eg. choice of language on mobile phone
Changes in coverage
±New LevelLocales
πŸ“ˆModernAkan, Bashkir, Chuvash, Kazakh (Arabic), Romansh, Shan, Quechua
πŸ“ˆModerateAnii, Esperanto
πŸ“ˆBasicBuriat, Piedmontese, Sicilian, Tuvinian
πŸ“‰Basic*Baluchi (Latin), Kurdish

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


ICU4X 2.1 released!

 The ICU4X Technical Committee is happy to announce ICU4X 2.1, an update to our modular, portable, and secure i18n library.

ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, Dart, TypeScript — with other languages in the timeline. Mozilla Firefox, Google Chrome, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.


Important changes since ICU4X 2.0 include:


  1. Latest i18n data: This release includes an update to CLDR 48.

  2. Calendar improvements: ICU4X is now being used to implement Temporal in V8 and SpiderMonkey via temporal_rs. icu_calendar has received many fixes and improvements in service of that, including new experimental arithmetic APIs.

  3. Normalizer optimizations: icu_normalizer has received a lot of optimization work, with some more to come. Optimizations made to shared data structures will benefit other components as well.

  4. Collation sort keys: It is now possible to use icu_collator to extract the sort key of a given string to amortize the cost of collation operations.


When updating ICU4X crates to 2.1, you may experience issues due to incompatibilities between older crates and newer crates around the alloc feature. In that case, please run cargo update for any crates that show up in the errors.


See the full changelog for more information


Check out our quickstart tutorial, interactive demo, or C++, TypeScript, and (experimental) Dart documentation.


As before, the Rust crate is available at crates.io, with documentation at docs.rs


Please post any questions via GitHub Discussions.



----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


ICU 78 Released


Unicode® ICU 78 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). 

ICU 78 updates to Unicode 17 (blog), including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations. 

It also updates to CLDR 48 (beta blog) locale data with new locales, and various additions and corrections. 
In Java, there is a draft new Segmenter API which is easier and safer to use than BreakIterator. In C++, there is a new set of APIs for Unicode string (UTF-8/16/32) code point iteration that works seamlessly with modern C++ iterators and ranges. 

The Java implementation of the CLDR MessageFormat 2.0 specification has been updated to CLDR 48. The core API has been upgraded to “draft”, while the Data Model API remains in technology preview. 

The C++ implementation of MessageFormat 2.0 is at CLDR 47 level and remains in technology preview. 

ICU 78 and CLDR 48 are major releases, including a new version of Unicode and major locale data improvements.


----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock



Thursday, October 2, 2025

Unicode CLDR 48 Beta available for specification review

The Unicode CLDR 48 Beta is now available for specification review and integration testing. The release is planned for October 29th, 2025, but any feedback on the specification needs to be submitted well in advance of that date. The beta specification is available at Draft LDML Modifications. See also the Migration section of the new release page.


CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.


The beta has already been integrated into the development versions of ICU 78, and ICU4X . We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Requesting Changes.

The following are some of the most significant changes to the specification (LDML).

Locale Identifiers and Names
  • Display Name Elements - Described the usage of the language element menu values core and extension, and alt="menu". Also revamped the description of how to construct names for locale IDs, for clarity.

Misc.
  • Character Elements - Added new exemplar types.

  • Person Name Validation - Added guidance for validating person names.

DateTime formats
  • Element dateTimeFormat - Added a new type relative for relative date/times, such as “tomorrow at 10:00”, and updated the guidelines for using the different dateTimeFormat types.

  • timeZoneNames Elements Used for Fallback - Added the gmtUnknownFormat to indicate when the timezone is unknown.

  • Metazone Names - Added usesMetazone to specify which offset is considered standard time and which offset is considered daylight.

  • Time Zone Format Terminology - Added the Localized GMT format (and removed the Specific location format). This affects the behavior of the z timezone format symbol. There is also now a mechanism for finding the region code from a short timezone identifier, which is used for the non-location formats (generic or specific).

  • Calendar Data - Specified more precisely the meaning of the era attributes in supplemental data, and how to determine the transition point in time between eras.

Numbers
  • Plural rules syntax - Added substantial clarifications and new examples. The order of execution is also clearly specified.

  • Compact Number Formats - Specified the mechanism for formatting compact numbers more precisely.

  • Rational Numbers - Added support for formatting fractions like 5½.

Units of Measurement
  • Unit Syntax - Simplified the EBNF product_unit and added an additional well-formedness constraint for mixed units.

  • Unit Identifier Normalization - Modified the normalization process.

  • Mixed Units - Modified the guidance for handling precision.

MessageFormat
  • Syntax and data model errors - Prioritized over other errors.

  • Default Bidi Strategy - Required and default.

  • Function :offset - Made Stable. (It was previously draft, and named :math.)

  • Draft functions :datetime, :date, and :time  - Updated to build on top of semantic skeletons.

  • Draft function :percent - Added.

There are many more changes that are important to implementations, such as changes to certain identifier syntax and various algorithms. See the Modifications section of the specification for details.

For more details see the draft CLDR 48  release page, which has information on the changes to data and structure, accessing the data, reviewing charts of the changes, and — importantly — Migration issues.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock