Minimal UTF-8 support in C code

caminhante a719c0e188 Now we are really parsing UTF-8 as specified 11 months ago
LICENSE 8513670368 Commit inicial 4 years ago
Makefile a719c0e188 Now we are really parsing UTF-8 as specified 11 months ago
README.md ec1032fe31 More goals and non-goals 4 years ago
minimal_UTF8.leo a719c0e188 Now we are really parsing UTF-8 as specified 11 months ago
utf8.c a719c0e188 Now we are really parsing UTF-8 as specified 11 months ago
utf8.h a719c0e188 Now we are really parsing UTF-8 as specified 11 months ago

README.md

Minimal UTF-8 support

Public-domain library with small set of functions to manipulate UTF-8 encoded text.

Features

  • Test whether a sequence of bytes is a valid UTF-8 character
  • Represent UTF-8 characters as a struct vector
  • Convert from an UTF-8 character to a 32-bit Unicode Codepoint
  • Convert from a 32-bit Unicode Codepoint to an UTF-8 character
  • Function to calculate the required space to convert a C string into an UTF-8 vector
  • Function to calculate the required space to convert an UTF-8 vector into a C string
  • Convert from a C string to an UTF-8 vector
  • Convert from an UTF-8 vector to a C string
  • Function to advance reading one UTF-8 character at a time
  • Function to send an UTF-8 character to file descriptor
  • Function to send an UTF-8 vector to file descriptor
  • Function to search an UTF-8 character in a vector
  • Function to search a sequence of characters in a vector

Goals and priorities

  • Be helpful
  • Be correct
  • Be easy to reuse
  • Facilitate text manipulation

Non-goals

  • Implement large functions
  • Be the fastest code available or vectorized

Installation

For using this library at your project, you can:

  • add utf8.[ch] files directly into your project (this library is public domain software);
  • load libminiutf8.so dynamically at runtime;
  • compile libminiutf8.a or utf8.o into your native executables.

Code examples

Coming soon.

Using, editing and notes

Each function has a descriptive commentary in the file utf8.h.

I use Leo Editor as my IDE.

This software is versioned under the monotonic versioning method. For more information see The Monotonic Versioning Manifesto