Provides both low-level and high-level interfaces for handling UTF-8, serving
as a complement to the functionality provided by com.google.common.base.Utf8
(Guava) and Java's built-in StandardCharsets.UTF_8.
Maven projects can use this library with a simple POM dependency:
<project>
...
<dependencies>
...
<dependency>
<groupId>com.everlaw</groupId>
<artifactId>utf8</artifactId>
<version>1.0.1</version>
</dependency>
...
</dependencies>
...
</project>A low-level utility class that provides static methods for testing,
encoding, and decoding UTF-8. The principal methods are:
isValid(codepoint): Returnstrueiff the given codepoint is valid UTF-8.toPackedInt(cseq, i): Encodes the 1- or 2-char Unicode codepoint starting atcseq[i]to 1-4 bytes of UTF-8, packed into a singleint. This enables incremental encoding of anyCharSequencewithout heap allocations.toPackedInt(codepoint): Encodes the given codepoint as UTF-8 packed into anintas described above.isContinuationByte(byte): Returnstrueiff the given byte is a UTF-8 continuation byte.numContinuationBytes(byte): Returns the number of continuation bytes that follow the given first byte of a possibly-multibyte UTF-8-encoded codepoint.
A high-level class for iterating over the UTF-8 bytes of a CharSequence,
implementing Java 8's PrimitiveIterator.OfInt. It allows for simple,
space-efficient iteration:
Utf8Iterator utf8 = new Utf8Iterator(string);
while (utf8.hasNext()) {
byte b = utf8.nextByte(); // convenience method for (byte) utf8.nextInt()
// do something with b
}This is functionally equivalent to:
ByteBuffer utf8 = StandardCharsets.UTF_8.encode(string);
while (utf8.hasRemaining()) {
byte b = utf8.get();
// do something with b
}The main benefits of using Utf8Iterator are:
- It operates on
CharSequences of all types, not justString. - It uses constant space, even for large strings, whereas the buffer returned
from
UTF_8.encodeis proportional to the size of the string. - It encodes incrementally, so no work is wasted if the loop is exited early.
This project uses Semantic versioning.
We are happy to receive Pull Requests. If you are planning a big change, it's probably best to discuss it as an Issue first.
In the root directory, run mvn install. That will build everything.