Package org.apache.fop.util
Class CharUtilities
java.lang.Object
org.apache.fop.util.CharUtilities
This class provides utilities to distinguish various kinds of Unicode
whitespace and to get character widths in a given FontState.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final charcarriage returnstatic final charCharacter code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.static final intCharacter class: Boundary between text runsstatic final charIdeogreaphic spacestatic final charline-separatorstatic final intCharacter class: Line feedstatic final charlinefeed characterstatic final charleft-to-right embeddingstatic final charleft-to-right markstatic final charleft-to-right overridestatic final charmissing ideographstatic final charnon-breaking spacestatic final charnext line control characterstatic final intCharacter class: non-whitespacestatic final charUnicode value indicating the the character is "not a character".static final charnull charstatic final charObject replacement characterstatic final charparagraph-separatorstatic final charpop directional formattingstatic final charright-to-left embeddingstatic final charright-to-left markstatic final charright-to-left overridestatic final charsoft hyphenstatic final charnormal spacestatic final charnormal tabstatic final intCharacter class: Unicode white spacestatic final charword joinerstatic final intCharacter class: XML whitespacestatic final charzero-width joinerstatic final charzero-width no-break space (= byte order mark)static final charzero-width space -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedUtility class: Constructor prevents instantiating when subclassed. -
Method Summary
Modifier and TypeMethodDescriptionstatic StringcharToNCRef(int c) Convert a single unicode scalar value to an XML numeric character reference.static intclassOf(int c) Return the appropriate CharClass constant for the type of the passed character.Creates an iterator to iter aCharSequencecodepoints.codepointsIter(CharSequence s, int beginIndex, int endIndex) Creates an iterator to iter a sub-CharSequence codepoints.static booleancontainsSurrogatePairAt(CharSequence chars, int index) Tells whether there is a surrogate pair starting from the given index in theCharSequence.static Stringformat(int c) Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.static intincrementIfNonBMP(int codePoint) Returns 1 if codePoint not in the BMP.static booleanisAdjustableSpace(int c) Method to determine if the character is an adjustable space.static booleanisAlphabetic(int c) Indicates whether a character is classified as "Alphabetic" by the Unicode standard.static booleanisAnySpace(int c) Determines if the character represents any kind of space.static booleanisBmpCodePoint(int codePoint) Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP).static booleanisBreakableSpace(int c) Helper method to determine if the character is a space with normal behavior.static booleanisExplicitBreak(int c) Indicates whether the given character is an explicit break-characterstatic booleanisFixedWidthSpace(int c) Method to determine if the character is a (breakable) fixed-width space.static booleanisNonBreakableSpace(int c) Method to determine if the character is a nonbreaking space.static booleanisSameSequence(CharSequence cs1, CharSequence cs2) Determine if two character sequences contain the same characters.static booleanisSurrogatePair(char ch) Determine if the given characters is part of a surrogate pair.static booleanisZeroWidthSpace(int c) Method to determine if the character is a zero-width space.static StringPad a string S on left out to width W using padding character PAD.static StringConvert a string to a sequence of ASCII or XML numeric character references.
-
Field Details
-
CODE_EOT
public static final char CODE_EOTCharacter code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.- See Also:
-
UCWHITESPACE
public static final int UCWHITESPACECharacter class: Unicode white space- See Also:
-
LINEFEED
public static final int LINEFEEDCharacter class: Line feed- See Also:
-
EOT
public static final int EOTCharacter class: Boundary between text runs- See Also:
-
NONWHITESPACE
public static final int NONWHITESPACECharacter class: non-whitespace- See Also:
-
XMLWHITESPACE
public static final int XMLWHITESPACECharacter class: XML whitespace- See Also:
-
NULL_CHAR
public static final char NULL_CHARnull char- See Also:
-
LINEFEED_CHAR
public static final char LINEFEED_CHARlinefeed character- See Also:
-
CARRIAGE_RETURN
public static final char CARRIAGE_RETURNcarriage return- See Also:
-
TAB
public static final char TABnormal tab- See Also:
-
SPACE
public static final char SPACEnormal space- See Also:
-
NBSPACE
public static final char NBSPACEnon-breaking space- See Also:
-
NEXT_LINE
public static final char NEXT_LINEnext line control character- See Also:
-
ZERO_WIDTH_SPACE
public static final char ZERO_WIDTH_SPACEzero-width space- See Also:
-
WORD_JOINER
public static final char WORD_JOINERword joiner- See Also:
-
ZERO_WIDTH_JOINER
public static final char ZERO_WIDTH_JOINERzero-width joiner- See Also:
-
LRM
public static final char LRMleft-to-right mark- See Also:
-
RLM
public static final char RLMright-to-left mark- See Also:
-
LRE
public static final char LREleft-to-right embedding- See Also:
-
RLE
public static final char RLEright-to-left embedding- See Also:
-
PDF
public static final char PDFpop directional formatting- See Also:
-
LRO
public static final char LROleft-to-right override- See Also:
-
RLO
public static final char RLOright-to-left override- See Also:
-
ZERO_WIDTH_NOBREAK_SPACE
public static final char ZERO_WIDTH_NOBREAK_SPACEzero-width no-break space (= byte order mark)- See Also:
-
SOFT_HYPHEN
public static final char SOFT_HYPHENsoft hyphen- See Also:
-
LINE_SEPARATOR
public static final char LINE_SEPARATORline-separator- See Also:
-
PARAGRAPH_SEPARATOR
public static final char PARAGRAPH_SEPARATORparagraph-separator- See Also:
-
MISSING_IDEOGRAPH
public static final char MISSING_IDEOGRAPHmissing ideograph- See Also:
-
IDEOGRAPHIC_SPACE
public static final char IDEOGRAPHIC_SPACEIdeogreaphic space- See Also:
-
OBJECT_REPLACEMENT_CHARACTER
public static final char OBJECT_REPLACEMENT_CHARACTERObject replacement character- See Also:
-
NOT_A_CHARACTER
public static final char NOT_A_CHARACTERUnicode value indicating the the character is "not a character".- See Also:
-
-
Constructor Details
-
CharUtilities
protected CharUtilities()Utility class: Constructor prevents instantiating when subclassed.
-
-
Method Details
-
classOf
public static int classOf(int c) Return the appropriate CharClass constant for the type of the passed character.- Parameters:
c- character to inspect- Returns:
- the determined character class
-
isBreakableSpace
public static boolean isBreakableSpace(int c) Helper method to determine if the character is a space with normal behavior. Normal behavior means that it's not non-breaking.- Parameters:
c- character to inspect- Returns:
- True if the character is a normal space
-
isZeroWidthSpace
public static boolean isZeroWidthSpace(int c) Method to determine if the character is a zero-width space.- Parameters:
c- the character to check- Returns:
- true if the character is a zero-width space
-
isFixedWidthSpace
public static boolean isFixedWidthSpace(int c) Method to determine if the character is a (breakable) fixed-width space.- Parameters:
c- the character to check- Returns:
- true if the character has a fixed-width
-
isNonBreakableSpace
public static boolean isNonBreakableSpace(int c) Method to determine if the character is a nonbreaking space.- Parameters:
c- character to check- Returns:
- True if the character is a nbsp
-
isAdjustableSpace
public static boolean isAdjustableSpace(int c) Method to determine if the character is an adjustable space.- Parameters:
c- character to check- Returns:
- True if the character is adjustable
-
isAnySpace
public static boolean isAnySpace(int c) Determines if the character represents any kind of space.- Parameters:
c- character to check- Returns:
- True if the character represents any kind of space
-
isAlphabetic
public static boolean isAlphabetic(int c) Indicates whether a character is classified as "Alphabetic" by the Unicode standard.- Parameters:
c- the character- Returns:
- true if the character is "Alphabetic"
-
isExplicitBreak
public static boolean isExplicitBreak(int c) Indicates whether the given character is an explicit break-character- Parameters:
c- the character to check- Returns:
- true if the character represents an explicit break
-
charToNCRef
Convert a single unicode scalar value to an XML numeric character reference. If in the BMP, four digits are used, otherwise 6 digits are used.- Parameters:
c- a unicode scalar value- Returns:
- a string representing a numeric character reference
-
toNCRefs
Convert a string to a sequence of ASCII or XML numeric character references.- Parameters:
s- a java string (encoded in UTF-16)- Returns:
- a string representing a sequence of numeric character reference or ASCII characters
-
padLeft
Pad a string S on left out to width W using padding character PAD.- Parameters:
s- string to padwidth- width of field to add paddingpad- character to use for padding- Returns:
- padded string
-
format
Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.- Parameters:
c- character code- Returns:
- formatted character string
-
isSameSequence
Determine if two character sequences contain the same characters.- Parameters:
cs1- first character sequencecs2- second character sequence- Returns:
- true if both sequences have same length and same character sequence
-
isBmpCodePoint
public static boolean isBmpCodePoint(int codePoint) Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP). Such code points can be represented using a singlechar.- Parameters:
codePoint- the character (Unicode code point) to be tested- Returns:
trueif the specified code point is between Character#MIN_VALUE and Character#MAX_VALUE} inclusive;falseotherwise- See Also:
-
incrementIfNonBMP
public static int incrementIfNonBMP(int codePoint) Returns 1 if codePoint not in the BMP. This function is particularly useful in for loops over strings where, in presence of surrogate pairs, you need to skip one loop.- Parameters:
codePoint- 1 if codePoint > 0xFFFF, 0 otherwise- Returns:
- 1 if codePoint > 0xFFFF, 0 otherwise
-
isSurrogatePair
public static boolean isSurrogatePair(char ch) Determine if the given characters is part of a surrogate pair.- Parameters:
ch- character to be checked- Returns:
- true if ch is an high surrogate or a low surrogate
-
containsSurrogatePairAt
Tells whether there is a surrogate pair starting from the given index in theCharSequence. If the character at index is an high surrogate then the character at index+1 is checked to be a low surrogate. If a malformed surrogate pair is encountered then anIllegalArgumentExceptionis thrown.high surrogate [0xD800 - 0xDC00] low surrogate [0xDC00 - 0xE000]
- Parameters:
chars- CharSequence to checkindex- index in the CharSequqnce where to start the check- Returns:
- true if there is a well-formed surrogate pair at index
- Throws:
IllegalArgumentException- if there wrong usage of surrogate pairs
-
codepointsIter
Creates an iterator to iter aCharSequencecodepoints.- Parameters:
s-CharSequenceto iter- Returns:
- codepoint iterator for the given
CharSequence. - See Also:
-
codepointsIter
Creates an iterator to iter a sub-CharSequence codepoints.- Parameters:
s-CharSequenceto iterbeginIndex- lower rangeendIndex- upper range- Returns:
- codepoint iterator for the given sub-CharSequence.
- See Also:
-