Text.WordBreak Class

Defined in: text/js/text-wordbreak.js:9

Module: text-wordbreak
Parent Module: text

Provides utility methods for splitting strings on word breaks and determining whether a character index represents a word boundary, using the generic word breaking algorithm defined in the Unicode Text Segmentation guidelines (Unicode Standard Annex #29).

This algorithm provides a reasonable default for many languages. However, it does not cover language or context specific requirements, and it does not provide meaningful results at all for languages that don't use spaces between words, such as Chinese, Japanese, Thai, Lao, Khmer, and others. Server-based word breaking services usually provide significantly better results with better performance.

Index
Methods

Item Index

Methods

_classify static
_isWordBoundary static
getUniqueWords static
getWords static
isWordBoundary static

Methods

`_classify`

(

string

)

Array protected static

Defined in text/js/text-wordbreak.js:196

Returns a character classification map for the specified string.

Parameters:

string String

String to classify.

Returns:

Array: Classification map.

`_isWordBoundary`

(

map
index

)

Boolean protected static

Defined in text/js/text-wordbreak.js:234

Returns true if there is a word boundary between the specified character index and the next character index (or the end of the string).

Note that there are always word breaks at the beginning and end of a string, so _isWordBoundary('', 0) and _isWordBoundary('a', 0) will both return true.

Parameters:

map Array

Character classification map generated by _classify.
index Number

Character index to test.

Returns:

Boolean:

`getUniqueWords`

(

string
options

)

Array static

Defined in text/js/text-wordbreak.js:154

Returns an array containing only unique words from the specified string. For example, the string 'foo bar baz foo' would result in the array ['foo', 'bar', 'baz'].

Parameters:

string String

String to split.
options Object

(optional) Options (see getWords() for details).

Returns:

Array: Array of unique words.

`getWords`

(

string
options

)

Array static

Defined in text/js/text-wordbreak.js:75

Splits the specified string into an array of individual words.

Parameters:

string String

String to split.
options Object

(optional) Options object containing zero or more of the following properties:

ignoreCase (Boolean)

If true, the string will be converted to lowercase before being split. Default is false.

includePunctuation (Boolean)

If true, the returned array will include punctuation characters. Default is false.

includeWhitespace (Boolean)

If true, the returned array will include whitespace characters. Default is false.

Returns:

Array: Array of words.

`isWordBoundary`

(

string
index

)

Boolean static

Defined in text/js/text-wordbreak.js:170

Returns true if there is a word boundary between the specified character index and the next character index (or the end of the string).

Note that there are always word breaks at the beginning and end of a string, so isWordBoundary('', 0) and isWordBoundary('a', 0) will both return true.

Parameters:

string String

String to test.
index Number

Character index to test within the string.

Returns:

Boolean: true for a word boundary, false otherwise.

APIs

Text.WordBreak Class

Item Index

Methods

Methods

_classify

Parameters:

Returns:

_isWordBoundary

Parameters:

Returns:

getUniqueWords

Parameters:

Returns:

getWords

Parameters:

Returns:

isWordBoundary

Parameters:

Returns:

`_classify`

`_isWordBoundary`

`getUniqueWords`

`getWords`

`isWordBoundary`