public class DeEntifyStrings
extends java.lang.Object
DeEntify
,
DeEntifyStrings
,
Entify
,
EntifyStrings
,
Flatten
Modifier and Type | Field and Description |
---|---|
static int |
LONGEST_ENTITY
Longest an entity can be, at least in our tables, including the lead & and trail ;.
|
static int |
SHORTEST_ENTITY
The shortest an entity can be 4, at least in our tables, including the lead & and
trailing ;.
|
static char |
UNICODE_NBSP_160_0x0a
unicode nbsp control char, 160, 0x0a.
|
Constructor and Description |
---|
DeEntifyStrings() |
Modifier and Type | Method and Description |
---|---|
static char |
bareHTMLEntityToChar(java.lang.String bareEntity,
char howToTranslateNbsp)
convert an entity to a single char.
|
static java.lang.String |
deEntifyHTML(java.lang.String text,
char translateNbspTo)
Converts HTML to text converting entities such as " back to " and < back to < Ordinary text passes
unchanged.
|
static java.lang.String |
deEntifyXML(java.lang.String text)
Converts XML to text converting entities such as " back to " and < back to < Ordinary text passes
unchanged.
|
static java.lang.String |
flattenHTML(java.lang.String text,
char translateNbspTo)
strips tags and entities from HTML.
|
static java.lang.String |
flattenXML(java.lang.String text)
strips tags and entities from XML..
|
protected static char |
possBareHTMLEntityWithSemicolonToChar(java.lang.String possBareEntityWithSemicolon,
char translateNbspTo)
Checks a number of gauntlet conditions to ensure this is a valid entity.
|
static char |
possEntityToChar(java.lang.String possBareEntityWithSemicolon)
Checks a number of gauntlet conditions to ensure this is a valid entity.
|
static java.lang.String |
stripHTMLTags(java.lang.String html)
Removes tags from HTML leaving just the raw text.
|
static java.lang.String |
stripXMLTags(java.lang.String xml)
Removes tags from XML leaving just the raw text.
|
public static final char UNICODE_NBSP_160_0x0a
public static final int LONGEST_ENTITY
public static final int SHORTEST_ENTITY
public static char bareHTMLEntityToChar(java.lang.String bareEntity, char howToTranslateNbsp)
bareEntity
- String entity to convert convert. must have lead & and trail ; stripped; may have form: #x12ff or #123 or lt or nbsp
style entity. Works faster if entity in lower case.howToTranslateNbsp
- char you would like   translated to, usually ' ' or (char) 160public static java.lang.String deEntifyHTML(java.lang.String text, char translateNbspTo)
text
- raw text to be processed. Must not be null.translateNbspTo
- char you would like translated to, usually ' ' or (char) 160 .public static java.lang.String deEntifyXML(java.lang.String text)
text
- raw XML text to be processed. Must not be null.public static java.lang.String flattenHTML(java.lang.String text, char translateNbspTo)
text
- to flattentranslateNbspTo
- char you would like translated to, usually ' ' or (char) 160 .public static java.lang.String flattenXML(java.lang.String text)
text
- to flattenpublic static char possEntityToChar(java.lang.String possBareEntityWithSemicolon)
possBareEntityWithSemicolon
- string that may hold an entity. Lead & must be stripped,
but may optionally contain text past the ;public static java.lang.String stripHTMLTags(java.lang.String html)
html
- input HTMLpublic static java.lang.String stripXMLTags(java.lang.String xml)
xml
- input XMLprotected static char possBareHTMLEntityWithSemicolonToChar(java.lang.String possBareEntityWithSemicolon, char translateNbspTo)
possBareEntityWithSemicolon
- string that may hold an entity. Lead & must be stripped, but may optionally contain text past the ;translateNbspTo
- char you would like nbsp translated to, usually ' ' or (char) 160 .