public class Dom2JSON
extends java.lang.Object
This class contains routines to convert a standard XML DOM into a JSON structure. Useful for the kind of XML is that which has been used to store DATA. What I mean by that is that data is always name-value pair, can be stored in an attribute OR can be stored in scalar form (one element with a name and contents in the value). If you have a set of values then you use a set of elements all the same name.
* An element can come in two forms: scalar value and object The Object has members, and two types of member: scalar or vector
The simple scalar form can ONLY have text within it, nothing else. No attributes, no sub elements. This then gets translated to the following JSON:
{
"elementA": value
}
In case B, we we have an object, and must structure this as a JSON object. Each of the attributes becomes a single named string element. Each sub-element is converted depending upon whether it is simple or complex, and whether there is one or multiple. In this first case, we have one attribute and one simple sub-element
{
"elementB": {
"attrib": "aval",
"sub": "xxx"
}
}
Attributes always map to simple values, and are treated the same as simple subelements. The order is assumed to be unimportant.
The next case to consider is multiple simple subelements. If multiple tags with the same name appear in the child, then the output need to map to an array value.
Then the result needs to be:
{
"elementC": {
"sub": ["xxx", "yyy", "zzz"]
}
}
The next case to consider is a single complex sub element:
Because the sub element is complex, the result must be an object notation:
{
"elementD": {
"sub": {
x: 23,
y: 48
}
}
}
The final case to consider is multiple complex sub elements and you get an array of json objects:
{
"elementD": {
"sub": [
{
x: 23,
y: 48
},
{
x: 23,
y: 48
}
]
}
}
IF an element is supposed to be an array, but the source document has only a single item, then there is no way for the introspection approach to determine that is it supposed to be an array or not. If there is only one (or zero) element, it can not know that it is supposed to be an array.
So Hints is a map from element name to an integer
In this example:
<books><book/></books>
The hint of 3 is placed on "book" so that book will be interpreted as an object, and also to say that there can be multiple books potentially. Unfortunately, if the XML has zero book elements, there is no way to know about the hint, and then no empty array can be generated. To solve this we would need a more complete schema definition of all the things that could be within any given element.
An important thing to remember is that when an element is determined to be non-simple, then all of the text within it is ignored. THat is, all the text between the start and end tag that is NOT within a sub element. For data usage in XML, the text around the sub elements is usually text for indenting the tags for display. This transformation strips that out. (There is no way in XML to distinguish content text from text that is used for layout, and there is a weak mechanism to consider all white space collapsable into a single character -- or even no characters. This is a weakness of XML and why you should generally prefer JSON for data.)
In the example below, all of the 'x' will be ignored. IN a real XML this might be white space, or it might be other characters, but in this conversion they will always be ignored.
<elementF>xx xxxxx<sub>Hello</sub>xx xxxxx<dub> xxxxxxxx<dubdub>World</dubdub>xxx xxxxx</dub>xxx </elementF>
The implication of this is that this is NOT useful for converting HTML to JSON. In HTML, you can have text, and within that text a word might be marked BOLD or something. In this conversion, all the text except for the small block marked bold would be ignored. Another way of thinking about this is that text for data can only exist as leaves at the end of the tree. Text that is not at a leaf will be ignored.
| Modifier and Type | Field and Description |
|---|---|
static int |
HINT_OBJECT
HINT_OBJECT (2) means it is an object even if no attributes or sub-elements appear,
still always treat this as an object and return an object whether empty or not.
|
static int |
HINT_OBJECT_ARRAY
HINT_OBJECT_ARRAY (3) means that this can can appear multiple times in the context, and that each time it is expected to have something complicated in it,
so assocait the name with an array, and put the contents into object in that array.
|
static int |
HINT_SIMPLE
HINT_SIMPLE (0) means it is a simple value, just take whatever string is found in the XML, and associate it with the attribute name as a string.
|
static int |
HINT_SIMPLE_ARRAY
HINT_SIMPLE_ARRAY (1) means this element will appear multiple times in the current context and make an array of simple values,
that is, and associate this name with an array that is then filled with simple string values
Used in the hint hashtable parameter of convertDomToJSON and convertElementToJSON.
|
| Constructor and Description |
|---|
Dom2JSON() |
| Modifier and Type | Method and Description |
|---|---|
static JSONObject |
convertDomToJSON(org.w3c.dom.Document doc)
Pass in a DOM Document, and you get a JSON object that represents
the entire contents.
|
static JSONObject |
convertDomToJSON(org.w3c.dom.Document doc,
java.util.Hashtable<java.lang.String,java.lang.Integer> hints)
Pass in a DOM Document, and you get a JSON object that represents
the entire contents.
|
static JSONObject |
convertElementToJSON(org.w3c.dom.Element rootEle)
Pass in an element and make a stand alone JSON structure for this one
element.
|
static JSONObject |
convertElementToJSON(org.w3c.dom.Element rootEle,
java.util.Hashtable<java.lang.String,java.lang.Integer> hints)
Pass in an element and make a stand alone JSON structure for this one
element.
|
static boolean |
elementIsSimple(org.w3c.dom.Element ele,
java.lang.Integer hintValue)
determines whether this element is simple or not.
|
static JSONObject |
getElementObj(org.w3c.dom.Element ele,
java.util.Hashtable<java.lang.String,java.lang.Integer> hints)
Gets the content of the element as an object that includes members
for each of the attributes, and for each of the sub elements.
|
static java.lang.String |
getElementText(org.w3c.dom.Element ele)
Returns the string contents of a simple element.
|
public static final int HINT_SIMPLE
public static final int HINT_SIMPLE_ARRAY
public static final int HINT_OBJECT
public static final int HINT_OBJECT_ARRAY
public static JSONObject convertDomToJSON(org.w3c.dom.Document doc) throws java.lang.Exception
java.lang.Exceptionpublic static JSONObject convertDomToJSON(org.w3c.dom.Document doc, java.util.Hashtable<java.lang.String,java.lang.Integer> hints) throws java.lang.Exception
java.lang.Exceptionpublic static JSONObject convertElementToJSON(org.w3c.dom.Element rootEle) throws java.lang.Exception
java.lang.Exceptionpublic static JSONObject convertElementToJSON(org.w3c.dom.Element rootEle, java.util.Hashtable<java.lang.String,java.lang.Integer> hints) throws java.lang.Exception
java.lang.Exceptionpublic static boolean elementIsSimple(org.w3c.dom.Element ele,
java.lang.Integer hintValue)
public static JSONObject getElementObj(org.w3c.dom.Element ele, java.util.Hashtable<java.lang.String,java.lang.Integer> hints) throws java.lang.Exception
java.lang.Exceptionpublic static java.lang.String getElementText(org.w3c.dom.Element ele)
throws java.lang.Exception
java.lang.Exception