Skip to content
Carlos Rueda edited this page Jul 27, 2014 · 36 revisions

Toward an RDF representation of the UDUNITS-2 unit definitions

Preliminary review of the XML structure

Is there a schema somewhere?

Here are some unit definitions found in the XML files:

(1) Typical simple definition:

        <unit>
            <def>60 min</def>
            <name><singular>hour</singular></name>
            <symbol>h</symbol>
            <aliases> <symbol>hr</symbol> </aliases>
        </unit>

(2) Multiple aliases:

        <unit>
            <def>&#xB0;/60</def>                <!-- DEGREE SIGN -->
            <name><singular>arc_minute</singular></name>
            <symbol>'</symbol>
            <symbol>&#x2032;</symbol>           <!-- PRIME -->
            <aliases>
                <name><singular>angular_minute</singular></name>
                <name><singular>arcminute</singular></name>
                <name><singular>arcmin</singular></name>
            </aliases>
        </unit>

Note that multiple <symbol>s appear at first level.

(3) Alias with only associated symbol:

    <unit>
        <def>V/A</def>
        <name><singular>ohm</singular></name>
        <symbol>&#x3A9;</symbol>        <!-- Greek capital letter omega
                                             (preferred) -->
        <aliases>
            <symbol>&#x2126;</symbol>   <!-- OHM SIGN -->
        </aliases>
    </unit>

(4) Unit with no (primary) name:

        <unit>
            <def>3.7e10 Bq</def>
            <aliases>
                <name><singular>curie</singular></name>
                <symbol>Ci</symbol>
            </aliases>
        </unit>

Modeling

Initial idea

  • Define class Unit
  • Define properties for "hasDefinition", "hasAlternate", "hasSymbol", "hasCardinality":
    • hasDefinition: string
    • hasAlternate: Unit
    • hasSymbol: string
    • hasCardinality: "singular" | "plural"

In the following, italicized name refers to the semantic name concept, while non-italicized name refers to the actual string instances in the vocabulary.

  • Each unit name from the XML definitions will be captured in a corresponding instance of the class Unit.
  • For this purpose, these extracted names are: all explicit names (singular and plural), all aliases (singular and plural) from the unit definition
  • With all names associated to a particular <unit> definition, the corresponding Unit instances are related to each using the "hasAlternate" property.

I think of the relationship between unit and definition as 1-1, which is why I don't like making 1 name -> 1 unit. I don't think 'ampere' is a unit and 'amp' is another unit, I think they are two names for the same unit. Whereas your hasAlternate wants to specify a relation between Units, I think it really is for connecting different names.

yes, good point.

Example

The following XML unit definition:

        <unit>
            <def>'/60</def>
            <name><singular>arc_second</singular></name>
            <symbol>"</symbol>
            <symbol>&#x2033;</symbol>           <!-- DOUBLE PRIME -->
            <aliases>
                <name><singular>angular_second</singular></name>
                <name><singular>arcsecond</singular></name>
                <name><singular>arcsec</singular></name>
            </aliases>
        </unit>

Will generate the following RDF Unit instances:

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-accepted/> .
@prefix prop:    <http://mmisw.org/ont/mmitest/udunits2-prop/> .

:arc_second
      a       :Unit ;
      prop:hasCardinality  "singular";
      prop:hasAlternate  :arcsec , :angular_second , :arcsecond ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol "\"" , "″" .

:arcsec
      a       :Unit ;
      prop:hasCardinality  "singular";
      prop:hasAlternate  :arc_second , :angular_second , :arcsecond ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol "\"" , "″" .

:angular_second
      a       :Unit ;
      prop:hasCardinality  "singular";
      prop:hasAlternate  :arc_second , :arcsec, :arcsecond ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol "\"" , "″" .

:arcsecond
      a       :Unit ;
      prop:hasCardinality  "singular";
      prop:hasAlternate  :arc_second , :arcsec, :angular_second ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol "\"" , "″" .

Note that although the example above explicitly reflects all associated properties for each Unit instance, not all of these would need to be actually materialized internally as some of this information could be inferred with appropriate modeling. More concretely:

  • One "master" Unit instance is designated to have all characterization explicitly associated including links to all its alternate names via the prop:hasAlternate property, which is symmetric.
  • Each of the other instances will basically only indicate prop:hasCardinality.

Example Alternate -- proposed for first release (graybeal)

In this case, we treat this as a vocabulary, in which the strings are the first class object and everything is related to them. It makes a distinction between the primary unit string, and the aliases. But otherwise, this is not a normalized model; information is repeated everywhere. (So if arc_second ever gets a new symbol, a lot of terms will change.) FOr what it's worth, an ideal UDUNITS exploration tool would be able to present alphabetized lists not just of the Unit and Alias terms, but also the definition and symbol strings.

The example will generate the following RDF unit instances:

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-accepted/> .
@prefix prop:    <http://mmisw.org/ont/mmitest/udunits2-prop/> .

:arc_second
      a       :Unit ;
      prop:hasCardinality  "singular";
      prop:hasSingularAlias  :arcsec , :angular_second , :arcsecond ;
      prop:hasDefinition    "'/60" ;
      prop:hasComment       "DOUBLE PRIME"
      prop:hasSymbol        "\"", "″" .

:arcsec
      a       :Alias ;
      prop:hasCardinality  "singular";
      prop:hasUnit          :arc_second  ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol        "\"", "″" .

:angular_second
      a       :Alias ;
      prop:hasCardinality  "singular";
      prop:hasUnit          :arc_second  ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol        "\"", "″" .

:arcsecond
       a       :Alias ;
      prop:hasCardinality  "singular";
      prop:hasUnit          :arc_second  ;
      prop:hasDefinition    "'/60" ;
      prop:hasSymbol        "\"", "″" .

Example Alternate Properly Modeled (graybeal)

In this case, we treat the unit itself as the primary entity. It is actually the definition that is the unique 'key' for each unit (as in the original document), but the fact that some units have URL-unfriendly definitions means we do the right thing and use an opaque code for each unit. A tool that works with this ontology will have to be able to recognize and present the name strings associated with the units; is the label is the right way to do this systematically?

Now the example will generate the following RDF unit instance:

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-accepted/> .
@prefix prop:    <http://mmisw.org/ont/mmitest/udunits2-prop/> .

:2a1369 
      a       :Unit ;
      prop:hasDefinition     "'/60" ;
      rdfs:label             "arc_second" ;     // I am not sure I've used the right property 
      prop:hasSingularName   "arc_second" ;
      prop:hasSingularAlias  "arcsec", "angular_second", "arcsecond" ;
      prop:hasSymbol         "\"", "″" ;
      prop:hasSymbolComment  "DOUBLE PRIME"  .

This would require special handling to display as a vocabulary. One could choose to augment this model with the concepts for Names and Aliases, something like the following. (Using the strings as subjects is appropriate here, because the entire concept of the alias is built around the string itself; change the string and you have a different alias.)

:arcsec
      a        :Alias  ;
      prop:referencesUnit    :2a1369;
      rdfs:label             "arcsec" ;
      prop:hasCardinality    "singular" .

If you wanted to build a a complete model you might create entries for all the symbols, like the following. But I think there is not an important use case for doing so.

:39f2c1
      a       :Symbol ;
      prop:referencesUnit    :2a1369;
      rdfs:label             "″";
      rdfs:comment           "DOUBLE PRIME"  .
Clone this wiki locally