-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observation Redesign #148
Comments
Can you expand on what you mean by this:
JSON allows any values, so is it solely when you are converting to an SPL schema that causes problems? |
Yes... this is mainly a problem in SPL... we define the same time in Java, SPL and Python. |
|
Doesn't it depend on context, the value of other fields in the object if a field is a number or text? |
If one treats JSON (or XML, if you belong to that camp) as nothing but a hierarchy of name-value pairs, from Streams' internal representation point of view, why don't we just model them as such? For example, a sensor data normally converted to int32 device_id, rstring notes, int32 heart_beat, int32 temperature can be converted to intValues[enum.device_id], intValues[enum.heart_beat], intValues[enum.temperature], stringValues[enum.notes]. intValues, stringValues are declared with map<enum, int32> and map<enum, rstring>. This dumbed down data model, much like JSON, XML, gives the maximal flexibility and trades off poor runtime efficiency. |
To prototype with NLP support, added valueStr attribute in Reading to allow for representation of non-numeric observation. |
Part of NLP support is also about the flexibility of carrying tags throughout the processing pipeline. Tags may be further organized as flat lists or lexical trees. The key is however a flexible data payload structure. |
I'd like to start a discussion on how to redesign the Observation Type.
The Observation Type is our universal data type and is currently defined as follows:
The challenge is that the Observation type is too numeric centric. In some cases, when we ingest data from clinical notes, the values are non-numeric. There is currently no way to really represent those values. If we look into the FHIR specification, Observation can have any of the following value types:
Question is how do we represent all these different value types in Streams? Should we have one data type per value type? Should we extend Observation to handle all the different value types? Should we add valueString to Observation, and for anything that cannot be represented with a numeric value, we put it in a String.
The simplest thing to do is to change Observation to the following:
Or should we be more elaborate and simply duplicate the FHIR observation specification?
The text was updated successfully, but these errors were encountered: