Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make formphrase macros importable/configurable #27

Open
2 tasks
RieksJ opened this issue Jan 9, 2024 · 7 comments
Open
2 tasks

Make formphrase macros importable/configurable #27

RieksJ opened this issue Jan 9, 2024 · 7 comments
Assignees
Labels
checks needed Some checks are needed to close the issue

Comments

@RieksJ
Copy link
Member

RieksJ commented Jan 9, 2024

In order to make formphrase macros also useable when terminologies are developed in different languages, it is necessary that they can be specified outside of the source code of the tools. Also, if a curator wants to adjust the macro's, (s)he can then do so. It is also handy for testing new regex candidates.

This issue calls for:

  • devising a way for users to specify the macro's they want to use, and documenting that in the specifications.
  • modify the MRGT so that it will provide the user with the following options for expanding formphrase macros:
    • only apply the implemented/coded macro-map (this would be the default),
    • only apply the custom specified macro-map,
    • first use the custom specified macro-map and then the implemented/coded default one.

For starters of the specifications, I think the macros should either be specified in a (new) section of the SAF (that doesn't get copied into MRGs), or we could make it a command-line option for the MRGT (so that it can also be listed in the MRGT configuration file).

@RieksJ RieksJ added enhancement New feature or request impact: MRGT labels Jan 9, 2024
@RieksJ RieksJ changed the title Make formphrase macro regexmap importable from SAF Make formphrase macros importable/configurable Jan 12, 2024
@RieksJ
Copy link
Member Author

RieksJ commented Jan 22, 2024

Decisions:

  1. We let go of the idea that all stuff that we can put in the config file of a tool must also be available on the command-line.
  2. The (part of the) config file for MRGT will have a section that allows for specifying (possibly empty) formphrase macros. If a (possibly empty) formphrase macro is specified, it will override the predefined macros, so you can 'adjust', and even 'remove' predefined macros, as well as add your own.

@RieksJ
Copy link
Member Author

RieksJ commented Jan 29, 2024

@Ca5e:

  • coderen
  • tev2-specifications updatedn

Ca5e added a commit that referenced this issue Feb 1, 2024
Ca5e added a commit to tno-terminology-design/tev2-specifications that referenced this issue Feb 4, 2024
@Ca5e
Copy link
Member

Ca5e commented Feb 4, 2024

  • Added the macros key to the configuration file documentation.
  • Added note within MRGT documentation regarding options being only available within configuration file.
  • Added functionality to MRGT v1.0.4.

@RieksJ, please check the documentation.

@RieksJ
Copy link
Member Author

RieksJ commented Feb 5, 2024

  • @RieksJ to check the documentation
  • @Ca5e to allow default form phrase macros to be specifiable in the same section in the SAF in which the (scope-wide) TermRef interpreter is specified

@RieksJ RieksJ self-assigned this Feb 19, 2024
Ca5e added a commit that referenced this issue Feb 22, 2024
@RieksJ
Copy link
Member Author

RieksJ commented Feb 27, 2024

@Ca5e Can you have a look at the specification of form phrase macro maps, and particularly the section on how they work.

If you are convinced the specifications and the operation of the tools agree, you may close this issue. If not, please comment what the (remaining) issues are.

@RieksJ RieksJ removed their assignment Feb 29, 2024
@RieksJ RieksJ added checks needed Some checks are needed to close the issue and removed enhancement New feature or request impact: MRGT labels Mar 1, 2024
@Ca5e
Copy link
Member

Ca5e commented Mar 2, 2024

Some things I believe should be looked into...

  • The TRRT doesn't actually use the form phrase macro map (as stated here), it only sees the result of the MRGT's conversion within a MRG.
  • The ability to specify the macro map within a config file was removed after moving the code to the SAF class. @RieksJ should this be moved to the MRGT tool again so we can combine both sources?
  • This section seems somewhat vague:

"Form phrases are used to refer to a particular semantic unit as known in a particular terminology."

I'd say form phrases aren't used to refer to a semantic unit, but instead enable a semantic unit to be referred to.

Here is how a form phrase is matched against:

Considering we're using termid to match where possible, I believe this section should be rethought. Within the MRGT there isn't much of a difference between searching in curated text or MRGs either. When the tool first recognizes that the curated texts are supposed to be used, it loads all of the curated texts as a 'normal' list of MRG entries.

  • I appreciate the way the form phrase documentation is kept quite universal, but it may be useful to refer back to the processing steps within the MRGT documentation, as that is the only tool that used the macro maps.
  • The documented yaml is not valid.
macros:
- "{ss}":   ["", "s", "'s", "(s)"],      // "act{ss}" --> "act", "acts", "act's", "act(s)"
- "{ess}":  ["", "es", "'s", "(es)"],    // "regex{es}" --> "regex", "regexes", "regex's", "regex(es"
- "{yies}": ["y", "y's", "ies"],         // "part{yies}" --> "party", "party's", "parties"
- "{ying}": ["y", "ying", "ies", "ied"], // "identif{ying}" --> "identify", "identifying", "identifies", "identified"
- "{es}":   ["e", "es", "ed", "ing"],    // "mangag{es}" --> "manage", "manages", "managed", "managing"
- "{able}": ["able", "ability"]          // "cap{able}" --> "capable", "capability"

should actually be (remove dashes that make the dictionary a list, change comment format)

macros:
  "{ss}":   ["", "s", "'s", "(s)"],      # "act{ss}" --> "act", "acts", "act's", "act(s)"
  "{ess}":  ["", "es", "'s", "(es)"],    # "regex{es}" --> "regex", "regexes", "regex's", "regex(es"
  "{yies}": ["y", "y's", "ies"],         # "part{yies}" --> "party", "party's", "parties"
  "{ying}": ["y", "ying", "ies", "ied"], # "identif{ying}" --> "identify", "identifying", "identifies", "identified"
  "{es}":   ["e", "es", "ed", "ing"],    # "mangag{es}" --> "manage", "manages", "managed", "managing"
  "{able}": ["able", "ability"]          # "cap{able}" --> "capable", "capability"

I have however changed the interpreting code so the format that does use the dashes is also supported in the next release.

@RieksJ
Copy link
Member Author

RieksJ commented Apr 1, 2024

@Ca5e Thanks for all the comments, which I have used to improve the documentation.

And, Yes, please move back so we can combine these sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checks needed Some checks are needed to close the issue
Projects
None yet
Development

No branches or pull requests

2 participants