Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile unicode characters in record attributes/searchable terms #261

Open
jsstevenson opened this issue Apr 13, 2022 · 2 comments
Open
Labels
bug Something isn't working ux Changes that help the user experience

Comments

@jsstevenson
Copy link
Member

jsstevenson commented Apr 13, 2022

We handle non-ASCII characters in database records by converting them to HTML escape sequences:

(however github's markdown renderer will display them as unicode -- the actual text of the value below is & folloewd by alpha followed by ;)

"label": "CXCL12α",

http://normalize.cancervariants.org/therapy/normalize?q=pubchem.substance:135651881

However, these values aren't properly handled either in the SwaggerUI interface (the label value above is converted to CXCL12&alpha) or via an HTTP request (the label value above gets the escape sequence cut entirely and the final query is CXCL12 which is an entirely different drug)

@korikuzma
Copy link
Member

korikuzma commented Apr 25, 2022

  • html.unescape --> urrlib.parse.unquote in main.py

@jsstevenson jsstevenson self-assigned this Sep 15, 2022
@mcannon068nw
Copy link
Contributor

mcannon068nw commented Sep 15, 2022

From recent analysis of concept mismatches, some concepts' child terms are failing to normalize to anything. Not all, but a number of these seem possibly due to special characters.
concept-mismatches-nan-ddb20220915.xlsx

@jsstevenson jsstevenson added bug bug Something isn't working ux Changes that help the user experience and removed bug labels Dec 24, 2022
@jsstevenson jsstevenson removed their assignment Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ux Changes that help the user experience
Projects
None yet
Development

No branches or pull requests

3 participants