Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

Open
devinbalkind opened this issue Mar 6, 2016 · 7 comments
Open

Comments

@devinbalkind
Copy link
Member

Currently the OCR-related strings are in 'core' so can't be deselected in the UI but must be removed manually from the xls afterwards.
OCR-related should only be visible if the OCR module (or 'All Modules') are enabled/selected

@devinbalkind devinbalkind changed the title [trac 11/03/14] Translation tool should be able to exclude OCR [trac import 11/03/14] Translation tool should be able to exclude OCR Mar 6, 2016
@sajanrav
Copy link

This bug seems to be resolved. Can this be closed?

Following steps were done to verify:

  1. Un-commented OCR module section in /module/templates/default/config.py
  2. Navigated to admin/translate/create?opt=1
  3. OCR was found in list of modules for selection.
  4. Re-commented OCR module section in /module/templates/default/config.py
  5. Navigated to admin/translate/create?opt=1
  6. OCR was not found in list of modules for selection.

@flavour
Copy link
Member

flavour commented Nov 24, 2016

What I think is still needed is to exclude at least some of modules/s3/s3pdf.py

To Test, you need to export a translation file for a new language code (even a dummy one) & you'll probably see some strings which relate to OCR even if the OCR module is disabled

@sajanrav
Copy link

Two tests were done with OCR module disabled and on each occasion, OCR related strings were found to be present when core was included ( i.e. include core was checked ). When core was not included, OCR strings were not found in the exported file.

Another observation was that the OCR related strings found were independent of the language and module selected. On both instances when core was included, the same OCR strings were found in the exported files.

Test 1:

  1. Navigated to admin/translate/create?opt=1
  2. Selected module - organizations
  3. Selected language code - es
  4. Checked and unchecked "include core" and exported for each instance.

Test 2:

  1. Navigated to admin/translate/create?opt=1
  2. Selected module - hospitals
  3. Selected language code - fr
  4. Checked and unchecked "include core" and exported for each instance.

@flavour
Copy link
Member

flavour commented Nov 24, 2016

Exactly what I'd expect.
OCR strings in s3pdf currently are included in 'core' but should be included in the 'ocr' module from a translation perspective

@hallamoore
Copy link
Contributor

hallamoore commented Dec 10, 2019

@flavour do you have any ideas on how to best approach this?

I see that the existing logic to handle special files uses the function names to determine which modules the strings belong to, but it seems like S3PDF mixes OCR and non-OCR in the same functions.

My first instinct is to declare which module a string should belong to on a per-string basis, like declare_string_module(T('My OCR string'), 'ocr'). Then we could use something similar to the existing special parsers, but modified to also get the declared module name. Then we would compare with that instead of with the function name.

One concern I have is that is that declare_string_module would be a no-op, useful only for the translation tool and not for the code itself.

So another possible direction is to pull the strings out into some sort of data structure that also includes the module the string should belong to. Then the translation tool can pull both the strings and the module from that, and the application code can pull just the strings. This might look something like

STRINGS = Storage(
    MY_OCR=Storage(value="My OCR string", module="ocr"),
    ...
    )

# application code
T(STRINGS.MY_OCR.value)

# translation tool
for item in STRINGS.values():
    if item.module == desired_module:
        strings.append(item.value)

which seems okay but maybe inconsistent and easy to forget to use STRINGS instead of passing in a direct string?

@nursix
Copy link
Member

nursix commented Dec 10, 2019

@hallamoore S3PDF is an obsolete module except for the OCR parts (all other PDF codec functionality has long been moved to s3/codecs/pdf.py).

So it's safe to assume that every T() in s3pdf.py belongs to the OCR module, or in fact, that s3pdf.py is the OCR module (and thus in dire need of a re-implementation removing all the PDF codec parts).

@nursix
Copy link
Member

nursix commented Dec 10, 2019

Or in other words: my recommendation here would be to drop modules/s3/s3pdf.py from "core" entirely, and only include its T's when "ocr" has been selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants