[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

devinbalkind · 2016-03-06T22:02:23Z

Currently the OCR-related strings are in 'core' so can't be deselected in the UI but must be removed manually from the xls afterwards.
OCR-related should only be visible if the OCR module (or 'All Modules') are enabled/selected

sajanrav · 2016-11-24T13:49:11Z

This bug seems to be resolved. Can this be closed?

Following steps were done to verify:

Un-commented OCR module section in /module/templates/default/config.py
Navigated to admin/translate/create?opt=1
OCR was found in list of modules for selection.
Re-commented OCR module section in /module/templates/default/config.py
Navigated to admin/translate/create?opt=1
OCR was not found in list of modules for selection.

flavour · 2016-11-24T13:57:56Z

What I think is still needed is to exclude at least some of modules/s3/s3pdf.py

To Test, you need to export a translation file for a new language code (even a dummy one) & you'll probably see some strings which relate to OCR even if the OCR module is disabled

sajanrav · 2016-11-24T14:53:04Z

Two tests were done with OCR module disabled and on each occasion, OCR related strings were found to be present when core was included ( i.e. include core was checked ). When core was not included, OCR strings were not found in the exported file.

Another observation was that the OCR related strings found were independent of the language and module selected. On both instances when core was included, the same OCR strings were found in the exported files.

Test 1:

Navigated to admin/translate/create?opt=1
Selected module - organizations
Selected language code - es
Checked and unchecked "include core" and exported for each instance.

Test 2:

Navigated to admin/translate/create?opt=1
Selected module - hospitals
Selected language code - fr
Checked and unchecked "include core" and exported for each instance.

flavour · 2016-11-24T14:56:37Z

Exactly what I'd expect.
OCR strings in s3pdf currently are included in 'core' but should be included in the 'ocr' module from a translation perspective

hallamoore · 2019-12-10T02:47:49Z

@flavour do you have any ideas on how to best approach this?

I see that the existing logic to handle special files uses the function names to determine which modules the strings belong to, but it seems like S3PDF mixes OCR and non-OCR in the same functions.

My first instinct is to declare which module a string should belong to on a per-string basis, like declare_string_module(T('My OCR string'), 'ocr'). Then we could use something similar to the existing special parsers, but modified to also get the declared module name. Then we would compare with that instead of with the function name.

One concern I have is that is that declare_string_module would be a no-op, useful only for the translation tool and not for the code itself.

So another possible direction is to pull the strings out into some sort of data structure that also includes the module the string should belong to. Then the translation tool can pull both the strings and the module from that, and the application code can pull just the strings. This might look something like

STRINGS = Storage(
    MY_OCR=Storage(value="My OCR string", module="ocr"),
    ...
    )

# application code
T(STRINGS.MY_OCR.value)

# translation tool
for item in STRINGS.values():
    if item.module == desired_module:
        strings.append(item.value)

which seems okay but maybe inconsistent and easy to forget to use STRINGS instead of passing in a direct string?

nursix · 2019-12-10T07:10:26Z

@hallamoore S3PDF is an obsolete module except for the OCR parts (all other PDF codec functionality has long been moved to s3/codecs/pdf.py).

So it's safe to assume that every T() in s3pdf.py belongs to the OCR module, or in fact, that s3pdf.py is the OCR module (and thus in dire need of a re-implementation removing all the PDF codec parts).

nursix · 2019-12-10T07:19:51Z

Or in other words: my recommendation here would be to drop modules/s3/s3pdf.py from "core" entirely, and only include its T's when "ocr" has been selected.

devinbalkind added Minor Admin labels Mar 6, 2016

devinbalkind changed the title ~~[trac 11/03/14] Translation tool should be able to exclude OCR~~ [trac import 11/03/14] Translation tool should be able to exclude OCR Mar 6, 2016

devinbalkind added the Bug label Mar 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

devinbalkind commented Mar 6, 2016

sajanrav commented Nov 24, 2016

flavour commented Nov 24, 2016

sajanrav commented Nov 24, 2016

flavour commented Nov 24, 2016

hallamoore commented Dec 10, 2019 •

edited

Loading

nursix commented Dec 10, 2019

nursix commented Dec 10, 2019

[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

[trac import 11/03/14] Translation tool should be able to exclude OCR #1239

Comments

devinbalkind commented Mar 6, 2016

sajanrav commented Nov 24, 2016

flavour commented Nov 24, 2016

sajanrav commented Nov 24, 2016

flavour commented Nov 24, 2016

hallamoore commented Dec 10, 2019 • edited Loading

nursix commented Dec 10, 2019

nursix commented Dec 10, 2019

hallamoore commented Dec 10, 2019 •

edited

Loading