You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing this corpus.
Creating GT is not an easy job. I took a random look at the page files from the pageXmlTranskribusCorrected folders.
I noticed the following problems:
the entire text of a line was encoded at the Word level, as a single Word.
Solution: Convert Word ind line
often the drop-capital are annotated as Graphic
many separators can be seen as so called fake separators and should be corrected
a wish, Transkribus does not create valid page instances, of course such annotations as:
<TranskribusMetadata docId="188203" .../> can be commented out.
but:
open type="" attributes
open id="" Attributes should be corrected to.
the Alto format files contain very deeply structured data, unfortunately when converting to Page-XML format this information was not included.
I will be very welcome to help you to improve the data within my possibilities.
Thanks again for everything
The text was updated successfully, but these errors were encountered:
Thank you for sharing this corpus.
Creating GT is not an easy job. I took a random look at the page files from the pageXmlTranskribusCorrected folders.
I noticed the following problems:
Solution: Convert Word ind line
<TranskribusMetadata docId="188203" .../> can be commented out.
but:
I will be very welcome to help you to improve the data within my possibilities.
Thanks again for everything
The text was updated successfully, but these errors were encountered: