EN FR
EN FR


Section: New Results

Computational Humanities and ancient texts

Participants : Daniel Stökl Ben Ezra, Marc Bui.

In collaboration with Jérémie Bosom and Dogu Kaan Eraslan (PhD students (co-)supervised by Marc Bui at EPHE).

Ancient languages of interest: ancient Egyptian (hieroglyphics, hieratic, demotic) , ancient Greek, Aramaic, Elamite, biblical Hebrew, classical Arabic, Hán Nôm (ancient vietnamese), old Persian

Computational approaches in humanities makes it possible to address the problems encountered by philologists such as reading, analyze and archiving old texts in a systematic way. We based our research on algorithms, their implementations, and human expertise on ancient languages to automate these difficult tasks.

The research scope of 2017 was the work around historical document or manuscripts available in images. Our work program (or work in progress) includes:

  • Document layout analysis for ancient manuscripts using computer vision techniques and machine learning

  • Script identification taking into account the environment where the trace is located: image, artefact, noise due to deterioration of the medium of writing. By stacking auto-encoding neural networks in order our approach provides an alternative representation of the input data received.

  • Text recognition (handwritten text recognition) by enhancing it with LSTM

  • Palaeographic classification of manuscripts and ancient inscriptions. Classification of historical document images can be addressed through script identification, in that case, our proposed method is based on the use of Convolutional Auto-Encoders (CAE) stacked in several layers in order to obtain fine-grained features and automatically learn representations of the line of writing or drawing of script

  • Cross language Information Retrieval and Information Retrieval applied to ancient languages.