Consider this fairly minimal document, which AFAIK is the recommended way of typesetting Devanagari-script Sanskrit-language text:
\documentclass{article}\usepackage{fontspec}\usepackage{polyglossia}\setmainlanguage{sanskrit}\newfontfamily\devanagarifont[Script=Devanagari]{Chandas}\begin{document}किंबहुना।परस्परंद्वैधम्उत्पन्नम्।\end{document}
When I typeset this, even when the output is visually fine, trying to copy the text from the PDF gives incorrect results each time. I've tried with both xelatex
and lualatex
, with four fonts all generously available online for free: Chandas, Noto Sans Devanagari, Noto Serif Devanagari, Adishila:
Correct text:
- किंबहुना।परस्परंद्वैधम्उत्पन्नम्।
xelatex
:- कंबहुना।परɕपरंजैधम्उɊपਯम्। (Chandas)
- ɫकʌबहुना।परȺरंद्वैधम्उत्पȡम्। (Noto Sans Devanagari)
- ȫकबहुना।परस्परंद्वैधम्उत्पन्नम्। (Noto Serif Devanagari)
- िकंबहुना।परस्परंद्वैधम्उत्पन्नम्। (Adishila)
lualatex
:- िकंबहुना।पर�परंद्वैधम्उ�पन्नम्। (Chandas)
- िकंबहुना।परस्परंद्वैधम्उत्पन्नम्। (Noto Sans Devanagari — also, the output is broken)
- िकंबहुना।परस्परंद्वैधम्उत्पन्नम्। (Noto Serif Devanagari — also, the output is broken)
- िकंबzना।परस्परंद्वैधम्उत्पन्नम्। (Adishila)
So none of these are correct, though for some combinations, only the first syllable was problematic. (It doesn't matter that it's the first syllable; किं anywhere has the same issue.)
(Aside: This was using TeX Live 2020 so lualatex
uses LuaHBTeX… yet the output is incorrect compared to xelatex
for two of the fonts.)
Is there a way of getting the correct text to be copied?
I also tried wrapping every word using the accsupp
package, like \BeginAccSupp{ActualText=किं}किं\EndAccSupp{}
and so on, but that results in complete gibberish.