Quantcast
Channel: Active questions tagged copy-paste - TeX - LaTeX Stack Exchange
Viewing all articles
Browse latest Browse all 70

LuaHBTeX: \copy contents behave differently from \box for copy-paste -- by making contents tofu 􀈽 for reordered glyphs or by adding spurious space

$
0
0

Here's an example of a tex \vbox with some devanagari text in it. My code adds contents of \vbox to document twice, first its copy (using \copy), and later itself (using \box). Both copies look similarly typeset in pdf, which is great. But upon copy-pasting contents from the pdf,[1] I find that the contents added by \copy look different from the contents added by \box. The former has some glyphs turned into tofu 􀈽; looks like the only glyph tofued is reordered Devanagari vowel sign ि i.e. U+093F. The latter doesn't tofu anything, but adds extra spurious space after the consonant-vowel pairs where ि is the consonant.

[1] I used Adobe Reader as it seems to be the only pdf reader that fully supports ActualText feature of pdf format that HarfBuzz utilizes.

Code:

% >> lualatex copyhb.tex\documentclass{article}\pagenumbering{gobble}\usepackage{fontspec}\newfontfamily\devanagarifont[Script=Devanagari, Renderer=HarfBuzz, Ligatures=TeX]{Noto Sans Devanagari}\begin{document}\newbox\tempbox\setbox\tempbox=\vbox{{\devanagarifont किंबहुना।परस्परंद्वैधम्उत्पन्नम्।पिताजीज्योतिषीसुकरात\endgraf}}Box copy: \copy\tempboxBox contents: \box\tempbox\end{document}

Screenshot:Screenshot of pdf

Result of copy-paste:

Box copy: 􀈽कंबहुना।परस्परंद्वैधम्उत्पन्नम्।􀈱पताजीज्यो􀈱तषीसुकरात

Box contents: किंबहुना।परस्परंद्वैधम्उत्पन्नम्।पिताजीज्योतिषीसुकरात

As can be seen, the 3 tofus 􀈽 placed in "Box copy" contents are for vowel notation ि of consonant-vowel pairs: कि, पि, and ति. Also to be noticed is that this tofuing process gobbles word space, and makes किंबहुना as one word 􀈽कंबहुना. The "Box contents" has correct glyphs, but has extra spacing after same consonant-vowel pairs कि, पि, and ति. The original box contents luckily doesn't gobble space between words किं and बहुना. So IMO if \box contents get rid of extra space after consonant-vowel pairs & if contents added to pdf by \copy are same as those added by \box, then this copy-paste problem should get resolved.

Side note for Renderer=Node: Tofu and spurious spacing problem does not occur with Node renderer. Though it has its own known problem of reordering ि and consonant of its consonant-vowel pair.


Viewing all articles
Browse latest Browse all 70

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>