Re: Problem defining \begin{CJK} . . . \end{CJK} in a macro



ipsi/Andrew wrote:

I define the following Macro:

\newcommand{\HelloGrandma}{%
\begin{CJK}{UTF8}{song}%
ä½ å¥½å¥¶å¥¶%
\end{CJK}%
}%
....
I'm utterly confused by this.

Now, I eventually realised that I could put the \begin{CJK} immediately
after the \begin{document}, and similar for \end, and everything works,
but I'd like to know *why* this happens.

The problem is about category-codes.

If you intend to make intensive use of the CJK-package, you need
to know about category-codes. So I'll try to elaborate a bit:

When (La)TeX reads a line of text from a file, one of the first
things that happen is, that everything is transformed into so
called "tokens".
A token can be e.g., a control-sequence, or a character/category-
code-pair.
The process of transforming stuff into tokens while reading input
is called "tokenizing".
When (La)TeX reads a character from the input-file and forms a
token from it, the resulting token will be a character of a
distinct category-code. There are several category-codes, e.g.,
catcode 11 means that the character is an ordinary text-letter.
Catcode 13 means that the character is to be treated like a
control-sequence.

You can e.g. write:

\catcode`\A=13\relax -> now "A" is to be treated like a macro
when it is read from input.
\defA{test} -> as A is to be treated like a macro,
you can use it as macro-name-argument
when defining...
A -> this yields the phrase "test"
\catcode`\A=11\relax -> now "A" is to be treated like a letter
when it is read from input.
A -> this yields the letter "A"
\catcode`\A=13\relax -> now "A" is to be treated like a macro
when it is read from input.
A -> this yields the phrase "test" as the
macro/catcode13-A is already defined.
\catcode`\A=11\relax -> now "A" is to be treated like a letter
when it is read from input.
A -> this yields the letter "A"
...

Be aware that I always added the phrase "when it is read from
input." I did so because reading input and macro-defining/
expansion are different concepts.

If I write e.g.,

\catcode`\A=11\relax
\def\macrowitha{A}
\catcode`\A=13\relax
\defA{test}
\macrowitha

,the last line will yield just the letter "A". Defining the
macro "\macrowitha" took place when A was of letter-catcode, so
the definition of \macrowitha contains a letter-A-token and it
will always expand to a letter-A-token, no matter how future
"A"-characters that are read from input-file will get tokenized.

The CJK-package also takes advantage of changing category-codes:

Within the CJK-environment category-codes are changed so that
characters while reading them from input-file are treated like
macros from which these nice Chinese/Japanese/Korean/Whatsoever
letters get created.
That means: "\begin{CJK}..." is just a directive for changing
how category-codes get assigned to characters when
future input is read/tokenized from text-file.
"\end{CJK}" is there for undoing the change/for
restoring "the old way" of assigning catcodes to
characters when reading input-files.

The clue hereby is: Putting this directives into a macro-
definition does not "execute" them. These directives do not get
executed when defining takes place but when the defined macro
gets expanded.

So writing something like

1 \newcommand{\HelloGrandma}{%
2 \begin{CJK}{UTF8}{song}%
3 ä½ å¥½å¥¶å¥¶%
4 \end{CJK}%
5 }%

yields:

1 - Define a new macro \HelloGrandma to do the following:
2 - Start a CJK-environment
3 - Write some tokens "ä½ å¥½å¥¶å¥¶%"
4 - End the CJK-environment.

The crucial point is: The tokens of line 3 get tokenized
according to the rules/catcode-settings which are valid when
_defining_ takes place/when the definition-text is read from the
input-file. If defining does itself not take place within a
CJK-environment, the likelihood for unexpected results is
very high.

Expanding \HelloGrandma yields:

1. Spit out the tokens: \begin{CJK}{UTF8}{song}%
2. Spit out the tokens²: ä½ å¥½å¥¶å¥¶%
3. Spit out the tokens: \end{CJK}%

1. means: Change directives for tokenizing characters
that get read from input-file.
2. means: Some tokens that do not come from consecutive
input-file-reading but do come from expanding
a macro the definition of which was tokenized
long ago not according to CJK-rules.
3. means: Reset directives for tokenizing characters
that get read from input-file.

²Sorry, but I have no unicode available on my old machine
right now. Have to use Unired. Pasting to latin1-encoded
stuff yields this.

So - if you want the correct result - you have to take care
that tokenizing the definition/the tokens of line 2 also
takes place according to the CJK-directives.

In order to get that, it is sufficient to let defining
take place within a CJK-environment also.

Another point is that you might wish to make \HelloGrandma
robust 8-) . Usually stuff that gets written to file (e.g., toc)
is evaluated fully before writing actually takes place.
With CJK a problem arises:
Stuff from within CJK gets fully evaluated and characters get
written to toc-file. In the next run toc-file is read and
characters get tokenized with wrong catcodes while reading that
file and thus do in the table of contents not yield Chinese/
Japanese/Korean/Whatsoever- stuff any more. You can prevent
this full-evaluation by LaTeX's \protect- and robustness-
mechanisms. If you declare a robust command, it will not be
fully evaluated any more when writing to toc-file etc takes
place but it will be written to file "verbatimly".

If you define e.g. \newcommand\test{bla bla} and write
\section{\test}, you will find in the toc-file the phrase/
the characters "bla bla".
If defining \test and calling \section took place within a
CJK-environment, the section-head in the text might contain
Chinese while the table of contents which is not read within
a CJK-environment would contain some unexpected/weird
character-conglomerate.

If you write \DeclareRobustCommand\test{bla bla} and write
\section{\test}, you will find in the toc-file the token
"\test".

Ulrich


\documentclass{article}
\usepackage{CJK}

\newcommand*\globalrobust[1]{%
{%
\escapechar=-1\relax
\expandafter\global
\expandafter\let\csname\string#1 \expandafter
\endcsname\csname\string#1 \endcsname
\global\let#1#1%
}%
}%

\begin{CJK}{UTF8}{song}%
\DeclareRobustCommand*\HelloGrandma{%
\begin{CJK}{UTF8}{song}%
ä½ å¥½å¥¶å¥¶%
\end{CJK}%
}%
\globalrobust\HelloGrandma
\end{CJK}%


\begin{document}
\tableofcontents

\section{Dear Grandma \HelloGrandma}

Dear Grandma \HelloGrandma

\end{document}


.



Relevant Pages

  • Re: Iron Heroes: the good things
    ... I really like the idea of customizing human characters. ... >> Essentially, you earn tokens for doing certain things, mostly in combat ... >> Skill Groups, which essentially allow the character to spend 1 skill ... The greatest thing about it is the Fury tokens. ...
    (rec.games.frp.dnd)
  • Re: Splitting a string with Regex and keep the separator
    ... I want also to thank you for the regex explanation. ... a key/val pair. ... If this group captures multiple tokens they're added to the ... is made up of one or more alphanumeric characters. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: A
    ... the characters 'fn_name' become the ... They will not form two separate tokens of '', ... parentheses are not special characters (that is, ...
    (comp.unix.shell)
  • Re: Splitting a string with Regex and keep the separator
    ... The regex is quite big already. ... a key/val pair. ... If this group captures multiple tokens they're added to the group's Captures collection in the order in which they're found. ... A token is made up of one or more alphanumeric characters. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: KT boundry event
    ... No, nothing else is possible, since characters change at different ... the last one to be evolved defines the clade, ... branch in two, and shows that gizmos arose first, then freens. ... And the point is that having more than one defining character doesn't ...
    (talk.origins)