سؤال

As per my code, I assume each greek character is stored in 2bytes. sizeof returns the size of each character as 4 (i.e the sizeof int)

How does strlen return 16 ? [Making me think each character occupies 2 bytes] (Shouldn't it be 4*8 = 32 ? Since it counts the number of bytes.)

Also, how does printf("%c",bigString[i]); print each character properly? Shouldn't it read 1 byte (a char) and then display because of %c, why is the greek character not split in this case.

strcpy(bigString,"ειδικούς");//greek
sLen = strlen(bigString);
printf("Size is %d\n ",sizeof('ε')); //printing for each character similarly
printf("%s is of length %d\n",bigString,sLen);
int k1 = 0 ,k2 = sLen - 2;

for(i=0;i<sLen;i++)
printf("%c",bigString[i]);

Output:

Size is 4
 ειδικούς is of length 16
ειδικούς
هل كانت مفيدة؟

المحلول

  1. Character literals in C have type int, so sizeof('ε') is the same as sizeof(int). You're playing with fire in this statement, a bit. 'ε' will be a multicharacter literal, which isn't standard, and might come back to bite you. Be careful with using extensions like this one. Clang, for example, won't accept this program with that literal in it. GCC gives a warning, but will still compile it.

  2. strlen returns 16, since that's the number of bytes in your string before the null-terminator. Your greek characters are all 16 bits long in UTF-8, so your string looks something like:

    c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 0
    

    in memory, where c0c0, for example, is the two bytes of the first character. There is a single null-termination byte in your string.

  3. The printf appears to work because your terminal is UTF-8 aware. You are printing each byte separately, but the terminal is interpreting the first two prints as a single character, and so on. If you change that printf call to:

    printf("%d: %02x\n", i, (unsigned char)bigString[i]);
    

    You'll see the byte-by-byte behaviour you're expecting.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top