Character sets Flashcards

1
Q

What is ASCII

A

It’s an agreed upon mapping between the english alphabet and western characters and numbers from 0 to 127.

The numbers can be converted to 8 bit representations of the letters of the alphabet as well as western characters such as ! OR & etc. Each character is a combination of 8 0s and 1s (bits). Remember 8 bits is a byte, meaning each character is 1 byte.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unicode

A

It’s meant to encompass 1000s of characters from all the different languages, and also characters such as emojis. Also accents on letters.

These characters are called graphemes

Graphemes are represented by code points. Each grapheme can be one or more code points. Code points have a numeric value and a string name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a grapheme

A

A single unit of a human writing system, like an emoji, or a letter, or chinese character.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the problem with many programming language string manipulation libraries - including Javascript

A

Let’s use string.length. The language measures the number of bytes, which is fine for ASCII, that is western languages and characters, but not for unicode. In ASCII one character is a byte, but in Unicode a grapheme can be 1 - 4 bytes per code point and it can have multiple code points.

These are called unicode unaware functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly