CS50 Basics Flashcards
(32 cards)
What are Bits?
Binary digits that reference 1’s and 0’s
What is a Byte?
A binary string of eight bits (1’s and 0’s)
What is positional notation?
A system of expressing numbers in which the digits are arranged in succession, the position of each digit has a place value, and the number is equal to the sum of the products of each digit by its place value.
Deconstruct 123-
100 10 1 (PLACE VALUE INDEX)
1 2 3 (DIGITS)
100x1 + 10x2 + 1x3 (DIGITS MULTIPLIED BY PLACE VALUE TO FIND PRODUCTS)
100 + 20 + 3 (FIND THE SUM OF THE PRODUCTS)
123 (EXPRESSED NUMBER)
What is a Base-10 System?
There are 10 symbols we use for all of our numbers (0-9). Through positional notation, we reuse these same symbols repeatedly to represent any number that there is, instead of continuously making new symbols for every number.
When you get to the value of 10 (or exponent of 10), you need to add another digit to the left of the current digit. Each new digit has to have a value 10x greater than the digit to its right because there are 10 symbols.
Base-10 System is also called the decimal system and is the system we typically use.
What is the Decimal System?
A number system that uses a notation in which each number is expressed in base 10 by using zero or one of the first 9 integers (0-9) in each place, and each place value is a power of 10.
100000 10000 1000 100 10 1
10^5 10^4 10^3 10^2 10^1 10^0
6 5 4 3 2 1
Sixth digit First Digit
6x10000 5x10000 4x1000 3x100 2x10 1x1
600000 + 500000 + 4000 + 300 + 20 + 1
654321
Six Hundred Fifty Four Thousand Three Hundred and Twenty One
What is the Binary System?
A number system that uses a notation in which each number is expressed in base 2 by using 0 or 1 in each place, and each place value is a power of 2.
When you get to a value of 2 (or an exponent of 2) we need to add another digit to the left of the current digit. Because there are two symbols, each new digit has a value x2 greater than the digit to its right.
Counting in Binary:
0 = 0 1 = 1 10 =2
11 =3 100 =4 101 = 5
110 = 6 111 = 7 1000 = 8
1001 = 9 1010 = 10 1011 = 11
8 4 2 1 (INDEX PLACE VALUE)
2^3 2^2 2^1 2^0
2x2 = 4 2x2=4 2x1=2 1
4x2 = 8
1 1 21 21 421 421 421 421 0 (0) 1 (1) 10 (2) 11 (3) 100 (4) 101 (5) 110 (6) 111 (7) 0x1=0 1x1=1 0x1=0 2x1=2 2+0=2 8421 1000 (8)
By multiplying each bottom number to the number immediately above it, then adding the results of those equations together, you get the number you are counting in binary. In the chart above, we counted from 0-7. Once you get to 8, you would have to add another bit on the right to continue counting, as we did. We added an 8, which is 2^3.
Any number raised to the power of 1 is….
Any number raised to the power of one is still itself.
10^1=10
225^1= 225
9^1=9
Any number raised to the power of 0 is…
Any number raised to the power of zero is 1.
10^0=1
225^0=1
9^0=1
18654234^0=1
What is an Alphanumeric System?
Alphanumeric systems use letters when counting past 0-9, so 10=A. 11=B, 12=C, etc. This is Base 36- symbols 0-9, followed by the letters of the alphabet (capitalized).
Base-62 system uses 0-9, followed by capitalized letters of the alphabet, followed by lower-case letters of the alphabet.
Both Base-36 and Base-62 work the same as our decimal system or the binary system in application using positional notation, but they just reference many more values.
What is ASCII?
ASCII is an encoding strategy that maps basic western characters to numbers between 0 and 127 (128 characters).
String (sequence looking to encode) H E L L O
First, find the ASCII value for each letter. 72 101 108 108 111
Next, convert the ASCII value to binary. 01001000 01100101 01101100 01101100 01101111
Each character becomes 8 bits, or 1 byte of binary data.
What is a string?
A sequence of characters.
What is Unicode?
The Unicode Standard was created for encoding over 100 languages (over 100,000 characters).
Can you define encoding?
The process of transforming numerical values, or code points, to their binary representation (1’s and 0’s).
What is a grapheme, and what are they made up of?
A grapheme is a single unit of a human writing system, like the letter “d”, or a Chinese symbol for example. Words are not graphemes as they can be broken down into letters.
A grapheme is represented by one or more code points, which combine together to form a grapheme. If you have a list of code points, you can figure out how to transform them into their binary representation.
How many bits is an ASCII value encoded down to?
8 bits, aka 1 byte.
What are two common encoding strategies for Unicode?
UTF-32 and UTF-8.
What is UTF-32?
UTF-32 takes each code point value (base 10) and converts it to binary, which takes up 4 bytes (which equates to 32 bits, hence the name UTF-32). ASCII encodes down to 1 byte, while UTF-32 encodes down to 4 bytes, which takes up 4x the space.
String Code Points UTF-32 Encoding
H 72 00 00 00 48
E 101 00 00 00 65
L 108 00 00 00 6C
L 108 00 00 00 6C
O 111 00 00 00 6F
! 33 00 00 00 21
👍🏾 128077 00 01 F4 4D
What is UTF-8?
UTF-8 take each code point value (base 10) and converts it to binary, down to 1-4 bytes depending on the size of the character/value.
String UTF-8
100 64
233 C3 A9
2357 E0 A4 B5
128077 F0 9F 91 8D
Which takes up more space, UTF-32 or UTF-8?
UTF-32 takes up more space. UTF-32 encodes code points down to 4 bytes no matter the character size.
UTF-8 encodes code points down to between 1-4 bytes depending on the size of the character/value.
True or False: UTF-8 and ASCII have the same encoding because their Unicode code point values are the same.
True.
True or False: UTF-32 and ASCII have the same encoding because their Unicode code point values are the same.
False.
True or False: With ASCII, 1 grapheme = 1 byte.
True.
True or False: With Unicode 1 grapheme = 1 byte.
False.
This is because some graphemes map to multiple code points. Some code points map to multiple bytes. If you try to read UTF-8 data as UTF-32, you will get very strange and unreadable output called Mojibake.
What must you know to decode bytes into graphemes?
The original encoding method used.