Lecture 4 Flashcards
(14 cards)
Lexical structures
Lexical structures, which concerns the forms of its individual symbols.
e.g := (keywords, identifiers)
Syntax
Defines structure of components of language eg structures of programs, statements, expressions, terms etc.
Semantics
Defines the meanings and usage of structures and requirements that cannot be described by grammar.
Checking type consistency, applying + operator to two numerical values is to evaluate the arithmetic sum.
Semantic Description problem
SEMANTICS MORE DIFFIICULT TO DEFINE than syntax and not well accepted methods for semantic definition.
Languages Analysis
Language implementation must analyse source code - its lexical and syntax structure.
Language Analysis two parts.
Low level is lexical analyzer, mathematically finite automation based on regular grammar.
High level is syntax analyser, or parser whcih is mathematically a push down automation based context free grammar.
lexical analyser (a.k.a. scanner
Reads source code character-by-character
Groups characters into tokens
Removes whitespace and comments
Sends tokens to the parser (next phase)
substrings of source program as lexemes.
sum is a lexeme; its token may be IDENT.
Lexical Analysis ( Alphabet )
An alphabet Σ is a finite non-empty set (of symbols).
the set Σab = {a, b} is an alphabet comprising symbols a and b;
– the set Σaz = {a, …, z} is the alphabet of lowercase English letters;
– the set Σasc of all ASCII characters is an alphabet.
Lexical Analysis (String)
A string or word over an alphabet Σ is a finite concatenation (or juxtaposition) of symbols from
Σ. For example,
- abba, aaa and baaaa are strings over Σab;
– hello, abacab, and baaaa are strings over Σaz;
– h$(e′lo, PjM#;, and baaaa are strings over Σasc. - The length of a string w (that is, the number of
symbols it has) is denoted |w|. E.g., |abba| = 4. - The empty or null string is denoted ε, and so |ε|
= 0.
Set of all strings…
set of all strings over Σ is denoted Σ∗
E.g., Σ ∗ ab = {ε, a, b, aa, ab, ba, bb, aab,…}
For any symbol or string x, x n denotes the string of the concatenation of n copies of x.
E.g.
a^4 = aaaa
(ab)^4 = abababab
Regular Expressions
specifiy parterns of string or symbols.
r matches or is matched by set of strings if the patterns of the strings are specified by the Reg Ex.
Set of strings matched by RE r is denoted by L(r) ⊆ Σ∗ ( if strings belong to alphbet Σ) is called by language determined by r
Regular Expression Definiton
∅ (empty set symbol) is regular expression. not useful.
ε (the empty string symbol) is a regular
expression. This matches just the empty string ε.
∅ and ε
empty string ε should not be confused with the empty language ∅
∅ is a formal language (i.e., a set of strings) that contains no strings, not even the empty string. ε
The empty string is a string that has the
properties:
ε +s =s +ε =s i.e., the empty string is the identity element of the concatenation operation
ε |=0. i.e., its length is zero.