08 - External data representation Flashcards

1
Q

Explain the difference between internal and transferred data representation.

A
  • Internal data is represented as data structures, arrays, or objects in programming languages like C, Java, or Python.
  • Transferred data is represented as byte sequences for transmission, that must be flatten.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the key aspects of data transmission format?

A
  • Agreement on a common format for the transmitted data.
  • Conversion of data by the transmitter and receiver.
  • Data is transmitted in the sender’s format and converted by the receiver.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the process of data serialization and deserialization in the context of data transfer and its requirements?

A
  • Serialization involves converting data structures into a format that can be stored or transmitted.
  • Deserialization is the reverse process, where the received or stored data is converted back into usable data structures.
  • Requires correciton, efficiency, interoperability and ease to use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discuss the limitations of CORBA, Java Object Serialization, DCOM COM+, JSON, Plain Text, and XML in data representation.

A
  • CORBA: Over-designed and heavyweight.
  • Java Object Serialization: Tailored to the Java environment.
  • DCOM COM+: Tailored to Windows environment.
  • JSON, Plain Text, XML: Lack * protocol description, high parsing overhead, and programmer maintenance required.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the concept of endianness in data representation.

A
  • Endianness refers to the order of bytes in a multi-byte number.
  • Big-endian stores higher-order bytes at lower memory addresses.
  • Little-endian stores lower-order bytes at lower memory addresses.
  • In networking, addresses are always big-endian.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What challenges arise in data representation due to heterogeneity in communication?

A
  • Different data structures like alignment, data size, and pointers vary across machines.
  • Alignment on word boundaries can change the size of a structure between machines.
  • Pointers, while convenient, have no meaning outside their defining machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How should data be represented for efficient communication?

A
  • Decide on the data types to support: base types, flat types, complex types.
  • Determine encoding methods for data transmission and decoding methods for data reception.
  • Use self-describing (tags) or implicit descriptions (end knowledge) for data encoding.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the role of stub generation in data representation?

A
  • Systems generate stub code from an independent specification (IDL).
  • IDL (Interface Description Language) describes an interface in a language-neutral way.
  • Stub generation separates logical data description from dispatching, marshaling/unmarshaling, and data wire format.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the key features of XDR’s approach to standardizing data representations.

A
  • XDR defines a single byte order (Big Endian) and floating-point representation (IEEE).
  • It decouples programs creating/sending portable data from those using/receiving it.
  • New machines or languages don’t affect existing programs; they just need to convert between standard and local representations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the canonical data types defined by XDR?

A
  • Basic data: Integer, Unsigned Integer, (Hyper) Integer, Floating-Point, Void.
  • Variable size data: Fixed-Length Opaque Data, Variable-Length Opaque Data, String.
  • Composed data: Fixed-Length Array, Variable-Length Array, Structure, Discriminated Union.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the function of the XDR Library in data representation?

A
  • XDR library solves data portability problems by transforming data to/from a canonical format.
  • It allows reading and writing arbitrary C constructs in a consistent and well-documented manner.
  • The library consists of functions for encoding/decoding data, based on defined data structures, known as filters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the types of data filters provided by XDR.

A
  • Basic Data Filters: Handle types like char, short, int, long, float, double, and void.
  • Variable Size Data Filters: For handling fixed-length opaque data, variable-length opaque data, and strings.
  • Composed Data Filters: Used for fixed-length arrays, variable-length arrays, structures, and discriminated unions.
  • Pointer Filters: Manage data structures involving pointers, with functions for transformation and memory allocation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is XDR used in data transfer?

A
  • XDR data structures don’t contain metadata, so type determination from binary data is impossible.
  • Client and server must agree on the format of transferred data.
  • XDR files from RPC (Remote Procedure Call) have compatible encoder/decoder.
  • The same code is used for both encoding and decoding data, and functions must be called in the same order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the main features of Google Protocol Buffers?

A
  • Defined by Google and widely used internally and externally.
  • Supports common types and service definitions.
  • Natively generates C++, Java, and Python code, with over 20 other languages supported by third parties.
  • Efficient binary encoding and readable text encoding, significantly smaller and faster to process than XML.
  • It’s not a full RPC system but handles marshalling, with many third-party RPC implementations available.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What properties characterize Google Protocol Buffers?

A
  • Efficient binary serialization.
  • Support for protocol evolution, allowing addition of new parameters.
  • The order of specified parameters is not important, and non-essential parameters can be skipped.
  • Supports somewhat complex structures and provides compile-time error checking.
  • Used for RPC calls, serializing data to non-relational databases, and as a long-term storage format due to its backward compatibility.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the primary goal of Protocol Buffers?

A
  • Provide a language- and platform-neutral way to specify and serialize data.
  • Ensure the serialization process is efficient, extensible, and simple to use.
  • Allow serialized data to be stored or transmitted over the network.
17
Q

Describe the features of the Protocol Buffer Language.

A

Messages contain uniquely numbered fields.
Fields are represented by field-type, data-type, field-name, encoding-value, and optional default value.
* Supports primitive, enumerated, and nested message data-types.
* Enables structuring data into a hierarchy.

18
Q

What are the different field types in Protocol Buffers?

A
  • Required fields: Must be present exactly once in a well-formed message.
  • Optional fields: Can appear zero or one time in a well-formed message.
  • Repeated fields: Can appear any number of times (including zero) in a well-formed message.
19
Q

What is the role of a .proto file in Protocol Buffers?

A
  • The .proto file contains the specification of the message.
  • It is compiled by the protoc tool, which generates code allowing programmers to manipulate the message type.
20
Q

Explain the function of protoc-c in Protocol Buffers.

A
  • It defines the .proto file following the language syntax.
message M1{
    required string str = 1;
    optional int32 i = 2;
}
  • Generates .h and .c files, including structures and functions for manipulating messages.
  • Provides functions like m1_init, m1_get_packed_size, m1_pack, m1_unpack, and m1_free_unpacked.
21
Q

What are the field rules in Protoc-c?

A
  • Required fields must appear exactly once in a well-formed message.
  • Optional fields may appear zero or one time, with a boolean flag indicating their presence.
  • Repeated fields can occur multiple times in a message, and their count is tracked.
22
Q

How are optional fields handled in Protoc-c?

A
  • Optional fields can have default values defined in the .proto file.
  • If no value is assigned, the field takes a type-specific default value.
  • The C structure includes a boolean to indicate if an optional field is transmitted.
23
Q

Explain the handling of repeated fields in Protoc-c.

A
  • Repeated fields represent arrays or lists in a message.
  • The C structure includes a count of the number of elements and a pointer to the array.
  • Memory management for repeated fields must be handled manually in C.
24
Q

Describe the use of enumerated fields in Protoc-c.

A
  • Enumerated fields allow a field to have one of a predefined list of values.
  • Enums are defined in the .proto file and translated into C enums.
  • Fields can be defined to use these enumerated types, providing a set of allowed values.
25
Q

How are message types used in Protoc-c?

A
  • Message types are created for each specific use case, like RPC calls or data storage.
  • Each message type has its own structure and set of fields.
  • Fields can be required, optional, or repeated, and are manipulated using generated C code.