Computer Science Flashcards Preview

Digital Preservation > Computer Science > Flashcards

Flashcards in Computer Science Deck (57)
Loading flashcards...
What is...


Signals that can be transmitted and can potentially be interpreted by a computer or a human. Data is commonly a stream of binary information that needs further processing to be understood.

What is...


Information may be synonymous with data. It could also be argued that information is data that can be interpreted meaningfully, that is, it is more than just a nonsense stream of bytes. It has tangible meaning to the computer or human being interpreting it.

What is...

Information Theory

The study of the quantification, storage, and communication of information. The application of information theory is crucial to compression techniques used in various hardware and software applications such as the transmission of signals, or JPEG compression.

What is a...


A single binary signal – on or off – represented as a one or a zero.

What is a...


An encoding of 8 binary signals – eight ones or zeros – represented as a single integer which then needs further interpreting by a computer, or user, by looking up an appropriate encoding scheme.

What is a...


A contiguous stream of bytes that requires further interpretation by a user or computer.

What is an...


An integer is a whole number from zero to positive, or negative, infinity, e.g. 0, 1, 2, 4, 8, 16, 32, 64, 128.

What is...

Compression (General)

The use of entropy (redundancy) in a digital object to enable it to be re-encoded in a way such that the resulting bitstream is smaller than the original, but that the original file, or an approximation of the original file can still be presented back to the user. A file that has been compressed must be uncompressed to be rendered or used.

What is...

Compression (Lossless)

A method of encoding data so that the resulting bitstream is smaller than the original, e.g. for transmission or storage, but when the data is uncompressed it is exactly the same as the original byte-for-byte.

What is...

Compression (Lossy)

  • Lossy compression is a term usually applied to files that have been transformed into something, smaller, through the removal of information, but which can be replayed back to the user in a way that is approximately the same.
  • The MP3 algorithm ‘compresses’ audio streams by removing high-frequency signals that, theoretically, human beings cannot hear, transforming the signal, and then re-encoding it.
  • The loss of high-frequency signals equates to a loss of information, and is therefore lossy.
  • Should a user attempt to then recompress a lossy file, the file will be compressed even further resulting in even more information loss – think photocopy of a photocopy.
  • N.B. it is a myth that simply opening a lossy file, e.g. JPG can make it lose even more information. The user must actively choose to resave the file, and even then, choose lossy options when doing so.

What is...


The state of being uncertain, for example, not knowing when a project is expected to be completed by.

What is...

Management of Uncertainty

The use of data, about an uncertain topic, or event, to simulate a range of potential outcomes that can be used to manage projects; risks; and costs; by giving stakeholders an evidence based projection about what may happen.

What is a...


A set of rules or principles that can be used to derive an outcome. For example, if asked to determine which direction is east or west, one might look at the time of day, and the position of the sun, and estimate thusly.

Heuristics are often employed in programming where a formal algorithm does not exist, but which an outcome still needs to be derived, e.g. in reverse engineering a file format from a sample corpus.

What is an...


  • A formal, or codified description of a set of rules for determining an outcome from one or more inputs.
  • Any set of rules combined to generate an outcome can be described as an algorithm, for example, the set of rules for baking a certain type of cake.
  • An algorithm could be created to sort a set of numbers in the most optimal way possible.
  • Algorithms are utilized in many of today’s online services, for example, YouTube, to determine the type of content that may be most interesting to its viewers.


What are...


The components of our computer architectures that make it possible to achieve an outcome or result. For example, to run Microsoft Word, a dependency may be Microsoft Windows. Configured in Microsoft Windows may be a number of software libraries that enable it to interact with your computer’s hardware configuration. As we dissect a file, or piece of software, we begin to understand what other technology it depends on to be able to run.

What is...

Unit Testing

The automated testing of source code by breaking it down into its smallest functional components – units. Testing is done by controlling inputs and testing the output and state of the program at various stages.

What is...

Semantic Versioning

A method of controlling the version numbers of software in a way that both makes it clear to users what changes they can expect, but also, in a way that makes software developers more accountable for the breadth and depth of their changes in any one release.

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

What is an...


A synonym for program, a file that can be run, or executed by a user of a computer system.

What is a...

Symbolic Link/Shortcut

The most common use of either a symbolic link (symlink) or shortcut, on Windows or Linux is to point to a file, or executable at some other location on the hard disk than where the symlink is positioned, e.g. to make it easier to run a given application from a particular location.

What is a...

Programming Language

A set of instructions and rules that can be combined to perform a computational task or set of tasks. A programming language is just a flavour of instructions that all need to be boiled down to something that the processor can understand – usually machine language. Programming languages usually differ in terms of abstraction, meaning that low-level languages work much closer to the hardware (closer to machine code) than high-level languages.

What is a...

Scripting Language

A high level language that is compiled at run-time via a program called an interpreter. By storing a large number of more complex, yet common functions and procedures in an interpreter, the user can be free to call those function using fewer commands in a ‘script'. A user can interact with data for example, without having to worry about underlying memory models of the computer. A scripting language cannot be run in absence of an interpreter so a dependency of running such code, for example, Ruby, or Python, is that their interpreters must be pre-installed on the host machine.

What is...

Data Encoding

A method of structuring information in a way that can be processed further by user, or computer, e.g. Extensible Markup Language, JavaScript Object Notation (JSON) and Comma Separated Values (CSV). File formats such as Microsoft Excel, Microsoft Word, or JPEG are also data encodings, albeit quite a bit more complex.

What is...

Character Encoding

The mapping of binary numbers to a lexical or numerical character. Numerous character encodings exist including ASCII, and EBCDIC. The widest range of characters can be represented using a standards called Unicode.

What is...


American Standard Code for Information Interchange is the mapping of computer control signals, and Latin alphanumeric characters to the 255 numbers that can be represented using a single byte. ASCII is heavily biased toward western writing systems and as such Unicode was created to make it easier to work with other writing systems in a computing environment.

What is...


Unicode (maintained by the Unicode Consortium) is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The Unicode Standard contains a repertoire of more than 128,000 characters covering 135 modern and historic scripts. Unicode can be implemented by different character encodings including UTF-8 and UTF-16.

What is...


UTF-8 is an encoding for Unicode and uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters.

What is...


UTF-16 is an encoding for Unicode and uses one 16-bit unit for the characters that were representable in a prior character encoding called UCS-2 and two 16-bit units (4 × 8 bits) to handle each of the additional characters in the Unicode standard.

What is...


EBCDIC is a legacy character encoding used in the past on IBM computers in the 1960s. EBCDIC could be used internationally through the use of code pages. Code pages by any other name were simply EBCDIC-like, that is, other character-encodings. One would need to look up Japanese, code page 930, CCSID 930, to understand how to decode an EBCDIC message encoded using this variant.

What is...

Unit Testing

The automated testing of source code by breaking it down into its smallest functional components – units. Testing is done by controlling inputs and testing the output and state of the program at various stages.

What is...

Version Control

A mechanism for the storage of text based digital files and all subsequent changes made to them – literally – their versions. Version control systems such as Git, Subversion, and Mercurial, are key to software development workflows. Version control enables users to create ‘branches’ on which to work, and create ‘releases’ to aid in the the maintenance of software released to the public.