BIOINFORMATICS Flashcards by Kyla Salgado

Concerned with knowledge and the flow of knowledge in biological systems using computational methods in genetics and genomics

BIOINFORMATICS

How well did you know this?

Not at all

Perfectly

study of genes

Genomics

How well did you know this?

Not at all

Perfectly

study of proteins

Proteomics

How well did you know this?

Not at all

Perfectly

A collection of related information which are:
○ Structured
○ Searchable → index
○ Updated periodically
○ Cross-referenced → hyperlinks

DATABASES

How well did you know this?

Not at all

Perfectly

○ These are programs that keep the database
working behind the scenes
○ Computerized data-keeping system

Tier 1: Database management system

How well did you know this?

Not at all

Perfectly

○ Facilitates communications between applications or databases
○ Extracts information from either local or remote databases

Tier 2: Middleware layer

How well did you know this?

Not at all

Perfectly

○ Enables users to access the database from anywhere without the need for downloading or installing any code
○ The one that we see – the graphic user interface.

Tier 3: Web interface

How well did you know this?

Not at all

Perfectly

CLASSIFICATION OF DATABASES
1. Scope of data coverage
give me the 2

● Comprehensive
● Specialized

How well did you know this?

Not at all

Perfectly

CLASSIFICATION OF DATABASES
2. Methods of biocuration
give me the 2

● Expert-curated (RefSeq)
● Community-curated (GenWiki)

How well did you know this?

Not at all

Perfectly

CLASSIFICATION OF DATABASES
3. Level of biocuration
give me the 3

● Primary
● Secondary
● Composite

How well did you know this?

Not at all

Perfectly

CLASSIFICATION OF DATABASES
4. Type of data managed
give me the 3

● DNA/RNA/Protein
● Disease
● Nomenclature/Literature

How well did you know this?

Not at all

Perfectly

● Information on sequence or structure alone
● Experimentally derived data submitted directly
● Archival in nature

PRIMARY DATABASE

How well did you know this?

Not at all

Perfectly

● A variety of primary databases, that allow for an ‘all-in-one’ search with multiple resources

COMPOSITE DATABASE

How well did you know this?

Not at all

Perfectly

● Derived from primary databases
● Based on analysis of the data from the primary
database

SECONDARY DATABASE

How well did you know this?

Not at all

Perfectly

“Google” of bioinformatics

COMPOSITE DATABASE

How well did you know this?

Not at all

Perfectly

● Primarily used is PubMed
● Contains entries for >11 million abstracts of scientific publications

LITERATURE DATABASE

● GenBank, EMBL-bank, and DDBJ exchange data to ensure comprehensive worldwide coverage;
accession numbers are managed consistently between the three centers

NUCLEIC ACID DATABASE

● Contains publicly available DNA sequences from >100,000 organisms
● Also contains derived protein sequences, and annotations describing biological, structural, and other relevant features

GENBANK

● Contains nucleotide sequences from all public sources.
● Accessible through Sequence Retrieval System (SRS), which allows keyword searching.
● Sequence similarity search tools: BLAST, Blitz, Fasta

EMBL

● Contains curated data on everything that has to do
with proteins, motifs, and interactions with other
substances.

PROTEIN DATABASE

● >18,000 macromolecular structures on proteins,
peptides, viruses, protein/NA complexes, nucleic acids, and carbohydrates.
● Determined by X-ray diffraction and NMR.

PROTEIN DATA BANK

○ Curated database focusing on high level of annotation (sequence, function, structure, post-translational modifications, variants) of proteins.
○ Non-redundant and reviewed.

● SWISS-PROT

○ Computer-annotated supplement to SWISS-PROT.
○ Redundant and unreviewed.

TrEMBL

● Secondary database on protein families, domains and functional sites that contain manually curated
information.
● Provides tools for analysis of protein sequences and motifs.

PROSITE

● Protein family fingerprints (groups/motifs). ● Detects distant relatives of large and highly divergen protein superfamilies by looking at conserved regions in alignments.

PRINTS

● Protein families and domains represented as multiple sequence alignments.

PFAM

PFAM ___ : Automatically Generated, LQ Entries

Pfam-B

PFAM ___ : Manually Curated, HQ Entries

Pfam-A

● Collection of ungapped multiple alignments of segments of related protein sequences (blocks) ● For: protein family classification, protein structure prediction

BLOCKS

● Contain data regarding structures of nucleic acids and proteins.

STRUCTURAL DATABASES

Easy to use website to align FASTA files.

MULT-ALN

Translates DNA sequences or RNA sequences into their protein sequences.

EXPASY

Provides a prediction of the protein structure.

I-TASSER