Importing Data in Python Flashcards Preview

Python > Importing Data in Python > Flashcards

Flashcards in Importing Data in Python Deck (38):
1

How do you read a text file?

How do you print the text file?

filename = 'textfile.txt'

file = open(filename, mode = 'r') # 'r' is to read the file

text = file.read()

file.close()

To print:

print(text)

2

When reading a text file, how do you avoid having to use file.close()?

Use the context manager with

e.g. 

with open('textfile.txt', 'r') as file:

print(file.read())

 

The file will only be open within the with context, so you don't have to worry about closing it.

3

How do you check whether a file is closed?

file = open('filename.txt', mode = 'r')

print(file.closed)

This will print True if the file is closed and False if it's still open.

4

How do you print out the first 3 lines of a text file?

with open('textfile.txt') as file

Use the readline() method

print(file.readline())

print(file.readline())

print(file.readline())

Each time print(file.readline()) executes, it will print the next line. 

5

What is a flat file?

  • Text file containing records
  • That is, table data without structured relationships (in contrast to relational database)
  • Consist of records= row of fields or attributes, each of which contains at most 1 item of information
  • Columns = feature or attribute, e.g. name, age
  • Could have file extension .csv (comma separated values) or .txt (text file)
  • Values in flat files can be separated by characters or sequence of characters (delimiters)

6

How do you import flat files with a)numerical data b)numerical and string data?

a) NumPy

b) pandas

7

Import flat file using NumPy

import numpy as np

filename = 'textfile.txt'

data = np.loadtxt(filename, delimiter = ' , ')

*Note default delimiter is any white space

8

When you import a flat file using NumPy, how do you skip the first row? e.g. when the first row is a header

 

import numpy as np

filename = 'textfile.txt'

data = np.loadtxt(filename, delimiter = ' , ', skiprows = 1)

9

When you import a flat file using NumPy, how do you select specific columns? e.g. select first and third columns

import numpy as np

filename = 'textfile.txt'

data = np.loadtxt(filename, delimiter = ' , ', usecols=[0, 2])

10

When you import a flat file using Numpy, how do you make sure all data is imported as strings?

import numpy as np

filename = 'textfile.txt'

data = np.loadtxt(filename, delimiter = ' , ', dtype = str)

11

what is the symbol for the delimiter in a tab-delimited file?

delimiter = '\t'

12

How do you import mixed datatypes with Numpy? e.g. different columns have strings or floats

Use genfromtxt()

data = np.genfromtxt('filename.csv', delimiter = ',', names = True, dtype = None)

*names tells whether there's a header

*dtype = None will make the function figure out what types each column should be

 

Can also use recfromcsv()

data = np.recfromcsv('filename.csv')

*Note that default delimiter is comma, default names=True, and default dtype = None

13

How do you import a flat file using pandas?

import pandas as pd

filename = 'filename.csv'

data = pd.read_csv(filename)

 

14

How do you check the first 5 rows of a DataFrame including the header?

use data.head()

 

15

How do you convert a DataFrame into a numpy array?

data = pd.read_csv(filename)

data_array = data.values

16

How do you import only 5 rows of a file into a DataFrame?

data = pd.read_csv(filename, nrows=5)

17

What is the pandas version of delimiter?

sep=

e.g. 

data = pd.read_csv(file, sep='\t')

18

The file has comments after the character '#' and missing values as 'Nothing', how do you modify the the following code?

data = pd.read_csv(file)

data = pd.read_csv(file, comment = '#', na_values = 'Nothing')

19

What are pickled files?

  • File type native to Python
  • It serializes (converts object to bytestream) 
  • Used for datatypes for which it isn't obvious how to store them

20

How do you import a file that has already been pickled?

How do you print it?

import pickle

with open('filename.pkl', 'rb') as file: #rb specifies that file is read-only and binary (computer readable--not human readable)

data = pickle.load(file)

print(data)

21

how do you import an excel spreadsheet?

import pandas as pd

file = 'filename.xlsx'

data = pd.ExcelFile(file)

22

How do you print the sheet names of an excel file? Complete the code below

import pandas as pd

file = 'filename.xlsx'

data = pd.ExcelFile(file)

print(data.sheet_names)

23

How do you load a excel sheet as a DataFrame?

import pandas as pd

file = 'filename.xlsx'

data = pd.ExcelFile(file)

Two ways:

df1 = data.parse('sheetname')

df2 = data.parse(sheetindex)

24

How do you import a SAS file as a DataFrame?

import pandas as pd

from sas7bdat import SAS7BDAT

with SAS7BDAT('SASfile.sas7bdat') as file:

df_sas = file.to_data_frame()

25

How do you import a Stata file as a DataFrame?

import pandas as pd

data = pd.read_stata('StataFile.dta')

26

What are HDF5 files?

Hierarchical Data Format version 5

Standard for storing large quantities of numerical data 

Data can be hundreds of gigabytes or terabytes

Can scale up to exabytes

27

How do you import an HDF5 file?

How do you explore the structure of the file (i.e. the categories it contains)?

Say one of the keys you printed was 'meta'. How do you print the keys within 'meta' (to see 'meta's subcategories)?

Say one of the keys within 'meta' was 'Description', how do you print the value of 'Description'?

import h5py

filename = 'filename.hdf5'

data = h5py.File(filename, 'r')

 

To explore structure, use .keys():

for key in data.keys()

print(key)

 

for key in data['meta'],keys():

print(key)

 

print(data['meta']['Description'].value)

28

Import a MATLAB file

import scipy.io

filename = 'filename.mat'

mat = scipy.io.loadmat(filename)

29

When you import a MATLAB file, what type of data it is?

It's a dictionary

keys = MATLAB variable names

values = objects assigned to variables

30

Create a database engine from a SQLite database

 

Print the table names

from sqlalchemy import create_engine

engine = create_engine('sqlite:///databasename.sqlite')

 

table_names = engine.table_names()

print(table_names)

31

What does this basic SQL query do?

SELECT * FROM Table_Name

Returns all columns of all rows of the table

32

You've created a database engine. How do you connect to the engine?

engine = create_engine('sqlite:///databasename.sqlite')

How do you query the database?

Turn the results object into a DataFrame

Close the connection

con = engine.connect()

 

rs = con.execute("SELECT * FROM Orders")

*This creates a result object assigned to rs

 

df = pd.DataFrame(rs.fetchall())

*fetchall fetches all rows

 

con.close()

33

You've created a dataframe from a SQLite database. How do you make sure the DataFrame contains the correct column names?

con = engine.connect()

rs = con.execute("SELECT * FROM Orders")

df = pd. DataFrame(rs.fetchall())

df.columns = rs.keys()

34

Use the context manager with to connect to the engine

engine = create_engine('sqlite:///databasename.sqlite')

*Select only column1 and column 2, and fetch only the first 5 rows

with engine.connect() as con:

rs = con.execute("SELECT column1, column2 from Table)

df = pd.DataFrame(rs.fetchmany(size=5))

 

*Using the context manager avoids having to close the connection

35

When importing a database, how do you select observations from a table that meet a certain condition

with engine.connect() as con:

rs = con.execute("SELECT * from Table WHERE condition”)

36

When importing an SQL database, how do you order records by a certain column?

with engine.connect() as con:

rs = con.execute("SELECT * from Table ORDER BY Column1”)

37

What is the pandas way to query a database?

df = pd.read_sql_query(“SELECT * FROM Table”, engine)

#engine is the engine you want to connect to

 

This replaces:

with engine.connect() as con:

rs = con.execute("SELECT * FROM Table”)

df = pd.DataFrame(rs.fetchall())

df.columns = rs.keys()

38

Join OrderID from the Orders Table with CompanyName from the Customers table (SQLite Database)

Q image thumb

df = pd.read_sql_query(“SELECT OrderID, CompanyName FROM Orders INNER JOIN Customers on Orders.CustomerID = Customers.CustomerID”, engine)