- Empowerment
- In Soviet Russia, your data is in charge of you
- With Python, you are in charge of your data
- Be comfortable asking questions about how to access your data
- Give a broad survey of what is possible and enough concepts so you can dig into whatever might be applicable to your environment
- Think of software development in three stages: get it working, get it working right, get it working quickly
- Here, we focus on making it possible to get it working by providing samples of different methods of data access
- Learn concepts that enable asking more informed questions when you undertake a data project
- Awareness that A beginning is a very delicate time - Dune, Frank Herbert
- Two phases of a software process to consider: development and operations
- How you think about accessing your data influences the rest of your project
- Be comfortable as you develop, but don't forget you might need to live with what you wrote for a while
- Where data is stored, organized, and persisted... what does this mean?
- Method for abstracting storage details from your application
- lots of different concerns/use cases result in lots of different ways to read and write data
- Does this matter? yes and no... if you're reading data, there are some things you might care about:
- Where it comes from
- What its shape is
- Performance/storage
- Files (stored in formats - e.g. CSV, XML, JSON, binary, etc. - see below)
- SQL databases
- No-SQL databases
- API
- API == Application Programming Interface
- A way for a program (application) to talk to another program programmatically
- Memory based (e.g programs importing logic/capabilities from other programs)
- importing libraries
- talking to the operating system
- Network/protocols (difference between protocol and API... we'll pretend there are none for the sake of this
discussion, but I'm happy to discuss more offline)
- standard protocols (e.g. HTTP, ODBC, FTP, etc.)
- Custom formats accessed via networking protocols
- typically data you might get over the Internet for a certain dataset or website, tailored for accessing that content
- this brings us to data formats (which applies to files as well)
- Web pages (plug for web scraping)
- Assumes you have access to your pile of data
- Formats are how you turn that pile of bytes into something intelligible
- Based on data format, you may want some form of library to consume it
pip install pandaspip install numpypip install sqlite3pip install matplotlibpip install seabornpip install requests
