I realized that I was a bit ambitious about making an blog this intricate so that everyone could understand without some additional background. I had grand visions of artfully explaining how all the pieces fit together so the average Junior High/Middle School student would understand. I had a mental list of my favorite stories about databases and programming. I would bring a smile to your face as I illuminated the dark corners of information systems. Well some of those things may still occur, but for the immediate I am just going to release some basic ideas and then get into the meat of it. I am sorry but I have given up my egalitarian dream of bringing Computer Science into the home.
What I really want to do is get back to my root desire which is to build a DBMS and write about my experience. So lets begin with our basic concepts and we can go from there.
The most basic question is probably What does a database do? In the most general terms a database stores and retrieves data so that the data can be used to create information. I am defining information as a meaningful collection of data. There is also the term intelligence. This doesn't refer to how smart something or someone is in this case. In this context it means information over time.
What is buried in that very basic statement about storing and retrieving data is a whole collection of tasks though. It must receive the data, structurally convert it into something a computer can use, receives queries for the data, and provide the requested data. Even that abstraction hides a great number of tasks. It also omits some important things like performance and security. I think security is obvious. We don't want to share our inventory with everyone. If our database held our CD collection, we might not want people to know we own a Hall & Oates CD, for obvious reasons. Retrieving data is one part of security, storing is another. Would we want an employee stealing inventory by just changing the count in the database? Would we want someone to insert an entry for Hall & Oates into our CD collection? Why would a database be concerned with performance when computers are so fast? The main reason is that the faster you make a database the more information it can process at once.
What are the parts of a database system? So a complete list of things I want my DBMS to do, notice I am hedging this by saying I want, is to store data, receive requests for data, verify the request is valid in structure and security, optimize the request, return the results. It must also receive data, verify the data is valid within the DBMS, and then store it.
What are some other parts that are sometimes associated with a database system? There are other things that are sometimes shipped with a DBMS that I will ignore for now. I may get another itch when this is done, but for now I am mentioning them for completeness only. Microsoft SQL Server has the most things bundled in that I know of. It has a tool for running and scheduling jobs. Jobs is a term referring to predefined processes that are started automatically when some event happens, like a particular time or the completion of another job. SQL Server also ships with an ETL tool. ETL stands for Extract, Translate, and Load. In general extract the data from someplace, translate it into something else, and then load it into the database. This aren't absolutely required for a DBMS so we ignore them.
First lets talk about ACID. It is an acronym for Atomicity, Consistency, Isolation, and durability.
Atomicity is the foundations of transactions. An example would be making a deposit at a bank. When you deposit a check you want the whole thing to go into your account. You don't expect that part of your check will get lost. Database transactions are the same way. In the inventory control database we need to make sure the entire shipping manifest is processed. We cant do a couple of lines, have a catastrophe, and then just throw up our hands and so oh well. What if our database didn't check data before it was inserted. Let's say John, a very reliable employee most of the time, presses the letter "o" instead of the number 0 for a numeric field. Our database, being the prototype it is failed to check that the data is the correct type. We insert a character into the field and the transaction fails. Sparks fly from John's computer and only the fields before the field with an "o" in it are stored.
Consistency is related to John's problem above, but from another point of view. Consistency means that a transaction will not only be completely processed, but that the "o" will not crash the database. A proper DBMS will verify that the value is appropriate for the field. This will also be important when we start talking about keys and relations. Consistency is what ensures that a foreign key in one table will relate to primary key in another. I understand that some may not know what keys are in a database. Just let it go or use Wikipedia. A transaction will take the database from one consistent state to another.
Isolation means that my transactions wont muck up yours. It wouldn't be good if our database allowed two people to ship the same item. Our widget with serial number 010 should not be shipped out by Kevin and John. It needs to be one or another.
Durability can be best explained, at least for me, by saying what goes in the database stays in the database. We will add and remove items from our warehouse and adjust the tables in our database accordingly. The tables should not be adjusted because Fred knocked his coffee onto the server. Fred will be our rebel throughout the series because of an irritating error when I learned how to program in Logo. I was looking at my first program, a simple little bit of code that moved the turtle, trying to figure out why the interpreter was giving me the following error message "You didn't tell me what to do with Fred" Oh I told it alright. Just not in a way that it would understand. Anyway after a few minutes I found the problem, if only all programming problems were that easy. I still use Fred for the name of my nemesis. Now back to durability. The basic idea is that data can only be changed by a transaction, not Fred's careless and lazy accident. An accident he should be fired for by the way. That's what you can do with Fred. So enough about my subconscious hostility about a programming issue I had when I was a teenager. Durability means our database needs to be a control freak. "Ain't nobody changing my data but me." Another example would be if a rogue program or service/daemon overwrote the hard drive or memory where that table was stored. Let's imagine that it wrote a random "Q" in someone's name because it thought it was inserting the "Q" in a text file it was editing. A database needs to defend against that.
What does relational mean? Relational means that data is stored in such a way as to capitalize on the relations between the data. For the purposes of this blog I will use a basic inventory control database. In our inventory control database we will need the ability to receive items, store items, find items, ship items, and dispose of items. Those of us who know a little bit about databases would say this is a trivial example. You only need to have tables for receiving items, storing items, finding items, shipping items, and disposing of items. While this would seem to be a very simple and elegant solution it really does ignore all the power of relational databases and creates a very error prone database. I recall one project I had where the goal was to improve on a database that was built like this. We ended up scrapping the whole database and starting over. So in the example of the inventory control database we would capitalize on the relations between requests for widgets, the widgets, the money moving around, and the organizations or people that interact with our inventory.
Relational is not the only way. Look at http://nosql-database.org/ for examples of database systems that are not relational and do really well.
Like it on Facebook, +1 on Google, Tweet it or share this article on other bookmarking websites.