Scattershot. Collect all the information that you could possibly. Start by looking at the sample reports and any prior attempts at data collection as described in the preceding section. Don’t worry about organizing it at this point. Try to think of everything, you can always eliminate unnecessary details later. It is better to have something that you don’t use later than to leave out something that might be important.
Atomization - breaking the data into its smallest usable parts: Go over the list of data points that you have, examine each piece of data and see if you can break it down to its smallest amount of usable parts. For example, a Full name like John T. Jefferson, CPA could be broken down into First name, Last Name, MI and Title. An address could be broken down into street address, suite number, city, state or province, country and postal code. Once you have all the information that you will need for your reports, you are ready for the next step, organizing the data into tables.
Designing the Tables
The guiding rule of a database is to eliminate data redundancy. This means that a single data point should only be stored in one place in the entire database. The process of designing the Back end of the database consists of two parts, Construction and Normalization. The construction part is where the data is examined and then grouped into tables that have a single common characteristic. Normalization is the process whereby each table is re-examined and determined if it can be split into more tables. The goal is that there should be no duplications of rows (records) or columns (fields) and in a single table there should be no duplication of data in the fields. Each data point (fact) should exist only once in a well constructed database. A poorly designed table is inefficient because it contains duplicates of the same information and mistakes in data entry will decrease the accuracy of the database. Since the same data must be manually entered each time, there the possibility for misspellings and when it comes time to retrieve the information, it will be missed. The figure on the left is the example used by MS Access to demonstrate a flawed table. It is flawed because it contains information about two completely different types of data, Products (Things) and Suppliers (Organizations). Think about how difficult it would be to correct this table if the suppliers address changed. It would have to be changed in multiple locations. Every time the same piece of data is entered, there is the possibility of misspelling the name. Then when you want to search for the products of a particular supplier, the misspelled name will not be found. When constructing the tables from the data that has been collected, start by grouping the data by a common characteristic, say People, Organizations, Places and Things. Then break out each type by another common characteristic.
If the data point does not fit into any group, then it may need a table of its own. Once you have formed a table, put in some sample data. If you see any repeating datapoints, like in Figure 1, this is a "Red Flag" that another table is needed.
(Street address, Suite number, City, State/Province, Country, Postal Code)
(Shippers, Suppliers, Departments)
(Products, Courses, Orders)
Think about the fields and whether you will want to search on them. For example, you may want to sort the data by city or state. You may want to extract data by Employee and Manager. Always keep in mind what you want to get out of the database when determining what data belongs in a table. Each field in the table should be a single attribute of each record and should contain only of one data type, for example, Text, Number, Boolean (Yes/No), Hyperlink, Attachment). Please note that Calculated is not included in this list. It is usually unnecessary to have a field for calculated data in a table as the calculations can be done on the Front-end of the database from the data in the fields, saving space in the Back-end of the database. Also, if more than one data type is in a field, it needs to be separated into its own column. Keep dividing and separating the tables until each table row describes one record or individual in the group, each field is a single attribute of that record, and each cell (the intersection of the record and the field) should only contain one data point.
You will have reached the ” least common denominator” for constructing tables when each record in the table can be described on the attributes or fields, and when every data point that you will need for a complete report to the User is included in a table, and no data point is duplicated anywhere else in the database. Then you will be ready for the next step. Determining the relationships between the tables.
The above discussion assumes that you will be starting from scratch. What if you already have spreadsheets or data tables that could possibly be used in the database. The same rules are used to deconstruct the spreadsheets into individual tables. The figure on the left demonstrates how this is done. Here, the first table as been split into two tables, Products and Suppliers. Now, If the supplier’s address changed, it would only have to be updated once. The duplicated rows in the first table have now been moved to the Supplier’s table and the SupplierID is now used to refer back to the Products table. In the next part of this blog, we will use integers for the table ID as this is more space efficient and can be indexed and searched rapidly. At this point, for practice I am attaching a spreadsheet of sample data that I found on the internet. The first 5 designers who can deconstruct this spreadsheet down to its minimalist tables and submit their work in the comments section will receive double kudos for doing it successfully. Please proceed to the next article Relational-Database-Principles-and-PowerApps-Step-3-Keys-and relationshipsNeed Keys?