cancel
Showing results for 
Search instead for 
Did you mean: 
Drrickryp

Relational database design and PowerApps – Step 2 Data Gathering, Constructing & Normalizing Tables

normal.gif

This post is the third in a series following the steps in designing the database that will provide the Back-end of a PowerApp.  The terminology assumes that you have reviewed the preceding two posts.  Hopefully, following this review, the prospective designer will be ready to take all of the data necessary for the application, break each data point down to its smallest usable components and then construct the tables.  The designer will then examine each table to eliminate redundancy. This process is called Normalization.  

Data Gathering

  • Scattershot.  Collect all the information that you could possibly need.  Start by looking at the sample reports and any prior attempts at data collection as described in the preceding section.  Don’t worry about organizing it at this point.  Try to think of everything, you can always eliminate unnecessary details later.  It is better to have something that you don’t use later than to leave out something that might be important. 
  • Atomization - breaking the data into its smallest usable parts: Go over the list of data points that you have, examine each piece of data and see if you can break it down to its smallest amount of usable parts.  For example, a Full name like John T. Jefferson, CPA could be broken down into First name, Last Name, MI and Title.  An address could be broken down into street address, suite number, city, state or province, country and postal code.  Once you have all the information that you will need for your reports, you are ready for the next step, organizing the data into tables.

 

Designing the Tables

 

The guiding rule of a database is to eliminate data redundancy. This means that a single data point should only be stored in one place in the entire database.  The process of designing the Back end of the database consists of two parts, Construction and Normalization.  The construction part is where the data is examined and then grouped into tables that have a single common characteristic.  Normalization is the process whereby each table is re-examined and determined if it can be split into more tables.  The goal is that there should be no duplication of rows (records) or columns (fields) and in a single table there should be no duplication of data in the fields. Each data point (fact) should exist only once in a well constructed database.  A poorly designed table is inefficient because it contains duplicates of the same information and mistakes in data entry will decrease the accuracy of the database.  Since the same data must be manually entered each time, there the possibility for misspellings and Capture.PNGwhen it comes time to retrieve the information, it will be missed. The figure on the left is the example used by MS Access in the Northwind Traders sample database  to demonstrate a flawed table. It is flawed because it contains information about two completely different types of data, Products (Things) and Suppliers (Organizations). Think about how difficult it would be to correct this table if the suppliers address changed. It would have to be changed in multiple locations.  Every time the same piece of data is entered, there is the possibility of misspelling the name.  Then when you want to search for the products of a particular supplier, the misspelled name will not be found.  When constructing the tables from the data that has been collected, start by grouping the data by a common characteristic, say People, Organizations, Places and Things.  Then break out each type by another common characteristic.  

If the data point does not fit into any group, then it may need a table of its own.  Once you have formed a table, put in some sample data.  If you see any repeating datapoints, like in Figure 1, this is a "Red Flag" that another table is needed.  

  • People
    • (Employees, Customers, Students, Faculty, Administration, etc.).
  • Places
    • (Street address, Suite number, City, State/Province, Country, Postal Code)
  • Organizations
    • (Shippers, Suppliers, Departments)
  • Things
    • (Products, Courses, Orders) 

Think about the fields and whether you will want to search on them.  For example, you may want to sort the data by city or state. You may want to extract data by Employee and Manager. Always keep in mind what you want to get out of the database when determining what data belongs in a table.  Each field in the table should be a single attribute of each record and should contain only of one data type, for example, Text, Number, Boolean (Yes/No), Hyperlink, Attachment). Please note that Calculated is not included in this list.  It is usually unnecessary to have a field for calculated data in a table as the calculations can be done on the Front-end of the database from the data in the fields, saving space in the Back-end of the database. Also, if more than one data type is in a field, it needs to be separated into its own column.  Keep dividing and separating the tables until each table row describes one record or individual in the group,  each field is a single attribute of that record, and each cell (the intersection of the record and the field) should only contain one data point. 

You will have reached  the ” least common denominator” for constructing tables when each record in the table can be described on the attributes or fields, and when every data point that you will need for a complete report to the User is included in a table, and no data point is duplicated anywhere else in the database.  Then you will be ready for the next step. Determining the relationships between the tables.

 

Capture1.PNGThe above discussion assumes that you will be starting from scratch.  What if you already have spreadsheets or data tables  that could possibly be used in the database.  The same rules are used to deconstruct the spreadsheets into individual tables.  The figure on the left demonstrates how this is done.  Here, the first table as been split into two tables, Products and Suppliers. Now, If the supplier’s address changed, it would only have to be updated once.  The duplicated rows in the first table have now been moved to the Supplier’s table and the SupplierID is now used to refer back to the Products table.  In the next part of this blog, we will use integers for the table ID as this is more space efficient and can be indexed and searched rapidly. 

Please proceed to the next article Relational-Database-Principles-and-PowerApps-Step-3-Keys-and relationshipsNeed Keys?Need Keys?

 

 

 

Comments