MMS • RSS
Article originally posted on Data Science Central. Visit Data Science Central
This is a general purpose question for a variety of solutions database management presents. Data stored in hard disks has separate concerns for optimization, than data in DBMS. Data centralized has separate optimization strategies than data distributed across networks. Data stored for purpose of website or desktop applications have different implications from corporate data. Therefore we are talking about different strategies for different people. Here’s how to know the difference.
First starting from the simple. Data access from website or desktop applications is served as simple standalone application with built in data files. For Microsoft Visual Studio developers, you create a web application and attach a data file within the project solution from add new menu item in the solution explorer. There, data is stored in all the possible data types supported by SQL Server Native application in the SQL language. When the data is up and running in the website or desktop application, oftentimes overhead is casted by multilayered architecture of the application. Normally, a simple dynamic website does not need a 3 tiered architecture as taught in most programming courses. The reason being 3 tiered architecture is for enterprise level software, PHP and ASP.NET applications with direct data access perform better in these cases. Not so when the application is big.
Now let us consider an enterprise level application. Consider an accounting application, where the ledgers, reporting, account receivables, liabilities, owner’s equity, each head of account has a separate interface with dedicated resources working to provide access to respective domain users. Here is what things do not scale up. Normal PHP and Simple Dynamic Websites may wear out. Here we need an enterprise level web application, with desktop and mobile module integration, together as a distributed server client database. Here we are not talking about simple CRUD operations (Create, Read, Update, and Delete Records). The distributed database(s) attached to each subject may have its own transaction histories maintained as Online Transactional Processing Software working at the back end. Committing new additions to employee account may reflect respective department wise database, payroll account, ledger, cost accounting reports, attendance management systems, and human resource records. Problems of data in such huge systems are in integration, concurrent access, data redundancy, data anomalies, and join operation overheads.
In this second scenarios, a number of database flavors provide increasing level of flexible and scalable solutions to the problems just stated. Data integration is provided by data mart applications, concurrent access by some high level RDBMS like Teradata, DB2, Oracle and SQL Servers. Data redundancy is kept only at a level needed for distributed access optimizations, while overhead in replication is minimized by centralization of database servers. Data anomalies are the problems where consistency is compromised. For example, when an employee leaves the organization, the payroll system, attendance management system and the HR management system has to recycle the references to the employee in their respective databases. In the process, some references may not be updated with the current employee occupation or replacement in the assigned duties. This is where normalization works effectively.
Next time we will be covering Normalization in detail. Why is it needed, what are its advantages, and what are the aims when normalizing database structures.