Feeds:
Posts

## A Summary on Noise Addition for Data Privacy

By Kato Mivule

• Noise addition: A stochastic value is added to confidential numeric attributes.

• The stochastic value is chosen from a normal distribution with zero mean and a diminutive standard deviation. First publicized by Jay Kim (1986) with the expression that

• Z = X + ɛ

• Where Z is the transformed data point

• X is the original data point.

• ɛ (epsilon)is the random variable (noise) with a distribution ε∼ N(0, σ2 ).

• The X is then replaced with the Z for the data set to be published, Z = X + ɛ

Gaussian Noise Distribution

• The Normal Distribution( Gaussian distribution), is a bell shaped probability distribution depicting real-valued stochastic variables clustered around a single mean…

• μ (mu) is the mean

• σ2 (Sigma) is the variance

• N(μ, σ2) is the normal distribution with mean μ and variance σ2

• Transformed data has to keep the same statistical properties as the original data.
• Covariance:Cov(X, Y): How affiliated the deviations between points X and Y.

• If Cov(X, Y) is positive, X and Y increase together, otherwise they don’t.
• If Cov(X, Y) is zero, X and Y are each autonomous.
• Correlation rxy calculates tendency of linear relation between two data points.

• If -1, then rxy is a negative linear relation between x and y,
• if 0, no linear relation,
• if +1, a strong linear relation.

Notes

[1] Jay Kim, A Method For Limiting Disclosure in Microdata Based Random Noise and Transformation, Proceedings of the Survey Research Methods, American Statistical Association, Pages 370-374, 1986.

[2] J. Domingo-ferrer, F. Sebe, and J. Castella-Roca, “On the security of noise addition for privacy in statistical databases,” in Privacy in Statistical Databases 2004, 2004, pp. 149-161. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.4575

## Applying Data Privacy Techniques on Tabular Data in Uganda

Kato Mivule and Claude Turner

Abstract
The growth of Information Technology(IT) in Africa has led to an increase in the utilization of communication networks for data transaction across the continent. A growing number of entities in the private sector, academia, and government, have deployed the Internet as a medium to transact in data, routinely posting statistical and non statistical data online and thereby making many in Africa increasingly dependent on the Internet for data transactions. In the country of Uganda, exponential growth in data transaction has presented a new challenge: What is the most efficient way to implement data privacy. This article discusses data privacy challenges faced by the country of Uganda and implementation of data privacy techniques for published tabular data. We make the case for data privacy, survey concepts of data privacy, and implementations that could be employed to provide data privacy in Uganda.

## The Need to Map ER-diagrams to the Relational Model

By Kato Mivule

Database Systems

During a database development process, often database developers will engage with their clients and users to solicit the requirements that the new database is supposed to accomplish. One of the best ways to communicate with clients and users is with the use of ER diagrams to check if the developers have fully incorporated all the requirements in the design as dictated by the client.

The ER diagrams become easy to study, visualize, and amend during the requirements soliciting process with the client. Therefore, ER diagrams are conceptional and abstract presentation of the database, fully giving a formal description of database components such as schema, entities, relationships, and attributes. [1] [2]

At the same time the Relational Model which was first introduced in 1969 by E.F Codd, is a conceptual view of a database based on first-order predicate logic, depicting a database as a set of predicate variants. The model stipulates a declaratory way for defining data and queries by requiring users to straightaway state what data they need to get from the data and place into the database.[4] Relational Models represent data as a collection of relations, depicting the database as a schema, the table as a relation, the columns as attributes, and rows as tuples.[2]

The Relational Model is declarative and therefore easy to implement using declarative languages like SQL, making it a major difference between ER Diagrams and the Relational Model. ER diagrams tend to be highly conceptual and though could be easily understood by clients and users who are not experts in database terminology, ER diagrams are not declarative and straight forward when it comes to implementation. It is therefore important to have the Relational Model interface between the ER diagrams and the SQL declarative language so as to make implementation faster and easier.

Secondly, any errors that come up during the design phase with the ER diagrams, can be captured when mapping to the Relational Model. Yet even more, wile both the ER and the Relational Model are mathematical, we can still map ER diagrams to the Relational Model, thus making implementation of the conceptual view to physical view much easier.

Yet still, the ease of use between the ER Model and Relational Model is still debatable. Peter P. Chen, the author of the ER model in the 1970s notes three differences between the ER Model and E.F. Codd’s Relational Model: [6] [7]

• ER models employ mathematical relation constructs to show relationships between entities…
• ER model incorporates more semantic data than the Relational Model…
• ER model uses explicit linkage between entities while the Relational model uses implicit linkages between entities…

Yet still despite the advantages of more semantic information with the Chen’s ER model, Codd’s Relational model is closer to SQL implementation and has the declarative advantage. Therefore, for the conceptual view, the ER diagram work best but will work better when mapped to the Relational model for optimized physical view implementation.

Sources

[1] Seyed M. M. Tahaghoghi, Hugh E. Williams, “Learning MySQL”, O’Reilly Media, Inc., 2006, ISBN 0596008643, 9780596008642, Page 112-118

[2] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems, Edition 6, Addison Wesley Pub Co Inc, 2010, ISBN 0136086209, 9780136086208, Page 199-350

[3] Heidi Gregersen and Leo Mark and Christian S. Jensen, “Mapping Temporal ER Diagrams to Relational Schemas”, IN PROC. OF THE 9TH INT. CONF, 1998.

[4] “Relational model – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Relational_model. [Accessed: 02-Nov-2010].

[5] “Entity-relationship model – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Entity-relationship_model. [Accessed: 02-Nov-2010].

[6] Peter Chen, “Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons Learned”, Software Pioneers: Contributions to Software Engineering, Springer, 2002.

[7] E.F. Codd, “A relational model of data for large shared data banks. 1970.,” M.D. computing : computers in medical practice, vol. 15, 1970, pp. 162-6.

## SQL, Relational Algebra, and Relational Calculus, the Differences

By Kato Mivule

Database Systems

Outline

In this article we take a look at the differences between SQL, Relational Algebra, and Relational Calculus. I this article, we focus on the main differences between Relational Algebra and Relational Calculus. Since SQL is mainly an implementation language, we take note of some major differences between Relational Algebra and Relational Calculus.

Introduction

Relational Algebra (RA) and Relational Calculus (RC) are formal languages for the database relational model while SQL is the practical language in the database relational model. [1] In these formal languages a conceptual database model is expressed in mathematical terms and notations while in the practical language – SQL, the mathematical expressions of the functionality and transaction of the database operations are implemented physically. Formal languages provide a medium through which to optimize and implement queries in database transactions. [1]

Of course the first notable differences between these languages in the syntax and notation used in the expressions. Each language, that RA, RC, and SQL have their own notations to express their notations.

RA Notations – Procedural Expression Language

We begin with the notations used in RA and that are unique to RA and are procedural, requiring a sequence of operations to express a query transaction as described by Elmasri et al: [1]

SELECT

The SELECT opeartor is σ (sigma) symbol and used as an expression to choose tuples that meet the selection condition…

σ<Selection condition>(R)

PROJECT

The PROJECT operator in RA is ∏ (pi) symbol used to choose attributes from a relation.

<attribute list>(R )

RENAME

The RENAME operator is symbolized by ρ (rho) and is used to express a naming of query results, attributes, and relations.

ρ s(B1, B2, B3,….Bn)(R )

UNION

RA notation for UNION is symbolized by ∪ , and includes all tuples that are in A or in B, eliminating duplicate tuples, therefore set A UNION set B would be expressed as:

RESULT ← A ∪ B

INTERSECTION

The INTERSECTION operation on a relation A INTERSECTION relation B, is symbolized by A B, includes tuples that are only in A and B.

RESULT ← A B

MINUS

the MINUS operation includes tuples from one Relation that are not in another Relation and symbolized by the – (minus) symbol. Therefore A – B would be expressed as…

RESULT ← A – B

CARTERSIAN PRODUCT

Creates a relation that has all the attributes of A and B, allowing all the attainable combinations of tuples from A and B in the result. The notation used is X.

C = A X B

JOIN

The JOIN operation is denoted by the ⟗ symbol and is used to compound similar tuples from two Relations into single longer tuples.

A ⟗ <join condition> B

NATURAL JOIN

The NATURAL JOIN operation returns results that does not include the JOIN attributes of the second Relation B and is symbolized by the * symbol.

A * ⟗ <join condition> B,

OR A * ⟗ (<join attributes 1>),

(<join attributes 2>) B

OR A * B

DIVISION

The DIVISION operation will return a Relation R(X) that includes all tuples t[X] in R(Z) that appear in R1 in combination with every tuple from R2(Y), where Z = X ∪ Y. The DIVISION operator is symbolized by ∻ symbol

R1(Z) ∻ R2(Y)

RC Notation – Non Procedural Expression Language

As Elmasri et al note, RC is a non procedural expression language that does not require any sequence to be followed in the expression of query transactions. One declarative statement could be used express a query transaction. [1]

The calculus query syntax as noted by Elmasri et al is:

{t | COND(t)}

Where t is the tuple variable.

COND(t) is the conditional expression.

The AND, OR, and NOT logical operators are also used in the expressions. An example of a RC expression is:

{t | STUDENT(t) AND t. GPA > 3.0}

This query would return all students whose GPA is greater than 3.0.

Expressions and Formulas in RC as noted by Elmasri et al are in the form:

{t1.Aj, t2.Ak,…, tn.Am | COND(t1, t2, …, tn, tn+1, tn+2, …, tn+m)}

Where COND is the condition or formula.

Existential and Universal Quantifiers

Elmasri et al further note that in RC quantifies that are largely unique to RC can appear in fomulas. The quantifiers include:

Universal quantifier (∀): Is true if all tuples make the formula true.

Existential quantifier (∃): Is true if there exists some tuple that makes the formula true

Example would include:

List the first name, last name and address of Students whose Major is Computer Science.

Q1: {s.Fname, s.Lname, s.Address | STUDENTS(s) AND (∃ m) (MAJOR(m)

AND m.major=’Computer Science’ AND m.studentnumb=t.studentnumb)}

SQL Notation

SQL is a structured query language, a comprehensive database language with data definition, queries, and updates.[1] It is with SQL that RA and RC are implemented physically in a database management program. Therefore this is the major difference between RA, RC and SQL, in that SQL will take the conceptual expressions of RA and RC and implement them at the computer level or physical level. SQL includes many declarative statements that it would take another article to write all of them but below are some few included here:

• Create: used to create schema and tables
• Insert: used to insert data to the tables
• Select: used to select tuples that satisfy a condition, also used to select attributes.
• Select fname From STUDENTS Where major = ‘computer science’: selects first name of students from the students table whose major is computer science.

It is noted that RC is much easier to implement in SQL as the RC notation is similar to SQL syntax.

Conclusion

Therefore RA and RC major differences are that RC is non procedural while RA is procedural and rigid as such. However, another notable difference is that it is easy to implement RC expressions directly to SQL language.

Sources

[1] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems, Edition 6, Addison Wesley Pub Co Inc, 2010, ISBN 0136086209, 9780136086208, Page 145 – 186

## An Overview of Relational Algebra Operators and Their SQL Translations

By Kato Mivule

Database Systems

Outline

Relational Algebra Operators are indispensable mathematical tools that can be used to retrieve queries by describing a sequence operations on relations and schemas. In this discussion, we take a look at various Relational Algebra operations and the SQL implementation of assorted Relational Algebraic operators.

Introduction

Relational Algebra Operators are mathematical functions used to retrieve queries by describing a sequence operations on tables or even databases(schema) involved.[1] With relational algebra operators, a query is always composed of a number of operators, which each in turn are composed of relations as variables and return an individual abstraction as the end product. [2]

The following are the main relational algebra operators as applied to SQL:

The SELECT Operator

The SELECT operator is used to choose a subset of the tuples(rows) from a relation that satisfies a selection condition, acting as a filter to retain only tuples that fulfills a qualifying requirement. [3]

• The SELECT operator is relational algebra is denoted by the symbol σ (sigma).
• The syntax for the SELECT statement is then as follows: [3]

σ<Selection condition>(R)

• The σ would represent the SELECT command
• The <selection condition> would represent the condition for selection.
• The (R) would represent the Relation or the Table from which we are making a selection of the tuples.

To implement the SELECT statement in SQL, we take a look at an example in which we would like to select the EMPLOYEE tuples whose employee number is 7, or those whose date of birth is before 1980…

σempno=7(EMPLOYEE)

σdob<’01-Jan-1980′(EMPLOYEE)

The SQL implementation would translate into:

SELECT empno

FROM EMPLOYEE

WHERE empno=7

SELECT dob

FROM EMPLOYEE

WHERE DOB < ’01-Jan-1980′

The PROJECT Operator

This operator is used to reorder, select and get rid of attributes from a table. At some point we might want only certain attributes in a relation and eliminate others from our query result. Therefore the PROJECT operator would be used in such operations. [2] [3]

• The symbol used for the PROJECT operation is ∏ (pi).
• The general syntax for the PROJECT operator is:

<attribute list>(R )

• ∏ would represent the ROJECT.
• <attribute list> would represent the attributes(columns) we want from a relational.
• (R ) would represent the relation or table we want to choose the attributes from.

To implement the PROJECT statement in SQL, we take a look at an example in which we would like to choose the Date of Birth (dob) and Employee Number (empno) from the relation EMPLOYE…

• dob, empno(EMPLOYEE )

In SQL this would translate to:

SELECT dob, empno

FROM EMPLOYEE

The RENAME Operator

The RENAME operator is used to give a name to results or output of queries, returns of selection statements, and views of queries that we would like to view at some other point in time: [3] [4] [5] [6]

• The RENAME operator is symbolized by ρ (rho).
• The general syntax for RENAME operator is: ρ s(B1, B2, B3,….Bn)(R )
• ρ is the RENAME operation.
• S is the new relation name.
• B1, B2, B3, …Bn are the new renamed attributes (columns).
• R is the relation or table from which the attributes are chosen.

To implement the RENAME statement in SQL, we take a look at an example in which we would like to choose the Date of Birth and Employee Number attributes and RENAME them as ‘Birth_Date’ and ‘Employee_Number’ from the EMPLOYEE relation…

ρ s(Birth_Date, Employee_Number)(EMPLOYEE ) ← ∏dob, empno(EMPLOYEE )

• The arrow symbol ← means that we first get the PROJECT operation results on the right side of the arrow then apply the RENAME operation on the results on the left side of the arrow.

In SQL we would translate the RENAME operator using the SQL ‘AS’ statement:

SELECT dob AS ‘Birth_Date’, empno AS ‘Employee_Number’

FROM EMPLOYEE

The UNION, INTERSECTION, and MINUS Operators

UNION: the UNION operation on relation A UNION relation B designated as A B, joins or includes all tuples that are in A or in B, eliminating duplicate tuples. [3] [7] The SQL implementation of the UNION operations would be as follows:

UNION

RESULT ← A B

SQL Statement:

SELECT * From A

UNION

SELECT * From B

INTERSECTION: the INTERSECTION operation on a relation A INTERSECTION relation B, designated by A B, includes tuples that are only in A and B. In other words only tuples belonging to A and B, or shared by both A and B are included in the result. [3] [7] The SQL implementation of the INTERSECTION operations would be as follows:

INTERSECTION

RESULT ← A B

SQL Statement:

SELECT dob From A

INTERSECT

SELECT dob from B

MINUS Operations: the MINUS operation includes tuples from one Relation that are not in another Relation. Let the Relations be A and B, the MINUS operation A MINUS B is denoted by A – B, that results in tuples that are A and not in B. [3] [7] The SQL implementation of the MINUS operations would be as follows:

MINUS

RESULT ← A – B

SQL Statement

SELECT dob From A

MINUS

SELECT dob from B

CARTESIAN PRODUCT Operator

The CARTERSIAN PRODUCT operator, also referred to as the cross product or cross join, creates a relation that has all the attributes of A and B, allowing all the attainable combinations of tuples from A and B in the result.[3] [7] [8] The CARTERSIAN PRODUCT A and B is symbolized by X as in A X B.

Let there be Relation A(A1, A2) and Relation B(B1, B2)

The CARTERSIAN PRODUCT C of A and B which is A X B is

C = A X B

C = (A1B1, A1B2 , A2B1, A2B2 )

The SQL implementation would be something like:

SELECT A.dob, B.empno

from A, B

JOIN Operator

The JOIN operation is denoted by the ⟗ symbol and is used to compound similar tuples from two Relations into single longer tuples. Every row of the first table is joined to every row of the second table. The result is tuples taken from both tables. [3] [9]

• The general syntax would be A ⟗ <join condition> B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob, A.empno

from employee

JOIN B on B.empno=A.empno

THETA JOIN Operator

This operation results in all combinations of tuples from Relation A and Relation B satisfying a join requirement. The THETA JOIN is designated by:[3] The SQL implementation would be the same as for the JOIN example above.

• A ⟗ <join condition> B

EQUIJOIN Operator

The EQUIJOIN operation returns all combinations of tuples from Relation A and Relation B satisfying a join requirement with only equality comparisons. The EQUIJOIN operation is symbolized by : [3]

A ⟗ <join condition> B, OR

• A ⟗ (<join attributes 1>),

(<join attributes 2>) B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT * from A

INNER JOIN B

on A.empno=B.empno

NATURAL JOIN Operator

The NATURAL JOIN operation returns results that does not include the JOIN attributes of the second Relation B. It is not required that attributes with the same name be mentioned. The NATURAL JOIN operator is symbolized by: [3] [10]

• A * ⟗ <join condition> B,

OR A * ⟗ (<join attributes 1>),

(<join attributes 2>) B

OR A * B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob, B.empno

FROM A

NATURAL JOIN B

//where depno =5

We can always use the ‘where’ clause to further restrict our output and stop a Cartesian product output.

DIVISION Operator

The DIVISION operation will return a Relation R(X) that includes all tuples t[X] in R(Z) that appear in R1 in combination with every tuple from R2(Y), where Z = X ∪ Y. The DIVISION operator is symbolized by: [3]

• R1(Z) ∻ R2(Y)

The DIVISION operator is the most difficult to implement in SQL as no SQL command is given for DIVISION operation. The DIVISION operator would be seen as the opposite of the CARTERSIAN PRODUCT operator; just as in standard math, the relation between division and multiplication.[11] Therefore a series of current SQL commands have to be utilized in implementation of the DIVISION operator. An example of the SQL implementation of DIVISION operator: [12]

SELECT surname, forenames

FROM employee X

WHERE NOT EXISTS

(SELECT ‘X’

FROM employee y

WHERE NOT EXISTS

(SELECT ‘X’

FROM employee z

WHERE x.empno = z.empno

AND y.surname = z.surname))

ORDER BY empno

Conclusion

Relational Algebra Operators are critical mathematical tools used to retrieve queries by describing a sequence operations on tables or even databases(schema) involved mathematically. With relational algebra operators, a query is always composed of a number of operators, which each in turn are composed of relations as variables and return an individual abstraction as the end product. Relational Algebra operations can easily be translated into SQL commands to retrieve query results, making it a powerful tool in the hands of any Database designer, user, and administrator.

References

[1] Clare Churcher, Beginning SQL Queries: From Novice to Professional, Apress Series, Books for Professionals by Professionals, Expert’s voice in databases, Edition illustrated, Apress, 2008, ISBN1590599438, 9781590599433, Page 12-13

[2] Peter Z. Revesz, Introduction to constraint databases, Texts in computer science, Edition illustrated, Springer, 2002, ISBN 0387987290, 9780387987293, Page 26

[3] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of Database Systems, Edition 6, Addison Wesley Pub Co Inc, 2010, ISBN 0136086209, 9780136086208, Page 145 – 164

[4] “Relational Algebra.” [Online]. Available: http://technet.microsoft.com/en-us/library/cc966426.aspx. [Accessed: 02-Oct-2010].

[5] “Rename (relational algebra) – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Rename_(relational_algebra). [Accessed: 02-Oct-2010].

[6] “Relational algebra – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Relational_algebra. [Accessed: 02-Oct-2010].

[7] Rajesh Narang, Database Management Systems, PHI Learning Pvt. Ltd., 2006, ISBN 8120326458, 9788120326453, Page 223 – 239

[8] P. S. Deshpande, Sql/Pl Sql For Oracle 9I (W/Cd), Dreamtech Press, 2003, ISBN 8177224719, 9788177224719, Page 45

[9] Allen G. Taylor, SQL All-in-one Desk Reference for Dummies, For Dummies Computers Series, For Dummies, 2007, ISBN 0470119284, 9780470119280, Page 303 – 314

[10] “Oracle simplifies SQL with ISO 99 Syntax.” [Online]. Available: http://www.dba-oracle.com/art_sql_iso_99.htm. [Accessed: 03-Oct-2010].

[11] L. I. McCann and W. I. Kenosha, “On making relational division comprehensible,” in FRONTIERS IN EDUCATION CONFERENCE, vol. 2, pp. 2–6, 2003.

[12] Sikha Bagui, Sikha Saha Bagui, Richard Earp, Essential SQL on SQL Server 2008, Edition revised, Jones & Bartlett Learning, 2009,ISBN 076378138X, 9780763781385, Page 314

## Oracle 11g SQL Performance Analyzer Overview

By Kato Mivule

Database  Systems

Oracle 11g’s SQL Performance Analyzer (SPA) also referred to as the “Fully Automatic SQL Tuning” is a technological feature provided in Oracle 11g that analyzes queries and checks the queries so as to better optimize them for better efficiency. The SPA will then rewrite the query for better, faster, and optimized output. [1]

According to Donald K. Burleson, a renowned expert in Oracle tuning, with books published on the subject, the concept of tuning in Oracle is not new as this technology has come in various formats in from Oracle 7 in which the database was optimized by tuning the following: Bitmap Indexes, Partitioned Views, Sequential prefetch for full table scans, alter index xx rebuild syntax, and Advanced replication. [2]

In Oracle 8, tuning was advanced further to focus on areas that include: Table and Index partitioning enhancements, reverse key indexes, updatable views, and Oracle parallel query for Data Manipulation Language (DML) such as insert, update, and delete. [2]

In Oracle 8i, tuning was further advanced and saw new tools that allowed tuning for keeping track of statistics in the database. As such the STATPACK tool allowed the DBA to access query history and then apply scientific methods to achieve better performance. [3]

In Oracle 9 saw advancement of the STATPACK tuning tool with additional tools such as the network, server, disk tuning, Instance and Object tuning, Design tuning to optimize database designs, and the SQL tuning. [4]

Oracle 10g saw a major improvement of the SQL tuning technology in which the SQL tuning became a central improvement of the query calibration technology in Oracle, offering solutions to the challenges of time constraints, heavy workloads, and complexity when it came to tune SQL statement in large databases. [5] Oracle 10g was able to identify high load or top SQL statements using the SQL history data, attempted to find ways to improve execution plan of SQL statements by optimization, and suggest and apply better execution plans for poor performing queries. [5].

Oracle 11g has the latest updated technology when it comes to SQL tuning. The SQL tuning in Oracle 11g, also known as the SQL Tuning Adviser, is an automatic tuning optimizer that employs Artificial Intelligent technology to self correct and self tune the database. The SQL Tuning Adviser carries out statistical analysis by gathering SQL statistics, creates new SQL profiles, carries out access path analysis by managing indexes, and analyzing SQL structures to make enhancements for better performance.[6] [7]

The most exciting development is the implementation of artificial intelligence in SQL optimization and self correction of large databases. The gathering of database performance statistics, analyzing the statistics, suggesting and implementing enhancements and optimizations is really a breakthrough in database technology, leaving the future role of the DBA in question.

References

[1] “Automatic SQL Tuning using SQL Tuning Advisor — DatabaseJournal.com.” [Online]. Available: http://www.databasejournal.com/features/oracle/article.php/3492521/Automatic-SQL-Tuning-using-SQL-Tuning-Advisor.htm. [Accessed: 29-Sep-2010].

[2] “Inside Oracle 11g fully automated SQL tuning.” [Online]. Available: http://www.dba-oracle.com/t_inside_fully_automated_sql_tuning.htm. [Accessed: 29-Sep-2010].

[3] Donald K. Burleson, Oracle Tuning: The Definitive Reference, Oracle In-Focus series, Edition 2, Rampant TechPress, 2010, ISBN 0979795192, 9780979795190, Page 744.

[4] Donald K. Burleson. Oracle9i high-performance tuning with STATSPACK. Oracle Press, McGraw-Hill Professional, 2002. ISBN 007222360X, 9780072223606, Page 8.

[5] Mario A. Nascimento, Proceedings: 30th International Conference on Very Large DataBases. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES. Morgan Kaufmann, 2004, ISBN 0120884690, 9780120884698, Page 1098

[6] “Inside the 11g SQL Performance Analyzer (SPA).” [Online]. Available: http://www.dba-oracle.com/oracle11g/oracle_11g_sql_performance_advisor.htm. [Accessed: 29-Sep-2010].

[7] Sam R Alapati. OCP Oracle Database 11g: New Features for Administrators Exam Guide (Exam 1Z0-050). McGraw-Hill Professional, 2008. ISBN 0071496831, 9780071496834, Page 215.

## A Discussion on Amazon’s SimpleDB and Relational Databases

By Kato Mivule

Database Systems

Online retailer Amazon sent shock waves into the Database tech community in 2007 with the launching of the Amazon SimpleDB [1] that led some to question the future of Relational Databases and in particular the future of SQL. [2] Microsoft also launched launched a service similar to Amazon’s SimpleDB in 2008, signaling a great interest in the new database architecture that Amazon employed with the SimpleDB. [3]

SimpleDB is based on a Distributed System Architecture that stores data in excess on devices across the Distributed Architecture of the same Sphere.[4] In the case of Amazon’s SimpleDB, data is stored on various distributed servers in the Amazon Domain.

SimpleDB is based on Amazon’s S3 – Simple Storage Service in which users are granted unlimited data storage capacity at very inexpensive rates. Data in the S3 system is stored across a number of servers or storage devices in the Amazon scalable distributed network.[5] SimpleDB and S3 are extensions of the Cloud Computing Architecture in which computing resources, software applications, and data are shared across the web on a demand basis.

Software resources such as Applications and Data are stored on distributed servers so when a user demands use of a Word Processing Application, that instance is provided to the user via the web browser. So, the user need not keep or store applications or data on their computing devices but will depend on the reliability and availability of the internet for access. [6] [7]

Amazon defines Cloud Computing Architectures as the inherent computing base in which Applications are utilized only when as demanded or needed. Once the demanded application is no longer needed or done with, the resources being utilized are discarded. [8] It is in this type of Cloud Computing architecture that SimpleDB is built on that Amazon refers to as the Elastic Compute Cloud or EC2 because the on-demand application utilizes resources elastically. [8]

Therefore SimpleDB is mainly a distributed Database tool that stores and retrieves user data across Amazon’s distributed servers or cloud computing architecture giving it the scalability advantage. SimpleDB does not follow the traditional database relational model in which data is stored on a central server.

As InformIT notes in their description of Amazon’s SimpleDB, [1] there are a number of problems with the SimpleDB database Technology, with the most trouble being a lack of schema. [1] This lack of structure in database definitions and constraints provides problems when it comes to data manipulation. For example since data is not defined, it becomes difficult carrying out mathematical calculations on data, even as InformIT notes that data in SimpleDB is stored as a variable characters. [1] This is the greatest setback to SimpleDB that out weighs any advantages.

As InformIT notes, SimpleDB provides scalability. [1] SimpleDB therefore would be best suited for quick data retrieval mechanisms rather than as a serious data management application. As one computing blogger noted, data integrity is never warranted, database aggregate transactions are not fostered and very difficult to code, and that Relational Database Technology has greatly improved in the area of scalability to the point that web applications like Facebook and LiveJournal use employ Relational Database Technology like MySQL. [9]

The issue of Security and Privacy are also of great concern when it comes to SimpleDB. It is easy to attain some degree of privacy when data is managed of specific known servers. Yet with the SimpleDB architecture, data is stored across many devices that should one device be compromised, some portion of data is at risk.

In conclusion, Relational Database Models will stay and while Amazon’s SimpleDB might not be the next big thing, it could be employed for non critical data management for fast storage and retrieval. Say for example, if one would simply want to store web address that are none critical, SimpleDB could be utilized in such circumstances.

References

[1] “Amazon Launches New Database Service — Database — InformationWeek.” [Online]. Available: http://www.informationweek.com/news/software/soa/showArticle.jhtml?articleID=204803008. [Accessed: 10-Sep-2010].

[2] “InformIT: Introducing Amazon SimpleDB > Abandoning the Relational Model?.” [Online]. Available: http://www.informit.com/articles/article.aspx?p=1619309&seqNum=3. [Accessed: 10-Sep-2010].

[3] “Microsoft launches its alternative to Amazon’s SimpleDB | ZDNet.” [Online]. Available: http://www.zdnet.com/blog/microsoft/microsoft-launches-its-alternative-to-amazons-simpledb/1245. [Accessed: 10-Sep-2010].

[4] James Murty, “Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB, O’Reilly Series, Editionillustrated”. O’Reilly Media, Inc., 2008, ISBN 0596515812, 9780596515812, page 52.

[6] “Amazon Simple Storage Service (Amazon S3).” [Online]. Available: http://aws.amazon.com/s3/. [Accessed: 10-Sep-2010].

[5] “Amazon S3 – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Amazon_S3. [Accessed: 10-Sep-2010].

[6] Gregory Wilshusen, “Information Security: Federal Guidance Needed to Address Control Issues with Implementing Cloud Computing”. DIANE Publishing, ISBN 1437935648, 9781437935646, Page 10.

[7] “Cloud computing – Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Cloud_computing#cite_note-23. [Accessed: 11-Sep-2010].

[8] “Amazon Web Services Developer Community : Building GrepTheWeb in the Cloud, Part 1: Cloud Architectures.” [Online]. Available: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1632&categoryID=100. [Accessed: 11-Sep-2010].

[9] “Top 10 Reasons to Avoid the SimpleDB Hype — ryanpark.org.” [Online]. Available: http://www.ryanpark.org/2008/04/top-10-avoid-the-simpledb-hype.html. [Accessed: 11-Sep-2010].