Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Database Normalization
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Database Normalization

Hello All,

I am looking to more explore about 1nf, 2nf, and 3nf section in DBMS normalization. I am confused about how it works and what the exact working process of normalization in DBMS? I have mention below some points, Anyone knows please explain to me.

  1. When is the process of normalization used?
  2. What is Boyce and Codd Normal Form?
  3. Difference between 1nf, 2nf, and 3nf?
Thanked by 1receivedthanks

Comments

  • jsgjsg Member, Resident Benchmarker
    edited October 2019

    @shivambhatele said:
    Hello All,

    I am looking to more explore about 1nf, 2nf, and 3nf section in DBMS normalization. I am confused about how it works and what the exact working process of normalization in DBMS? I have mention below some points, Anyone knows please explain to me.

    1. When is the process of normalization used?

    When a DB is designed or if an existing DB needs to be normalized, e.g. due to practical reasons like one company buying another company.

    1. What is Boyce and Codd Normal Form?

    BCNF ("Boyce Codd Normal Form") ist the practically most desired NF. It's more stringent that 3 NF.

    1. Difference between 1nf, 2nf, and 3nf?

    Increasing stringency.

    To really understand normalization you must stop seeing a DB as a "collection of records" and look at it with "formal eyes". Then you'll understand that normalization pretty much describes the way from "a pile of records" towards a relational DB.

  • Some normalization can be a good thing, but you're most likely wasting your time if you're trying to figure out all of the different forms of it.

    Fun fact: normalizing your data can make your database slower, but it can save a lot of space.

  • Normalizing your database just for the sake of doing it may not really be helpful in the long run. Normalization can cause more complexity so you should consider the advantages carefully along with various other features offered in the database to meet your goal.

    Thanked by 1receivedthanks
  • From how the original question was written, I thought it was more of a study question about database theory. In practice it's common to use some denormalization, and databases support it more and more. For example, postgres and sqlite3 both let you put json arrays into columns, and let write SQL queries that examine their contents.

    Thanked by 1receivedthanks
  • jsgjsg Member, Resident Benchmarker
    edited October 2019

    @cirrus_cloud said:
    Fun fact: normalizing your data can make your database slower, but it can save a lot of space.

    No. I've yet to see a BCNF normalized relational DB that got slower. I don't say that's not possible in some exotic corner cases but I've not yet seen an actually used real world DB that got slower after normalization.
    Reason: code that can rely on data consistency is always faster than code that must deal with inconsistency as most databases do simply because they must due to most real world DBs not being consistent. Consistency (a normalized DB) leads e.g. to much faster array processing as well as to much fewer cache trashing.

    It seems there is a major misunderstanding. A not at least 3NF conforming DB is not a relational DB - simple as that. It's but a pile of records which may happen to work and seem to be a RDB but it's not really.
    But I concede that the term "normalization" is a bit misleading as it seems to suggest that xNF or BCNF is some kind of academic enhancement that might or might not be desirable and that anyway is not important for mere mortal business DBs.
    But still, the term "relational DB" is not arbitrary, it's implying and theoretically demanding a well defined mathematical construct.

    Thanked by 1receivedthanks
  • jsg said: No. I've yet to see a BCNF normalized relational DB that got slower.

    Meh, sites use memcached etc. precisely because doing stuff through the db all the time gets too slow. That is the ne plus ultra of denormalization.

    Thanked by 1receivedthanks
  • Can someone provide a real example of normalization?

    Thanked by 1receivedthanks
  • When is your homework due?

  • jackbjackb Member, Host Rep
    edited October 2019

    @LTniger said:
    Can someone provide a real example of normalization?

    One example scenario is - let's say you've got a table which holds contacts. Each contact has a name and a contact type (e.g. person). You realise at design time that contact type (person) is going to be duplicated and if you ever wanted to rename it to human, you'd have to update all the contact records.

    So, you split it off out into a definition table (contact type) and insert a record for person, referencing it as a foreign key on the contact table. Now it's not duplicated and if you ever want to rename it you only update one row.

    There is so many more scenarios covered by normalisation though.

    Usually the end result is better data integrity and better performance, but eventually views become an absolute necessary for human readability and data presentation (you end up with tables that just contain IDs / uuids / guids)

    Thanked by 2Levi receivedthanks
  • jackb said: you end up with tables that just contain IDs / uuids / guids

    Mmm, yes, I saw this few times. And that was disgusting. Poor DBA's...

    Thanked by 1receivedthanks
  • Those sorts of tables are usually for the purpose of doing many-to-many joins.

    Thanked by 1receivedthanks
  • jackbjackb Member, Host Rep
    edited October 2019

    @LTniger said:

    jackb said: you end up with tables that just contain IDs / uuids / guids

    Mmm, yes, I saw this few times. And that was disgusting. Poor DBA's...

    Tbh once you get used to it, it's a really good way to arrange things from a dev perspective.

    Dbas love it too since (generally) they will just need to run the reindex job on a schedule and jobs a good one, Vs something with a less technically correct design that might get bigger performance issues and then they might get bothered by users or management.

    The reason for this is most implementations of SQL are really good at seeking keys from an index (in this case the primary key). They aren't so good at looking for content (scanning)

  • BCNF (Boyce Codd Normal Form) db's apparently can make some guarantees about referential integrity that non-normalized db's can't. I'll try to study this since I'm trying to understand SQL better.

    Thanked by 1receivedthanks
  • @jsg said:

    @cirrus_cloud said:
    Fun fact: normalizing your data can make your database slower, but it can save a lot of space.

    No. I've yet to see a BCNF normalized relational DB that got slower. I don't say that's not possible in some exotic corner cases but I've not yet seen an actually used real world DB that got slower after normalization.
    Reason: code that can rely on data consistency is always faster than code that must deal with inconsistency as most databases do simply because they must due to most real world DBs not being consistent. Consistency (a normalized DB) leads e.g. to much faster array processing as well as to much fewer cache trashing.

    It seems there is a major misunderstanding. A not at least 3NF conforming DB is not a relational DB - simple as that. It's but a pile of records which may happen to work and seem to be a RDB but it's not really.
    But I concede that the term "normalization" is a bit misleading as it seems to suggest that xNF or BCNF is some kind of academic enhancement that might or might not be desirable and that anyway is not important for mere mortal business DBs.
    But still, the term "relational DB" is not arbitrary, it's implying and theoretically demanding a well defined mathematical construct.

    Since when is a join faster than a flat table where the data already exists? Constraints make your database slower.

  • jsgjsg Member, Resident Benchmarker

    @cirrus_cloud said:
    Since when is a join faster than a flat table where the data already exists? Constraints make your database slower.

    That was not the question.

    Thanked by 1receivedthanks
  • @jsg said:

    @cirrus_cloud said:
    Since when is a join faster than a flat table where the data already exists? Constraints make your database slower.

    That was not the question.

    I was just responding to you stating that normalization doesn't slow a database down. Normalization is by definition adding more joins, which are going to be slower than data with fewer constraints.

  • jsgjsg Member, Resident Benchmarker

    @cirrus_cloud said:
    I was just responding to you stating that normalization doesn't slow a database down. Normalization is by definition adding more joins, which are going to be slower than data with fewer constraints.

    No. Normalization usually makes the DB faster due to consistency. This is even more true when there is lots of memory and/or cache (which is usually the case with DBs of significant size).

  • jackbjackb Member, Host Rep
    edited October 2019

    @cirrus_cloud said:

    @jsg said:

    @cirrus_cloud said:
    Since when is a join faster than a flat table where the data already exists? Constraints make your database slower.

    That was not the question.

    I was just responding to you stating that normalization doesn't slow a database down. Normalization is by definition adding more joins, which are going to be slower than data with fewer constraints.

    Index size is a lot smaller with no duplication. Consider you have 1m rows - let's say a list of addresses, each with a column e.g. town. There is 10,000 different towns in your database.

    If duplicated, all of your duplicated towns need indexed on your address table, or you get a table scan when doing select * from address where town='bumfuck' and region='nowhere'

    If you've normalised your data, the 10,000 towns are indexed in the town table. This means an index seek is used rather than a table scan, and the index is 100x smaller than it otherwise would be if you had indexed the field in the address table.

    Indexes significantly improve database performance, especially when you're dealing with normalised data.

    Note: some of the above is SQL server terminology, but should equally apply to most other implementations

  • cirrus_cloudcirrus_cloud Member
    edited October 2019

    @jackb said:

    @cirrus_cloud said:

    @jsg said:

    @cirrus_cloud said:
    Since when is a join faster than a flat table where the data already exists? Constraints make your database slower.

    That was not the question.

    I was just responding to you stating that normalization doesn't slow a database down. Normalization is by definition adding more joins, which are going to be slower than data with fewer constraints.

    Index size is a lot smaller with no duplication. Consider you have 1m rows - let's say a list of addresses, each with a column e.g. town. There is 10,000 different towns in your database.

    If duplicated, all of your duplicated towns need indexed on your address table, or you get a table scan when doing select * from address where town='bumfuck' and region='nowhere'

    If you've normalised your data, the 10,000 towns are indexed in the town table. This means an index seek is used rather than a table scan, and the index is 100x smaller than it otherwise would be if you had indexed the field in the address table.

    Indexes significantly improve database performance, especially when you're dealing with normalised data.

    Note: some of the above is SQL server terminology, but should equally apply to most other implementations

    Right, that is an instance where it might make sense to normalize. So yes, my statement of "all joins are bad and slow down queries" isn't quite was I was going for. Normalizing common data in a RDMS makes sense and could make queries faster. But an application using a NoSQL solution could enforce users to select from a list of states / provinces or towns in the first place to keep the indexes in your example slimmer.

    @jsg said:

    @cirrus_cloud said:
    I was just responding to you stating that normalization doesn't slow a database down. Normalization is by definition adding more joins, which are going to be slower than data with fewer constraints.

    No. Normalization usually makes the DB faster due to consistency. This is even more true when there is lots of memory and/or cache (which is usually the case with DBs of significant size).

    There are NoSQL solutions where records aren't stored in a normalized way like with a RDMS. But yes, normalization could definitely make a RDMS faster. My point is joins between tables with lots of data that can't be reduced into something trivial like states / provinces or towns is slow. If there were a "Person" table with 1 million entries and there were a "PersonHistory" table that joined to Person and this table had on average 100 history records per person for a total of 100 million entries, is it faster to query the PersonHistory table for a single person, or is it faster to have a "Person" collection with the history for that person already in the record to retrieve one person's history?

    I would expect the NoSQL solution to perform better for this query.

    But answering the question, I have never seen someone who evaluated if a table met 3NF or BCNF. Has anyone seen someone who made a table for first names or something? So there would be a Person.FirstNameId or something and a join would have to be done to get the first name? That in my mind is the sort of normalization I have not yet witnessed. But, normalizing state by having a Person.StateId field instead of storing a string on every person for their state is common.

    Thanked by 1receivedthanks
  • Here is your answer to all the questions:
    1. When is the process of normalization used?
    Answer: Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables.

    1. What is Boyce and Codd Normal Form?
      Answer: Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as originally defined.

    2. Difference between 1nf, 2nf, and 3nf?
      **Answer: ** 1NF, 2NF, and 3NF are normal forms that are used in relational databases to minimize redundancies in tables. 3NF is considered as a stronger normal form than the 2NF, and it is considered as a stronger normal form than 1NF.

    To through with this post Read More about DBMS Normalization.

Sign In or Register to comment.