Saturday, April 20, 2019

Data Persistence


                                       Data Persistence

In data in information systems, we treat data with persistence of data and convert it into information. The data must be kept for later use in order to maintain logging status for further processing and data generation. The data can be stored, read, updated / modified and deleted. At the time of running the software systems, the data is stored in the main memory, which is volatile. The data must be stored in nonvolatile memory for persistence. There are two main ways to store data.
•  Files
• data base

There are many formats for storing data

-Plain-text, XML, JSON, tables, text files, images, etc.

What is data?

- The data is line data, can be processed (by the application components) and converted into meaningful information.

- And there are so many types of data, separated into two parts,

Quantitative - Digital
Textual
boolean
Data time
Qualitative - Visibility
modifiability
usability

The job data is contained in the computer's memory and it is volatile. Data must be saved in nonvolatile storage for persistence.

What is database?

                Databases are created and managed in database servers. SQL is used to process databases.

DDL –CRUD databases
DML –CRUD data in databases

        Database types

Hierarchical databases
Network databases
Relational databases
Non-relational databases (NoSQL)
Object-oriented databases
Graph databases
Document databases

What is database server?

                It is similar to data house where the website store or maintain their data and information. A Database Server is a computer in a LAN that is dedicated to database storage and retrieval. Actually, database server holds the database management system and the databases.

What is database management system?

     DBMSs are used to connect to the DB servers and manage the DBs and data in them •PHPMyAdmin
•MySQL Workbench

Some other DBMS examples include:
        MySQL
        SQL Server
        Oracle
        dBASE
        FoxPro

 Data arrangement

•Data warehouses
•Big Data
      -Volume
      -Variety
      -Velocity

Data Warehouse

This is mainly an architecture, not a technology. With varieties of SQL based data sources, extracting data. And help for generating analytic reports.
In preferences organization wants to know some informed decision, they prefer  to choose data warehousing, as for reports they need reliable or believable data from the sources
 And handle mainly structured data.
If organization want to know some informed decision, they prefer to choose data warehousing.
As for this kind of report they need reliable or believable data from the sources.

 Big Data

Big data mainly a technology, which stands on volume, velocity and variety of the data.
Volume defines the amount of data coming from different sources
Velocity refers to the speed of data processing.
And varieties refer to the number of types of data.
If organization need to compare with lot of big data, which contain valuable information and help them to take better decision (like how to lead more revenue. More profitability, more customers etc.), they obviously preferred big data approach.

Let’s analyze how the application components communicate with files and databases. Files    and DBs are external components. They are existing outside the software system. Software can connect to the files/DBs to perform CRUD operations on data.

 File –File path
URL DB –connection string

To process data in DB

 •SQL statements
•Prepared statements
•Callable statements


        SQL statements
Prepared statements
Callable statements
      Execute standard SQL                      statements from the application
      Statement stmt=                                 con.createStatement();                     stmt.executeUpdate(“update           STUDENT set NAME =”+              name + “ where ID =”+ id + “)”;
The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.
PreparedStatementpstmt= con.prepareStatement("update STUDENT set NAME = ? where ID = ?");
pstmt.setString(1, "MyName"); pstmt.setInt(2, 111); pstmt.executeUpdate();
Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?, ?)}"); cstmt.execute();

The need for ORM, explaining the development with and without ORM,
There are different structures for holding data at runtime

•Application holds data in objects
•Database uses tables (entities)
But mapping data in objects to the tables are more easy using Objet Relational Mapping(ORM)
Mismatches between relational and object models
• Granularity : Object model has more granularity than relational model.
• Subtypes : Subtypes (means inheritance) are not supported by all types of relational databases.
• Identity : Like object model, relational model does not expose identity while writing equality.
•Associations : Relational models cannot determine multiple relationships while looking into an object domain model.
• Data navigation : Data navigation between objects

The POJO, Java Beans, and JPA, indicating their similarities and differences,
POJO

POJO stands for Plain Old Java Object. It is an ordinary Java object, not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. POJOs are used for increasing the readability and re-usability of a program. POJOs have gained most acceptance because they are easy to write and understand. They were introduced in EJB 3.0 by Sun microsystems.
A POJO should not:

•Extend pre-specified classes.
•Implement pre-specified interfaces.
•Contain pre-specified annotations.

Beans

• Beans are special type of Pojos. There are some restrictions on POJO to be a bean
• All JavaBeans are POJOs but not all POJOs are JavaBeans.
• Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs because Serializable is a marker interface and therefore not of much burden.
• Fields should be private. This is to provide the complete control on fields.
• Fields should have getters or setters or both.
• A no-arg constructor should be there in a bean.
• Fields are accessed only by constructor or getter setters.
Java Persistence API (JPA)
•An API/specification for ORM
•Uses
  •POJO classes
  •XML based mapping file (represent the DB)
  •A provider (implementation of JPA)

 ORM tools available for different platforms like java, php, .net etc.,
Java ORM Tools are,
Hibernate
IBatis/MyBatis
Toplink

PHP ORM Tools are,
Doctrine
Now will see that, needs for NoSQL indicating the benefits and also different types of NoSQL databases,
Not Only SQL (NOSQL)
•Relational DBs are good for structured data
•For semi-structured and un-structured data, some other types of DBs can be used
•Key-value stores
•Document databases
•Wide-column stores
•Graph stores
Benefits of NoSQL
•When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
•Large volumes of rapidly changing structured, semi-structured, and unstructured data
NoSQL DB servers
•MongoDB
•Cassandra
•Redis
•Amazon DynamoDB
•Hbase

 Hadoop is, explaining the core concepts of it,

Hadoop
• The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
•It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
• Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures.

Hadoop core concepts
• Hadoop Distributed File System (HDFSTM): A distributed file system that provides high throughput access to application data
• Hadoop YARN: A framework for job scheduling and cluster resource management.
• Hadoop Map Reduce: A YARN-based system for parallel processing of large data sets.

 concept of IR, identifying tools for IR,

•Data in the storages should be fetched, converted into information, and produced for proper use
•Information is retrieved via search queries
     • Keyword search
      • Full-text search
•The output can be
     • Text
     • Multimedia
•The information retrieval process should be,
      • Fast/performance
      •Scalable
      •Efficient
      •Reliable/Correct

                                       Data Persistence

In data in information systems, we treat data with persistence of data and convert it into information. The data must be kept for later use in order to maintain logging status for further processing and data generation. The data can be stored, read, updated / modified and deleted. At the time of running the software systems, the data is stored in the main memory, which is volatile. The data must be stored in nonvolatile memory for persistence. There are two main ways to store data.
•  Files
• data base

There are many formats for storing data

-Plain-text, XML, JSON, tables, text files, images, etc.

What is data?

- The data is line data, can be processed (by the application components) and converted into meaningful information.

- And there are so many types of data, separated into two parts,

Quantitative - Digital
Textual
boolean
Data time
Qualitative - Visibility
modifiability
usability

The job data is contained in the computer's memory and it is volatile. Data must be saved in nonvolatile storage for persistence.

What is database?

                Databases are created and managed in database servers. SQL is used to process databases.

DDL –CRUD databases
DML –CRUD data in databases

        Database types

Hierarchical databases
Network databases
Relational databases
Non-relational databases (NoSQL)
Object-oriented databases
Graph databases
Document databases

What is database server?

                It is similar to data house where the website store or maintain their data and information. A Database Server is a computer in a LAN that is dedicated to database storage and retrieval. Actually, database server holds the database management system and the databases.

What is database management system?

     DBMSs are used to connect to the DB servers and manage the DBs and data in them •PHPMyAdmin
•MySQL Workbench

Some other DBMS examples include:
        MySQL
        SQL Server
        Oracle
        dBASE
        FoxPro

 Data arrangement

•Data warehouses
•Big Data
      -Volume
      -Variety
      -Velocity

Data Warehouse

This is mainly an architecture, not a technology. With varieties of SQL based data sources, extracting data. And help for generating analytic reports.
In preferences organization wants to know some informed decision, they prefer  to choose data warehousing, as for reports they need reliable or believable data from the sources
 And handle mainly structured data.
If organization want to know some informed decision, they prefer to choose data warehousing.
As for this kind of report they need reliable or believable data from the sources.

 Big Data

Big data mainly a technology, which stands on volume, velocity and variety of the data.
Volume defines the amount of data coming from different sources
Velocity refers to the speed of data processing.
And varieties refer to the number of types of data.
If organization need to compare with lot of big data, which contain valuable information and help them to take better decision (like how to lead more revenue. More profitability, more customers etc.), they obviously preferred big data approach.

Let’s analyze how the application components communicate with files and databases. Files    and DBs are external components. They are existing outside the software system. Software can connect to the files/DBs to perform CRUD operations on data.

 File –File path
URL DB –connection string

To process data in DB

 •SQL statements
•Prepared statements
•Callable statements


        SQL statements
Prepared statements
Callable statements
      Execute standard SQL                      statements from the application
      Statement stmt=                                 con.createStatement();                     stmt.executeUpdate(“update           STUDENT set NAME =”+              name + “ where ID =”+ id + “)”;
The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.
PreparedStatementpstmt= con.prepareStatement("update STUDENT set NAME = ? where ID = ?");
pstmt.setString(1, "MyName"); pstmt.setInt(2, 111); pstmt.executeUpdate();
Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?, ?)}"); cstmt.execute();

The need for ORM, explaining the development with and without ORM,
There are different structures for holding data at runtime

•Application holds data in objects
•Database uses tables (entities)
But mapping data in objects to the tables are more easy using Objet Relational Mapping(ORM)
Mismatches between relational and object models
• Granularity : Object model has more granularity than relational model.
• Subtypes : Subtypes (means inheritance) are not supported by all types of relational databases.
• Identity : Like object model, relational model does not expose identity while writing equality.
•Associations : Relational models cannot determine multiple relationships while looking into an object domain model.
• Data navigation : Data navigation between objects

The POJO, Java Beans, and JPA, indicating their similarities and differences,
POJO

POJO stands for Plain Old Java Object. It is an ordinary Java object, not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. POJOs are used for increasing the readability and re-usability of a program. POJOs have gained most acceptance because they are easy to write and understand. They were introduced in EJB 3.0 by Sun microsystems.
A POJO should not:

•Extend pre-specified classes.
•Implement pre-specified interfaces.
•Contain pre-specified annotations.

Beans

• Beans are special type of Pojos. There are some restrictions on POJO to be a bean
• All JavaBeans are POJOs but not all POJOs are JavaBeans.
• Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs because Serializable is a marker interface and therefore not of much burden.
• Fields should be private. This is to provide the complete control on fields.
• Fields should have getters or setters or both.
• A no-arg constructor should be there in a bean.
• Fields are accessed only by constructor or getter setters.
Java Persistence API (JPA)
•An API/specification for ORM
•Uses
  •POJO classes
  •XML based mapping file (represent the DB)
  •A provider (implementation of JPA)

 ORM tools available for different platforms like java, php, .net etc.,
Java ORM Tools are,
Hibernate
IBatis/MyBatis
Toplink

PHP ORM Tools are,
Doctrine
Now will see that, needs for NoSQL indicating the benefits and also different types of NoSQL databases,
Not Only SQL (NOSQL)
•Relational DBs are good for structured data
•For semi-structured and un-structured data, some other types of DBs can be used
•Key-value stores
•Document databases
•Wide-column stores
•Graph stores
Benefits of NoSQL
•When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
•Large volumes of rapidly changing structured, semi-structured, and unstructured data
NoSQL DB servers
•MongoDB
•Cassandra
•Redis
•Amazon DynamoDB
•Hbase

 Hadoop is, explaining the core concepts of it,

Hadoop
• The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
•It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
• Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures.

Hadoop core concepts
• Hadoop Distributed File System (HDFSTM): A distributed file system that provides high throughput access to application data
• Hadoop YARN: A framework for job scheduling and cluster resource management.
• Hadoop Map Reduce: A YARN-based system for parallel processing of large data sets.

 concept of IR, identifying tools for IR,

•Data in the storages should be fetched, converted into information, and produced for proper use
•Information is retrieved via search queries
     • Keyword search
      • Full-text search
•The output can be
     • Text
     • Multimedia
•The information retrieval process should be,
      • Fast/performance
      •Scalable
      •Efficient
      •Reliable/Correct