Data
Persistence
In
data in information systems, we treat data with persistence of data and convert
it into information. The data must be kept for later use in order to maintain
logging status for further processing and data generation. The data can be
stored, read, updated / modified and deleted. At the time of running the
software systems, the data is stored in the main memory, which is volatile. The
data must be stored in nonvolatile memory for persistence. There are two main
ways to store data.
• Files
•
data base
There
are many formats for storing data
-Plain-text,
XML, JSON, tables, text files, images, etc.
What
is data?
-
The data is line data, can be processed (by the application components) and
converted into meaningful information.
-
And there are so many types of data, separated into two parts,
Quantitative
- Digital
Textual
boolean
Data
time
Qualitative
- Visibility
modifiability
usability
The
job data is contained in the computer's memory and it is volatile. Data must be
saved in nonvolatile storage for persistence.
What
is database?
Databases
are created and managed in database servers. SQL is used to process databases.
DDL
–CRUD databases
DML
–CRUD data in databases
Database
types
Hierarchical
databases
Network
databases
Relational
databases
Non-relational
databases (NoSQL)
Object-oriented
databases
Graph
databases
Document
databases
What
is database server?
It
is similar to data house where the website store or maintain their data
and information. A Database Server is a computer in a LAN that
is dedicated to database storage and retrieval. Actually, database server holds
the database management system and the databases.
What
is database management system?
DBMSs are used to connect to the DB servers and manage the DBs and
data in them •PHPMyAdmin
•MySQL
Workbench
Some
other DBMS examples include:
MySQL
SQL
Server
Oracle
dBASE
FoxPro
Data
arrangement
•Data
warehouses
•Big
Data
-Volume
-Variety
-Velocity
Data
Warehouse
This
is mainly an architecture, not a technology. With varieties of SQL based data
sources, extracting data. And help for generating analytic reports.
In
preferences organization wants to know some informed decision, they
prefer to choose data warehousing, as for reports they need reliable
or believable data from the sources
And
handle mainly structured data.
If
organization want to know some informed decision, they prefer to choose data
warehousing.
As
for this kind of report they need reliable or believable data from the sources.
Big
Data
Big
data mainly a technology, which stands on volume, velocity and variety of the
data.
Volume
defines the amount of data coming from different sources
Velocity
refers to the speed of data processing.
And
varieties refer to the number of types of data.
If
organization need to compare with lot of big data, which contain valuable
information and help them to take better decision (like how to lead more
revenue. More profitability, more customers etc.), they obviously preferred big
data approach.
Let’s
analyze how the application components communicate with files and databases.
Files and DBs are external components. They are existing
outside the software system. Software can connect to the files/DBs to perform
CRUD operations on data.
File
–File path
URL
DB –connection string
To
process data in DB
•SQL
statements
•Prepared
statements
•Callable
statements
|
SQL statements
|
Prepared statements
|
Callable statements
|
|
Execute standard SQL
statements from the
application
Statement stmt=
con.createStatement();
stmt.executeUpdate(“update
STUDENT set NAME =”+
name + “ where ID =”+ id + “)”;
|
The query only needs to be parsed (or prepared) once, but can be
executed multiple times with the same or different parameters.
PreparedStatementpstmt= con.prepareStatement("update STUDENT set
NAME = ? where ID = ?");
pstmt.setString(1, "MyName"); pstmt.setInt(2, 111);
pstmt.executeUpdate();
|
Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?,
?)}"); cstmt.execute();
|
The
need for ORM, explaining the development with and without ORM,
There
are different structures for holding data at runtime
•Application
holds data in objects
•Database
uses tables (entities)
But
mapping data in objects to the tables are more easy using Objet Relational
Mapping(ORM)
Mismatches
between relational and object models
•
Granularity : Object model has more granularity than relational model.
•
Subtypes : Subtypes (means inheritance) are not supported by all types of
relational databases.
•
Identity : Like object model, relational model does not expose identity while
writing equality.
•Associations
: Relational models cannot determine multiple relationships while looking into
an object domain model.
•
Data navigation : Data navigation between objects
The
POJO, Java Beans, and JPA, indicating their similarities and differences,
POJO
POJO
stands for Plain Old Java Object. It is an ordinary Java object, not bound by
any special restriction other than those forced by the Java Language
Specification and not requiring any class path. POJOs are used for increasing
the readability and re-usability of a program. POJOs have gained most
acceptance because they are easy to write and understand. They were introduced
in EJB 3.0 by Sun microsystems.
A
POJO should not:
•Extend
pre-specified classes.
•Implement
pre-specified interfaces.
•Contain
pre-specified annotations.
Beans
•
Beans are special type of Pojos. There are some restrictions on POJO to be a
bean
•
All JavaBeans are POJOs but not all POJOs are JavaBeans.
•
Serializable i.e. they should implement Serializable interface. Still some
POJOs who don’t implement Serializable interface are called POJOs because
Serializable is a marker interface and therefore not of much burden.
•
Fields should be private. This is to provide the complete control on fields.
•
Fields should have getters or setters or both.
•
A no-arg constructor should be there in a bean.
•
Fields are accessed only by constructor or getter setters.
Java
Persistence API (JPA)
•An
API/specification for ORM
•Uses
•POJO
classes
•XML
based mapping file (represent the DB)
•A
provider (implementation of JPA)
ORM tools available for different platforms
like java, php, .net etc.,
Java
ORM Tools are,
Hibernate
IBatis/MyBatis
Toplink
PHP
ORM Tools are,
Doctrine
Now
will see that, needs for NoSQL indicating the benefits and also different types
of NoSQL databases,
Not
Only SQL (NOSQL)
•Relational
DBs are good for structured data
•For
semi-structured and un-structured data, some other types of DBs can be used
•Key-value
stores
•Document
databases
•Wide-column
stores
•Graph
stores
Benefits
of NoSQL
•When
compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the
relational model is not designed to address:
•Large
volumes of rapidly changing structured, semi-structured, and unstructured data
NoSQL
DB servers
•MongoDB
•Cassandra
•Redis
•Amazon
DynamoDB
•Hbase
Hadoop is, explaining the core concepts of it,
Hadoop
•
The Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple
programming models.
•It
is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.
•
Rather than rely on hardware to deliver high-availability, the library itself
is designed to detect and handle failures.
Hadoop
core concepts
•
Hadoop Distributed File System (HDFSTM): A distributed file system that
provides high throughput access to application data
•
Hadoop YARN: A framework for job scheduling and cluster resource management.
•
Hadoop Map Reduce: A YARN-based system for parallel processing of large data
sets.
concept of IR, identifying tools for IR,
•Data
in the storages should be fetched, converted into information, and produced for
proper use
•Information
is retrieved via search queries
•
Keyword search
•
Full-text search
•The
output can be
•
Text
•
Multimedia
•The
information retrieval process should be,
•
Fast/performance
•Scalable
•Efficient
•Reliable/Correct
Data
Persistence
In
data in information systems, we treat data with persistence of data and convert
it into information. The data must be kept for later use in order to maintain
logging status for further processing and data generation. The data can be
stored, read, updated / modified and deleted. At the time of running the
software systems, the data is stored in the main memory, which is volatile. The
data must be stored in nonvolatile memory for persistence. There are two main
ways to store data.
• Files
•
data base
There
are many formats for storing data
-Plain-text,
XML, JSON, tables, text files, images, etc.
What
is data?
-
The data is line data, can be processed (by the application components) and
converted into meaningful information.
-
And there are so many types of data, separated into two parts,
Quantitative
- Digital
Textual
boolean
Data
time
Qualitative
- Visibility
modifiability
usability
The
job data is contained in the computer's memory and it is volatile. Data must be
saved in nonvolatile storage for persistence.
What
is database?
Databases
are created and managed in database servers. SQL is used to process databases.
DDL
–CRUD databases
DML
–CRUD data in databases
Database
types
Hierarchical
databases
Network
databases
Relational
databases
Non-relational
databases (NoSQL)
Object-oriented
databases
Graph
databases
Document
databases
What
is database server?
It
is similar to data house where the website store or maintain their data
and information. A Database Server is a computer in a LAN that
is dedicated to database storage and retrieval. Actually, database server holds
the database management system and the databases.
What
is database management system?
DBMSs are used to connect to the DB servers and manage the DBs and
data in them •PHPMyAdmin
•MySQL
Workbench
Some
other DBMS examples include:
MySQL
SQL
Server
Oracle
dBASE
FoxPro
Data
arrangement
•Data
warehouses
•Big
Data
-Volume
-Variety
-Velocity
Data
Warehouse
This
is mainly an architecture, not a technology. With varieties of SQL based data
sources, extracting data. And help for generating analytic reports.
In
preferences organization wants to know some informed decision, they
prefer to choose data warehousing, as for reports they need reliable
or believable data from the sources
And
handle mainly structured data.
If
organization want to know some informed decision, they prefer to choose data
warehousing.
As
for this kind of report they need reliable or believable data from the sources.
Big
Data
Big
data mainly a technology, which stands on volume, velocity and variety of the
data.
Volume
defines the amount of data coming from different sources
Velocity
refers to the speed of data processing.
And
varieties refer to the number of types of data.
If
organization need to compare with lot of big data, which contain valuable
information and help them to take better decision (like how to lead more
revenue. More profitability, more customers etc.), they obviously preferred big
data approach.
Let’s
analyze how the application components communicate with files and databases.
Files and DBs are external components. They are existing
outside the software system. Software can connect to the files/DBs to perform
CRUD operations on data.
File
–File path
URL
DB –connection string
To
process data in DB
•SQL
statements
•Prepared
statements
•Callable
statements
|
SQL statements
|
Prepared statements
|
Callable statements
|
|
Execute standard SQL
statements from the
application
Statement stmt=
con.createStatement();
stmt.executeUpdate(“update
STUDENT set NAME =”+
name + “ where ID =”+ id + “)”;
|
The query only needs to be parsed (or prepared) once, but can be
executed multiple times with the same or different parameters.
PreparedStatementpstmt= con.prepareStatement("update STUDENT set
NAME = ? where ID = ?");
pstmt.setString(1, "MyName"); pstmt.setInt(2, 111);
pstmt.executeUpdate();
|
Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?,
?)}"); cstmt.execute();
|
The
need for ORM, explaining the development with and without ORM,
There
are different structures for holding data at runtime
•Application
holds data in objects
•Database
uses tables (entities)
But
mapping data in objects to the tables are more easy using Objet Relational
Mapping(ORM)
Mismatches
between relational and object models
•
Granularity : Object model has more granularity than relational model.
•
Subtypes : Subtypes (means inheritance) are not supported by all types of
relational databases.
•
Identity : Like object model, relational model does not expose identity while
writing equality.
•Associations
: Relational models cannot determine multiple relationships while looking into
an object domain model.
•
Data navigation : Data navigation between objects
The
POJO, Java Beans, and JPA, indicating their similarities and differences,
POJO
POJO
stands for Plain Old Java Object. It is an ordinary Java object, not bound by
any special restriction other than those forced by the Java Language
Specification and not requiring any class path. POJOs are used for increasing
the readability and re-usability of a program. POJOs have gained most
acceptance because they are easy to write and understand. They were introduced
in EJB 3.0 by Sun microsystems.
A
POJO should not:
•Extend
pre-specified classes.
•Implement
pre-specified interfaces.
•Contain
pre-specified annotations.
Beans
•
Beans are special type of Pojos. There are some restrictions on POJO to be a
bean
•
All JavaBeans are POJOs but not all POJOs are JavaBeans.
•
Serializable i.e. they should implement Serializable interface. Still some
POJOs who don’t implement Serializable interface are called POJOs because
Serializable is a marker interface and therefore not of much burden.
•
Fields should be private. This is to provide the complete control on fields.
•
Fields should have getters or setters or both.
•
A no-arg constructor should be there in a bean.
•
Fields are accessed only by constructor or getter setters.
Java
Persistence API (JPA)
•An
API/specification for ORM
•Uses
•POJO
classes
•XML
based mapping file (represent the DB)
•A
provider (implementation of JPA)
ORM tools available for different platforms
like java, php, .net etc.,
Java
ORM Tools are,
Hibernate
IBatis/MyBatis
Toplink
PHP
ORM Tools are,
Doctrine
Now
will see that, needs for NoSQL indicating the benefits and also different types
of NoSQL databases,
Not
Only SQL (NOSQL)
•Relational
DBs are good for structured data
•For
semi-structured and un-structured data, some other types of DBs can be used
•Key-value
stores
•Document
databases
•Wide-column
stores
•Graph
stores
Benefits
of NoSQL
•When
compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the
relational model is not designed to address:
•Large
volumes of rapidly changing structured, semi-structured, and unstructured data
NoSQL
DB servers
•MongoDB
•Cassandra
•Redis
•Amazon
DynamoDB
•Hbase
Hadoop is, explaining the core concepts of it,
Hadoop
•
The Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple
programming models.
•It
is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.
•
Rather than rely on hardware to deliver high-availability, the library itself
is designed to detect and handle failures.
Hadoop
core concepts
•
Hadoop Distributed File System (HDFSTM): A distributed file system that
provides high throughput access to application data
•
Hadoop YARN: A framework for job scheduling and cluster resource management.
•
Hadoop Map Reduce: A YARN-based system for parallel processing of large data
sets.
concept of IR, identifying tools for IR,
•Data
in the storages should be fetched, converted into information, and produced for
proper use
•Information
is retrieved via search queries
•
Keyword search
•
Full-text search
•The
output can be
•
Text
•
Multimedia
•The
information retrieval process should be,
•
Fast/performance
•Scalable
•Efficient
•Reliable/Correct