What is Database Replication?
Database replication is a technique where multiple copies of the database
tables are spread around servers on a network. Even though the word
replication is fairly new, the concept is not. It has its roots in
mainframe era technology. As records are added and modified in one
copy of a database table (a replica), in off-line fashion those changes
will get propagated to the other replicas when the remote networks are
connected together periodically. For many years, this has been considered
an excellent alternative to remote session applications like PcAnywhere™,
that only send screens and keystrokes over a remote connection, and which
require a full time connection while data entry or querying/reporting occurs.
For example, suppose you have a traveling salesman who moves from city
to city and stays in a hotel at night. He carries his notebook computer
with him at all times. He enters orders throughout the day into his
notebook using a program that saves them in a local table. At night,
when he arrives at the hotel, he connects his notebook computer and modem
to the phone line, and calls the home office. The home office then
synchronizes it’s copy of this database table with his, effectively uploading
all new orders the salesman has created from the notebook computer for
the day. At the same time, any orders that were created by other
salesman are downloaded into the notebook computer. Also, any pricing
changes made by home office employees are replicated into local tables
on the notebook computer. Now the salesman is ready to go for the
next day. Note that all new information is entered “off line”.
Another example would be a remote office that must communicate routinely
with the home office. Perhaps this company has several shipping points
around the country. At night, each remote office can “poll” the home
office for the new orders that came in that day. At the end of the
synchronizing process, all sites have a heterogeneous data set that is
fully up-to-date.
A third use for database replication is for data warehousing or datamarts.
These are centralized repositories for databases that make it easy for
any number of individuals to perform complex analysis of vast amounts of
data. CEO’s and board executives might want to “mine” this data to
extract tidbits of information that can help make the company more competitive
or have an edge. This data could easily be processed by mainframes
or supercomputers, or using servers running SQL-based databases.
Database replication comes in many different forms and flavors.
Some types of replication involves a “two-phase commit” (2PC), where all
of the replicas must agree to accept a change to the database before a
transaction will be replicated. 2PC requires that all servers be
live and communicating with each other at all times, and unless a full
time high performance WAN is already in place and has the extra bandwidth
to accommodate the extra activity. It is not usually cost-effective,
as WAN links are expensive, and in most cases slower than LAN links.
Therefore, most organizations choose a “warm” update procedure which updates
records between the branches in periodic batches, rather than a “hot” update,
where all transactions are available immediately after entry. For
the purposes of this document, we will only be discussing this "warm" update
method of database replication from this point forward.
Warm updates (also referred to as "polling") use a replication scheme
where the added and changed records are sent between the replicas periodically.
This precludes the need to have full-time WAN access, if in fact, a WAN
is needed at all. Much of the file transfer can be handled using
modems and dedicated computers at each remote server. Their job is
to read the databases periodically and transfer the replicated set over
the telephone line back and forth to each other and to other peers on other
remote networks. They might be programmed to dial once per hour or
once per day. In some cases, this form of polling is used to update
the primary server only and is referred to as a one-way polling update.
This technique is typically used by small operations that only require
certain data for things like re-ordering inventory, cash settlement, and
data mining.
When it comes down to it, Polling updates for database replication is
popular for one reason -- its cheap. It usually gets the job done
without reliance on expensive ISDN, Frame Relay, or ATM WAN links.
Above, the diagram shows a primary server (usually representing your
main location) and several secondary servers that are remotely located
in other building, cities, states, etc. These secondary servers could be
satellite store locations, branches, or even a salesman with a laptop computer.
There was a time when many multi-store operations and franchises used
polling techniques to update their branches exclusively. In the early
90's and back, there really weren't many options for getting quick data
access from one point to another. Technology only offered a few choices
for real-time connection between computers, and performance was not very
good. Only larger institutions could afford the monthly fees of Frame
Relay or T1 data circuits for a real-time computer network experience.
Many businesses had to settle for standard POTS phone lines that could
only reach speeds of 28.8kbps.
Due to these limitations, it is no wonder that many point-of-sale systems
of yesteryear offered some form of "polling" to gather information about
all of its satellite locations. The retail marketplace has used a
one-way polling technique for years, and has been one of the most cost-effective
ways for the headquarters of a company to keep tabs on inventory and sales
at each remote outlet. Simple retail operations which do not deal
with customer databases, accounts receivable, accounts payable, billing,
CRM, and other office functions at the remote locations have found one-way
polling satisfactory compared to the very expensive and limited alternatives
available during the 70's, 80's, and early 90's.
However, many retail businesses have had to settle on the shortcomings
of polling versus real-time connectivity because of the expense in data
circuits and hardware necessary to connect computers from far distances.
The demand for two-way polling (true database replication) has increased,
and so has the problems that come with such technique. Read on as
we discuss some of the major pitfalls of two-way polling.
What are the Pitfalls of Polling?
As we move deeper into the 21st century, businesses are realizing the
importance of data and its impact on the ability to better serve its customers.
One-way polling is not good enough for many businesses because it does
not give the whole picture of business activity to all of the satellite
locations. Customers have demanded the need for having access to
all information at all locations, and have initially attempted turning
to two-way polling as a solution. It seems logical after all, one-way
polling worked fine, how much more difficult could two-way polling be?
Unfortunately, there is a big difference. Let us investigate further.
The number of satellite locations and amount of daily transactions
could be excessive and sometimes formidable for nightly updates.
With two-way polling, each location has to contact the primary server,
and then be contacted back in order for each satellite office to gather
information from all of the other locations. As an operation grows
from two, five, twenty and even fifty locations, the amount of phone calls
and back-and-forth data traffic can take hours. Additionally, if
something goes wrong, re-synchronizing a location can be tedious and cause
unnecessary down times.
Synchronization errors become more common as you business grows.
You are much more susceptible to hardware failure, power outages, phone
line problems, and human error as you attempt to poll several or more locations.
Because the process happens at night, there is nobody there to ensure that
all locations are properly setup for the update. This means that
you don't find out about the problems until the next morning... right when
you are ready to open up for business. Periodically, two-way polling
causes your operations to come to a halt until the branches re-sync the
database.
Security Issues become a major problem when using this form of database
replication. Imagine, each satellite location having your entire
database of information on its own computer or network. This means
that you must hire IT staff to ensure that each location is properly secured
from unscrupulous personnel attempting to copy your database from an unprotected
server. This also means that every location must protect themselves
with anti-virus software, firewall protection and backup software to ensure
security from outside computer hackers while connected to the internet.
Worse yet, a small satellite office must also be well guarded from potential
criminals that may break into a business after hours and steal your computer
system and all the valuable data along with it.
Software upgrades become a tedious and difficult task in some cases.
As improvements are made to your software system, updating all satellite
offices at the same exact time is sometimes critical for polling to operate
properly. One satellite office using an older software revision could
spell trouble for the entire company.
Database conflict is quite commonplace for two-way polling and usually
requires daily manual adjustments from knowledgeable staff. For
example, suppose that your employee John modifies the zip code of a customer
record at his Tampa office, and Janet modifies that same customer’s phone
number from her Miami office. John replicates his change first and the change
takes. Now Janet replicates her change, BUT SHE STILL HAS THE OLD
ZIP CODE. Alas, a conflict has just sprung up! Janet must not
be allowed to continue her replication of that particular record until
the conflict has been resolved by an administrator! Even more complicated
scenarios arise when dealing with monies received at two different locations
for the same customer account. Different systems handle these conflicts
in different ways, but regardless of the conflict detection method, you
must still deal with the conflict manually. This and many other similar
scenarios happen more often than you would like to imagine.
Critical information lags using polling techniques compared to real-time
connectivity strategies.
Today, it is crucial for multi-location operations to work together as
one entity. In many cases, this means that certain information can't
wait until the next day to appear at other locations. For example,
a customer visits one of your offices within the city, only to find out
that you are out-of-stock on the popular Widget Plus product. With
real-time connectivity, one can easily point the customer to one of its
other locations that have stock without ever having to call the other location.
Databases using polling techniques would only know of inventory levels
from the night before and could mislead your staff of stock status and
other vital information.
Software solutions
that offer polling techniques are often more expensive than investing in
programs that utilize real-time connectivity.
Software programs that use polling techniques are often quite expensive
because of the licensing required for each of your remote locations to
utilize such technology. In the past, software vendors were able
to get away with charging considerable licensing fees and annual support
for applications which could perform data replication because the alternative
real-time strategies were still more expensive in the long run. However,
this has changed dramatically within the last five years. This is
discussed in more detail below.
The Demise
of Polling
There has been many advancements in the computer field that are quickly
changing the way businesses connect computers, gather information, and
communicate with its customers. Some of the biggest factors reshaping
computer networking are: The proliferation of the Internet; increased speed
of desktop computers; dramatic improvements in remote connectivity
software and the availability of inexpensive high-speed data circuits like
ISDN and DSL throughout the country.
Businesses now have real affordable options that can bring locations
together, regardless of distances between each other. The best part
is... once your organization's computer network is all connected in real-time,
you will have the luxury of sharing all business applications installed
and available on your main office computer server (i.e., database programs,
word processors, spreadsheet applications, etc.). In
fact, more than 90% of Fortune 500 companies use real-time connectivity
solutions for its branches to communicate and share data among its organization
worldwide. Microsoft has even recognized the changes in remote computing
and has integrated Terminal Services into its latest OS, just for this
purpose.
There are many different terminologies which all refer to real-time
connectivity. Thin client, remote session, remote office, and server-based
computing are just a few of the terms commonly used when referring to the
operation of remote applications. There is even a new type of business
that has formed because of the growing popularity in this technology.
Application Service Providers (a.k.a. ASP) are companies which can quickly
get you up and running with your online applications and perform administrative
backups, server redundancy, and a list of other computer-related services.
Today, many database replication technologies (including two-way polling
techniques) are quickly being replaced by the convenience, flexibility,
and TCO of real-time based solutions. The following are just some
of the benefits of real-time connectivity as it relates to business software
solutions like ManageMore™:
1. Information is real-time and can be viewed by all locations the instance
you record it. Any number of branches can simultaneously view or
change a customers information without database conflict. A
customer can now drive from one location to another, and know that your
establishment has all the information about his purchases, returns, credits,
etc. that occurred on that same day from any office. You can also
view inventory levels at all locations at any given time.
2. Security issues with regards to your database are minimal.
All information resides at the primary server only. Database backups,
firewall protections, and administration can be handled at the main office
of your business. Remote locations only need a simple workstation
and data circuit for connecting to the primary server.
3. No special software licensing required by the software vendor in
order to connect your locations together (other than typical concurrent
user software licenses).
4. Through the use of the internet, one can connect locations nearly
anywhere in the world without costly long-distance telephone charges.
5. Since all locations are connected real-time, processes like credit
card authorization can be done from the primary server, eliminating the
need for costly POS Terminals and dedicated phone lines at the remote branches.
6. Communication between employees of your remote locations can now
be accomplished easily through internal email and other groupware features,
without the need for time-consuming and disruptive phone calls between
branches.
7. Cost of computer equipment is dramatically reduced because only a
thin client workstation is necessary to connect to the host server.
These trouble-free types of smart terminal PC's have no hard drive, fans,
or moving parts. They also have an extremely small footprint, so
you have plenty of desk space and leg room.
For more information, articles, discussions, and facts about real-time
connectivity technologies, visit some of these web sites:
www.thinplanet.com
www.ncd.com
www.thinclientzone.com
www.thin-world.com
Conclusion
Rapid advancements in remote computing are certainly reshaping how SMEs
interconnect their offices and do business with its customers. Technology
options are plentiful and cheap for communicating between two points at
high-speed -- ISDN, DSL, Cable Modem, Satellite, Line of Site Wireless
connection, and other broadband solutions are making it possible for any
organization to act like a big business. The Internet has also simplified
matters by taking advantage of the worlds largest computer network to connect
computers at the furthest corners of the earth.
However, database replication will not entirely disappear anytime soon.
Some parts of this world are still not as advanced as the United States
when it comes to telecommunication data circuits and must still rely on
slower POTS phone lines for any form of data transmission. There
are also some remote regions where internet service is not a local telephone
call or is metered and not cost-effective for businesses that need connected
all day long.
Please check out ManageMore's requirement
page for more information on products we recommend for true real-time
connectivity solutions.
|