Distributed File System (DFS)
Many companies introduce Microsofts Distributed File System
(DFS) nowadays. Usually in migration and standardization
A lot of people think that DFS is just a simple file service,
which requires no great conceptual work. But without planing and a
good concept you hardly reach the desired result, you expect from
You can rely on the experience of our specialists, gained in many
projects with intercontinental DFS installations.
Basics of DFS-N and DFS-R
The product DFS consists of two independent parts.
- DFS-N virtual tree view of
multiple file services
- DFS-R data replication
between multiple file servers.
You can use both products independently. A DFS
tree does not require DFS replication in advance. However, much
more important is that a DFS replication can take place even
without DFS tree.
The DFS tree is the original function of DFS before additional
functions such as replication have been added. The DFS Tree
summarizes distributed file service resources in an abstracted
Planning a Distributed File System tree assumes that the way
people work in the company is known well. And the DFS tree is
visible to the end user. So it will - hopefully positive -
influence his or her daily work.
The intersection between the DFS tree and the file service is
the file system share. A folder
in the DFS tree has one or more folder targets,
which in turn correspond to the shares on file servers. These
shares do not have to be necessarily provided by Windows servers.
Almost every CIFS (common internet file system)
based sharing may be included in the DFS tree. Possible are various
NAS filer, SAMBA shares or CIFS enabled NetWare Services. However,
this applies only for the DFS tree. The Distributed File System
Replication necessarily presupposes a Windows server, as there is a
Microsoft service is installed.
An essential function of the DFS tree is the
switching between DFS folder targets. A Distributed
File System folder can have one share or multiple shares as its
destination (referrals). These target servers may reside in
different locations. The Distributed File System
client can choose the best folder target due to the Sites and
Services information in Active Directory. You can
influence the choice by configuring preferable folder
targets. This setting instructs all clients to use the
same target folder. Only when the primary target is
not reachable, it is changed to the secondary
At this point, we need to talk about the DFS-R
The change between folder targets only makes sense if there are
identical datasets. DFS-R replication can synchronize
data between multiple Windows servers and is based on a highly
efficient WAN-optimized protocol.
Unfortunatelly there are a few limitations for the
dynamic exchange between the target folders. The DFS-R
protocol depends on the available bandwidth and the rate of change
in the file system. Changes are put into a queue
(backlog queue) and processed sequentially. This means
that no replication can be guaranteed within a certain
time. An automatic switch between Folder Targets
can cause clients to access different databases. There
is unfortunately no clear support statement from Microsoft about
it. A few good notes on DFS Replication can be found
In one of our current projects Microsoft has refused to support
a scenario with two active folder targets on two synchronized by
DFS-R servers. Here the inactive folder target has been disabled
and is activated manually in case of failures.
Data synchronization between Windows file servers can be done
with a special transfer protocol: DFS-R. The DFS-R replication
does not require a DFS tree. It is independent of
an existing distributed file system structure. The DFS-R replicated
directory may even contain subdirectories, which are linked in
various DFS tree.
WAN Optimization for DFS-R
DFS-R was highly optimized for the use in WAN environments.
Under TCP RFC a maximum TCP window size of 64
Kbytes was set. In the context of WAN connections, which
usually have higher latency, the 64kByte TCP window is a
Data packets need to be confirmed before further packets can be
sent. At high latency it leads to very low throughput.
With RFC 1323 "TCP Window Scale option" has been
introduced. This allows an increase of the TCP window to
max 16 MB. Using the TCP Windows scale option DFS-R is not
very sensitive to latency anymore. You can still use up to 80% of
the bandwidth at latency times of 500 ms.
The DFS-R protocol contains the compression algorithm
RDC (Remote Differential Compression) in order to
save transmission bandwidth. RDC detects changes in Office
documents and transmits only the changes. Once one of the two
replication partners operates with the Enterprise Edition of
Windows Server, the advanced feature Cross File RDC runs
automatically. If data components already exist in other files on
the target server, Cross File RDC uses these parts locally create
the replicated files.
A replication of ordinary office documents with cross file RDC can
save 50% - 80%.
Example: a Word document
Word saves changes while editing in a temporary file. When you
save your temporary work, a new file is created. For the RDC
protocol it is a new file. Changes can not be detected and the file
is completely transferred.
Only the Cross File RDC, which is active with the Enterprise
Edition of the server can help you here. Cross File RDC compares
the file with alreasy replicated data and assembles the file on the
target server from the replicated parts.
DFS-R is a multi-master replication. Changes can be performed on
all replication partners. Only the file logging status of the files
is not replicated.
If a Word document is changed on both replication partners, the
file with the most recent timestamp becomes the new version
(last writer wins). Therefore, it is not
recommended to work with several replication partners with write
access. A good example of a read-only replication is the Sysvol
directory of an Active Directory domain. Multiple replication
partners are available but only with read access. There can be no
change conflicts, as Sysvol is read-only.
Other examples are HUB / SPOKE implementations, which are used
for data backup. The SPOKE server is enabled for write access, the
HUB server receives changes and has even no active Folder
The most common case is to use a DFS tree to simplify the access
to distributed file services for the end user. The DFS-R
replication is often used for data backups at remote sites that
don't have a local backup infrastructure.
Designing a DFS tree needs some preparation. It is important to
know the work process of the end user works.
A good example is a DFS tree for a company that has several
Without DFS, network drives are connected to the file server at
each site. The necessary UNC paths are difficult to understand for
most end users and can be difficult to remember. A DFS tree
can summarize these UNC paths to an abstract tree view and
make it transparent for the end user. The next design step combines
related data together into virtual nodes.
The marketing department of our example company operates in
various locations: Hamburg, Frankfurt and Munich. A virtual DFS
folder "marketing" has subfolders with the names of locations.
These subfolders point to the file servers at the sites. And that
is really transparent to the user. That way, all data from the
marketing department is nicely summarized in one node .
You should be careful with the naming of the folders. Department
codes are not recommandable, since each reorganization in the
company would result in an adjustment of the DFS tree.
Let us have a look at DFS-R. In our example the IT
infrastructure of the sites should be consolidated. The focus of
the consolidation are the local backup servers. DFS-R will be used
to replicate the data to the central office, where they are
secured. The most important design element is the expected rate of
change here. A good starting point may be an evaluation of latest
daily backups here.
Rough estimate: for the replication of data 500GByte with a daily
change rate of 3% and a saving by RDC of 50% would require a
long-term average bandwidth of 0.7 Mbit/s.
The employees in the marketing department have transparent access
to all data of their department regardless the location they are
situated. In addition, the availability of the file service is
increased. The data is replicated to the headquarter and links are
there activated in the event of an error DFS. Finally, you can save
backup infrastructure at external locations, because the data is
replicated using DFS-R for backup in the headquarter.