Header-Flasche

Phone  +49 6126 710 796 0

Distributed File System (DFS)

Many companies introduce Microsofts Distributed File System (DFS) nowadays. Usually in migration and standardization projects.

A lot of people think that DFS is just a simple file service, which requires no great conceptual work. But without planing and a good concept you hardly reach the desired result, you expect from DFS.

You can rely on the experience of our specialists, gained in many projects with intercontinental DFS installations.

 

Basics of DFS-N and DFS-R


The product DFS consists of two independent parts.

  • DFS-N virtual tree view of multiple file services
  • DFS-R data replication between multiple file servers.

You can use both products independently. A DFS tree does not require DFS replication in advance. However, much more important is that a DFS replication can take place even without DFS tree.

 

DFS-N

The DFS tree is the original function of DFS before additional functions such as replication have been added. The DFS Tree summarizes distributed file service resources in an abstracted view.

Planning a Distributed File System tree assumes that the way people work in the company is known well. And the DFS tree is visible to the end user. So it will - hopefully positive - influence his or her daily work.

The intersection between the DFS tree and the file service is the file system share. A folder in the DFS tree has one or more folder targets, which in turn correspond to the shares on file servers. These shares do not have to be necessarily provided by Windows servers. Almost every CIFS (common internet file system) based sharing may be included in the DFS tree. Possible are various NAS filer, SAMBA shares or CIFS enabled NetWare Services. However, this applies only for the DFS tree. The Distributed File System Replication necessarily presupposes a Windows server, as there is a Microsoft service is installed.

DFS-N (Distributed File System)

An essential function of the DFS tree is the switching between DFS folder targets. A Distributed File System folder can have one share or multiple shares as its destination (referrals). These target servers may reside in different locations. The Distributed File System client can choose the best folder target due to the Sites and Services information in Active Directory. You can influence the choice by configuring preferable folder targets. This setting instructs all clients to use the same target folder. Only when the primary target is not reachable, it is changed to the secondary target.

At this point, we need to talk about the DFS-R replication.
The change between folder targets only makes sense if there are identical datasets.
DFS-R replication can synchronize data between multiple Windows servers and is based on a highly efficient WAN-optimized protocol.

Unfortunatelly there are a few limitations for the dynamic exchange between the target folders. The DFS-R protocol depends on the available bandwidth and the rate of change in the file system. Changes are put into a queue (backlog queue) and processed sequentially. This means that no replication can be guaranteed within a certain time. An automatic switch between Folder Targets can cause clients to access different databases. There is unfortunately no clear support statement from Microsoft about it. A few good notes on DFS Replication can be found here:

http://blogs.technet.com/b/askds/archive/2010/09/01/microsoft-s-support-statement-around-replicated-user-profile-data.aspx

In one of our current projects Microsoft has refused to support a scenario with two active folder targets on two synchronized by DFS-R servers. Here the inactive folder target has been disabled and is activated manually in case of failures.

 

DFS-R

Data synchronization between Windows file servers can be done with a special transfer protocol: DFS-R. The DFS-R replication does not require a DFS tree. It is independent of an existing distributed file system structure. The DFS-R replicated directory may even contain subdirectories, which are linked in various DFS tree.

DFS-R (Distributed File System) 

WAN Optimization for DFS-R

DFS-R was highly optimized for the use in WAN environments. Under TCP RFC a maximum TCP window size of 64 Kbytes was set. In the context of WAN connections, which usually have higher latency, the 64kByte TCP window is a problem.

Data packets need to be confirmed before further packets can be sent. At high latency it leads to very low throughput.
With RFC 1323 "TCP Window Scale option" has been introduced. This allows an increase of the TCP window to max 16 MB. Using the TCP Windows scale option DFS-R is not very sensitive to latency anymore. You can still use up to 80% of the bandwidth at latency times of 500 ms.

 

DFS-R Compression

The DFS-R protocol contains the compression algorithm RDC (Remote Differential Compression) in order to save transmission bandwidth. RDC detects changes in Office documents and transmits only the changes. Once one of the two replication partners operates with the Enterprise Edition of Windows Server, the advanced feature Cross File RDC runs automatically. If data components already exist in other files on the target server, Cross File RDC uses these parts locally create the replicated files.
A replication of ordinary office documents with cross file RDC can save 50% - 80%.

Example: a Word document

Word saves changes while editing in a temporary file. When you save your temporary work, a new file is created. For the RDC protocol it is a new file. Changes can not be detected and the file is completely transferred.

Only the Cross File RDC, which is active with the Enterprise Edition of the server can help you here. Cross File RDC compares the file with alreasy replicated data and assembles the file on the target server from the replicated parts.

 

File Logging

DFS-R is a multi-master replication. Changes can be performed on all replication partners. Only the file logging status of the files is not replicated.

If a Word document is changed on both replication partners, the file with the most recent timestamp becomes the new version (last writer wins). Therefore, it is not recommended to work with several replication partners with write access. A good example of a read-only replication is the Sysvol directory of an Active Directory domain. Multiple replication partners are available but only with read access. There can be no change conflicts, as Sysvol is read-only.

Other examples are HUB / SPOKE implementations, which are used for data backup. The SPOKE server is enabled for write access, the HUB server receives changes and has even no active Folder Target.

 

DFS Implementation

The most common case is to use a DFS tree to simplify the access to distributed file services for the end user. The DFS-R replication is often used for data backups at remote sites that don't have a local backup infrastructure.

Designing a DFS tree needs some preparation. It is important to know the work process of the end user works.

A good example is a DFS tree for a company that has several sites:
Without DFS, network drives are connected to the file server at each site. The necessary UNC paths are difficult to understand for most end users and can be difficult to remember. A DFS tree can summarize these UNC paths to an abstract tree view and make it transparent for the end user. The next design step combines related data together into virtual nodes.

The marketing department of our example company operates in various locations: Hamburg, Frankfurt and Munich. A virtual DFS folder "marketing" has subfolders with the names of locations. These subfolders point to the file servers at the sites. And that is really transparent to the user. That way, all data from the marketing department is nicely summarized in one node .

DFS Implementation - DFS Links

You should be careful with the naming of the folders. Department codes are not recommandable, since each reorganization in the company would result in an adjustment of the DFS tree.

Let us have a look at DFS-R. In our example the IT infrastructure of the sites should be consolidated. The focus of the consolidation are the local backup servers. DFS-R will be used to replicate the data to the central office, where they are secured. The most important design element is the expected rate of change here. A good starting point may be an evaluation of latest daily backups here.
Rough estimate: for the replication of data 500GByte with a daily change rate of 3% and a saving by RDC of 50% would require a long-term average bandwidth of 0.7 Mbit/s.

DFS Implementation - Backup Links disabled
The employees in the marketing department have transparent access to all data of their department regardless the location they are situated. In addition, the availability of the file service is increased. The data is replicated to the headquarter and links are there activated in the event of an error DFS. Finally, you can save backup infrastructure at external locations, because the data is replicated using DFS-R for backup in the headquarter.

©2016 FirstAttribute AG - All rights reserved.

Realization Site Point GmbH

Legal notice