Distributed File System (DFS)
Many companies introduce Microsofts Distributed File System(DFS) nowadays. Usually in migration and standardizationprojects.
A lot of people think that DFS is just a simple file service,which requires no great conceptual work. But without planing and agood concept you hardly reach the desired result, you expect fromDFS.
You can rely on the experience of our specialists, gained in manyprojects with intercontinental DFS installations.
Basics of DFS-N and DFS-R
The product DFS consists of two independent parts.
- DFS-N virtual tree view ofmultiple file services
- DFS-R data replicationbetween multiple file servers.
You can use both products independently. A DFStree does not require DFS replication in advance. However, muchmore important is that a DFS replication can take place evenwithout DFS tree.
The DFS tree is the original function of DFS before additionalfunctions such as replication have been added. The DFS Treesummarizes distributed file service resources in an abstractedview.
Planning a Distributed File System tree assumes that the waypeople work in the company is known well. And the DFS tree isvisible to the end user. So it will – hopefully positive -influence his or her daily work.
The intersection between the DFS tree and the file service isthe file system share. A folderin the DFS tree has one or more folder targets,which in turn correspond to the shares on file servers. Theseshares do not have to be necessarily provided by Windows servers.Almost every CIFS (common internet file system)based sharing may be included in the DFS tree. Possible are variousNAS filer, SAMBA shares or CIFS enabled NetWare Services. However,this applies only for the DFS tree. The Distributed File SystemReplication necessarily presupposes a Windows server, as there is aMicrosoft service is installed.
An essential function of the DFS tree is theswitching between DFS folder targets. A DistributedFile System folder can have one share or multiple shares as itsdestination (referrals). These target servers may reside indifferent locations. The Distributed File Systemclient can choose the best folder target due to the Sites andServices information in Active Directory. You caninfluence the choice by configuring preferable foldertargets. This setting instructs all clients to use thesame target folder. Only when the primary target isnot reachable, it is changed to the secondarytarget.
At this point, we need to talk about the DFS-Rreplication.
The change between folder targets only makes sense if there areidentical datasets. DFS-R replication can synchronizedata between multiple Windows servers and is based on a highlyefficient WAN-optimized protocol.
Unfortunatelly there are a few limitations for thedynamic exchange between the target folders. The DFS-Rprotocol depends on the available bandwidth and the rate of changein the file system. Changes are put into a queue(backlog queue) and processed sequentially. This meansthat no replication can be guaranteed within a certaintime. An automatic switch between Folder Targetscan cause clients to access different databases. Thereis unfortunately no clear support statement from Microsoft aboutit. A few good notes on DFS Replication can be foundhere:
In one of our current projects Microsoft has refused to supporta scenario with two active folder targets on two synchronized byDFS-R servers. Here the inactive folder target has been disabledand is activated manually in case of failures.
Data synchronization between Windows file servers can be donewith a special transfer protocol: DFS-R. The DFS-R replicationdoes not require a DFS tree. It is independent ofan existing distributed file system structure. The DFS-R replicateddirectory may even contain subdirectories, which are linked invarious DFS tree.
WAN Optimization for DFS-R
DFS-R was highly optimized for the use in WAN environments.Under TCP RFC a maximum TCP window size of 64Kbytes was set. In the context of WAN connections, whichusually have higher latency, the 64kByte TCP window is aproblem.
Data packets need to be confirmed before further packets can besent. At high latency it leads to very low throughput.
With RFC 1323 “TCP Window Scale option” has beenintroduced. This allows an increase of the TCP window tomax 16 MB. Using the TCP Windows scale option DFS-R is notvery sensitive to latency anymore. You can still use up to 80% ofthe bandwidth at latency times of 500 ms.
The DFS-R protocol contains the compression algorithmRDC (Remote Differential Compression) in order tosave transmission bandwidth. RDC detects changes in Officedocuments and transmits only the changes. Once one of the tworeplication partners operates with the Enterprise Edition ofWindows Server, the advanced feature Cross File RDC runsautomatically. If data components already exist in other files onthe target server, Cross File RDC uses these parts locally createthe replicated files.
A replication of ordinary office documents with cross file RDC cansave 50% – 80%.
Example: a Word document
Word saves changes while editing in a temporary file. When yousave your temporary work, a new file is created. For the RDCprotocol it is a new file. Changes can not be detected and the fileis completely transferred.
Only the Cross File RDC, which is active with the EnterpriseEdition of the server can help you here. Cross File RDC comparesthe file with alreasy replicated data and assembles the file on thetarget server from the replicated parts.
DFS-R is a multi-master replication. Changes can be performed onall replication partners. Only the file logging status of the filesis not replicated.
If a Word document is changed on both replication partners, thefile with the most recent timestamp becomes the new version(last writer wins). Therefore, it is notrecommended to work with several replication partners with writeaccess. A good example of a read-only replication is the Sysvoldirectory of an Active Directory domain. Multiple replicationpartners are available but only with read access. There can be nochange conflicts, as Sysvol is read-only.
Other examples are HUB / SPOKE implementations, which are usedfor data backup. The SPOKE server is enabled for write access, theHUB server receives changes and has even no active FolderTarget.
The most common case is to use a DFS tree to simplify the accessto distributed file services for the end user. The DFS-Rreplication is often used for data backups at remote sites thatdon’t have a local backup infrastructure.
Designing a DFS tree needs some preparation. It is important toknow the work process of the end user works.
A good example is a DFS tree for a company that has severalsites:
Without DFS, network drives are connected to the file server ateach site. The necessary UNC paths are difficult to understand formost end users and can be difficult to remember. A DFS treecan summarize these UNC paths to an abstract tree view andmake it transparent for the end user. The next design step combinesrelated data together into virtual nodes.
The marketing department of our example company operates invarious locations: Hamburg, Frankfurt and Munich. A virtual DFSfolder “marketing” has subfolders with the names of locations.These subfolders point to the file servers at the sites. And thatis really transparent to the user. That way, all data from themarketing department is nicely summarized in one node .
You should be careful with the naming of the folders. Departmentcodes are not recommandable, since each reorganization in thecompany would result in an adjustment of the DFS tree.
Let us have a look at DFS-R. In our example the ITinfrastructure of the sites should be consolidated. The focus ofthe consolidation are the local backup servers. DFS-R will be usedto replicate the data to the central office, where they aresecured. The most important design element is the expected rate ofchange here. A good starting point may be an evaluation of latestdaily backups here.
Rough estimate: for the replication of data 500GByte with a dailychange rate of 3% and a saving by RDC of 50% would require along-term average bandwidth of 0.7 Mbit/s.
The employees in the marketing department have transparent accessto all data of their department regardless the location they aresituated. In addition, the availability of the file service isincreased. The data is replicated to the headquarter and links arethere activated in the event of an error DFS. Finally, you can savebackup infrastructure at external locations, because the data isreplicated using DFS-R for backup in the headquarter.