Question How would you solve building a folder synchronization program?

NukDi

Member
Joined
May 18, 2019
Messages
5
Programming Experience
Beginner
I am interesting in setting up a folder synchronization program with C#.

What the program would do is connect to a server and based on synchronization flags in a configuration file synch the local folder with the remote one.

How would you recommend to do this? What technology would be most fit if not using frameworks.

Would you make an object of each file or just an object of the configuration file with all flags, for exempel flags for skipping certain folders, extensions or delete certain files. How would you do the searching in the remote server to not need to search it each time a certain flag should be checked? How would be the best layout of the config file?

I am new at programming that why all these questions, have chosing this kind of project since it will require me too both learn how to set up a connection and to handle files. Seems as an all around project to learn the most :)

Am thankfull for any answer.
 
Is this going to be only 1 desktop machine synching to a single set of folders on the server? Or are you going to support the Dropbox model where there can be multiple desktops/devices updating the server?

If you are targeting Windows only with just one desktop and one server where all updates just come from the single desktop, follow the KISS principle. Clear the Archive bit on the Windows file after uploading. Next time around you scan for changed files, just check to see if the Archive bit is set again. That means the file has changed since you last saw it, and so you should upload it again.

For filtering files, I suggest using regular expressions to determine the filename patterns. For the folder filters, just serialize a tree structure of the folders that should be scanned. The alternative to support Unix style glob patterns would look simpler on the surface, but can potentially be more difficult to implement. It's still doable -- you'll just have to do extra parsing and conversions to convert Unix glob patterns to .NET Framework regular expressions.

Why do you need a config file that the user can play with? If you are writing a Windows application, you can write out your settings any which way you want. It need not be human readable.
 
Is this going to be only 1 desktop machine synching to a single set of folders on the server? Or are you going to support the Dropbox model where there can be multiple desktops/devices updating the server?

If you are targeting Windows only with just one desktop and one server where all updates just come from the single desktop, follow the KISS principle. Clear the Archive bit on the Windows file after uploading. Next time around you scan for changed files, just check to see if the Archive bit is set again. That means the file has changed since you last saw it, and so you should upload it again.

For filtering files, I suggest using regular expressions to determine the filename patterns. For the folder filters, just serialize a tree structure of the folders that should be scanned. The alternative to support Unix style glob patterns would look simpler on the surface, but can potentially be more difficult to implement. It's still doable -- you'll just have to do extra parsing and conversions to convert Unix glob patterns to .NET Framework regular expressions.

Why do you need a config file that the user can play with? If you are writing a Windows application, you can write out your settings any which way you want. It need not be human readable.

Hello Skydiver, thank you for your answer.

It is going to be the contrary, multiple desktops getting their updates from one server, hence the need for configuration file so it can easily be downloaded from the server with the latest settings of flags for what to download and not. The config file doesnt have to be human readable, would work with some kind of XML hierchy.

What i am trying to do is:

1. Create an interface with the solely function to let the user create those configuration files with all the flags
2. Have a console application that can be installed in multiple desktops which will update the local folders whenever run
3. Configuration file will also be in the server and the console application will first download it and then run the synch based on the flags and settings inside of the configuration file
4. Will probably use Task Scheduler to set it to run the console application at startup
At startup this would probably be the steps:
- exe is run from Task Scheduler
- the application sets up connection with server, downloads latest config file if any changes have been made
- parses the config file with all flags
- runs the synch
- closes

Any suggestion what technology to use to achieve this? I am completely new at C# so would be very thankful with all termologies and if possible links to them.

Thank you once again for answering my question.
 
First of all, let me remind you of the risks involved in making such a program. The first risk is computer security -- if a file gets infected or corrupted on one machine, you'll very quickly propagate that infected or corrupted file to another machines. The second risk is potential copyright issues. The act of copying from one machine is considered by the DMCA as distribution. Although we may now have a lot of FOSS software, and some MP3s and MPEGs may have liberal copyrights on them, not all of them do.

Personally, I would use the right tool for the job. Use a Windows service instead of fiddling with trying to play with the Task Scheduler. A Windows service is simply a console program. It can be configured to run at startup. It can run with a specific identity, or as the system. Most people have learned over the years to write their Windows service so that can be run either as a console program or as a service. They usually run as console to do debugging, diagnostics, or configuration, but otherwise normally run as a service. The downside of a Windows service is that it can't have any GUI. You'll have to write yet another app to either talk to the service, or to update its configuration. Again this leads back to why people have learned to write their services to also run on console.

If you are wondering why Google Drive and Dropbox chose not to be a Windows service, but rather as an application, they had an additional requirement: to be able to penetrate the market better, they needed to be able to install into machines as non-administrators since the UAC tended to scare casual users, and on most business machines, people are not admins and they wouldn't be able to install something that needs to be installed as a Windows service. Then add on that casual users won't want to deal with a console UI, or editing config files, and hence you get these apps that take advantage of the Windows Clickonce infrastructure.

I recommend making sure that you always treat dates and times in UTC if you need to compare file timestamps to determine which file is newer. Never use local time. What if the machines are in different time zones?

I recommend thinking long and hard about how you will deal with machines whose clocks drift. Will you run your own NNTP server on your server, or pick one of the globally available public ones to force the various machines clocks to synchronize? What if the user prefers another NNTP server and that server is out of sync with your NNTP server of choice?

Or perhaps synchronized machine clocks may not be the approach. Perhaps using Lamport timestamps would be better?

I recommend thinking long and hard about file conflicts. What happens if two different machines change the same file? What happens if they change the same file at exactly the same time? What happens if the clocks drift?

For the file downloading/uploading, consider that Windows has BITS built in. Although it may add complexity to your project to use it, it could also save your users a lot of bandwidth and battery power since BITS can be very efficient.

When synchronizing files, consider if you will be synchronizing only the main data stream of the file, or if you will also synchronize the other data streams.
 
First of all, let me remind you of the risks involved in making such a program. The first risk is computer security -- if a file gets infected or corrupted on one machine, you'll very quickly propagate that infected or corrupted file to another machines. The second risk is potential copyright issues. The act of copying from one machine is considered by the DMCA as distribution. Although we may now have a lot of FOSS software, and some MP3s and MPEGs may have liberal copyrights on them, not all of them do.

Personally, I would use the right tool for the job. Use a Windows service instead of fiddling with trying to play with the Task Scheduler. A Windows service is simply a console program. It can be configured to run at startup. It can run with a specific identity, or as the system. Most people have learned over the years to write their Windows service so that can be run either as a console program or as a service. They usually run as console to do debugging, diagnostics, or configuration, but otherwise normally run as a service. The downside of a Windows service is that it can't have any GUI. You'll have to write yet another app to either talk to the service, or to update its configuration. Again this leads back to why people have learned to write their services to also run on console.

If you are wondering why Google Drive and Dropbox chose not to be a Windows service, but rather as an application, they had an additional requirement: to be able to penetrate the market better, they needed to be able to install into machines as non-administrators since the UAC tended to scare casual users, and on most business machines, people are not admins and they wouldn't be able to install something that needs to be installed as a Windows service. Then add on that casual users won't want to deal with a console UI, or editing config files, and hence you get these apps that take advantage of the Windows Clickonce infrastructure.

I recommend making sure that you always treat dates and times in UTC if you need to compare file timestamps to determine which file is newer. Never use local time. What if the machines are in different time zones?

I recommend thinking long and hard about how you will deal with machines whose clocks drift. Will you run your own NNTP server on your server, or pick one of the globally available public ones to force the various machines clocks to synchronize? What if the user prefers another NNTP server and that server is out of sync with your NNTP server of choice?

Or perhaps synchronized machine clocks may not be the approach. Perhaps using Lamport timestamps would be better?

I recommend thinking long and hard about file conflicts. What happens if two different machines change the same file? What happens if they change the same file at exactly the same time? What happens if the clocks drift?

For the file downloading/uploading, consider that Windows has BITS built in. Although it may add complexity to your project to use it, it could also save your users a lot of bandwidth and battery power since BITS can be very efficient.

When synchronizing files, consider if you will be synchronizing only the main data stream of the file, or if you will also synchronize the other data streams.

No the machines wont be able to upload to the server, they will only download from the server. File that the admin manually will put there by using another service. So the purpose of this program will be to only sync the local machines, downloading from the server.

Windows services seem to fit better, because the console application that is run on the machines wont have any GUI, it will only run in the background using the config file flags of what folders to download, what file to skip and how to search the server. The config file is also created by the admin using a seperate GUI, and it will also be uploaded to the server by the admin. So the only thing the console application does is download that config file first, then parse it to see what folders to download from the server locally, by checking what files are new or not.

How would you recommend to do the searching so it only is required one time per run. And what would be the proper design of the classes to handle the flags (configs)?
 
So it's one way: from server to local machine. What happens if a local machine updates a file after it has been downloaded? Will the file get updated from the server again within some timespan? Or will it remain whatever that local version is until the next logon or reboot?

Part of me feels like what are trying to setup is some kind of learning lab where you want the machines to be in some kind of set state at the beginning of a session. The typical approach taken is these cases is to use VMs. The VM are configured to be essentially read-only. Yes, the user can write locally, but when the VM is next rebooted, it's back again to its initial state.
 
So it's one way: from server to local machine. What happens if a local machine updates a file after it has been downloaded? Will the file get updated from the server again within some timespan? Or will it remain whatever that local version is until the next logon or reboot?

Part of me feels like what are trying to setup is some kind of learning lab where you want the machines to be in some kind of set state at the beginning of a session. The typical approach taken is these cases is to use VMs. The VM are configured to be essentially read-only. Yes, the user can write locally, but when the VM is next rebooted, it's back again to its initial state.

The files will not be changed, but if for some reason they are changed they are overwritten at the next sync at reboot. No it's not a learning lab, just a sync with the ability to set criterias for what to be downloaded from a set server.

Look at this one for reference ofwhat i am trying to do on my own Synchronicity - A Folder Synchronizing Application
 
You do realize that next to rsync or using robocopy, you could simply use git or mercurial to simply have that web front ends sync to the server, right?

Are you trying to re-invent the wheel because of the challenge, or is this some kind of final course project?
 
You do realize that next to rsync or using robocopy, you could simply use git or mercurial to simply have that web front ends sync to the server, right?

Are you trying to re-invent the wheel because of the challenge, or is this some kind of final course project?

There will be a new position in C# at my job in 3 months, and this is some kind of things they work with so a college in that group recommended this kind fo project as a way to learn the basics I would need if I want to work there.
 
Back
Top Bottom