couldn't scan folder with whitespace and special chars like $

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
Hello,

I want to scan drive in my app, but I can't scan $Recyclebin and folders with whitespace in name.

I can't find any resolution for these errors, I tried to find NuGet packages for resolving this issues for me.
I'm using System.IO

Program.cs:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace FileHandler
{
    class Program
    {
        private static void Main(string[] args)
        {
            GetAllFilesFromFolder(@"E:\", true);
        }

        private static List<string> GetAllFilesFromFolder(string root, bool searchSubfolders)
        {
            Queue<string> folders = new Queue<string>();
            List<string> folderCount = new List<string>();
            List<string> files = new List<string>();
            folders.Enqueue(root);
            while (folders.Count != 0){
                string currentFolder = folders.Dequeue();
                try {
                    string[] filesInCurrent = Directory.GetFiles(currentFolder, "*.*", System.IO.SearchOption.TopDirectoryOnly);
                    files.AddRange(filesInCurrent);
                }
                catch
                {
                    //Console.WriteLine("Error: " + currentFolder);
                    // Do Nothing
                }
                try{
                    if (searchSubfolders){
                        string[] foldersInCurrent = Directory.GetDirectories(currentFolder, "*.*", System.IO.SearchOption.TopDirectoryOnly);
                        foreach (string _current in foldersInCurrent){
                            folderCount.AddRange(foldersInCurrent);
                            folders.Enqueue(_current);
                        }
                    }
                }
                catch{
                    Console.WriteLine("Error: " + currentFolder);
                    // Do Nothing
                }
            }
            countFiles = files.Count();

            List<string> distinct = folderCount.Distinct().ToList(); //Remove Duplicates from scan
            Console.WriteLine("Number of folders AFTER: " + distinct.Count);
            Console.WriteLine("Number of files is: " + files.Count());
       
            Console.ReadLine();
            return files;
        }
    }
}
Thanks a lot
 
Last edited:

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
Directory.GetDirectories() and Directory.GetFiles() returns files and folders with spaces in their name. It also finds the "C:\$Recycle.Bin"
C#:
using System;
using System.IO;

public class Test
{
    static void Main()
    {
        foreach(var f in Directory.GetDirectories(@"C:\"))
        {
            Console.WriteLine(f);
        }
    }
}
Capture.png
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
One thing you seem to not understand, and that is the recycle bin is not actually a folder at all. And there is a reason it earned its name as a bin. Microsoft are great for using intended puns. You see, the recycle bin is actually a virtual location on the drive, and the files stored within it are not actually there either. The recycle bin actually allocates space from the hard drive to store your files that you delete in a virtual binary environment. So when you delete a file, its never really deleted from your PC. Instead, its kinda assigned a pattern of building blocks which house the original address of the file, and all of its particulates which makes it the functioning file we see when previewing in windows.

However, any file which is marked "deleted", is simply assigned a flag and a GUID, which sets the file invisible from the operating system for us, but virtually visible to us in its virtual directory until we clear out the bin. The process of clearing the bin is simply the harddrive reallocating these bits that were used to construct your file and they are broken down and fragmented across various areas of the discs allocation area where it will be reused and permanently overwritten. And that is why some data can still be recovered from a hard disc even after something is deleted. The data is only scattered until its called to be reused at a later time. Anyway...

This is what the files look like as I iterate over them in the recycle bin. This is when they've been moved to the virtual folder. I believe the GUID has something to do with its next address on the Harddrive, but I may stand corrected on that one. :
C:\$Recycle.Bin\S-1-5-21-3775181533-2454628510-745607798-1001\$RYWMCQR.xml
C:\$Recycle.Bin\S-1-5-21-3775181533-2454628510-745607798-1001\$RZWSJHT.txt
If you look up the directory info on the location of the recycle bin, you will find out more about it once you step into the debugger.

Screenshot_48.jpg

Take note that its a hidden directory, and it requires elevated permissions because its a system virtual "folder". Lastly, whatever you plan on doing with the files in the recycle bin will likely be very hard to do. Also, look up KNOWNFOLDERID on MSDN, as it clarifies some of what I've said above.

Edit
Fixed a typo
 
Last edited:

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
Directory.GetDirectories() and Directory.GetFiles() returns files and folders with spaces in their name. It also finds the "C:\$Recycle.Bin"
C#:
using System;
using System.IO;

public class Test
{
    static void Main()
    {
        foreach(var f in Directory.GetDirectories(@"C:\"))
        {
            Console.WriteLine(f);
        }
    }
}
View attachment 722
Thank you for your insight, why in my code I can't do the same and my code does following
This is from catch block, All of these folders are under E:\ ,not the "E:\ \" - tested on my code.
wtf.png

Then the same with more folders..
wtf2.png

For instance... as you'd guess, this is in the folder "E:\[Alt+0160]\", but not "E:\[Alt+0160]\[Alt+0160]\" ( sorry for misconduct with whitespace)

When I recreate it in your code, it still is a problem,
E:\test\ \tt (No-BreakSpace is folder itself)
I get output: "E:\test\ \ \tt"
One thing you seem to not understand, and that is the recycle bin is not actually a folder at all. And there is a reason it earned its name as a bin. Microsoft are great for using intended puns. You see, the recycle bin is actually a virtual location on the drive, and the files stored within it are not actually there either. The recycle bin actually allocates space from the hard drive to store your files that you delete in a virtual binary environment. So when you delete a file, its never really deleted from your PC. Instead, its kinda assigned a pattern of building blocks which house the original address of the file, and all of its particulates which makes it the functioning file we see when previewing in windows.

However, any file which is marked "deleted", is simply assigned a flag and a GUID, which sets the file invisible from the operating system for us, but virtually visible to the in its virtual directory until we clear out the bin. The process of clearing the bin is simply the harddrive reallocating these bits that were used to construct your file and they are broken down and fragmented across various areas of the discs allocation area where it will be reused and permanently overwritten. And that is why some data can still be recovered from a hard disc even after something is deleted. The data is only scattered until its called to be reused at a later time. Anyway...

This is what the files look like as I iterate over them in the recycle bin. This is when they've been moved to the virtual folder. I believe the GUID has something to do with its next address on the Harddrive, but I may stand corrected on that one. :

If you look up the directory info on the location of the recycle bin, you will find out more about it once you step into the debugger.

View attachment 723
Take note that its a hidden directory, and it requires elevated permissions because its a system virtual "folder". Lastly, whatever you plan on doing with the files in the recycle bin will likely be very hard to do. Also, look up KNOWNFOLDERID on MSDN, as it clarifies some of what I've said above.
Thank you for your insight as well. I'm aware how recycle bin does work, I was confused with the Non-Break Space, that I thought my code won't take any special character.
So my code won't work at all, although I couldn't find anything suspicious why my Non-Break Space is added between E:\ and folder itself on the drive.
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
This is from catch block
Say what? If that's in the catch block, then that means an exception is being thrown. What is the exception? Perhaps the message text in the exception will tell you what is failing.
 

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
Say what? If that's in the catch block, then that means an exception is being thrown. What is the exception? Perhaps the message text in the exception will tell you what is failing.
E:\test\ \ \tt|System.IO.DirectoryNotFoundException: Could not find a part of the path 'E:\test\ \ \tt'.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileSystemEnumerableIterator`1.CommonInit()
at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)
at System.IO.Directory.GetDirectories(String path, String searchPattern, SearchOption searchOption)
at FileHandler.Program.GetAllFilesFromFolder(String root, Boolean searchSubfolders) in D:\Projects\FileScan\FileHandler\Program.cs:line 34
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
Looks like the .NET Framework has a bug when dealing with the non-breaking space... It's supposedly fixed for .NET Core, but I'm guessing that you are using the .NET 4.x

See the comments below: Get-ChildItem and non-breaking space
 

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
Looks like the .NET Framework has a bug when dealing with the non-breaking space... It's supposedly fixed for .NET Core, but I'm guessing that you are using the .NET 4.x

See the comments below: Get-ChildItem and non-breaking space
You're right.. I tried to use .NET Framework 4.7.2 and it was not okay..
OUTPUT: E:\test\ \ \tt
When I put it into .NET Core
Got following output:
E:\test\ \tt --> Which is the one, I needed it to be.

Thanks a lot!
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
Btw, It must be something in your code that isn't right. I tried it in 4.7.2 using code I wrote myself and experienced no problems. Further tested on 4.8 and also experienced no problem.
 

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
I couldn't identify the problem... My code is all up there, nothing more or less.. If you could post your code here to compare it.. It'd be nice. When I copied it into .NET Core with NO edit it worked.


WELL only edit I did in the project onto .NET Framework was ClickOnce security thing (or so) to bypass problems and scan folders and files with administrative account. But that make no sense to me.
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
to bypass problems and scan folders and files with administrative account.
Administrative rights are required for that as I already said above, and so that's likely why It wasn't working. The bin is owned by the system.

Can you first try checking if it runs in 4.7.2 since that change?

The only difference to you, is that I am using the Shell API from system32 with an interface instead.
 

fihovi

Member
Joined
Nov 26, 2019
Messages
7
Programming Experience
Beginner
E:\ \ \test|System.IO.DirectoryNotFoundException: Could not find a part of the path 'E:\ \ \test'.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileSystemEnumerableIterator`1.CommonInit()
at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)
at System.IO.Directory.GetDirectories(String path, String searchPattern, SearchOption searchOption)
at frameworktest.Program.GetAllFilesFromFolder(String root, Boolean searchSubfolders) in C:\Users\krogi\source\repos\frameworktest\frameworktest\Program.cs:line 55
instead of E:\ \test (\ \ => \[Alt+0160]\)

Error line
Program.cs:
                        string[] foldersInCurrent = Directory.GetDirectories(currentFolder, "*.*", SearchOption.TopDirectoryOnly);
Block of code is available in the main (first) post.

So it's still not working. Plain new project, copied WORKING code from .NET Core to .NET Framework 4.7.2 and commented out Npgsql (PostgreSQL driver for C#), because I do not need or either use database in scanning.
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
Well that's weird. I will need to run your code and debug it myself when I find a little more time later tonight. I'd love to do it now but I'm currently under pressure to finish up on something I was meant to finish for work last week but completely forgot about. Anyway, I will update you once I find my feet and delve into this. At a glance, It does actually looks like you have a problem with how the paths are iterated over, but I will check this for you later. But thanks for trying it. ;)
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
It's not just the code. The environment with the directory whose only character is a non-breaking space is needed to replicate the problem being seen by the OP.
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
I reproduced the problem with .NET Framework 4.8:
C#:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;

class Program
{
    const string RootTestDir = "C:\\TestingNBSPDir";
    static string ChildTestDir = Path.Combine(RootTestDir, "\x00A0\\Leaf");

    static bool EnsureDirectoryExists(string path)
    {
        try
        {
            Console.WriteLine($"Ensuring directory exists: {path}");
            if (!Directory.Exists(path))
                Directory.CreateDirectory(path);
            return true;
        }

        catch (Exception ex)
        {
            Console.Error.WriteLine($"Couldn't access or create {ChildTestDir}");
            Console.Error.WriteLine(ex);
        }
        return false;
    }

    static void RecursivelyListDirectories(string root)
    {
        Console.WriteLine($"Enumerating directories starting at: {root}");
        var queue = new Queue<string>();
        queue.Enqueue(RootTestDir);
        while (queue.Count != 0)
        {
            var current = queue.Dequeue();
            Console.WriteLine($"Working on '{current}'");

            try
            {
                foreach (var dir in Directory.EnumerateDirectories(current, "*", SearchOption.TopDirectoryOnly))
                    queue.Enqueue(dir);
            }

            catch (Exception ex)
            {
                Console.Error.WriteLine(ex);
            }
        }
    }

    static void Main()
    {
        if (EnsureDirectoryExists(ChildTestDir))
            RecursivelyListDirectories(RootTestDir);
    }
}
Results in:
Capture.PNG.png


And .NET Core 3.0 works fine:
Capture2.PNG.png
 
Last edited:

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
Even though I used EnumerateDirectories() above, GetDirectories() also fails/works the same way.
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
Also another data point: Letting .NET Framework 4.8 do the recursion itself works correctly, but that provides cold comfort when you actually want to walk the directories yourself.
C#:
static void RecursivelyListDirectories(string root)
{
    Console.WriteLine($"Enumerating directories starting at: {root}");
    try
    {
        foreach (var dir in Directory.GetDirectories(root, "*", SearchOption.AllDirectories))
            Console.WriteLine($"Working on '{dir}'");
    }

    catch (Exception ex)
    {
        Console.Error.WriteLine(ex);
    }
}
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
There aren't really a lot of options, and If I have read your topic properly, you mentioned Clickonce. You won't be able to use a elevated rights with ClickOnce, so disable ClickOnce. Right click your project properties and hit security and disable the click once security setting. Then add a app.manafest file and edit it so asInvoker is replaced with : <requestedExecutionLevel level="requireAdministrator" uiAccess="false" />. After you decide how to handle this issue with the white space, you may get access denied errors for some of the root files/folders in C:\ or any other drives you search, but we can deal with that next in a separate topic if you need to.

Your only options that I can see are; Prevention of paths using 0160 (Best option)! Can you prevent the folders from being named like that?

Most likely not, and I wouldn't advise renaming any files or folders, unless your application is responsible for creating them. (Assuming your app is some kind of backup program.) You could also record the paths which contain 0160 and correct them in your application without touching the original files. And If your app is meant to backup every file, you could simply replace 0160 and fix the path by calling trim(), and then somehow mark that path as having needing to be renamed with 0160 should you ever need to restore it with its original name.

You could use a dictionary or an external file to keep track of files/folder paths as you perceived them with 0160 spacing, and with the trimmed version (non 0160) version. Assuming your application creates a backup or makes records of all file/folders paths. And if you ever needed to restore a file/directory, you would simply check your file to see if the file/folder needing replacing was ever formally named using 0160 so you could replicate the correct path to restore on the OS. Does that make sense?

Alternative option to renaming is skipping those files and folders. You could adapt your foreach statement with Linq to check if the file or folder contains 0160. What the below will do is check if the path does not contain a path with 0160 and skip it if it does :
C#:
foreach (string _current in foldersInCurrent.Where(s => !s.Contains(@" ")))
Simply use .Where(s => !s.Contains(@" ")) wherever you want to exclude or handle paths containing 0160.
Obviously I am sure you're aware you can also call Trim() on the path to remove any such whitespaces etc. You should also be constructing your foreach with DirInfo /FileInfo to try avoid issues like this. Use :
foreach (DirectoryInfo _current etc... this will mean rewriting some of your code.
Using a simple string is not a good use of object orientated code. You can avail of better functionality from using Dirinfo/FileInfo instead, and work on your paths from there. Hope these suggestions help.
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
Just to add, it is 4AM in the morning, and I am a little overworked after a long day, so I hope that made sense, and if there is something that Isn't clear, I will follow up with you tomorrow with any questions you might have. (if I can wake up.)

Good night.
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
As a quick reminder, the non-breaking space (160) is only an issue when it is the trailing character for .NET Framework 4.6.1 and higher. If it (and other whitespace characters) are not the trailing characters, there is not an issue. Another way to bypass the issue is just to go to .NET Core 3.0.
 
Last edited:
Top Bottom