Resolved HttpRequest download PDF files

Gekidow

Member
Joined
Apr 20, 2023
Messages
12
Programming Experience
Beginner
Hello, I have a problem, my program reads an ODT file and downloads the links inside it, these links correspond to PDF files available on an intranet. The problem is that as a result of the program, I don't have 128 pdf that are downloaded but I have 128 files (which correspond well in terms of name to what I am supposed to have) without extension, and which are all 18 kb of size. My question is then the following: Why do I not have PDF files in output but files without extension as on the screenshot? Is it a redirect problem ? I also tried with DownloadFile method and i have the same result The System.Diagnostics.Process.Start(link); method works but I can't rename the files because the program only execute them and doesn't downloads them(the browser downloads them). PS : i'm on .NET 3.5 and Visual Studio 2010

résultatfichiers.png


Here is my code :
C#:
using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

namespace PrepareDocForExternalUse
{
    class Program
    {
        static void Main(string[] args)
        {
            // Prompts the user for the absolute path to an ODT file
            Console.WriteLine("Please enter the absolute path of an ODT file:");
            string odtFilePath = Console.ReadLine();

            // Read the contents of the ODT file
            byte[] content = File.ReadAllBytes(odtFilePath);
            MemoryStream ms = new MemoryStream();
            ms.Write(content, 0, content.Length);
            ZipFile zf = new ZipFile(ms);
            zf.UseZip64 = UseZip64.Off;
            zf.IsStreamOwner = false;
            ZipEntry entry = zf.GetEntry("content.xml");
            Stream s = zf.GetInputStream(entry);

            // Convert stream to string
            StreamReader reader = new StreamReader(s);
            string contentXml = reader.ReadToEnd();

            // Search for all links that start with "applnet.test.fr"
            string pattern = @"http://applnet\.test\.fr/GetContenu/Download\.aspx\?p1=.*?;p2=.*?;p5=.*?;p6=NOPUB";
            Regex regex = new Regex(pattern);
            MatchCollection matches = regex.Matches(contentXml);

            Directory.CreateDirectory(Path.GetDirectoryName(odtFilePath));

            // Process each link found
            foreach (Match match in matches)
            {
                string link = match.Value;
                string[] parts = link.Split(new string[] { "aspx?" }, StringSplitOptions.None);
                string queryString = parts[parts.Length - 1];


                // Download the corresponding intranet document
                string folderName = Path.GetFileNameWithoutExtension(odtFilePath);
                string subFolderName = "PJ - " + folderName;
                string fileName = queryString;
                string localFilePath = "C:/PiecesJointes/" + fileName;
                string onlineFilePath = "https://com.test.fr/files/test/test/" + queryString;

                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
                request.AllowAutoRedirect = false;
                request.Method = "GET";
                request.ContentType = "application/pdf";
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                Stream stream = response.GetResponseStream();
                byte[] buffer = new byte[4096];
                int bytesRead = 0;
                FileStream fileStream = new FileStream(localFilePath, FileMode.Create);

                do
                {
                    bytesRead = stream.Read(buffer, 0, buffer.Length);
                    fileStream.Write(buffer, 0, bytesRead);
                } while (bytesRead > 0);

                fileStream.Close();
                response.Close();


                // Replace the link with the path of the downloaded document
                string newLink = localFilePath.Replace("\\", "/");
                contentXml = contentXml.Replace(link, onlineFilePath);
            }

            // Updates the content.xml in the initial ZIP file
            byte[] contentXmlBytes = System.Text.Encoding.UTF8.GetBytes(contentXml);
            ms = new MemoryStream();
            zf.BeginUpdate();

            // Add updated content to ZIP file
            ZipOutputStream zos = new ZipOutputStream(ms);
            zos.UseZip64 = UseZip64.Off;
            zos.IsStreamOwner = false;

            // Add entry for content.xml file
            zos.PutNextEntry(new ZipEntry(entry.Name));
            StreamUtils.Copy(new MemoryStream(contentXmlBytes), zos, new byte[4096]);

            // Processes each entry from the original ODT file
            foreach (ZipEntry origEntry in zf)
            {
                // Ignore the entry for the content.xml file because it has already been added
                if (origEntry.Name == entry.Name) continue;

                // Add entry to new ZIP file
                zos.PutNextEntry(new ZipEntry(origEntry.Name));
                StreamUtils.Copy(zf.GetInputStream(origEntry), zos, new byte[4096]);
            }

            zos.Close();

            // Finish updating the ZIP file
            zf.CommitUpdate();
            zf.Close();


            // Renames and saves the updated ODT file
            Guid g = Guid.NewGuid();
            string updatedFilePath = Path.Combine(Path.GetDirectoryName(odtFilePath), g + "_" + Path.GetFileName(odtFilePath));
            using (FileStream stream = new FileStream(updatedFilePath, FileMode.Create))
            {
                ms.Position = 0;
                ms.WriteTo(stream);
            }
            Console.WriteLine("The ODT file has been successfully updated and saved as: " + updatedFilePath);
            Console.ReadLine();
        }
    }
}
 
I tried this, the program downloads 1 file (another 18kb with the identification code) and does not download the others and the timeout appears

C#:
using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

namespace PrepareDocForExternalUse
{
    class Program
    {

        static void Main(string[] args)
        {
            // Demande à l'utilisateur le chemin absolu d'un fichier ODT
            Console.WriteLine("Merci de renseigner le chemin absolu d'un fichier ODT:");
            string odtFilePath = Console.ReadLine();


            // Lis le contenu du fichier ODT
            byte[] content = File.ReadAllBytes(odtFilePath);
            MemoryStream ms = new MemoryStream();
            ms.Write(content, 0, content.Length);
            ZipFile zf = new ZipFile(ms);
            zf.UseZip64 = UseZip64.Off;
            zf.IsStreamOwner = false;
            ZipEntry entry = zf.GetEntry("content.xml");
            Stream s = zf.GetInputStream(entry);

            // Convertit le stream en string
            StreamReader reader = new StreamReader(s);
            string contentXml = reader.ReadToEnd();

            // Recherche tous les liens qui commencent par "applnet.fiducial.fr"
            string pattern = @"http://applnet\.fiducial\.fr/GetContenu/Download\.aspx\?p1=.*?;p2=.*?;p5=.*?;p6=NOPUB";
            Regex regex = new Regex(pattern);
            MatchCollection matches = regex.Matches(contentXml);


            Directory.CreateDirectory(Path.GetDirectoryName(odtFilePath));

            // Traite chaque lien trouvé
            foreach (Match match in matches)
            {
                string link = match.Value;
                string[] parts = link.Split(new string[] { "aspx?" }, StringSplitOptions.None);
                string queryString = parts[parts.Length - 1];



                // Télécharge le document intranet correspondant
                string folderName = Path.GetFileNameWithoutExtension(odtFilePath);
                string subFolderName = "PJ - " + folderName;
                string fileName = queryString;
                string localFilePath = "C:/PiecesJointes/" + fileName;
                string onlineFilePath = "https://com.fiducial.fr/files/fiducial/banque/" + queryString;

                string uriString = "http://applnet.fiducial.fr/GetContenu/PageLogin.aspx";


                string username = "username";
                string password = "password";

                CookieContainer cookieContainer = new CookieContainer();
                string postData = "username=" + username + "&password=" + password;
                byte[] postDataBytes = Encoding.UTF8.GetBytes(postData);

                HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(uriString);
                httpWebRequest.Method = "POST";
                httpWebRequest.ContentType = "application/x-www-form-urlencoded";
                httpWebRequest.ContentLength = postDataBytes.Length;
                httpWebRequest.CookieContainer = cookieContainer;

                Stream requestStream = httpWebRequest.GetRequestStream();
                requestStream.Write(postDataBytes, 0, postDataBytes.Length);
                requestStream.Close();

                HttpWebResponse httpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();

                CookieCollection cookies = httpWebResponse.Cookies;


                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
                request.Timeout = 30000;
                request.CookieContainer = new CookieContainer();
                request.CookieContainer.Add(cookies);

                request.AllowAutoRedirect = true;
                request.Method = "GET";
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                
                Stream stream = response.GetResponseStream();
                byte[] buffer = new byte[4096];
                int bytesRead = 0;
                FileStream fileStream = new FileStream(localFilePath, FileMode.Create);

                do
                {
                    bytesRead = stream.Read(buffer, 0, buffer.Length);
                    fileStream.Write(buffer, 0, bytesRead);
                } while (bytesRead > 0);


                fileStream.Close();
                response.Close();


                // Remplace le lien par le chemin du document téléchargé
                string newLink = localFilePath.Replace("\\", "/");
                contentXml = contentXml.Replace(link, onlineFilePath);
            }

            // Met à jour le content.xml dans le fichier ZIP initial
            byte[] contentXmlBytes = System.Text.Encoding.UTF8.GetBytes(contentXml);
            ms = new MemoryStream();
            zf.BeginUpdate();



            // Ajoute le contenu mis à jour au fichier ZIP
            ZipOutputStream zos = new ZipOutputStream(ms);
            zos.UseZip64 = UseZip64.Off;
            zos.IsStreamOwner = false;

            // Ajoute l'entrée pour le fichier content.xml
            zos.PutNextEntry(new ZipEntry(entry.Name));
            StreamUtils.Copy(new MemoryStream(contentXmlBytes), zos, new byte[4096]);


            // Traite chaque entrée du fichier ODT original
            foreach (ZipEntry origEntry in zf)
            {
                // Ignore l'entrée pour le fichier content.xml car il a déjà été ajouté
                if (origEntry.Name == entry.Name) continue;

                // Ajoute l'entrée au nouveau fichier ZIP
                zos.PutNextEntry(new ZipEntry(origEntry.Name));
                StreamUtils.Copy(zf.GetInputStream(origEntry), zos, new byte[4096]);
            }

            zos.Close();

            // Termine la mise à jour du fichier ZIP
            zf.CommitUpdate();
            zf.Close();


            // Renomme et enregistre le fichier ODT mis à jour
            Guid g = Guid.NewGuid();
            string updatedFilePath = Path.Combine(Path.GetDirectoryName(odtFilePath), g + "_" + Path.GetFileName(odtFilePath));
            using (FileStream stream = new FileStream(updatedFilePath, FileMode.Create))
            {
                ms.Position = 0;
                ms.WriteTo(stream);
            }
            Console.WriteLine("Le fichier ODT a été mis à jour avec succès et enregistré sous le nom : " + updatedFilePath);
            Console.ReadLine();
        }

    }
}
 
What are the values of link on line 85? On what values does the request time out on?

Not related to your problem, but lines 94-107 could simply be replaced with
C#:
using (var stream = response.GetResponseStream())
using (var fileStream = new FileStream(localFilePath, FileMode.Create))
{
    stream.CopyTo(fileStream);
}

From what I can see, you still aren't doing anything with regards to providing a file extension for the filename, but that was the issue with your original post, but your latest post seems to have encountered a different issue with regards to timing out.
 
- The value of link on line 85 is the string value of the match found by the regular expression pattern on line 47 (it's the files links that i have to download)
- I set the timeout to prevent the previous request HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link); don't get stuck too long
- CopyTo method seems unavailable on my .NET version (3.5)
- Yes indeed, I did not take care of this problem because of the identification, I told myself that it would not be possible to recover the extension of the downloaded file if I cannot already download it
 
- The value of link on line 85 is the string value of the match found by the regular expression pattern on line 47 (it's the files links that i have to download)

Yes, I understand that. But since we don't have access to your initial "content.xml", we have no idea what the actual matches end up looking like. So please show us what those values end up being.
 
- I set the timeout to prevent the previous request HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link); don't get stuck too long

Yes, that was pretty obvious. What URLs does it time out on? Hence the second question asking for what values of link does it time out on?
 
(Link Before the treatment)
http://applnet.fiducial.fr/GetContenu/Download.aspx?p1=35fa172c-2e3a-449a-a156-d40653952325&p2=4&p5=1&p6=NOPUB
(Link after):
https://com.fiducial.fr/files/fiducial/banque/p1=35fa172c-2e3a-449a-a156-d40653952325&p2=4&p5=1&p6=NOPUB
 
It seems blocked on this link http://applnet.fiducial.fr/GetConte...e3a-449a-a156-d40653952325&p2=4&p5=1&p6=NOPUB

(It's the first of the document, so I think they will all be blocked)
 
Last edited by a moderator:
So you are saying that it is timing out downloading this URL:
http://applnet.fiducial.fr/GetContenu/Download.aspx?p1=35fa172c-2e3a-449a-a156-d40653952325&p2=4&p5=1&p6=NOPUB

or on
http://applnet.fiducial.fr/GetConte...e3a-449a-a156-d40653952325&p2=4&p5=1&p6=NOPUB
 
So essentially the only difference between your original post and the new code in post #11 is that you are passing in cookies that you got back from the login page. It's strange that the newer code would time out. I'm not quite sure how to help you. Have you asked the owner of the site what is the best way to get those files? Perhaps they actually have an API instead of you having to essentially do some screen scraping.
 
Back
Top Bottom