Is anyone here using C# for Data Analysis?

DznFa

Member
Joined
Dec 1, 2019
Messages
5
Programming Experience
1-3
Hello,

I was wondering if anyone here using C#/.NET for data analysis ? i would like to hear their best practices. I know there are some more popular (or better?) options like Python or R out there but i would like to use C# to improve my knowledge on that more instead of learning something new. Currently i'm preparing a report (data reading and making some calculations) with C# but i m not sure if i am doing things correctly.
So far, i created a class based on data column names and created objects and store them in list of that class. Later i will apply the calculations.

Thanks in advance.
 

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
We won't know if you're doing it right. Unless you show us
 

DznFa

Member
Joined
Dec 1, 2019
Messages
5
Programming Experience
1-3
Thanks for answering. Below i am sharing my code. The code is not yet completed and is in progress. I am wondering if this could be a good way of data analysis in C#.
Few questions would be:
  1. What would be the data type to store table like data considering each columns will have different datatype? In my code, i consider each column as a field in a class that later i would create a list of it.
  2. In the code, i was planning to first make the necessary datatype conversions and then store them in DateTime dictionary or int array based on columns. Is it a good idea to split the data like this ?
Code is working without any issue so far even though it does not do much things.

Here is my code:

1) This is the class i use to store all data inside a List<LifeOfSData>. All fields represent the columns' names:

C#:
public class LifeOfSData

    {

        public int ID { get; set; }

        public string Department { get; set; }

        public string Stage { get; set; }

        public string RequestID { get; set; }

        public string Status { get; set; }

        public string AS { get; set; }

        public DateTime ATS { get; set; }

        public DateTime CTS { get; set; }

        public DateTime PTS { get; set; }

        public DateTime ITS { get; set; }

        public DateTime MTS { get; set; }

        public DateTime MPSSTS { get; set; }

        public DateTime DTS { get; set; }

        public DateTime DPSS { get; set; }

        public DateTime VPTS { get; set; }

        public DateTime VPPSSTS { get; set; }

        public DateTime MDTS { get; set; }

        public DateTime DDTS { get; set; }

        public DateTime UnderImplementationTS { get; set; }

        public string Site { get; set; }

        public DateTime Created { get; set; }

    }
2) A class to get data from MS Access DB and store it in dictionary with column names being the keys:

C#:
public static class GetDataInDictionary
    {
        private static string userName;
        private static string dataSourcePath;
        private static string query;
        private static DataSet dataSet;
        private static Dictionary<string, string[]>  dictData = new Dictionary<string, string[]>();

        public static Dictionary<string, string[]> GetData()
        {
            userName = Environment.UserName;
            dataSourcePath = @"Provider = Microsoft.ACE.OLEDB.12.0; Data Source = C:\Users\" + userName + @"PATH TO DB";
            query = "MY QUERY";
            dataSet = new DataSet();

           [B][U] //AccessDatabase is a DLL i made to use it for later projects if the type of data source is Access.[/U][/B]
            dataSet = AccessDatabase.RetrieveDataFromAccessDB.GetData(dataSourcePath, query);
            FillDictionaryFromDataSet();
            return dictData;
        }

        private static void FillDictionaryFromDataSet()
        {
            string[] data = new string[dataSet.Tables[0].Rows.Count];
            string columnName = "";
            long iRow = 0;
            foreach (DataTable table in dataSet.Tables)
            {
                foreach (DataColumn column in table.Columns)
                {
                    columnName = column.ToString();
                    foreach (DataRow row in table.Rows)
                    {
                        data[iRow] = row[columnName].ToString();
                        iRow += 1;
                    }
                    string[] arrData = new string[data.Length];
                    Array.Copy(data, arrData, data.Length);
                    dictData.Add(columnName, arrData);
                    iRow = 0;
                }
            }
        }
    }
3) This class is converting data based on columns into related data types (DateTime, int etc) to be able to make calculations based on business logic:

C#:
public class Analysis
    {
        private Dictionary<string, string[]> _data = new Dictionary<string, string[]>();
        public List<LifeOfSData> LifeOfSDataobjList { get; private set; } = new List<LifeOfSData>();
        private string[] _dateTimeColumns = new string[]
        {
            "ATS",
            "CTS",
            "PTS",
            "MTS",
            "MSS TS",
            "DTS",
            "DPSS",
            "VPTS",
            "Under Implementation TS"
        };
        private Dictionary<string, DateTime[]> _convertedDateTimeColumns = new Dictionary<string, DateTime[]>();
        private string _idColumnName = "ID";
        private int[] _idColumn;

        public Analysis(Dictionary<string, string[]> DictLifeOfSData)
        {
            this._data = DictLifeOfSData;
        }

        public void ConvertDataColumns()
        {
            foreach (KeyValuePair<string,string[]> item in this._data)
            {
                if (Array.IndexOf(_dateTimeColumns, item.Key) > -1) //If bigger then -1, then value is exist in the array
                {
                    //string ColumnName = item.Key
                    //string[] values = DataAnalysisUtility.ConvertStringToDateTime(item.Value)
                    _convertedDateTimeColumns.Add(item.Key, DataAnalysisUtility.ConvertStringToDateTime(item.Value));
                }
                if (item.Key == _idColumnName)
                {
                    this._idColumn = new int[item.Value.Length];
                    int i = 0;

                    foreach (var idNumber in item.Value)
                    {
                        this._idColumn[I] = Convert.ToInt32(idNumber);
                        i += 1;
                    }
                }
            }
        }
[/I]

4) Class to store small functions that will do one thing like conversion of datatypes. I plan to add more functions. For now it is just one function that is converting from string to Datetime:
C#:
public static class DataAnalysisUtility
    {

        public static DateTime[] ConvertStringToDateTime(string[] dateTime)
        {
            DateTime[] returnedArray = new DateTime[dateTime.Length];
            for (int i = 0; i < dateTime.Length; i++)
            {
                returnedArray[I] = Convert.ToDateTime(String.IsNullOrEmpty(dateTime[I]) ? "1/1/1900": dateTime[I]);
            }
            return returnedArray;
        }
    }[I]
[/I][/I][/I][/I]

5) The main class:

C#:
class Program
    {
        static void Main(string[] args)
        {
            Dictionary<string, string[]> dictLifeOfSData = GetDataInDictionary.GetData();
            Analysis analysis = new Analysis(dictLifeOfSData);
            analysis.ConvertDataColumns();
        }
    }[I][I][I][I]
[/I][/I][/I][/I]
 
Last edited:

Sheepings

Senior Programmer
Joined
Sep 5, 2018
Messages
652
Location
UK
Programming Experience
10+
Code tags please?
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
There's a couple of things that has me puzzled:

  1. The data is coming from an Access database. Presumably the columns which contain date/time values would have had columns using a date/time type. Why are you pulling the data as a string, and then converting the string back to date/time?
  2. Why do you need to convert from a DataSet over to dictionary keyed by column names? If it is only to be able to get to each column by name, the DataTable in the DataSet already lets you do that. If you need to "flatten" the DataSet to look like a single table, some simple wrappers to do this on demand should be sufficient.
 

DznFa

Member
Joined
Dec 1, 2019
Messages
5
Programming Experience
1-3
Hi, thanks for reviewing the code and the comment.
Here is my explanation:
  1. I wanted to store the data in string regardless of datatype stored in access so that in the future i would not have to worry about the type changes made in the database. Because this data is actually coming to database from a website that we don't have any control of. So i did not want to trust the types stored in the Access. In addition, i just double checked the DB for some column's type and they were stored as string even though they are date time (but the format of values is as follows: 9/20/2018T12:35).
  2. I wanted to store the data in a dictionary because it is easier for me to follow what is happening to the data when it is changed, manipulated etc (as a result of being a visual person). And, what do you mean by "simple wrappers" ? How can i apply that ?
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
I wanted to store the data in a dictionary because it is easier for me to follow what is happening to the data when it is changed, manipulated etc
Is that why you declared a LifeOfSData class (which is essentially a row by row view instead of a column by column view) and never actually use it?
 

DznFa

Member
Joined
Dec 1, 2019
Messages
5
Programming Experience
1-3
Yes, It is part of the reason. But also i was planing to utilize LINQ to query data object based on business logic.
In addition, class is not yet used because the code is not completed. Today i will create another function to create objects from that class and store them in
C#:
public List<LifeOfSData> LifeOfSDataobjList { get; private set; } = new List<LifeOfSData>();
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
And, what do you mean by "simple wrappers" ? How can i apply that ?
You could define a simple object that takes a DataSet in its constructor and just holds on to a reference to it. Later you could call a method on it to get the column data you are interested in. Something like:
DataSetWrapper:
class DataSetWrapper
{
    DataSet _ds;
    
    public DataSetWrapper(DataSet ds)
    {
        _ds = ds;
    }
    
    public IEnumerable<T> EnumerateDataSetColumnValues<T>(string columnName)
    {
        foreach(DataTable table in _ds.Tables)
        {
            foreach(DataColumn column in table.Columns)
            {
                if (column.ColumnName == columnName)
                {
                    foreach(DataRow row in table.Rows)
                        yield return (T)Convert.ChangeType(row[column], typeof(T));
                }
            }
        }
    }

    public T[] GetDataSetColumnValues<T>(string columnName)
    {
        return EnumerateDataSetColumnValues<T>(columnName).ToArray();
    }
}
 

Skydiver

Well-known member
Joined
Apr 6, 2019
Messages
644
Location
Virginia Beach, VA
Programming Experience
10+
Yes, It is part of the reason. But also i was planing to utilize LINQ to query data object based on business logic.
In addition, class is not yet used because the code is not completed. Today i will create another function to create objects from that class and store them in
C#:
public List<LifeOfSData> LifeOfSDataobjList { get; private set; } = new List<LifeOfSData>();
I'm still not seeing the value of getting columnar data, but maybe that's just because I've always been trained to look at tabular data with a row representing an object, and the columns of the table being the various attributes/properties of the object.

Personally, my recommendation is to load the data from the database directly into objects of type LifeOfSData and insert those objects into a list, or better yet expose them as an IEnumerable<LifeOfSData> which is only loaded on demand. If ever there is a need to just get all the values of a specific column, it's as simple as writing:
C#:
IEnumerable<string> allDepartments = LifeOfSDataobjlist.Select(losd => losd.Department);
IEnumerable<DateTime> allATS = LifeOfSDataobjlist.Select(losd => losd.ATS);
 

DznFa

Member
Joined
Dec 1, 2019
Messages
5
Programming Experience
1-3
Thanks very much for the recommendation. i will try to do that and share the code later on.
 
Top Bottom