Question SQL Injection Detection: help fix this code to work?

Veli

New member
Joined
Feb 15, 2023
Messages
3
Programming Experience
Beginner
SQLInjectionDetection:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Numpy;
using Python.Runtime;
using Keras;
using Keras.Layers;
using Keras.Models;
using Keras.Optimizers;
using Keras.losses;

namespace SQLInjectionDetection
{
    class Program
    {
        static void Main(string[] args)
        {
            // Load CSV file
            var trainData = File.ReadAllLines("tokens.csv")
                .Select(l => l.Split(','))
                .Select(s => new { Token = s[0], Label = int.Parse(s[1]) })
                .ToList();

            // Shuffle trainData
            var random = new Random();
            trainData = trainData.OrderBy(d => random.Next()).ToList();

            // Split trainData into training and validation sets
            var splitIndex = (int)(trainData.Count * 0.8);
            var trainDataSubset = trainData.Take(splitIndex).ToList();
            var testDataSubset = trainData.Skip(splitIndex).ToList();

            // Define vocabulary and tokenize trainData
            var vocabulary = new HashSet<char>(trainDataSubset.SelectMany(d => d.Token).Distinct());
            var tokenToIndex = vocabulary.Select((c, i) => new { Token = c, Index = i }).ToDictionary(t => t.Token, t => t.Index);
            var maxSequenceLength = trainDataSubset.Max(d => d.Token.Length);
            var trainTokenized = Tokenize(trainDataSubset, tokenToIndex, maxSequenceLength);
            var testTokenized = Tokenize(testDataSubset, tokenToIndex, maxSequenceLength);

            // Build RNN model
            using (Py.GIL())
            {
                dynamic keras = Py.Import("keras");
                dynamic np = Py.Import("numpy");
                var input = new Input(shape: 1000);
                var embedding = new Embedding(vocabulary.Count, 32).Apply(input);
                var lstm = new LSTM(32).Apply(embedding);
                var output = new Dense(1, activation: keras.activations.sigmoid).Apply(lstm);
                var model = new Model(inputs: input, outputs: output);
                model.Compile(optimizer: new Adam(), loss: new BinaryCrossentropy(), metrics: new[] { "accuracy" });

                // Train model
                var trainX = trainTokenized.Item1;
                var trainY = trainTokenized.Item2;
                var testX = testTokenized.Item1;
                var testY = testTokenized.Item2;
                model.Fit(trainX, trainY, batchSize: 32, epochs: 10, validationData: (testX, testY));

                // Take user input and make prediction
                Console.Write("Enter user input: ");
                var userInput = Console.ReadLine();
                var inputTokenized = TokenizeInput(userInput, tokenToIndex, maxSequenceLength);
                var prediction = model.Predict(inputTokenized).GetData<float>()[0, 0];
                Console.WriteLine($"Prediction: {(prediction > 0.5 ? "Malicious" : "Safe")} (Score: {prediction:F4})");

                // Evaluate model
                var testMetrics = model.Evaluate(testX, testY);
                Console.WriteLine($"Test loss: {testMetrics[0]:F4}");
                Console.WriteLine($"Test accuracy: {testMetrics[1]:F4}");
            }
        }

        private static (NDarray, NDarray) Tokenize(List<dynamic> data, Dictionary<char, int> tokenToIndex, int maxSequenceLength)
        {
            var numExamples = data.Count;
            var X = np.zeros((numExamples, maxSequenceLength));
            var Y = np.zeros((numExamples, 1));
            for (var i = 0; i < numExamples; i++)
            {
                var tokens = data[i].Token;
                var label = data[i].Label;
                Y[i] = label;
                for (var j = 0; j < tokens.Length; j++)
                {
                    var token = tokens[j];
                    var index = tokenToIndex[token];
                    X[i, j] = index;
                }
            }
            return (X, Y);
        }

    }
}
 
What problem are you running into?
 
Also, perhaps I am missing something, but why train an AI to detect SQL injection? Why not just prevent the SQL injection from happening at all by following best practices to validate your inputs?
 
Also, perhaps I am missing something, but why train an AI to detect SQL injection? Why not just prevent the SQL injection from happening at all by following best practices to validate your inputs?

The ML model is just part of a hybrid technique. The technique also makes use of validation and sanitization routines, and parameterized queries. I'm facing some challenges with code lines 38, 39, 47-51, 58 and 63. I was hoping someone here could help me fix them.
The whole idea is to load a CSV file that contains a dataset of safe and malicious tokens, build and train an RNN model and use the train model to classify user input as safe or malicious.
 
What challenges? What errors are you getting?

Not all of us will have access to the various libraries/assemblies that you are using.
 
Last edited:
The more I think about it, the more I am confused. If you are already using parameterized queries, how else would an attacker get a SQL injection attack to work? It's not like they have control over the rest of your query... unless you are giving the attacker an opportunity to modify the query. But if you are allowing the attacker to modify the query, then you aren't following security best practices.

In other words, the input given by the user on line 62 should always be the hardcoded query you already have in your code. E.g. SELECT LastLoginTime FROM User = @user AND PasswordHash = @hashedPassword. The attacker would only have control over what gets passed into the parameters @user and @hashedPassword.
 
I'm having some difficulty understanding your confusion. I mentioned that the C# code I posted is not working, and I need assistance in fixing it.

While parameterized queries are effective in preventing SQL injection attacks, there are still some limitations to their effectiveness. For instance, not all database drivers support parameterized queries. Additionally, it's possible to write code that doesn't utilize parameterized queries correctly. Although parameterized queries protect against SQL injection attacks on individual query parameters, they don't offer protection against attacks on the query structure itself. Furthermore, SQL injection attacks can occur through other areas of the application, such as stored procedures or dynamic SQL, which may not be safeguarded by parameterized queries. While parameterized queries are an essential tool in mitigating SQL injection attacks, they cannot be relied on as a standalone solution.

Regarding my request for help, I would appreciate anyone who could assist me in resolving my C# code issue. I firmly believe that it's vital to approach SQL injection mitigation in a comprehensive way, and I am committed to adopting a security strategy that includes parameterized queries and other protective measures.

Thank you.
The more I think about it, the more I am confused. If you are already using parameterized queries, how else would an attacker get a SQL injection attack to work? It's not like they have control over the rest of your query... unless you are giving the attacker an opportunity to modify the query. But if you are allowing the attacker to modify the query, then you aren't following security best practices.

In other words, the input given by the user on line 62 should always be the hardcoded query you already have in your code. E.g. SELECT LastLoginTime FROM User = @user AND PasswordHash = @hashedPassword. The attacker would only have control over what gets passed into the parameters @user and @hashedPassword.
 
Okay. But it is still very hard to help you if you do not tell us what errors you are getting is what "challenges" you are facing. It's is like going to the doctor and saying "I don't feel well", but not telling her what your symptoms are.
 
Back
Top Bottom