problem with concatenating number with hebrew

orentu

New member
Joined
Nov 4, 2019
Messages
2
Programming Experience
5-10
hi
when i try to concatenating some str to one
the order is change
for example
str1 =number
str2 = hebrew
str3= number
the str3 concatenating to str1 and not to str2 i guess (if i concatenating with "," is should be ok but i
want fixed length
some explain:
i loop via list dictionary to get length and tag <> (from xml)
after i loop via xml (treat xml like txt)
and get the value between TAG compare list exist and txt/xml exist if match i
get the value between tag and concatenating till end of list dictionary
the problem:::::
the concatenating not by order because the hebrew is RTL i guess
any suggestions?
thanks
C#:
for (int index = 0; index < dict.Count; index++)
{
    var item = dict.ElementAt(index);
    var itemKey = item.Key;
    var itemValue = item.Value;
    // int x = Int32.Parse(itemKey);
    StringBuilder builder = new StringBuilder(itemValue);
    builder.Replace("<", "</");
    int lengthh = builder.Length;
    StringBuilder builderWO = new StringBuilder(itemValue);
    builderWO.Replace(">", "/>");
    foreach (string line in lines)
    {
        int theFirstLen = line.Trim().IndexOf(itemValue);
        int theLastLen = line.Trim().IndexOf(builder.ToString());
        int theLastLenWO_OPEN = line.Trim().IndexOf(builderWO.ToString());
        if (theFirstLen >= 0 && theLastLen > 0 || theLastLenWO_OPEN >= 0)
        {
            if (theLastLenWO_OPEN >= 0) //mean that we need to put spaces only
            {
                SR_LEFT = SR_LEFT + "{" + i + ",-" + itemKey.Substring(0, itemKey.IndexOf(".")) + "}";
                Console.WriteLine(itemKey.Length - itemKey.IndexOf("."));
                SR_RIGHT = SR_RIGHT + new string(' ', Int32.Parse(itemKey.Substring(0, itemKey.IndexOf("."))));
                i += 1;
                break;
            }
            else
            {
                // Console.WriteLine(line.Trim().Substring(theFirstLen + lengthh, theLastLen - theFirstLen - lengthh));
                SR_LEFT = SR_LEFT + "{" + i + ",-" + itemKey.Substring(0, itemKey.IndexOf(".")) + "}";
                SR_RIGHT = SR_RIGHT + line.Trim().Substring(theFirstLen + lengthh, theLastLen - theFirstLen - lengthh);
                i += 1;
                break;
            }
        }
    }
}
using (StreamWriter sw = new StreamWriter("C:\\TST.TEXT", false))
{
    sw.WriteLine(SR_LEFT, SR_RIGHT);
}
 
Last edited by a moderator:
I noticed that some chars slip through the net and pass for non-Hebrew/Arabic, but I am unsure why that happens. For example, taking str2 in the following snipped contains the word כַּף סוֹפִית, but the letter וֹ which is waw in Arabic, and vav in Hebrew seems to slip through the filter. Maybe I am missing additional char ranges for Hebrew Vowels ? :
Are you sure it is being missed? Your HasHebrew() is returning true for it:
Capture.png


Here's the code I used to test your method:
C#:
        static void Main(string[] args)
        {
            var sb = new StringBuilder();
            var str2 = "כַּף סוֹפִית";
            foreach (var ch in str2)
                sb.AppendLine($"{ch}: {BidiHelper.HasHebrew(ch, false, false, CharRange)}");
            MessageBox.Show(sb.ToString(), str2);
        }

Full code below:
C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace TestBidi
{
    /// <summary>
    /// The BidiHelper class is designed to parse strings and detect Hebrew and Arabic
    /// languages and concatenate strings with Hebrew or Arabic characters but keeping
    /// the second item of the tuple
    /// </summary>
    public static class BidiHelper
    {
        /// <summary>
        /// The character range for (CharRange) consists of the Hebrew Block ranges for Hebrew and Arabic letters. See for more info : https://en.m.wikipedia.org/wiki/Hebrew_(Unicode_block)
        /// </summary>
        public static readonly char[] CharRange = { (char)0x0580, (char)0x05ff, (char)0xfb1d, (char)0xfb4f };
        public static readonly string CodeMark = "\u200E";
        /// <summary>
        /// The code mark (CodeMark) is set to use LTR directional order. See for more info : http://unicode.org/reports/tr9/#Directional_Formatting_Codes
        /// </summary>
        /// <param name="charRange">This parameter takes the string values from the CharRange string array.</param>
        /// <param name="codeToPoint">This parameter is responsible for the directional order of the text and takes its value from the CodeMark string
        /// See summery for CodeMark for additional info.</param>
        /// <param name="tupleOfStrings">The tuple takes three parameters and holds the three values we want to concatenate together. These are the
        /// three parameters we used to create the Tuple with above.</param>
        /// <param name="separator">The separator is used to add an optional symbol for separation. To use none, use string.Empty</param>
        /// <returns>The returned value returns a concatenated string of the three values.</returns>
        public static string GetHebrewConcat(char[] charRange, string codeToPoint, Tuple<string, string, string> tupleOfStrings, string separator)
        {
            string[] col_OfConcatValues = { };
            var charArr = tupleOfStrings.Item2.ToCharArray();
            int spins = 0;
            foreach (char eChar in charArr)
            {
                switch (HasHebrew(eChar, false, false, charRange))
                {
                    case true:
                        spins++;
                        if (tupleOfStrings.Item2.Length.Equals(spins))
                        { return string.Join(separator, tupleOfStrings.Item1, string.Concat(codeToPoint, tupleOfStrings.Item2, codeToPoint), tupleOfStrings.Item3); }
                        break;
                    case false:
                        spins++;
                        if (tupleOfStrings.Item2.Length.Equals(spins))
                        { return string.Join(separator, tupleOfStrings.Item1, tupleOfStrings.Item2, tupleOfStrings.Item3); }
                        break;
                }
            }
            return string.Empty;
        }
        /// <summary>
        /// This function checks a string array for a range of chars as well as vowels which belong to the Hebrew or Arabic language.
        /// </summary>
        /// <param name="eChar">This property checks each char from the executing method iterating a chararray.</param>
        /// <param name="hasHebrewChar">This bool is set to true if a Hebrew or Arabic char is detected within the charRange.</param>
        /// <param name="hasOtherHebrewChar">This bool is set to true if a Hebrew or Arabic char is detected within the charRange using a different range limit.</param>
        /// <param name="charRange">This string array contains the different Hebrew/Arabic chrRanges in order to evaluate against for Hebrew/Arabic chars.</param>
        /// <returns></returns>
        private static bool HasHebrew(char eChar, bool hasHebrewChar, bool hasOtherHebrewChar, char[] charRange)
        {
            hasHebrewChar = eChar >= charRange[0] && eChar <= charRange[1];
            hasOtherHebrewChar = eChar >= charRange[2] && eChar <= charRange[3];
            if (hasHebrewChar) { return true; }
            else if (hasOtherHebrewChar) { return true; }
            else { return false; }
        }

        static void Main(string[] args)
        {
            var sb = new StringBuilder();
            var str2 = "כַּף סוֹפִית";
            foreach (var ch in str2)
                sb.AppendLine($"{ch}: {BidiHelper.HasHebrew(ch, false, false, CharRange)}");
            MessageBox.Show(sb.ToString(), str2);
        }
    }
}
 
No I'm not sure. Because I originally had that function wrote completely differently yesterday, and I can't remember how I originally wrote it, so maybe it was a short-circuit issue, who knows. Curious though, that symbol that is false is that a space or what? I've had enough Bidi this week to last me a few months. haha

But I'm glad it worked for you. (y)
 
Yes, it is the space that comes back as false.
 
I know that all that code above was written while you were in a code-compile-debug writing fugue, but the following:
C#:
private static bool HasHebrew(char eChar, bool hasHebrewChar, bool hasOtherHebrewChar, char[] charRange)
{
    hasHebrewChar = eChar >= charRange[0] && eChar <= charRange[1];
    hasOtherHebrewChar = eChar >= charRange[2] && eChar <= charRange[3];
    if (hasHebrewChar) { return true; }
    else if (hasOtherHebrewChar) { return true; }
    else { return false; }
}
should probably be written as:
C#:
private static bool HasHebrew(char eChar, char[] charRange)
{
    return (charRange[0] <= eChar && eChar <= charRange[1]) || (charRange[2] <= eChar && eChar <= charRange[3]);
}

Or better yet:
C#:
static (char Min, char Max)[] HebrewCharRanges = new[] { ('\u0590', '\u05ff'), ('\ufb1d', '\ufb4f') };

static bool IsHebrew(char ch)
    => HebrewCharRanges.Any(r => r.Min <= ch && ch <= r.Max);
 
I know that all that code above was written while you were in a code-compile-debug writing fugue
Nope, that has nothing to do with the way I wrote it. I just wrote it that way as I always do when I know its going on a forum. Sure, I could also have wrote the foreach with a for instead and also used Linq a bit more too and also used inline variable declarations, and whatnot but I always choose to write it in a way that's most simplest for new users to read and follow. But as always, contributions for more advanced users are always welcome. Something for everybody ;)
 
Back
Top Bottom