Article written by

A passionate pythonist geek looking for problems, to solve :P

44 responses to “Python Arabic Text Reshaper”

  1. Khaled

    بارك الله فيك اخي عبدالله
    أرجو ان يساعدني هذا في العديد من الامور التي لا تدعم اللغة العربية
    أتمنى أن نتواصل على الايميل

  2. Louis

    Hello Abd,
    Thank you for this *extremely* valuable port. Quick question, regarding “single letters”.
    Your algorithm reshapes an isolated letter, such as ض (\u0636) into a shaped one : ﺿ (\uFEBF).
    I don’t think this is correct (?) I consider adding a line of code at the very first line of the function “get_reshaped_word” to exclude 1-letter words. Would it make sense?

    def get_reshaped_word(unshaped_word):
    if len(unshaped_word) == 1: return unshaped_word ### <—– New
    unshaped_word = replace_lam_alef(unshaped_word)
    decomposed_word = DecomposedWord(unshaped_word)

  3. waleed

    بارك الله فيك
    شغالة معايا تمام
    تسلم

  4. Cüneyt Sina Koca

    Thanks for this project, just wanted to inform that my problems regarding the error :

    UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 0-4: ordinal not in range(128)

    is solved by putting the following lines in arabic_reshaper.py :

    import sys
    reload(sys)
    sys.setdefaultencoding(‘utf-8′)

  5. Mohamed LICHOURI

    In first, thanks for sharing this code, but i have a problem with the example that you provided.

    pass_arabic_text_to_render(bidi_text)
    NameError: name ‘pass_arabic_text_to_render’ is not defined

  6. Razan

    Assalam Alykum

    Thank you brother for your great effort and sharing it , Now i can finally use beautiful arabic fonts in Linux for OpenERP arabic Reports.
    which the arabic_reshaper.py was suggested as a part of solution for OpenERP arabic reports in https://github.com/barsi/openerp-rtl

    i have noticed that there is vertical alignment Problem when generating the reports . the data is not vertically well aligned. am just asking is this issue related to the reshaper or to the Reportlab represntation for the arabic font.

    note that before i use the solution in the link [ https://github.com/barsi/openerp-rtl ] some fonts were well aligned but they have the square thing issue , now they are ok but not well aligned vertically !!!

  7. Razan

    Thanks in advance :)

  8. Marek

    Hello,

    I’m using your module together with bidi and it’s clear the arabic text itself is correct and well wrapped whether in console or in text editor. However I need to render Arabic text properly as Paragraph entity in Reportlab, but I’m only facing a problem with word wrap (RTL text is wrapped, but with new line above, not under). How did you passed through this?

    best regards and thanks for your effort
    Marek

  9. egamal

    السلام عليكم
    تظهر مشكله عند طباعه جمله طويله في اكتر من سطر
    https://www.dropbox.com/s/foadw5ykw4n7m8m/Screenshot%20from%202013-11-27%2019%3A39%3A05.png

  10. Amine

    Thank you so much, really a wonderful job, thanks thanks thanks

  11. Josh

    Thank you for this extremely valuable port, which helped generate printed registration rolls for over a million voters in Libya.

    There is a minor bug with the lam-alef glyphs, which appears to be from the original Java package, as I have noted in GitHub issue #2.

    We have also mirrored the RTL branch of reportlab to GitHub, in case others would like to use it without installing mercurial.
    https://github.com/hnec-vr/reportlab-rtl

    1. Marek

      Hi Josh & Abd Allah,

      I am still confused how to break a block of Arabic text into lines – a reportlab’s paragraph. Starting from the right side of a page, the text should run to the left margin and continue on a new line bellow and right. This is not so, when I run the code against the reportlab-rtl branch. In PDF I got this:
      ‫و المتغيرات البينية السنوية و تلك على المدى الطويل إضافة إلى عدم دقة القياسات والحسابات المتبعة‬
      ‫إذا أخذنا بعين الإعتبار طبيعة تقلب المناخ‬
      instead of this:
      إذا أخذنا بعين الإعتبار طبيعة تقلب المناخ و المتغيرات البينية السنوية و تلك على المدى الطويل إضافة إلى عدم دقة القياسات والحسابات المتبعة

      This is the complete code (using reportlab-rtl, python-bidi and Abd Allah’s reshaper):

      #encoding:UTF-8
      from reportlab.lib.pagesizes import A4
      from reportlab.platypus.doctemplate import SimpleDocTemplate
      import arabic_reshaper # Abd Allah’s code
      from bidi.algorithm import get_display # python_bidi
      from reportlab.pdfbase import pdfmetrics
      from reportlab.pdfbase.ttfonts import TTFont
      from reportlab.lib.styles import ParagraphStyle
      from reportlab.lib.enums import TA_RIGHT
      from reportlab.platypus.para import Paragraph

      pdf_file=open(‘disclaimer_arabic.pdf’,’w’)
      pdf_doc = SimpleDocTemplate(pdf_file, pagesize=A4)
      arabic_text = u’إذا أخذنا بعين الإعتبار طبيعة تقلب المناخ و المتغيرات البينية السنوية و تلك على المدى الطويل إضافة إلى عدم دقة القياسات والحسابات المتبعة’
      arabic_text = arabic_reshaper.reshape(arabic_text) # join characters
      arabic_text = get_display(arabic_text) # change orientation by using bidi
      #english_text = ‘If we take into account the nature of climate variability and inter-annual variability and those on long-term addition to the lack of accuracy of measurements and calculations used’
      pdfmetrics.registerFont(TTFont(‘Arabic-normal’, ‘KacstOne.ttf’))
      style = ParagraphStyle(name=’Normal’, fontName=’Arabic-normal’, fontSize=12, leading=12. * 1.2)
      style.alignment=TA_RIGHT
      pdf_doc.build([Paragraph(arabic_text, style)])
      pdf_file.close()

      best
      Marek

      1. Josh

        Marek, in your ParagraphStyle make sure you set wordWrap=’RTL’. Otherwise, reportlab-rtl will act as if it’s LTR text.

        1. Marek

          I have tried it already (it looks very promising :-)), but unfortunately it has no effect, at least with my code…

  12. Yashar Bazli

    Hi Bro.
    How i can to install it ?!
    thank you

  13. Marek

    Hi Josh & Abd Allah,
    I was trying reportlab-rtl branch with reshaper and bidi. Reportlab’s paragraph doesn’t seem to be RTL enabled, because the block of Arabic text is not properly broken into lines automatically. The text running from right side is expected to continue on the new line bellow and right. This is not so, the new line appears above. Is this feature missing in reportlab Paragraph class for RTL text? It works for LTR.
    all the best

  14. Samir Sabri

    Salam,

    I have re-wrote your library to haxe language so that I can port it to php, javascript, c sharp, c++, java, but I didn’t re-write the method get_display
    The question is, why shall I use get_display to reverse the text? I can simply reverse it simply by iterating through the letter via a simple loop, right?

    Also, I have tried it, but I got this result: , so why the ALEF looks like LAM ? please see here:
    https://drive.google.com/file/d/0BwzBTCo1-KJBSHA4c25GRXZzNkE/edit?usp=sharing

  15. Samir Sabri

    Here is the result after using a fully unicode font (Traditional Arabic Font)
    https://drive.google.com/file/d/0BwzBTCo1-KJBUzNnNkQ3Z0lPVUE/edit?usp=sharing

    there is an empty space under the shadda, what do you recommend?

  16. Muhannad

    Man awesome, worked like magic! THUMBS UP

  17. faisal

    اولاً شكراً على المجهود الجيد
    لاكن لم افهم لماذا تحتاج الى مكتبة بايثون بايدي
    تستطيع ان تستغني عنها بإظافة هذا الكود
    RTL = “”
    for letter in reshaped_text:
    RTL= letter + RTL

    في النهاية

    وهذا البرنامج كامل

    # -*- coding: utf-8 -*-

    import arabic_reshaper
    reshaped_text = arabic_reshaper.reshape(u’اللغة العربية رائعة’)
    RTL = “”
    for letter in reshaped_text:
    RTL= letter + RTL
    print RTL

    شكراً لك

Leave a Reply