AlexandriaX-2026

Motivation

Dialectal Arabic MT needs context, diagnosis, and broader coverage.

Arabic dialectal machine translation remains difficult because of regional linguistic variation, limited high-quality parallel data, non-standardized orthography, and translation choices that depend on social and conversational context.

AlexandriaX-2026 Shared Task evaluates systems that preserve meaning while adapting lexical, morphological, pragmatic, and sociolinguistic choices to the requested Arabic dialect. It also covers MT evaluation with interpretable error detection so teams can see not only which systems score well, but where and why they fail.

Dialogue-aware translation

Source turns are provided with previous context, domain information, and speaker-addressee gender metadata.

Broad dialectal coverage

The MT subtasks span broad Arabic varieties instead of treating dialectal Arabic as one label.

Cross-dialect MT

Short financial-domain queries are translated from one Arabic variety into another.

Diagnostic MT evaluation

Span-level LQM-inspired annotations support error detection and classification.

Dialect Coverage

AlexandriaX centers country-level and sub-dialect variation across Arabic-speaking regions.

Subtask 1: Dialogue Translation, 13 Arabic varieties

Jordanian Lebanese Palestinian Syrian Saudi Omani Yemeni Egyptian Sudanese Libyan Moroccan Mauritanian Tunisian

Subtask 2: Cross-Dialect Arabic MT, 11 Arabic varieties

Modern Standard Arabic Palestinian Moroccan Tunisian Egyptian Algerian Lebanese Yemeni Omani Saudi Libyan

Subtask 3: MT Error Detection, 5 Arabic varieties

Egyptian Emirati Mauritanian Moroccan Palestinian

Shared Task Subtasks

Three complementary routes into dialectal Arabic MT.

Participants can build context-aware translation systems, cross-dialect Arabic MT systems, diagnostic error-analysis systems, or any combination of the three.

1

Context-Aware English-to-Dialectal Arabic Dialogue Translation

Constrained: provided data only, <=5B parameters Unconstrained: external resources allowed, no restrictions on model usage

Objective

Participants need to build a machine translation system that translates selected English dialogue turns into a specified dialectal Arabic variety. The system must use the provided context, such as previous dialogue turns, target country/dialect label, domain, speaker/addressee gender, and persona information when available. The output should preserve the original meaning while sounding natural and appropriate in the target dialect, including correct lexical, morphological, pragmatic, and sociolinguistic choices.

Participants may enter either or both tracks:

Constrained track: use only the provided Alexandria dataset, with models up to 5B parameters.
Unconstrained track: allowed to use external data as well, pretrained models, and larger models without a parameter limit.

Data

Training and development data for model building.
Private test set with hidden gold translations.

Evaluation

spBLEU is the primary metric; chrF++ is also reported.
Overall average across all countries.
Constrained and unconstrained submissions will be separated later after participants submit their system descriptions. For now, both tracks will submit to the same CodaBench competition.

Data Split Sizes (in turns)

Split	EG	JO	LB	LY	MA	MR	OM	PS	SA	SD	SY	TN	YE
train	3108	5501	8906	0	2573	5515	6280	14933	8470	0	6071	2034	3089
dev	1113	1113	1118	0	1110	1114	1109	1110	1110	0	1119	1116	1118
Public Test	1118	1107	1106	1109	1115	1112	1118	1109	1113	1106	1114	1109	1106
Private Test (Test Phase)	1113	1109	1110	1309	1111	1119	1107	1111	1114	915	1114	1114	1113

Libya and Sudan do not include train or development sets, but they are still included in the test phase to evaluate how well participant systems generalize to these dialects.

Subtask 1 currently has two test splits: the public test set and the private test set. The public test set is already available on Hugging Face and can be used by participants to evaluate current systems. The private test set is reserved for the test phase of the subtask and will be used to evaluate final participant systems; evaluation on this split will be available only through CodaBench.

Resources

2

Cross-Dialect Arabic Machine Translation

Arabic-to-Arabic MT Financial-domain queries

Objective

Participants need to build a system that translates text from one Arabic variety into another Arabic variety. The input is a short financial-domain query written in a source Arabic variety, plus the required target dialect or variety. The system must generate a translation that keeps the same meaning, uses appropriate financial terminology, and sounds fluent and natural in the target dialect.

The task will test both familiar dialect pairs and unseen dialect-pair settings to measure robustness across Arabic varieties.

Data

Short financial-domain queries in source Arabic varieties.
Target dialect or variety labels for each input.

Evaluation

Target-side translation quality and dialectal naturalness.
Seen and unseen dialect-pair settings for robustness analysis.
Official metrics and ranking details will be released with the evaluation scripts.

Resources

3

Dialectal Arabic MT Error Detection and Classification

Span prediction Error classification LQM-inspired typology

Objective

Participants need to build an evaluation system that analyzes machine translation outputs and identifies translation errors. The system must do two things: first, predict the exact word-level span in the translated text where an error occurs; second, assign an error category to each span using the provided LQM-inspired typology. Error classification categories are graphetics, morphosyntax, orthography_writing_conventions, pragmatics, semantics, and sociolinguistics.

Data

Training, development, and test data for MT error detection and classification.
Gold span and error-class labels are withheld for official evaluation.

Evaluation

Exact Match F1.
Overlap F1.
Error Class Macro-F1.
Overall Score: average of Exact Match F1 and Error Class Macro-F1.

Data Distribution

Split	Overall	ENG_EGY	ENG_MAU	ENG_MOR	ENG_PAL	ENG_UAE
Train	1125	263	244	169	218	231
Dev	138	32	30	21	27	28
Test	145	34	31	22	28	30

Resources

Data Examples

Data Examples from AlexandriaX-2026 subtasks.

Subtask 1

Context-Aware Dialogue Translation

Domain: Agriculture and farming Dialect: Moroccan Standard Darija Dialect

English Conversation

Female -> Male · Wholesale Buyer · Turn 1

For me, Hajj, this is more than just paper. I want to build a long-term partnership with you, one that is based on trust.

Male -> Female · Farmer · Turn 2

You speak the truth, my daughter. A contract is just ink, but trust is what matters. May God bless our work together.

Dialectal Translation

Female -> Male · Wholesale Buyer · Turn 1

بنسبة ليا، لحاج، هادشي كتر من غير ورقة. بغيت نبني معاك شراكة long-term، تكون مبنية على ثقة.

Male -> Female · Farmer · Turn 2

عندك لحق ا بنتي. ل contrat غير مداد، ولكن ثقة هي لي مهمة. الله يبارك فخدمتنا.

Subtask 2

Cross-Dialect Arabic Machine Translation

Domain: Financial services Intent: card arrival Source variety: Modern Standard Arabic

MSA Source

ماذا افعل إن لم أستلم بطاقتي الجديدة؟

Moroccan Translation

شنو ندير إلا ماوصلتنيش لاكارط جديدة ديالي؟

Saudi Translation

وش أسوي إذا ما استلمت بطاقتي الجديدة؟

Tunisian Translation

آش نعمل كان ما جاتنيش كارطتي الجديدة؟

Subtask 3

MT Error Detection and Classification

Direction: English to Moroccan dialect Output: highlighted span | error type

Source

Okay, I'll close my mouth and sew it up with needle and thread. I will not speak at all.

MT Prediction with Tagged Errors

Important Dates

AlexandriaX-2026 shared-task timeline.

Task Launch

Release task website, documentation, and registration form.

May 16, 2026

Training and Development Release

Release task training/development data, baseline code, and evaluation scripts.

June 1, 2026

Registration Deadline and Blind Test Release

Teams register and receive blind test inputs for final evaluation.

July 20, 2026

Final System Output Deadline

Submission deadline for final predictions.

July 25, 2026

Final Results Released

Official rankings and diagnostic breakdowns shared with participants.

July 30, 2026

System Description Papers Due

Camera-ready system descriptions are due.

August 22, 2026

Shared Task Overview Paper Due

Overview paper describing the task, data, evaluation, and official results is due.

September 1, 2026

Conference Camera-ready Deadline

Final camera-ready materials due for the conference proceedings.

September 10, 2026

ArabicNLP / EMNLP Presentation Period

Shared task overview and participant systems presented during the conference period.

October 24-29, 2026

Participation

Multiple entry points for teams with different resources.

Teams may participate in any subset of the three subtasks.

Registration Process

Register as an individual or team.
Select the subtask and track you plan to participate in.
Join the Google Group for updates, clarifications, and evaluation announcements.
Register on the CodaBench competition page for official submissions.

Submission Requirements

Submit system outputs for the blind test set in the specified format.
Use only permitted data and model sizes for the constrained MT track.
Include enough system details for result verification.
Prepare a system description paper following ArabicNLP instructions.

Contact

Questions and announcements

Organizer Contact

For task questions, registration issues, or release coordination, contact the organizing team.

alexandriax2026@gmail.com

Updates

Join the Google Group for task updates, clarifications, and evaluation announcements.

alexandriax-2026 Google Group

Organizing Committee

AlexandriaX-2026 organizers

Abdellah El MekkiThe University of British Columbia (Canada)
AbdelRahim A. ElmadanyThe University of British Columbia (Canada)
Samar M. MagdyThe University of British Columbia (Canada)
Saad EzziniKing Fahd University of Petroleum and Minerals (Saudi Arabia)
Mo El-HajLancaster University (United Kingdom) and VinUniversity (Vietnam)
Mustafa JarrarHamad Bin Khalifa University (Qatar)
Samhaa El-BeltagyNewgiza University (Egypt)
Mourad AbbasHigh Council of Arabic (Algeria)
Fadi ZaraketAmerican University of Beirut (Lebanon)
Salim Al MandhariLancaster University (United Kingdom)
Zaid AlyafeaiKing Abdullah University of Science and Technology (Saudi Arabia)
Bernard GhanemKing Abdullah University of Science and Technology (Saudi Arabia)
Muhammad Abdul-MageedThe University of British Columbia (Canada)

AlexandriaX‑2026

Context-Aware Dialectal Arabic MT and MT Evaluation

Registration is open for AlexandriaX-2026.

Dialectal Arabic MT needs context, diagnosis, and broader coverage.

Dialogue-aware translation

Broad dialectal coverage

Cross-dialect MT

Diagnostic MT evaluation

Three complementary routes into dialectal Arabic MT.

Context-Aware English-to-Dialectal Arabic Dialogue Translation

Objective

Data

Evaluation

Data Split Sizes (in turns)

Resources

Cross-Dialect Arabic Machine Translation

Objective

Data

Evaluation

Resources

Dialectal Arabic MT Error Detection and Classification

Objective

Data

Evaluation

Data Distribution

Resources

Data Examples from AlexandriaX-2026 subtasks.

Context-Aware Dialogue Translation

English Conversation

Dialectal Translation

Cross-Dialect Arabic Machine Translation

MSA Source

Moroccan Translation

Saudi Translation

Tunisian Translation

MT Error Detection and Classification

Source

MT Prediction with Tagged Errors

AlexandriaX-2026 shared-task timeline.

Task Launch

Training and Development Release

Registration Deadline and Blind Test Release

Final System Output Deadline

Final Results Released

System Description Papers Due

Shared Task Overview Paper Due

Conference Camera-ready Deadline

ArabicNLP / EMNLP Presentation Period

Multiple entry points for teams with different resources.

Registration Process

Submission Requirements

Questions and announcements

Organizer Contact

Updates

AlexandriaX-2026 organizers