Evaluation Phase

Submission Rules

General Rules

  • Submit final outputs through the official CodaBench competition pages.
  • Use the provided IDs exactly as released in the blind test files.
  • Do not manually inspect or infer hidden gold labels during the evaluation phase.
  • Teams may submit to one or more subtasks and tracks.
  • Final rankings are produced only from official submissions made before the deadline.

Subtask 1 Output

  • Provide one dialectal Arabic translation for each required English turn.
  • Preserve the source ID, target dialect or country label, and predicted translation field.
  • Constrained submissions must use only released Alexandria training and development data.
  • Constrained models must not exceed 5B total parameters.
  • Outputs are evaluated using spBLEU and chrF++ overall and per country.

Subtask 2 Output

  • TBD

Subtask 3 Output

  • Provide predicted word-level error spans in the translated text.
  • Assign one error category from the official LQM-inspired typology to each span.
  • Use the span-indexing convention specified in the evaluation scripts.
  • Empty predictions should be represented using the official no-error format.
  • Outputs are ranked by Labeled Span F1 with additional dialect and category breakdowns.

Verification Checklist

  • Run the released validation script before uploading predictions.
  • Confirm UTF-8 encoding and valid JSON/CSV formatting.
  • Check that every test example has exactly the expected number of predictions.
  • Keep a copy of the submitted system outputs and configuration for the paper.
  • Email the organizers before the deadline if the platform rejects a valid file.