Evaluation Phase
Submission Rules
General Rules
- Submit final outputs through the official CodaBench competition pages.
- Use the provided IDs exactly as released in the blind test files.
- Do not manually inspect or infer hidden gold labels during the evaluation phase.
- Teams may submit to one or more subtasks and tracks.
- Final rankings are produced only from official submissions made before the deadline.
Subtask 1 Output
- Provide one dialectal Arabic translation for each required English turn.
- Preserve the source ID, target dialect or country label, and predicted translation field.
- Constrained submissions must use only released Alexandria training and development data.
- Constrained models must not exceed 5B total parameters.
- Outputs are evaluated using spBLEU and chrF++ overall and per country.
Subtask 2 Output
Subtask 3 Output
- Provide predicted word-level error spans in the translated text.
- Assign one error category from the official LQM-inspired typology to each span.
- Use the span-indexing convention specified in the evaluation scripts.
- Empty predictions should be represented using the official no-error format.
- Outputs are ranked by Labeled Span F1 with additional dialect and category breakdowns.
Verification Checklist
- Run the released validation script before uploading predictions.
- Confirm UTF-8 encoding and valid JSON/CSV formatting.
- Check that every test example has exactly the expected number of predictions.
- Keep a copy of the submitted system outputs and configuration for the paper.
- Email the organizers before the deadline if the platform rejects a valid file.