The Tajik language, the official and national language of Tajikistan, serves as a cultural cornerstone for its speakers. Beyond Tajikistan’s borders, it is spoken in parts of Uzbekistan, Kazakhstan, and Kyrgyzstan, as well as among Tajik communities in Afghanistan and China.
Tajik belongs to the southwestern branch of the Iranian languages, a subgroup of the Indo-European family. It evolved from Old Persian and various Eastern Iranian dialects that were prevalent in the 9th and 10th centuries across Mawarannahr and Khorasan. In medieval texts from the 9th to the 11th centuries, the language was known as «Farsi Dari,» «Farsi,» or simply «Dari.» It serves as the historical foundation for three modern literary languages: Tajik (in Tajikistan), Persian (in Iran), and Dari (in Afghanistan).
The development of the Persian Dari language can be divided into three main stages:
Old Persian (circa 9th century BCE — 4th century BCE): This early form of Persian was used during the Achaemenid Empire and was recorded using cuneiform script. These inscriptions, found on rock faces, walls, and columns, have been preserved with remarkable fidelity and are invaluable for historical research.
Middle Persian (3rd century BCE — 7th century CE): Following the fall of the Achaemenid Empire and the rise of the Sassanian state, Middle Persian became the language of administration, literature, and Zoroastrian religion. Written in Pahlavi script, it was instrumental in the cultural and literary development of the region.
New Persian (Dari) (9th century CE — present): During the Islamic era, Persian evolved under the influence of Arabic. The 9th century marked the beginning of literary Persian, which reached its zenith in the 10th century under Persian rule. Early Persian texts written in Arabic script date back to the latter half of the 9th century, with historical accounts from the period detailing these early literary works.
The «Farsi Dari» language developed with influences from Eastern Iranian dialects, including Sogdian, particularly in cities like Bukhara, the capital of the Samanid state. This dialectal influence enriched the language, incorporating features from various Eastern Iranian languages.
During the 9th and 10th centuries, Persian literary language evolved with modifications to the Arabic script to better fit Persian phonology. This period saw the language become entrenched in official and literary use, with Bukhara’s dialect playing a significant role in its development.
Despite significant regional dialectical differences, medieval literary Persian largely remained uniform. For instance, it was difficult to distinguish between the literary languages of Mawarannahr and Khorasan up until the 16th century, as Tajik and Persian used a unified literary language. Persian later spread across Northern India, Eastern Turkestan, the Caucasus, Turkey, and Kurdistan, maintaining a consistent lexical and grammatical framework across these regions.
The influence of Arabic persisted even after the Arab conquest and the spread of Islam, leaving a lasting imprint on Persian and later Tajik vocabulary. In the 19th century, the literary language began to align more closely with Tajik dialects, though these changes did not significantly alter established literary norms.
In the 20th century, the Persian language spoken in Central Asia was officially renamed Tajik. This change was linked to political developments in the early 1900s. The Tajik Soviet Socialist Republic was established in 1929, granting Tajik the status of a national language. The language underwent several script reforms, first adopting a Latin-based alphabet in 1929 and then switching to a Cyrillic alphabet in 1939. Today’s Tajik Cyrillic alphabet includes 35 letters tailored to the language’s phonetic structure.
Following Tajikistan’s independence, the Tajik language reclaimed its status as the state language, reasserting its importance a millennium after the fall of the Samanid state. The 1994 Language Law officially recognized Tajik as the state language of Tajikistan.
Currently, Tajik is spoken in four main dialect groups:
These dialects exhibit significant regional variations, making Tajik in Samarkand quite different from that spoken in Afghanistan.
Grammatically, Tajik is an analytic language, characterized by a lack of case and gender categories. Relationships between words are mainly expressed through prepositions, postpositions, and word order, with agreement occurring primarily between nouns and verbs.
Tajik also serves as the native or sole language for various non-Tajik ethnic groups, including Central Asian Roma («Jugi»), Bukhara Jews, and Central Asian Arabs. Despite its wide usage, only about 8 million out of 20 million Tajiks globally are proficient in the language, reflecting a broader linguistic diversity within the Tajik community.