Parts of Speech Tagging Issues in Nepali Language

  1. English words simply transliterated into Devanagari are sometimes problematic to tag as their status is often not clear. For eg., जोन (NNP) एंड (CC) कम्पनी (NN). Currently, such multiple words meaning a single entity have been tagged separately for each word constituent.
  2. Some words like पछि have multiple functions and hence quite problematic and confusing while tagging. In such cases, contextual occurrence has been taken into consideration. In घरपछि, पछि is a postposition (POP) and in चार वर्ष पछि, पछि is an adverb (RBO).
  3. Problems tagging the symbol “/”. Cases of १/२ and राम/श्याम.
  4. Problems tagging the symbol “-”. For eg. घर-परिवार.
  5. No provision for tagging negation words like न…न meaning “neither…nor”.
  6. The post positions like ले, लाइ, बाट, etc. when comes in combination with other words is treated morphologically not leaving any spaces in between. For e.g. राम (NNP) ले (PLE) for रामले.
  7. Compound verbs have been tagged as per the last verb component it contains. For e.g., खानुभायो <vbx>. Here the tagging is done taking into consideration भायो rather than खानु.
This question is marked "community wiki".

asked 09 Jun '20, 00:02

mani-rai's gravatar image

mani-rai ♦♦
accept rate: 20%

wikified 09 Jun '20, 02:14

Be the first one to answer this question!
toggle preview community wiki

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 09 Jun '20, 00:02

question was seen: 352 times

last updated: 09 Jun '20, 02:14

Copyright © 2016 Based on OSQA .