2022 nllp-1 13
2022 nllp-1 13
2022 nllp-1 13
et al., 2016) as the annotation framework. Each ROD 1.1 6.2 1.9 0.0 88.8 1.4 0.2 0.3 40
RLC 0.3 4.4 9.0 12.2 0.6 61.4 3.5 8.4
legal expert assigned one of the 13 Rhetorical roles
RPC 0.0 0.0 0.9 0.0 0.0 4.0 95.1 0.0 20
to each document sentence. Note that we initially
NON 0.1 0.1 1.0 0.0 0.0 0.6 0.0 98.1
experimented with different levels of granularity 0
A
G
E
D
C
C
N
(e.g., phrase level, paragraph level), and based on
FA
ST
PR
RL
RP
NO
RO
AR
the pilot study, we decided to go for sentence-level (a) IT
annotations as it maintains the balance (from the FAC 95.7 0.1 2.3 0.0 0.4 0.0 1.5 0.0 0.0
100
perspective of topical coherence) between too short ARG 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 80
(having no labels) and too long (having too many PRE 2.8 0.3 93.0 0.0 0.3 2.6 0.5 0.0 0.5
labels) texts. Legal experts pointed out that a single STA 2.5 1.7 47.0 22.0 4.2 3.0 8.1 10.6 0.8 60
ROD 0.1 0.0 0.0 0.0 99.6 0.2 0.2 0.0 0.0
sentence can sometimes represent multiple rhetori- RLC 0.0 0.0 17.2 0.0 0.0 82.8 0.0 0.0 0.0 40
cal roles (although this is not common). Each ex- RPC 3.6 0.0 1.9 0.0 2.6 0.1 89.4 1.0 1.4
pert could also assign secondary and tertiary rhetor- DIS 0.0 0.0 0.0 0.0 0.0 0.0 21.6 78.4 0.0 20
ical roles to a single sentence to handle such sce- NON 1.1 0.0 0.2 0.0 0.0 0.0 0.9 0.0 97.8
0
narios (also App. B.4). As an example, suppose a
C
A
G
E
D
C
C
DIS
N
FA
ST
PR
RL
RP
NO
RO
AR
(LSP), that aims to model the relationship between CRF CRF CRF
the full model at the inference time, the true label si-1 si si+1
F1
0.60
LSTM LSTM LSTM LSTM LSTM LSTM
0.58
IT
E1(s1) E1(si) E1(sn) E2(s1) .... E2(si) .... E2(sn) IT+CL
.... ....
0.56
E1 E2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
s1 .... si .... sn
Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. Vijit Malik, Rishabh Sanjay, Shubham Kumar Nigam,
2013. Statistical methods for rates and proportions. Kripabandhu Ghosh, Shouvik Kumar Guha, Arnab
john wiley & sons. Bhattacharya, and Ashutosh Modi. 2021. ILDC for
CJPE: Indian legal documents corpus for court judg-
Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, and ment prediction and explanation. In Proceedings
Maosong Sun. 2018. Few-shot charge prediction of the 59th Annual Meeting of the Association for
with discriminative legal attributes. In Proceedings of Computational Linguistics and the 11th International
the 27th International Conference on Computational Joint Conference on Natural Language Processing
Linguistics, pages 487–498, Santa Fe, New Mexico, (Volume 1: Long Papers), pages 4046–4062, Online.
USA. Association for Computational Linguistics. Association for Computational Linguistics.
162
Marie-Francine Moens, Caroline Uyttendaele, and Jos Adam Wyner, Raquel Mochales-Palau, Marie-Francine
Dumortier. 1999. Abstracting of legal cases: the Moens, and David Milward. 2010. Approaches to
potential of clustering based on the selection of rep- text mining arguments from legal cases. In Semantic
resentative objects. Journal of the American Society processing of legal texts, pages 60–79. Springer.
for Information Science, 50(2):151–161.
Adam Z Wyner, Wim Peters, and Daniel Katz. 2013. A
National Judicial Data Grid. 2021. National judi- case study on legal case annotation. In JURIX, pages
cial data grid statistics. https://www.njdg. 165–174.
ecourts.gov.in/njdgnew/index.php.
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and
Quoc V Le. 2020. Self-training with noisy student
Isar Nejadgholi, Renaud Bougueng, and Samuel With-
improves imagenet classification. In Proceedings of
erspoon. 2017. A semi-supervised training method
the IEEE/CVF Conference on Computer Vision and
for semantic search of legal facts in canadian immi-
Pattern Recognition, pages 10687–10698.
gration cases. In JURIX, pages 125–134.
Wenmian Yang, Weijia Jia, Xiaojie Zhou, and Yutao
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Luo. 2019. Legal judgment prediction via multi-
Sentence embeddings using siamese bert-networks. perspective bi-feedback network. In Proceedings of
arXiv preprint arXiv:1908.10084. the Twenty-Eighth International Joint Conference on
Artificial Intelligence, IJCAI-19, pages 4085–4091.
M Saravanan, Balaraman Ravindran, and S Raman. International Joint Conferences on Artificial Intelli-
2008. Automatic identification of rhetorical roles gence Organization.
using conditional random fields for legal document
summarization. In Proceedings of the Third Interna- Hai Ye, Xin Jiang, Zhunchen Luo, and Wenhan Chao.
tional Joint Conference on Natural Language Pro- 2018. Interpretable charge predictions for criminal
cessing: Volume-I. cases: Learning to generate court views from fact
descriptions. In Proceedings of the 2018 Conference
Jaromir Savelka and Kevin D Ashley. 2018. Segmenting of the North American Chapter of the Association for
us court decisions into functional and issue specific Computational Linguistics: Human Language Tech-
parts. In JURIX, pages 111–120. nologies, Volume 1 (Long Papers), pages 1854–1864,
New Orleans, Louisiana. Association for Computa-
Stavroula Skylaki, Ali Oskooei, Omar Bari, Nadja tional Linguistics.
Herger, and Zac Kriegman. 2021. Legal entity ex-
traction using a pointer generator network. In 2021
International Conference on Data Mining Workshops
(ICDMW), pages 653–658.
163
Appendix of NCLAT (National Company Law Appellate
Tribunal)2 , CCI (Competition Commission of In-
A Ethical Considerations dia)3 , COMPAT (Competition Appellate Tribunal)4 .
The proposed corpus and methods do not have Since the IT laws are 50 years old and relatively
direct ethical consequences to the best of our dynamic, we stick to certain sections of IT domain
knowledge. The corpus is created from pub- only, whereas we use all the sections for CL do-
licly available data from a public resource: www. main. We restrict ourselves to the IT cases that
indiankanoon.org. The website allows free are based on Section 147, Section 92C and Sec-
downloads, and no copyrights were violated. With tion 14A only to limit the subjectivity in cases. We
the help of law professors, we designed a course randomly select 50 cases from IT and CL domain
project centered around RR annotations for the each to be annotated. We used regular expressions
student annotators. The students voluntarily par- in Python to remove the auxillary information in
ticipated in the annotations as a part of the course the documents (For example: date, appellant and
project. Moreover, annotators were curious about respondent names, judge names etc.) and filter out
learning about AI technologies and further con- the main judgment of the document. We use the
tributing towards its progress. There was no com- NLTK5 sentence tokenizer to split the document
pulsion to take part in the annotation activity. into sentences. The annotators were asked to anno-
The cases were selected randomly to avoid bias tate these sentences with the rhetorical roles.
towards any entity, situation, or laws. Any meta-
B.2 Annotators Details
information related to individuals, organizations,
and judges was removed so as to avoid any intro- With the help of law professors, we designed a
duction of bias. For the application of corpus to course project centered around RR annotations for
judgment prediction task, we are not the first ones the student annotators. The students voluntar-
to do the task of judgment prediction. For the task, ily participated in the annotations as a part of the
we took all the steps (names anonymization and course project. Moreover, annotators were curious
removal of meta-information) as outlined in the about learning about AI technologies and further
already published work of Malik et al. (2021). The contributing towards its progress. There was no
focus of this paper is rhetorical role prediction, and compulsion to take part in the annotation activity.
the task of judgment prediction is only a use-case. The 6 annotators come from an Indian Law Uni-
Moreover, in this paper we focus mainly on IT and versity. Three of them specialize in Income Tax
CL cases where facts and scenarios are more ob- domain and the other three specialize in Competi-
jective and there are less biases compared to other tion Law domain.
types of cases (e.g., criminal and civil cases). As
B.3 Rhetorical Roles
also described by Malik et al. (2021), we do not
believe that the task could be fully automated, but We provide the definition of each of the Rhetorical
rather it could augment the work of a judge or legal Role in the main paper. Examples for each of the
practitioner to expedite the legal process in highly RR are given in Table 15. Figure 5 provides the
populated countries. number of sentences for each label in the IT and
Legal-NLP is a relatively new area; we have CL dataset. Note that representation of both the
taken all the steps to avoid any direct and fore- domains is similar with the exception of DIS label.
seeable ethical implications; however, a lot more
B.4 Secondary and Tertiary Annotation
exploration is required by the research community
Labels
to understand implicit ethical implications. For this
to happen, resources need to be created, and we are Legal experts pointed out that a single sentence
making initial steps and efforts towards it. can sometimes represent multiple rhetorical roles
(although this is not common). Each expert could
B Dataset and Annotations also assign secondary and tertiary rhetorical roles
to a single sentence to handle such scenarios and
B.1 Data Collection and Preprocessing
2
https://nclat.nic.in/
The IT and CL cases come from the Supreme Court 3
https://www.cci.gov.in/
of India, Bombay and Kolkata High Courts. For 4
http://compatarchives.nclat.nic.in
CL cases, we use the cases from the tribunals 5
http://www.nltk.org/
164
4500
IT
4000
3578
CL
3495
Number of sentences
3500
2753
3000
2500 FAC 770 184 42 66 12 108 3 69 1200
1912
1751
2000 ARG 50 1293 26 4 35 83 3 30
1561
1000
1310
1196
581
517
511
363
600
298
500 ROD 21 195 42 19 159 109 8 33
250
143
56
0 RLC 25 156 54 46 4 869 23 112 400
0
FAC ARG STA PRE ROD RLC RPC NON DIS RPC 1 2 11 0 0 76 203 0
200
NON 4 41 12 4 0 144 14 762
0
Figure 5: Distribution of RR labels in IT and CL docu-
A
G
E
D
C
C
N
FA
ST
PR
RL
RP
NO
RO
AR
ments.
(a) Between annotators A1 and A2 for IT domain
FAC 759 58 6 35 22 23 1 3 1200
motivate future research. On an average annotators ARG 152 1310 72 132 211 190 2 35
1000
assigned secondary role in 5-7% cases and assigned PRE 42 29 110 44 52 93 9 14
tertiary roles in 0.5-1% cases. STA 54 3 3 622 19 137 0 2
800
A
G
E
D
C
C
N
FA
ST
PR
RL
RP
NO
RO
AR
ment (averaged pairwise macro F1 between anno- (b) Between annotators A2 and A3 for IT domain
tators) upon 13 fine-grained labels in Table 9. Also,
FAC 2686 0 44 0 18 0 4 0 2
we provide the pairwise confusion matrices of an- ARG 0 361 0 0 2 0 0 0 0
3000
notators (A1 , A2 ) and (A2 , A3 ) for both IT and CL PRE 25 0 2929 1 411 46 51 0 40 2500
A
G
E
D
C
C
DIS
N
FA
ST
PR
RL
RP
NO
RO
AR
A
G
E
D
C
C
DIS
N
FA
ST
PR
RL
RP
NO
RO
AR
F1 0.73 0.88
(d) Between annotators A2 and A3 for IT domain
Table 9: Label-wise inter-annotator agreement for all 13
Figure 6: Confusion matrix between Annotators for IT
fine-grained labels.
and CL domains.
167
D.2 Single Sentence Classification Baselines we used shift embeddings for the shift component
We train single sentence classification models for and BERT embeddings for the RR component. We
the task of rhetorical role labelling. We use BERT- used the NLL loss in both components of the MTL
base-uncased and Legal-BERT models and fine- model weighted by the hyperparameter λ. We
tune them upon the sentence classification task. We use the Adam Optimizer for training. We provide
also try a variant of using context sentences (left dataset-wise hyperparameters and results in Tables
sentence and the right sentence) along with the 10 , 12 and 11.
current sentence to make classification, we call this D.5 Hyperparameter λ
method BERT-neighbor. We use CrossEntropyLoss
We tuned the hyperparameter λ of the MTL loss
as the criterion and Adam as the optimizer. We
function upon the validation set. We trained the
use a batch size of 32 with a learning rate of 2e-5
MTL model with λ ∈ [0.1, 0.9] with strides of 0.1
and fine-tune for 5 epochs for all our experiments.
and show the performance of our method on IT and
Refer to Tables 10 , 12 and 11 and for results and
IT+CL datasets in Figure 4. λ = 0.6 performs the
more information about the hyperparameters.
best for the IT domain and also performs competi-
D.3 Sequence Classification Baselines tively on the combined domains.
Hyperparameters(E=Epochs),
(LR=Learning rate),
(BS=Batch Size),
Model (Dim=Embedding dimension), IT+CL (Macro F1)
(E1=Embedding dimension Shift),
(E2=Embedding dimension RR),
(H=Hidden dimension),
BiLSTM-CRF(sent2vec) LR=0.01, BS=40, Dim=200, H=100, E=300 0.65
BiLSTM-CRF(BERT) LR=0.01, BS=40, Dim=768, H=384, E=300 0.63
LSP-BiLSTM-CRF(BERT-SC) LR=0.005, BS=20, Dim=2304, H=1152, E=300 0.67
LR=0.005, BS=20, E1=2304, E2=768,
MTL-BiLSTM-CRF(BERT-SC) 0.70
H=1152(Shift), H=384(RR), E=300
LR=0.005, BS=20, E1=2304, E2=2304,
MTL-BiLSTM-CRF(BERT-SC) 0.68
H=1152(Shift), H=384(RR), E=300
LR=0.005, BS=20, E1=768, E2=768,
MTL-BiLSTM-CRF(BERT-SC) 0.65
H=1152(Shift), H=384(RR), E=300
169
Hyperparameters(E=Epochs),
(LR=Learning rate),
(BS=Batch Size),
Model (Dim=Embedding dimension), CL (Macro F1)
(E1=Embedding dimension Shift),
(E2=Embedding dimension RR),
(H=Hidden dimension),
BERT LR=2e-5, BS=32, E=5 0.52
BERT-neighbor LR=2e-5, BS=32, E=5 0.51
Legal-BERT LR=2e-5, BS=32, E=5 0.53
CRF(handcrafted) LR=0.01, BS=40, Dim=172, E=300 0.52
BiLSTM(sent2vec) LR=0.01, BS=40, Dim=200, H=100, E=300 0.54
BiLSTM-CRF(handcrafted) LR=0.01, BS=40, Dim=172, H=86, E=300 0.56
BiLSTM-CRF(sent2vec) LR=0.01, BS=40, Dim=200, H=100, E=300 0.61
BiLSTM-CRF(BERT emb) LR=0.01, BS=40, Dim=768, H=384, E=300 0.63
BiLSTM-CRF(MLM emb) LR=0.01, BS=40, Dim=768, H=384, E=300 0.60
LSP(SBERT) LR=0.005, BS=40, Dim=2304, H=1152, E=300 0.63
LSP(BERT-SC) LR=0.005, BS=40, Dim=2304, H=1152, E=300 0.68
LR=0.005, BS=20, E1=2304, E2=768 , H=1152(Shift),
MTL(MLM emb) 0.67
H=384(RR), E=300
LR=0.005, BS=20, E1=2304, E2=768, H=1152(Shift),
MTL(BERT-SC) 0.69
H=384(RR), E=300
LR=0.005, BS=20, E1=2304, E2=2304, H=1152(Shift),
MTL(BERT-SC) 0.67
H=384(RR), E=300
LR=0.005, BS=20, E1=768, E2=768, H=1152(Shift),
MTL(BERT-SC) 0.64
H=384(RR), E=300
Table 14: Judgment Prediction results using predicted ROD & RPC
170
Label Sentence
It has also been alleged that the copies of the notices were also sent,
inter alia, to the principal officer of the said company and also to the ladies
Fact
as mentioned herein before, who has sold the immovable property
in question.
For executing this contract, the assessee entered into various contracts
Fact
-Offshore Supply contract and Offshore Service Contracts.
But the words inland container depot were introduced in Section 2(12)
Ruling By Lower Court
of the Customs Act, 1962, which defines customs port.
We may also mention here that the cost of superstructure was
Ruling By Lower Court Rs. 2,22,000 as per the letter of the assessee dated 28-11-66 addressed
to the ITO during the course of assessment proceedings.
Such opportunity can only be had by the disclosure of the materials to
Argument the court as also to the aggrieved party when a challenge is thrown to the
very existence of the conditions precedent for initiation of the action.
In this connection, it was urged on behalf of the assessee(s) that, for the
relevant assessment years in question, the Assessing Officer was required
Argument
to obtain prior approval of the Joint Commissioner of Income Tax before
issuance of notice under Section 148 of the Act.
In the meantime, applicant has to pay the additional amount of tax with
Statute interest without which the application for settlement would not
be maintainable.
On the other hand, interest for defaults in payment of advance tax falls
Statute under section 234B, apart from sections 234A and 234C, in section
F of Chapter XVII.
The State having received the money without right, and having retained
Ratio of the Decision and used it, is bound to make the party good, just as an individual
would be under like circumstances.
Therefore, the Department is right in its contention that under the
Ratio of the Decision
above situation there exists a Service PE in India (MSAS).
For these reasons, we hold that the Tribunal was wrong in reducing the
Ruling by Present Court penalty imposed on the assessee below the minimum prescribed
under Section 271(1)(iii) of the Income-tax Act, 1961.
Hence, in the cases arising before 1.4.2002, losses pertaining to exempted
Ruling by Present Court
income cannot be disallowed.
Yet he none the less remains the owner of the thing, while all the
Precedent
others own nothing more than rights over it.
I understand the Division Bench decision in Commissioner of
Precedent
Income-tax v. Anwar Ali, only in that context.
None Leave granted.
None There is one more way of answering this point.
Dissent Therefore a constructive solution has to be found out.
In the light of the Supreme Court decision in the case of CCI vs SAIL
Dissent
(supra) t his issue has to be examined.
171