AChandiok BICA2018

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/327062630
CIT: Integrated cognitive computing and cognitive agent technologies based

cognitive architecture for human-like functionality in artificial systems
Article in Biologically Inspired Cognitive Architectures · August 2018

DOI: 10.1016/j.bica.2018.07.020
CITATIONS READS
2 277
2 authors:
Ashish Chandiok D. K. Chaturvedi

Banasthali University Dayalbagh Educational Institute (Deemed to be University)
12 PUBLICATIONS 37 CITATIONS 178 PUBLICATIONS 1,593 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Role and efficacy of Positive Thinking on Stress Management and Creative Problem Solving for Adolescents View project
Ph.D thesis View project
All content following this page was uploaded by D. K. Chaturvedi on 04 September 2019.
The user has requested enhancement of the downloaded file.

Biologically Inspired Cognitive Architectures xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Biologically Inspired Cognitive Architectures

journal homepage: www.elsevier.com/locate/bica
Research article
CIT: Integrated cognitive computing and cognitive agent technologies based

cognitive architecture for human-like functionality in artificial systems
⁎,1 ⁎⁎,2
A. Chandiok , D.K. Chaturvedi
Faculty of Engineering, Dayalbagh Educational Institute, Dayalbagh, Agra, Uttar Pradesh 282005, India
A R T I C LE I N FO A B S T R A C T
Keywords: The paper proposes a novel cognitive architecture that combines cognitive computing and cognitive agent
Cognitive agent technologies for performing human-like functionality. The system architecture is known as CIT (Cognitive
Cognitive computing Information Technology). This design takes advantage of cognitive computing to handle Experiential
Architecture Information (EI) using audio processing, computer vision, natural language processing, text mining, and data
Human-like functionality
mining techniques. The CIT architecture includes human like cognitive agent functionality comprising attention,
Experiential information
learning, memory, action selection, and action to handle human like individual and distributed knowledge bases
Decision
to create rational decisions. The work shows CIT architecture practical implementation through “CIT frame-
work” developed in C# and python language. For validating the system performance, the paper shows CIT based
Object Recognition and Question Answering System. This framework is anticipated to advance the quality of arti-
ficial intelligent agent based decision-making using human like perception, comprehend and action skills, re-
ducing real world business errors and assuring the correct, accurate, knowledgeable and well-timed human like
decisions.
Introduction environment (Tweedale, 2014). So, the paper proposes CIT (Cognitive
Information Technology) architecture for developing artificial system
In today’s world, researchers are doing considerable work in de- based on Cognitive Computing and Cognitive Agent technology that can
veloping artificial systems, but still, present designs shows the absence handle real world data, link it, assimilate it with knowledge to solve
of necessary and sufficient illustration of human-like functionality problems like humans.
(Samsonovich, 2012). The central focus of the Artificial Intelligence is
in creating human-like “Super Intelligent Systems” that can help us in Objective of the work
making life better in terms of economic and social values. So, the re-
searchers selected “Agent theory” to achieve the target of creating super The purpose of writing this research paper is:
intelligent systems (Seidita & Chella, 2017). With the continual growth
in the field of agent concepts inside modern intelligent system era, a • Firstly, to have a comparative state of the art review of previous
vital need is felt to have “Cognitive Architectures” for handling such architecture development works focusing on their advantages, as
complex agent based systems that provision human mind like abilities well as limitations, to improve the proposed CIT architecture.
(Arsene & Dumitrache, 2017). Commencing, this work in the current • Secondly to address the human-like functionally and its necessity in
state of the art in the area of artificial cognitive systems, it is essential to a cognitive design to have better perception ability, deliberative
concentrate on the fact that there is an urgent requirement for scientific knowledge handling; and higher level cognition and meta cognition
research targeted at developing a human-like sense, comprehend and to make best action selection and its implementation.
act using “Cognitive Computing” technologies. It is necessary to nurture a • Thirdly, to propose and define CIT (Cognitive Information
new way of architectural development that leads to substantial progress Technology) architecture to remove the problems of previous cog-
of specific cognitive agents with the skills to serve in an uncertain nitive system architectural frameworks.
⁎
Principal Corresponding author.
⁎⁎
Corresponding author.
E-mail address: achandiok@gmail.com (A. Chandiok).
1
Ashish Chandiok is working in the field of artificial cognitive system.
2
D.K. Chaturvedi is working in field of softcomputing, cognitive and conscious systems.
https://doi.org/10.1016/j.bica.2018.07.020
Received 18 January 2018; Received in revised form 28 July 2018; Accepted 31 July 2018
2212-683X/ © 2018 Elsevier B.V. All rights reserved.
Please cite this article as: Chandiok, A., Biologically Inspired Cognitive Architectures (2018), https://doi.org/10.1016/j.bica.2018.07.020
A. Chandiok, D.K. Chaturvedi Biologically Inspired Cognitive Architectures xxx (xxxx) xxx–xxx
• Fourthly, to validate and test the proposed design. So, the paper model the human-like behavioral and structural functions of the cog-
presents a point to point emphasis on the experimental work on the nitive system.
real-world application, captivating a case study of Cognitive According to Newell (1990)and Anderson and Lebiere (2003)cog-
Computing Agent-based Humanoid named “CITHCog” utilizing the nitive architecture is a particular scope of universal domain design for
power of CIT architecture and its software framework. The appli- modeling computational cognition, by assembling the necessary con-
cation developed is Object Recognition and Question/Answering figuration and procedure of mind to be used for creating general lin-
system. Lastly, the authors look at general discussion on prospects of guistic-skills, plastic behavior, real-time process, brains like compre-
such designs. hension, rationality, sizable symbolic knowledge base, human-like
learning, development, and self-awareness.
The cognitive computing agent based systems have a human expert In Sun (2004) desideratum claims to be more expansive for ex-
like sensing, comprehending and acting abilities to assist in best deci- plaining cognitive functions. It includes ecological realism, bio-re-
sion making through timely access to digital and electronically stored volutionary realism, cognitive realism, modular, adaptation, routines,
experiential knowledge (Modha et al., 2011). These computing agents and synergism, as well as interaction. Also describing these measures
help to improve the social and business services. A cognitive computing and smearing them to put on a variety of cognitive architecture. Ron
agent interacts with the expert users and stored electronic records as Sun focuses on the absence of evidently distinct cognitive conventions
well as knowledge bases to obtain the facts as input and delivers clues, and procedural methodologies. Also, according to Samsonovich (2010)
warnings, or advice for problems solving, creating the long-term care while reviewing 26 cognitive architectures, claims that it is an approach
scheduling, and other facets Sancin, Dobravc, and Dolšak (2010). A to the planning of infrastructure for higher level computational system
cognitive computing system needs the knowledge and info stored in principally for mimicking or modeling of human cognition inside arti-
data and knowledge bases for cognitive decision-making (Peña-Ayala & ficial structures.
Mizoguchi, 2012). This paper tries to build general cognitive computing In Laird (2012) deliberates the difference between cognitive archi-
agent based architecture having human-like functionality. The pro- tectures and software systems. Both cognitive architecture and software
posed archetype will take an instruction query and an initial situation include a proper process for data representation, the memory storing,
inputs from the user and will offer a decision in a reasonable form control components, as well as input/output devices. The main varia-
founded on existing knowledge. It will assimilate consistent knowledge tion comes where former deliver merely a static model for specific
bases from domain experts and Business Practice Guidelines (BPG) computation. While, on the other hand, cognitive architectures based
knowledge with practical information that is extracted continuously system, autonomously transform over mind like anatomy expansion
from EDR databases and delivers appropriate decision support (Miller, and resourcefully use knowledge to achieve different tasks rather than
McGuire, & Feigh, 2017). fixed problem solving (Tenorth & Beetz, 2017).
Researchers worked hard in the past, proposing cognitive archi-
Review of past cognitive architectures tectures and are still developing numerous designs for completing one
unified mission which is to build human-like cognitive abilities inside
Regardless of comprehensive research in the area of artificial in- artificial systems.
telligence over many years, no cognitive system is built that can au- The pioneer work in creating a cognitive system based on archi-
tonomously attain skills and the superiority of human-like sensing, tectures points to SOAR (Laird, 2008) denoting symbolic design, which
brainpower, understanding and decision making (Goertzel, 2014). In consists of various cognitive modular functional blocks like long-term
the current era designing and realizing an artificial cognitive system is a memories (semantic, procedural, and episodic), short-term (working
tedious task (Thórisson & Helgason, 2012). The main reason behind memory); and cognitive processing of symbolic input data Persiani,
such complex work is that the system must have human-like unified Franchi, and Gini, 2018. For every decision making cycle the processor
functionality representing sensing, attention, motivation, social-inter- searches long-term memories for relevant knowledge based on goals.
action, emotion, actuation, communication, self/other knowing, SOAR implements production rule while attaining information from the
learning, memory, knowledge development, planning, goal-based rea- long-term memories in achieving the best choice. Under conditions
soning, decision making as well as doing creation/building and quan- when it has inadequate data to tackle a real-world problem, it generates
titative operations (Adams et al., 2012; Lieto, Chella, & Frixione, 2017; smaller goals known as sub-goals. SOAR development process is going
Samsonovich, 2014). These features realized in artificial systems with for last 35 years focusing on the symbolic handling of knowledge, and
cognition skills only by using cognitive architectures and not by in- its present version is SOAR9 (Gudwin et al., 2017; Laird, Kinkade,
dividually creating artificial intelligence tools. Researchers are working Mohan, & Xu, 2012). SOAR lags in an efficient attention mechanism for
hard and developing various cognitive architectures and frameworks to doing perception. Therefore it performs processing in abstract based on
attain human-like real-world problem-solving abilities (Kirk, Mininger, rules and symbolic knowledge in every decision cycle. Though it looks
& Laird, 2016; Vernon, Metta, & Sandini, 2007). So, it is necessary to perfect to validate cognitive systems based on knowledge-rules and
have the state of art review for the previously proposed cognitive ar- abstract knowledge, humans do not take decisions entirely on the in-
chitectures. While on performing the review process, the primary tangible stored rules. So, in CIT based on cognitive computing and
concentration is to find functionality that are missing in the handful of agent technologies the decision is based on Experiences rather than
stable and active architectures among which are SOAR, ACT-R, NARS, procedures. Also, in CIT the real-world inputs are through Attention.
ICARUS, CLARION, EPIC, and LIDA as stated in Duch, Oentaryo, and ACT-R currently is going with its latest version ACT-R 7 which re-
Pasquier (2008). Also the focus goes to current on going research works presents a hybrid cognitive architecture. ACT-R performs its cognitive
in current cognitive architectures like MDB (Bellas, Duro, Faina, & abilities by programming facts inside the framework to complete any
Souto, 2010), CogPrime (Goertzel et al., 2013), Sigma (Pynadath, task (Anderson, 2005). ACT-R has three main components known as
Rosenbloom, & Marsella, 2014), MLECOG (Starzyk et al., 2017), eBICA modules (perceptual-motor and memory module), buffers, and pattern
(Samsonovich, 2013), and ECA (Georgeon, Marshall, & Manzotti, matching mechanism. The perceptual module does visual and motor
2013). The primary emphasis is to improve the designs of CIT (Cogni- skills, while memory modules stores individual facts (declarative) like
tive Information Technology) concentrating on lacuna’s of previous “Delhi is the capital of India” and procedures such as “3 + 4 = 7”. The
cognitive principle philosophies in constructing such type of systems. buffer helps to acquire individual modules to obtain particular state.
A cognitive architecture is a blueprint design to build artificial Pattern matching helps to determine the appropriate production which
cognitive agent by endorsing a decision process like a human. Cognitive matches perfectly to the query state buffer. ACT-R uses both symbolic
architecture proposes the approach to create a framework to practically and sub-symbolic production system to perform applications. The major
2
problem in ACT-R is in implementing productions based on program- knowledge representation. Also, it uses reward and motivation based
ming by third parties. Humans do not make decision-based on pro- system to create actions.
grams. So, in CIT knowledge formally gets stored in the form of con- The MDB (Bellas et al., 2010) (multilevel Darwinist brain) archi-
ceptual experiences rather than applications which are coded. tecture proceeds an evolutionary approach in the direction of progress.
CLARION denoting (Connectionist Learning with Adaptive rule It agrees for natural variation of goals or motivations by presenting a
Induction Online) is a hybrid cognitive architecture (Sun, Merrill, & satisfaction model and allowing fast reactive behavior. The MDB
Peterson, 2001). It acquires real-world audio and visual sensory per- structure makes sure of conserving deliberative features and allowing
ception and stores in the form of symbolic information representation for the choice of acts as a replacement for modest actions. In the MDB
using neural networks, higher level declarative knowledge and re- model, inspiration occurs to control the agent’s behavior grounded on
inforcement learning. CLARION also uses an additional motivational the rating of satisfaction of the drive using together with the internal as
system for performing perception, cognitive skills, and action. Action well as external perceptions of the agent. It is similar to the way, CIT
gets decided in the external world or perform internal management of levers motivations and its action choices centered on values such as
memories and the goals. Advanced level motivations in CLARION ar- Confidence and Rewards. Moreover, MDB holds long-term (LTM) and
chitecture are pre-trained using a back propagation neural network. short-term (STM) memories. Though, it presently has nonexistence of
Sun suggested that some motives may derive from end to end training attention switching mechanism to support both perception and memory
(Gray, 2007). Though, no general indications represented in what way handling.
motivations imitate by the CLARION cognitive agent, nor any applied For summarizing all the cognitive architecture, the main criteria
instance in which the learning helps the agent in developing its motives. taken is how efficiently these designs fulfill human-like functionality.
This method to “consequential drives” is similar to the humans; yet, So, it is necessary to decide which are the best functional parameters on
CLARION agent is expending subjectively fixed factors to regulate the which these architectures must get evaluated. Considering the main
extreme power of the motivation (Sun, 2009), on the other hand in CIT, competencies areas in the artificial general intelligence as explained in
no such random set is used to motivate for particular activities. Instead, Adams et al. (2012). The abilities are Perception, Attention, Learning,
use inner confidence as a motivation factor to select hypothesis based Memory, Reasoning, Action, Emotions. Considering above compe-
decisions. tencies and adding or altering the abilities, and as well as their re-
LIDA which is known as (Learning Intelligent Distribution Agent) ar- spective modal. The new proposed classification parameters for evalu-
chitecture is a hybrid architecture working on the principles of global ating individual architecture are:
workspace theory (Franklin, Strain, McCall, & Baars, 2013). It has
various types of memory blocks functionalities and works in cognitive 1. Perception: Vision (V), smell (S), Touch (To), Taste(Ta), Audio (A),
cycles (Franklin, Madl, D’Mello, & Snaider, 2014). In every sequence, Cross-modal (C), Proprioception (P), Data Input (D), Other Sensors
the cognitive system performs understanding for creating decisions, (O).
attention and doing actions based on data. Sensory information from 2. Attention: Visual (V), Auditory (A), Behavioural (B), Social (S).
the real-world stimulates perception memory during comprehension, 3. Learning: Declerative (De), Perceptual(Pe), Procedural (Pr),
while local workspace is used to develop relations among episodic and Associative (A), Non-Associative(NA), Priming (Pm), Imitation (I),
declarative memory. Small codes of “Attention” participate for choices Dialog (Di), Media-Oriented (MO), Reinforcement (R),
inside the global workspace and, when nominated, are broadcast all Experimentation (E).
over the system. The disseminated information reaches artificial- 4. Memory: Sensory (Se), Working (W), Episodic (Ep), Semantic(Sm),
memories, as well as triggering learning, and action. Attention lets the Procedural (P), Implicit (I), Explicit (Ex), Global (G).
architecture to accomplish a real-time operation, decreasing the extent 5. Reasoning: Induction, Deduction, Abduction, Physical, Causal,
of evidence handled. CIT working somewhat inspired by the LIDA Associational.
cognitive agent-based processing. However, CIT architecture executes a 6. Action Selection: Conditions for Action Selection: Relevance (R),
cognitive computing approach for treating experiences. It uses sub- Utility (U), Emotions (E). The categories of These action selection is
jective skills and motivations to cope goal based decision and arouses Winner-take-all (WTA), Probabilistic (Po), Predefined (Pr).
the agent’s artificial mental behavior. So, CIT cognitive process starts 7. Action: Robotic (R), Computer Vision (V), Natural Language
after doing attention behavior and when the data reaches the sensory Processing (N), Pyschological Experiments (P), Virtual Agents (VA),
memories and creates features for further processing. Human Robot Interface/Human Computer Interface (H),
EPIC architecture stands for (Executive-Process/Interactive Control) Quantitative (Q), Creation/Building (B), Categorization and
and is a hybrid Architecture. The explicit objective of EPIC is to design, Clustering (C), Decision-making (D), Games and Puzzles (G), Human
implement and validate a cognitive structure, for human-like informa- Performance Modeling (HPM).
tion handling that precisely focus for the “full timing process” of human
perceptual, cognitive, and motor action (Kieras et al., 1997). EPIC offers Table 1 shows that past cognitive architecture is missing specific
a framework for building models of human-system interface that are features to exhibit human-like functionalities. Firstly, very few design
perfect and sufficiently complete to be beneficial for real-world inten- like: (LIDA, EPIC, MDB, and MLECOG) are using Audio (A)/Visual (V)
tional drives. EPIC signifies an advanced production of outcomes on attention. Secondly, no structure is focusing towards Dialog (D), Media-
human perceptual/motor act, cognitive modeling methods, and task Oriented (MO), and Reinforcement Learning(R). Thirdly, no archi-
inquiry methodology, applied in the form of computer imitation system. tecture is using machine learning, utility, and reward based action se-
EPIC replicates human performance in a real-world task by program lection. Lastly, past designs Action based applications are solving toy
design the cognitive processor using production rules prearranged as problems of classical Artificial Intelligence, building robots and doing
procedures for achieving task based on goals. The EPIC model then mechanical works like human. They are not handling human-like nat-
executed in collaboration with an imitation of the external peripheral ural language processing to perform cognitive level tasks like Question/
system and achieved the same job as the human will do. The prototype Answering, language understanding, sentiment analysis, summarizing
creates events (e.g., spoken dialogues, eye actions, and keystrokes) the dialogues or documents, understand images and answer questions,
which have perfect timing relative to human enactment. CIT also has and show human-like expert decision making. So, to overcome the
embodied and enactive features like EPIC but uses advanced technol- above problems according to literature review of past work, this paper
ogies like deep-learning for audio and computer vision, natural lan- proposes a “CIT Architecture” based “Cognitive Computing Agent” to
guage processing for dialogue understanding, and semantic web for perform human-like mental actions as mentioned above.
3
Table 1
Comparative Analysis of Cognitive Architectures based on (P:Perception, A: Attention, L: Learning,M: Memory, AS: Action Selection, Ac: Action).
Architecture Sense Comprehend Action
Perception Attention Learning Memory Action Selection Action
SOAR A,O,P,D,V N/A De,Pr,A Se,W,Sm,P,Ep P,R,U,E,L R,G,V,C

ACT-R A,V N/A De,Pr,A,Pm Se,W,Sm,P,Ep Pr,R,U,E,L P,G,N,C
NARS N/A N/A De,Pr,A,Pm W,Sm,P,Ep,G U P,G
CLARION D,O N/A De,Pr,A,Pm W,Sm,P,Ep P,U,E,L R,P
LIDA D,V V Pe,Pr,A,NA,Pm Se,W,Sm,P,Ep W,R,Re R,P,C
EPIC To,A,P,V V N/A Se,W,Sm,P R HPM
MDB To,A,O,P,D,V V De,Pe,Pr W,Se,P U,L R
COGPRIME A,P,D,V N/A De,Pr W,Se,P,Ep,G Pr,U,E,L C,R,V,N
eBICA N/A N/A D,Pe,Pr W,Se,P,Ep U,L V
MLECOG V V A,Pm W,Sm,P,Ep U,L G
1
Perception: A-Audio, V-Visual, To-Touch, P-Proprioception, D-Data, O- Others.
2
Attention: V-Visual.
3
Learning: De-Declarative, Pe-Perceptual, Pr-Procedural, A-Associative, NA-Non Associative, Pm-Priming.
4
Memory: Se-Sensory, W-Working, Sm-Semantic, P-Procedural, Ep-Episodic.
5
Action Selection: R-Relevance, U-Utility, E-Emotions, L-Learning, P-Probabilistic, Pr-Predefined.
6
Action: R-Robotic, G-Games and Puzzles, V-Computer Vision, N-Natural language Processing, HPM-Human Performance Modelling, c-Categorization and
Clustering.
This paper planned as follows. Section “Background of Cognitive CIT (aqi ) ◁ (CM (aqi) ⋃EM (aqi) ⋃KM (aqi) ⋃IM (aqi) ) (1)
Information Technology (CIT) architecture model” deliberates back-
ground focusing on principles of artificial intelligent agents, challenges where 〈CM 〉 represents the Concept Model, 〈EM 〉 signifies the Experience
in building artificial system using it, solution to improve the system by Model, 〈KM 〉 denotes the Knowledge Model, and 〈IM 〉 indicates the
using cognitive computing and agents technologies merged inside Interface Model. Therefore, the Eq. (1) shows the Cognitive CIT (aqi) of an
cognitive architecture. Section “Proposal of CIT architecture and fra- agent (aqi ) is the union of its Concept, Experience, Knowledge and
mework” elucidates the method of CIT (Cognitive Information Interface Information’s.
Technology) to introduce human-like functionality. In Section “Sensing
process inside CIT architecture for perception”, we explain a Humanoid
system named CITHCog based on CIT principles performing Human-like Concept model in cognitive information technology structure
functionality. The last two section focus on Discussion and Conclusion. Definition 0.2 (Concepts). A concept (C ) denotes a semantic and
ontological graph model as a cognitive element to represent
Background of Cognitive Information Technology (CIT) associations between real world entities and natural language type
architecture model facts information’s. A concept according to graph theoretic approach is
defined as
Cognitive information technology structure
• A group of master nodes N representing similar entities
m
Cognitive Information Technology (CIT) is a state of the art archi- E = (e , e , e , …) .

1 2 3
tecture model for developing and managing cognitive and information • A group of edge nodes N representing individual facts
e
handling skills inside an agent to conduct internal information flows F = (f , f , f , …) .

1 2 3
and artificial general intelligence. Fig. 1 shows CIT structure develops • A group of labels L representing associations A = (a , a , a , …) be- 1 2 3
and encapsulates following components inside the cognitive agents. tween Nm and Ne .
Definition 0.3 (Entity). An entity E = [(e )] is the representative of a
1. Concept Model (CM ) , which represents that a cognitive agent builds particular instant of a concept.
the meaning information about the real world in terms of human Definition 0.4 (Fact). A fact F = [(f )] is the sub level concept that
semantics and ontologies. illustrates the features of the instantaneous concept entity.
2. Experience Model (EM ) , which represents that a cognitive agent
builds its learning information about the real world in terms of Definition 0.5 (Association). An Association A = [(a)] is the
human expert knowledge behavior. relationship between the combinations of entity E and fact
3. Knowledge Model (KM ) , which represents that a cognitive agent F.Therefore, a concept over a finite group of similar entities E, a
builds its memory information about the real world in terms of finite group of association labels A and a finite group of facts F is a
human natural intelligence patterns facts. graph having set of nodes as shown in Eq. (2).
4. Interface Model (IM ) , which represents that a cognitive agent ex-
hibits and builds its skills for the real world in terms of human m n
natural perception and action skills. ⎧ ⎡ ⎛ ⎞⎤⎫
Co ≜ Ω(E , F , A): Ei × ⎢ ∑ Fj⋃⎜ ∑ Ajk ⎟ ⎥
Definition 0.1 (Cognitive Information Technology). The cognitive ⎨ ⎬
⎩ ⎣ j=1 ⎝ k=1 ⎠⎦⎭ (2)
information technology structure of an agent (aqi ) is well-defined as
the set of information technologies using cognitive computing for
developing human like intelligence and solving real world The Eq. (3) show the total function of Concept element in terms of
problems.The Eq. (1) shows the cognitive information technology association A of two concepts C ⊎ C′.
structure CIT (aqi) for an agent aqi .
4
Fig. 1. General Background of CIT

Architecture Based Cognitive Computing
Agent Model for Human Like Functionality.
Figure shows that “CIT Architecture” block
uses Cognitive Computing Technologies
block and Cognitive Agent block to build
Concept, Experience, Knowledge, and
Interface Models. The Cognitive Computing
block comprises of all the latest technologies
like Machine Learning, Deep Learning, NLP,
Knowledge and Data Mining, Robotics, IOT
etc. The Cognitive Agent block comprises of
Perception, Memory, Learning, Reasoning,
Knowledge, Action and Action Selection to
create agent features. The Complete Model
perform Human like functionality like
Question Answering, Dialogue based
talking, robotic action, (image, object, and
human) recognition,speech understanding
and generation, text applications.
n n n n n
C ⊎ C′ → ⎛⎜E ⊕ ⋃ F ⊕ ⋃ A⎞⎟⋃[A (⊗) ∨ A ( ⊙ )]⋃⎛⎜E′ ⊕ ⋃ F ′ ⊕ ⋃ A′⎞⎟ EM (aqi) = ∑ Ψi (Ex ), ∀ i ⇒ Ci
⎝ i=1 i=1 ⎠ ⎝ i=1 i=1 ⎠ (3) i=1 (6)
The Eq. (6) shows a general experience model [EM (aqi) ].
In the equation, ⊕ denotes simple combination of Entity, Facts, and
Associations to construct a concept. On the other hand, A (⊗) represent
a function to interconnect two concepts C and C′ where both of them Knowledge model in cognitive information technology structure
share common associations. If C and C′ do not share common Definition 0.7 (Knowledge). Knowledge (K ) denotes that it is a
associations then the associations are represented by A ( ⊙ ) . cognitively generated graphical, rule based or computational model
representation of evidences inside the memory; develop by the agent
Theorem 0.1. The general concept model [CM (aqi) ] for a cognitive agent using cognitive computing technique specific methodology (Φ) ,
states that a concept model is a dynamic graphical structure following vector applying on experience pattern model (EM ) as shown in Eq. (7). So,
mathematics that possess the association vector the knowledge is a ready to go prototype for inference to solve real
Av ≜ {A ( ⊙ ) ∧ A (⊗)|(⊙, ⊗) ∈ → v } , giving association capability to link world problems.
up a concept C1 to other concepts Cn either using A ( ⊙ ) or by A (⊗)
respectively. Kn = {Φ(EM )|EM ⊂ (Ex [CM ]) ∧ CM ⊂ (E , F , A)} (7)
Theorem 0.2. A general Concept model [CM (aqi) ] for a cognitive agent (aqi ) Φ represents a function to convert the experience model into knowledge
states that it is a dynamic set of models (M ) develop by applying different for inference.
cognitive computing technology prototype (Ωi) . Theorem 0.4. A general Knowledge model [KM (aqi) ] for a cognitive agent
m (aqi ) states that it is a dynamic set of models (M ) develop by applying
CM (aqi) ≜ ∑ Ωi (Co), ∀ i ⇒ E , F , A different cognitive computing technology prototype (Φi) .
i=1 (4)
m
The Eq. (4) shows the general Concept model [CM (aqi) ] for a cognitive agent KM (aqi) ≜ ∑ Φi (Kn ), ∀ i ⇒ EM
i=1 (8)
(aqi )
The Eq. (8) shows the general Knowledge model [KM (aqi) ] for a cognitive
agent (aqi )
Experience model in cognitive information technology structure
Definition 0.6 (Experience). An Experience (Ex ) is a human expert
decision di for an instant (i) of a concept model (C ) representing a case Interface model in cognitive information technology structure
for a situation in a real world based on Entity E having Facts F link with Definition 0.8 (Interface). An interface (I ) denotes that it is
Associations A. Therefore, the instant experience pattern is given by Eq. combination of hardware (HC ) and software (SC ) components to
(5). perform particular perception skill (Ps ) and action skill (As ) by the
cognitive agent using the individual knowledge model (KM (aqi) ) as
⎧ m n
⎡⎛ ⎛ ⎞⎞ ⎤⎫ shown by Eq. (9).
Ex ≜ Ψ(C ): di ⋂⎢ Ei ×
⎨ ⎜ ∑ Fj⋃⎜ ∑ Ajk ⎟ ∈ Ci⎥
⎟ ⎬
⎢
⎣⎝ j=1 ⎝ k=1 ⎠⎠ ⎥
⎦⎭ (5) In ≜ Θ(Hc , Sc , KM ): {(Hc⋃Sc ) ⇒ Ps ∧ (Hc⋃Sc⋃(KM )) ⇒ As } (9)
⎩
Theorem 0.3. A general experience model [EM (aqi) ] for a cognitive agent Theorem 0.5. A general Interface model [IM (aqi) ] for a cognitive agent (aqi )
(aqi ) states that an experience model structure dynamically combines states that it is a dynamic interface model to interact with real world
different patterns of instant Ex considering every possible Ci creating the generating decision Dm for specific perception Ps or action As , comprising
experience behavior pattern based on learning from human expert. hardware and software components for performing audio and natural
5
Fig. 2. Cognitive Information Technology (CIT) Modular architecture. Figure Fig. 3. Cognitive Information Technology (CIT) Architecture Information flow
shows three layer (Interface, Enactive, and Cognitive layer) modular archi- in Cognitive Computing Agents. Figure shows Cognitive Computing information
tecture. Interface Layer performs “Reflexive Process” using Perception Module handling inside CIT architecture that performs three skills(Sensing,
and Action Module. Enactive Layer performs “Deliberative Process” using Comprehend, Action). Sensing Skills are Attention based perception of Audio,
Knowledge Module and Decision Module. Cognitive Layer performs “Reflective Visual, Sensor and Textual Information. Comprehend Skills develops
Process” using Cognitive Module and Meta Cognitive Module. Input to Interface Experience Information from Teacher, Web, and Social Interaction. Action skills
Layer is perceptions (Audio, Visual, Textual, and Multimedia) from create Plans,and Models Info. It choose best plan/model taking decision to
Environment which generates “Reaction” and Output is Action reach the goal. In the Figure, CIT handles“Query information”
(Motor,Speech,Textual). Input to Enactive Layer is Attributes (from Perception) (Question,Commands,Statements,Images,Sensor Data). Input Interface is
which creates “Facts” and generates Decision (for specific Action). Input to (Computer Vision, Audio Processing, Sensor Processing) that give “Dialog
Cognitive layer is Experience (from combination of facts) which triggers “be- Information”. Cognitive Processor apply Natural Language and Image
havior” according to which Meta Cognitive Module develops Adaptive Plans Processing, and represents “Knowledge Information” stored in Information
that turn on respective decision and performs certain Action to reach the goal. Storage. Experience Handler takes “Experience Information and produce
Results Information”. Cognitive Processor give Filtered Results and provide
“Best Action Information” to reach goal. Using “Feedback Information” CIT
language processing, computer vision, robotics, sensor acquisition and data updates the stored experience for better future performance.
mining. The interface model acts as a bridge between Cognitive world and the
real world.
creating and storing knowledge using Semantic Web and Networks, and
m reasoning using Inference Engine having certain rules or models. So,
⎧⎛ ⎫
IM (aqi) ≜ ∑ Θi (In), ∀ i ⇒ EM ⎞⎟ → Dm (Ps, As ) ⎬
⎨ ⎜ i=1
combining cognitive computing and agent technologies into a single
⎩⎝ ⎠ ⎭ (10) system will provide better ability to solve real world problems in
comparison to previous humanoid system.
The Eq. (10) shows a general Interface model [IM (aqi) ] for a cognitive agent
The Central goal of cognitive agents is developing and detecting
(aqi ) .
experience based knowledge base in specific state framework and then
assisting in the procedure of creating a decision on the action by experts
in their particular area. The next step is performing that action on the
Proposal of CIT architecture and framework
environment.
Many researchers have thought that both cognitive agents and ex-
Proposed cognitive computing agent architecture: CIT
pert systems are same because these two objects have knowledge base
as the main component. The main difference between them is how they
CIT Cognitive Architecture as shown in the Fig. 2 has three layers
develop and use knowledge base. Experts systems use pre-programmed
and performs reactive, deliberative and reflective functionality like
rule logic in every situation, while cognitive agents act more like hu-
humans by developing Concept, Experience, Knowledge and Interface
mans by using the past experiences and more significant is to discover
models. The architecture modules work on the principles of Cognitive
outcome according to human searching plan and receive a certain level
Computing and Artificial Agents which are currently the most important
of its prospect than find perfect result (Duris, 2018).
areas of research in the field of Artificial Intelligence. On combining
The Cognitive Computing agent mimics the human-like cognitive
both the fields, it leads to a new hybrid approach known as “Cognitive
skills for sensing, comprehending and taking decisions on answering
Computing Agent” technology as shown in Fig. 3. A “Cognitive Computing
queries as shown in Table 2. Fig. 4 presents the interface model in-
agent” is an entity having artificial skills of executing and completing
troduces these skills artificially by taking help of latest technologies like
the specific tasks in the similar way as we humans do. CIT architecture
Computer Vision for sensing objects under view, and Audio Processing
based Cognitive Computing Agents can be Expert Systems (software
for hearing the queries put by users regarding the object which the
agents) or Humanoids (robotic agents). The Agent technology can pro-
humanoid is seeing. Secondly, for performing comprehending skills, the
vide Humanoids with the human like properties of autonomy, percep-
cognitive humanoid uses Natural Language Processing for Under-
tion, ability to learn, intelligence, and cooperation. On the other hand
standing the user question linguistically, and use semantic Knowledge
the Cognitive Computing technologies gives the humanoids speaking and
model written in XML for answering questions about the object. Lastly,
listening power using Audio Processing, Visual power with Computer Vi-
the humanoid utilizes inference query engine for getting answers
sion, understanding linguistics through Natural Language Processing,
6
Table 2
Cognitive skills performed by Cognitive Computing Agent Using Interface Model Inside CIT architecture.
Interface Model: Object
Cognitive skill Cognitive Computing Technique Purpose
Sensing Speech Processing Listening and replying queries from Users regarding a particular situation
Computer Vision Focusing and identifying Objects, faces and things under view. Learning features and then recognizing the entity
Sensor Processing Accumulating sensor information and processing it for external device controls like IOT (Internet of Things)
Comprehending Natural Language Processing Understanding User Queries and generating answers regarding the queries
Knowledge Processing Creating knowledge from experiences and using storing Knowledge Models (Semantic, Computational, Rules) to
answer the queries given by Users
Inference Engine Getting answers from a fixed knowledge base from human experiences about the query using rules/model inference
or knowledge extraction
Rules, Machine and Deep learning Inference knowledge models (Machine, Deep learning, and rules) to provide human like expert decision
Inference
regarding the object. Due to increase in the demand for ASR based artificial systems many
lot of past research done by companies and academic research centers
Sensing process inside CIT architecture for perception using feature extraction based template matching and probabilistic
models. But, none of them succeeded to give better results. So, it be-
Speech processing came tough to identify which of them we require for developing CIT
speech recognition system. So, the work created a new hybrid speech
Fig. 5 shows the speech model followed inside CIT architecture feature template and deep learning model approach for getting speech
model for Automatic Speech Recognition (ASR) having “Speech to recognition. First step is speech feature template done in terms of MFCC
Text” and “Text to Speech” conversion. “The primary aim of speech (Mel Frequency Cepstral Cofficient) to create feature vector and then
recognition and speech response is to have natural interrogation among each feature vector given to the step RNN (Recurrent Neural Network)
humans and CIT based cognitive agents systems through speech dia- trained to get the recognition Wang and Chen (2018). Therefore, this
logue, where natural denotes resemblance to the approach people in- speech processing technology in CIT based on certain component of the
teract with each other” (Ding & Shi, 2017; Motamed, Setayeshi, & quality of the acoustic model, the depths of language model for un-
Rabiee, 2017). ASR provides CIT based artificial systems the capability derstanding human voice, and the number of instances in the speech
to increase the communication understanding between users and the dictionary. Therefore CIT architecture ingest both the technologies for
agents systems. Automatic Speech Recognition (ASR) and Text to Speech (TTS) con-
The CIT automatic speech recognition (ASR) systems and text-to- version respectively as shown in Fig. 6.
speech (TTS) systems does fairly well in trying to match human speech
process due to its speech understanding algorithm. The CIT system, Evaluation of audio perception
consumes the state-of-the-art technology based on “Hybrid Model based The first evaluation performed is to select speech recognition tool
on Personal Grammar with Deep learning Model” functioning at error for CIT based Cognitive to do ASR (Automatic Speech Recognition). The
less than 10 percent. primary algorithm technologies chosen for evaluation are feature
Fig. 4. CIT Based Cognitive Computing

Agent Technologies for Application
Development. Figure shows “Speech
Recognition” technology for Speech to Text
and Text to Speech conversion. “Computer
Vision” for image and visual recognition.
“Natural Language Processing” for natural
language understanding and generation.
“Knowledge processing” for knowledge re-
presentation. “Information retrieval and ex-
traction” for document and answer extrac-
tion. “rules, machine learning and
computational models” for decision in-
ference. “Robotics, motors and speech” for
actions.
7
speech recognition) software services.

• Estimate quality measures by comparing the recognized text with
actual phrases.
Two quality measures taken for quality measures.
• WER (Word Error Rate): A metrics to measure ASR performance,

considering minimum number of word edits due to Substitutions (S),
Insertions (I), Deletions (D) during the conversion of a reference
transcription phrase into another hypothesis phrase.
– The general EDIT distance is given by
S+I+D
ED =
N (11)
WER is given by
WER = ED × 100% (12)
– For performing ASR, all the parameter are weighted alike but
can have different weights.
– The general rule is lesser the counts of EDITS, denotes phrases
are more similar to each other providing good speech recognition
Fig. 5. Speech Processing Functionality in CIT Based Cognitive Computing quality.
Agents. Figure shows a “Query Speech Signal” as input and “Response” as
output Speech signal. The CIT framework is enabled with Speech to Text and
• TRP (True Recognized Phrases): A measure to recognize the total
percentage of exact phrase matching.
Text to Speech Conversion.
EPR
TRP =
N (13)
– It is a measure denoting the accuracy of the performance. Greater

the accuracy value the better the performance of the system.
After doing the test, the most appropriate tool found is “Deep
Learning model” based on deep learning techniques. The result does not
improve or matches the industry self-proclaimed 8% WER using deep
learning model. But, by using our test data on proposed method, the
outcomes are still inspiring. Proposed ASR as shown in Table 3 attained
17.4% of WER and 76.2% of TRP. The classical probabilistic model
known as Hidden Markov Model(HMM) and feature template re-
presented by Mel Frequency Cepstral Coefficient(MFCC) respectively
has the worst percentage of WER of 72.2% and 78.9% WER, while TRP
Fig. 6. CIT Complete Structure System for Speech Recognition to Speech results are 26.8% and 19.6%.
Translation. Figure shows speech “Attention” block for speech focus. “ASR”
block for automatic speech recognition. “NLP” for processing natural language.
“Semantic Extraction” block for extracting Subject, Predicate, and Object. Computer vision
“Inference” block for decision of particular action. “Action” block for doing
motor/audio task. “TTS” block for text to speech conversion. Fig. 7 shows the Vision processing model using CIT architecture.
Using Computer vision in cognitive computing based CIT understands
the principles of human vision and uses them as inspiration to advance
template (MFCC), probabilistic model (HMM) and Deep learning Model
artificial machine vision systems. Computer vision in CIT architecture
(RNN). these implements, the primary objective is to select the best tool
helps to provide visual attention, doing human-like visual searches,
for the purpose. The first step performed is creating phrases and re-
object discovery, object and person tracking, image captioning, scene
cording them to store as voice records. As Cognitive agent is a general
categorization and proving localization and mapping (Boukezzoula,
purpose humanoid, so the voice phrases were having general sayings
Coquin, Nguyen, & Perrin, 2018; Zhang & Liu, 2014). CIT computer
used by us in everyday speaking (questions, commands, and state-
vision method is shown in Fig. 8. CIT architecture has the modular
ments). The short sentences are “bring me a glass of water.”, “what is your
capability of attention in which it focuses on a particular aspect of the
name?”, and “the capital of India is New Delhi.”. There are around 300
real world and domain-specific environment. In CIT the first stage is to
different phrases for this test and compared diverse situations like
implement pre-attentive based parallel processing to find the region of
gender (M/F), age (young, adult and old person) and contextual noise
environments (having noise and no noise), as well as additional mea-
Table 3
sures with devices like (microphone, mikes, etc.).
Selection of Tool based on Performance for Speech Recognition using Accuracy
The subsequent sequential steps performed for testing ASR perfor-
(A) as evaluation parameter.
mance:
Algorithm WER TRP
• Provide recorded audio file dataset having short phrases to the re- (RNN) Deep Learning Model 17.4% 76.2%
spective software and API services. (HMM) Probabilistic Model 72.8% 26.2%
• Obtain the recognized text from these individual ASR (automatic (MFCC) Template Model 78.9% 19.6%
8
Table 4
Selection of Tool based on Performance for Computer Recognition using
Accuracy (A) as evalution parameter.
Algorithm Accuracy (A)
Proposed Feature Vector Model 79%

CNN Deep Learning Model (Caltech-256 dataset) 81%
CNN Deep Learning Model (Imagenet dataset) 83%
template matching comprising of colour, shape and texture Ding et al.

(2017). The second algorithm is state of the art, deep learning using
Convolutional Neural Network (CNN) (Boubenna & Lee, 2018; Guo
et al., 2016; Nweke, Teh, Al-garadi, & Alo, 2018). For testing the per-
formance of the proposed method, the evaluation parameter taken is
“Accuracy” denoting percentage of correct semantic visual object
Fig. 7. Computer Vision Functionality in CIT Based Cognitive Computing classification. The proposed method for CIT is feature template
Agents. Figure shows CIT architecture ability to recognize objects using matching comprising of colour, shape and texture. The performance
“Computer Vision” and give audio and motor “Response” for User Query. gives 79% accuracy. On the other hand, the deep learning model based
on convolution neural network applied with Caltech-256 dataset and
Imagenet dataset performs with accuracy 81% and 83% respectively.
Even though deep learning performance is better than proposed feature
template method, the selected algorithm the feature based method. The
main reason behind is that the for online new learning of object classes
is not possible with deep learning. The deep learning models has to be
trained to recognize the object class, which is not an easy task currently
at run time in real world scenario (Table 4).
Comprehending process inside CIT architecture for user queries
Natural language processing
Fig. 9 shows the Natural Language Processing model using CIT ar-
chitecture. Once the Cognitive Computing Agent has gone through the
visual and speech sensing process, it only has the text that imitates what
Fig. 8. CIT Complete Structure System for Object Recognition to Action. Figure the user queried and some numeric features vector for the visual image
shows “Visual Attention” block for focusing on real world images. “Pre representing the object. The next stage is applying Natural Language
Attentive” block helps in finding various Regions of Interest (ROI). “Post Processing (NLP) as a cognitive computing technique. In this juncture,
Attentive” block selects a particular “Region of Interest”. “Image Element” the humanoid comprehends what the user said, after converting the
block extract image features. “Object Recognition” block uses artificial in-
user inquiries about the object into simple text format using Speech
telligence tool to recognize the object. “CIT” block decides the necessary action.
Processing. The NLP is certainly more complicated and challenging than
the method of speech recognition because the natural human language
interest. Secondly, the system does the final post attentive stage per- comprises of the world situation with its representation as context
forming focused, serial processing by taking an individual region of based semantics (Abacha & Zweigenbaum, 2015). So, the NLP is a way
interest one at a time and identifying and selecting each entity. At-
tention helps to prioritize and decides what and where to look in a
complex scene and which thing is to process first. It is especially ne-
cessary for performing real-time vision system process. Through at-
tention, CIT architecture decides if any action is needed or not and
choose to take next steps like recognizing the object, know what is
present, grasp the object, where to move and what to do. It is essential
in humanoids and robotics. Attention also helps for human-robot in-
teraction by giving the ability to the humanoid and human user to have
combined attention to get focus on same entity. In CIT architecture for
computer vision first, pre-attentive implementation is computing sal-
ience map based on colors. Secondly, it finds connected components to
determine the salient blobs. In the next post-attentive stage it selects
individual salient blob and fit a rectangular boundary. It improves
shapes of blobs using segmentation process to get the best output. Using
stereo vision and depth processing helps to create blobs of 3D map
object models to detect real-world objects by matching features stored
for individual objects in the knowledge base. Fig. 9. Natural Language Processing Functionality in CIT Based Cognitive
Computing Agents. Figure shows that CIT architecture has ability of “Natural
Evaluation of computer vision tool in CIT architecture Language Understanding” for (Commands, Statements, and Questions). The
The proposed CIT based method utilizes two technologies to com- “Knowledge Search and Inference Block” performs information retrieval and
pare the performance. First is the proposed Hybrid feature vector extraction to get Answers.
9
Fig. 11. Question Analysis Procedure Inside CIT Architecture. In the Figure the
main work is done by “Question Analysis” block. Initially it takes the Query and
determines Entity, Question Type, and Predicted Answer Type. “Question
Analysis” block also performs semantic representation of the query. It includes
Fig. 10. Natural Language Processing Structure Inside CIT Architecture. “NLP
combines Named Entity, Associations, and Parts of Speech for understanding
Parser” block performs parts of speech tree development and knowing (Subject,
the query. “Semantic relation” block creates knowledge in the form of (Subject,
Predicate, Object). Sentence Analyzer decides Sentence type as (Command,
Predicate, Object). Query analysis block also creates semantic query to get
Sentence, Question). “Relationship Finder” block finds relationship between
answer. The CIT architecture always provide multiple answer, from which the
entities to create Knowledge. “Fact Label” processing helps in retrieving and
best answer taken according to its reward and confidence.
extracting answer to do specific action.
to understand the meaning of the text (Kreimeyer et al., 2017). In this independence?”)
work, using the Natural Language Processing (NLP) the system per- 4. How Type Questions (Way) (e.g. (i) How is India government
forms language analysis as shown in Fig. 10 for deciding the command, chosen?”, (ii) “How can India improve in agriculture?”)
question, and statement type and take necessary action. The first step 5. What Type Questions (Definition): (e.g. “what is the capital of
for Natural Language Processing in the CIT cognitive architecture is to India?”)
implement and create the parts-of-speech tree. The tree is passed to the 6. Factoid Type Questions (Expected answer is an entity, providing specific
sentence analyzer and concludes if the sentence type is command based information about India) (What, Which, How) Type A: predictable
(imperative), query-based (interrogative), or statement/notification answers resemble to (place, person,…) entities (most frequent or at
(declarative). After the sentence type is known, the specific rule logic is least that we remember during talking about India), expressed in
used to get the concept of the phrase for further processing by the CIT general with “what”, “which” and “how” (e.g. “What is the com-
system. The sentence analyzer takes the parts-of-speech trees given by mercial capital of India?”, “How to visit tourist places in India?”,
the NLP framework using Natural Language (NL) parser to decide “Which are the tourist places in India?”, Which city is famous for
which type of statements the user input fits best. The analyzer de- monuments?”) (When, Where, Who, How) Type B: predictable an-
termines sentence type as follows: If the user input comprises a noun swers expressed in general with when (Time), where (Location), who
phrase at the starting of the sentence, then consider it as declarative. If (Person),and questions with how (Quantity) (e.g. “When did India
the sentence contains a question labels denoting (W/H) or Yes/No type, host Asian games?”, “Where do I get temples in India?”, “Where
then it is interrogative. Rest of user input sentences are imperative. would mountains located for climbing?”, “How much money is
Classifying Commands As Declerative Sentence: In Natural Language needed to visit India?”, “How many times India fought wars?”)
Processing (NLP) declarative sentences considered as syntactically 7. What are Type Question (Lists) (e.g. (i) “What is the cultural city of
complex sentences. A declarative phrase always denotes a statement as India?”, (ii) “What are the qualities of temples in India?”, (iii) “What
(Command/Information). In a declarative clause, generally the subject are the dishes prepared in India?”)
comes before the verb, and it practically finishes with a period. CIT 8. Specific Type Questions (Regarding India description) (e.g. “India with
architecture propose to understand the user inputs which include large population size. What are the religion in India?”)
commands in the form of statement (i.e. “I want to be a good re- 9. Multi Type Questions (Chained) (e.g. (i) “What are occupations in
searcher.”) or sentences that simply comprise information (i.e. “The India and what is scope of business?” (ii) “What are the rivers in
capital of India is New Delhi”) So, it is necessary that the declarative India? How the rivers provide water? What are the cities built on
sentences must be treated differently, depending on their function. CIT river bank?”)
architecture keeps information based declarative clauses inside the
knowledge memory relating to the entity. On the other hand, the Query Analysis and Semantic Translation Procedure Fig. 11 shows the
command based sentences passed to the CIT action system for further general method inside CIT architecture and the approach followed in
decisions. this work for question analysis involving: (i) mining question char-
Classifying Question as Interrogative sentences: The CIT architecture acteristics such as entities and its semantic associations and (ii) con-
has the power to understand and answer questions. The work proposes structing equivalent Question triplet represents subject, predicate, object,
a practical classification of natural language questions for answering subject attribute, predicate attribute, and object attribute for performing
questions focusing on general knowledge on the topic (India) into fol- LINQ and SPARQL queries to get answers from annotated Documents/
lowing categories for determining the question type: Experience. The paper focus on question related to India: Factoid, Yes/
No, Reason, Condition, Definition and List for implementing Question
1. Yes/No Type Questions (Truth): (e.g. “Can India be the next super- Answering System based on CIT Framework.
power?”) The procedure performs the given representation of the extracted
2. Why Type Questions (Explanation and Reason): (e.g. “Why India information to obtain Triplet reformulations of the user queries with its
focus on yoga?”) attributes. The proposed method involves:
3. When Type Questions (Condition): (e.g.“When India got
10
Table 5
General Question Analysis-Examples (QT: Question Type,PAT: Predictable Answer Type, T: Target, S: Subject, Predicate:P, O: Object, SQ: Semantic Query).
Examples
Analysis Definition Y/N Factoid (WH)
1
User Question What is India? Can India become superpower? How to visit cities in India?
Question Type(QT)2 What is Can How
Predictable Answer Type (PAT)3 Definition Reduce (Boolean) Preparation
Question Simplification (T)4 India India, become Superpower India, visit cities
Semantic Representation (S,P,O)5 (India, definition, Answer) (India,become superpower,Answer ) (India,visit places, Answer)
Semantic Query (SQ)6 (India) with PAT = Definition become (India,superpower) with PAT = Boolean Visit (India, places) with PAT = description
1
User enters the queries about Subject.
2
Determining Question type like (What, Which, Where, When,…).
3
Knowing the expected answer type like visit, establish, business.
4
Extract the target from the queries e.g. capital of India.
5
Prepare (Subject, Predicate, Object) for the query.
6
Evaluate RDF/XML query.
1. Identifying the question type (e.g. What, Why, When Where, Yes/ the semantic relationship as Association. At last construct LINQ/
No, Definition,..) SPARQL query. For Factoid type questions, predictable answer type is
2. Determining the Predictable Answer Type for WH and Yes/No determine by matching the NL user questions with physically con-
questions. structed lexical arrays. A set of pattern is made for every question type.
3. Creating the question’s in a new simplified form. These patterns use enquiring pronouns, syntax based analysis and
4. Entity Recognition based on the new simplified form of the question. standard words in order to recognize a set of corresponding questions.
5. Extraction of semantic relations as Question triplet: sub- Answering Question Regarding Entity: Table 6 shows the approach of
ject,predicate,object based on the new question form. CIT based cognitive computing agent uses for searching answers based
6. Construction of SPARQL or LINQ Query for answering the questions. on Semantic Ontology. According to each user question, the cognitive
computing agent builds semantic queries based on the predictable an-
Table 5 presents the questions are known using linguistic natural swer types. The queries are in the form of triple (Subject, Predicate, and
language understanding rule to determine the categories: Yes/No Type Object) obtained from question analysis. It corresponds to a specific
Entity Queries: These types of Queries recognize by Natural Language illustration of the important features of the question (i.e. entities, se-
Understanding (NLU) rule that identify the absence of a WH pronoun mantic association, information about the entity like India, etc.). In the
and find the definite structure of Yes/No questions. Simple logic answer search phase, the cognitive computing agent associated to each
transforms the question into the triplet form. Questions analysis will Semantic RDF query into less precise queries comprising fewer con-
consist of determining entity, extracting relationship and constructing straints (e.g. only entities and associations, or only entities, etc. Such
the LINQ/SPARQL query (Zhu & Iglesias, 2018). Definition Type Queries queries are known as relaxed forms of the initial query). These simple
for Entity: This type of question is detected with Natural Language question forms are constructed dynamically by shortening one more
Understanding like: What is Y? or What does Y mean?, where (Y de- triple form of the user query at every stage.
notes a noun phrase). In this case, Y is the focus of the question. The Classifying Command as Imperative sentences: The imperative sen-
simple form of the query will be Y. Question analysis will consist only of tence configuration is used firmly for passing commands. They are a
knowing the India as an entity and creating query. simple type of phrases regarding syntax. The grammatical structure can
Factoid Question for Entity: This type of question is known by means be categorized just as beginning with a verb. Imperative sentences can
of a fixed simple rules on user queries like knowing the First Word of be as simple as “Go!” or as complex as “Bring me the watch from the
the Question (FWQ) representing: bedroom.” Imperative sentences at all times categorized as command
FWQ (Question): (What/How/Which/When/Why/Where). in- phrases for CIT architecture. In maximum conditions, the action and the
dicates that question (Question is a WH question) and then determine object of the imperative sentence have the direct object relationship.
the predictable answer type, also check whether the query is a list type For sentences that only contain single-word instructions (i.e., “Go.” or
question and constructing the simple favourable form for answer. Next “Do”), the CIT architecture only focuses on action object. Many times
steps will consist of entity recognition by applying tokenization, Parts of various kinds of commands do not comprise straight action objects. In
Speech (POS), Parsing and Name Entity Recognition (NER). Then, know its place, the instruction depends on a propositional expression, such as
“in the bedroom” or “to the kitchen.” For these kinds of instructions, the
preposition relationship helps CIT architecture to obtain the correct
Table 6 action object for the command.
Entity Answer Searching Models According to Question (E:Entity (e.g. India,
Russia), AR: Association Relationship) (e.g. (visits, capital, aircraft)).
Evaluation of natural language understanding
Answer Model For evaluating Question understanding ability TREC-8 to TREC-11
unstructured Question benchmark database is chosen for judging
Question Type Question Model Simple Form Answer Type
Question Classification knowing Question Type (QT) and Predictable
Definition 1
What is E? E E is Sentence…. Answer Type (PAT). Also, same database considered for doing semantic
Yes/No2 Can E1 AR E2 E1 AR E2 Yes or No understanding of Subject (S), Subject Attribute (SA), Predicate (P), and
Factoid3 How Can AR from E1 (E1,AR,Answer) Information Predicate Attribute (PA). In TREC QA, there is a division among ‘fac-
List4 What are AR of E1 (E1, AR, Answers) A list of entities
toid’ against ‘definition’ and ‘list’ questions. The TREC-QA (8 to 11) has
1
Definition type question, returns an answer type as sentence. following number questions: TREC 8–200, TREC 9–693, TREC 10–500,
2
Boolen type question returns an answer type as Yes or No. and TREC 11–500. The total number of questions inside TREC QA is
3
Factoid type question returns an answer type as information. around 1893. From the TREC-QA 150 questions selected for evaluation.
4
List type question returns answers as set of entities The query items denoted as (when, what, why, where, how and who).
11
Table 7 Answer Reciprocal Word Rank), TRR (Total Reciprocal Rank), TRWR
Number of Correct Question Understanding and Semantic Understanding by (Total Reciprocal Word Rank), MRR (Mean Rank Reciprocal), and P
CIT framework from Selected Questions (TREC 8 to 11). (Precision).
Question Question Question Semantic
Numbers Understanding Understanding • FHS (First Hit Success) – FHS is equal to 1 if the first answer given by
the system is correct else FHS is assigned a value 0. The user expert
What 30 27 24
will focus on the first response offered by the system. If only the
When 25 24 23
Where 25 24 22 initial answer is given and taken as the topmost priority and the
Why 25 22 21 system provides the all the first solution as for right, then the
How 25 23 22 average value of FHS denotes the RR (Recall Ratio).
Who 20 19 17
• FARR (First Answer Reciprocal Rank): The value of FARR varies from
Total 150 139 129 1 to 0. If the first answer is correct then FARR = 1/1 = 1. If the
fourth answer is right then FARR = 1/4 =.25, if none of the re-
sponse returned by the system is appropriate then the value
The instances are from open domains and asked in everyday life. It FARR = 0. So, higher FARR signifies user efforts are less and vice
gives us the ability of the system to analyze what the user is asking for versa.
the system. For evaluating the Natural Language Understanding (NLU)
part of CIT based CITHCog robot, the metrics obtained from the con-
• FARWR (First Answer Reciprocal Word Rank): FARWR represent the
total number of words an expert user read to reach the correct word
fusion matrix. So, the classification problems divide into two parts. answer. If in a reply the fifth word describes the right answer, then
Firstly, based on Question Classification (Question Type (QT), the value of FARWR = 1/5. The Humans speak according to sac-
Predictable Answer Type (AT)). When the robot understands the cade, which denotes a few words at an instant. For short answer, an
question and provides correct output (QT, AT), then classifier gets the expert user can understand and interpret the response in a single
label 1 otherwise 0. Secondly, if the robot recognizes the semantic saccade. If FARR is high, then the human user has to do less effort
(Subject (S), Predicate (P), Object (O)) and its respective attributes and vice versa.
(Subject Attribute (SA), Predicate Attribute(PA)), then the classifier
gets a label one 1 else it is assigned a 0 value. Table 7 shows that the
• TRR (Total Reciprocal Rank): TRR denotes a metrics to evaluate
multiple correct answers instead of just evaluating a first right re-
accuracy of question and semantic understanding is 92.6% and 86% sponse. If third and fifth answers are correct then TRR = 1/3 + 1/
respectively. The CIT framework shows a good capability in handling 5 = 8/15. TRR shows diminishing value according to utility func-
questions posed by users. tion denoting likelihood for the active user to retrieve the accurate
response.
Evaluation of CIT cognitive computing agent in humanoid for question
answering ability
• MRR (Mean Reciprocal Rank): The mean reciprocal rank is a statis-
tical quantity for assessing any method that creates a list of probable
In general, the information retrieval (IR) system utilizes precision, answers to a section of queries, well-ordered by the probability of
recall, and f-measure to evaluate performance. For question answering rightness.
(Q/A) system, another type of evaluation metrics needed, because user
efforts, time, utility, judgments, and influence must also be taken as In Table 8, the First Hit Success (FHS) for CIT based Question An-
essential criteria. So, the assessment of the system performance done by swering framework is around 98%. So, the system shows the excellent
improving TREC-QA quantitative metrics. The metrics are FHS (First performance in getting the first answer as correct. The CIT system has
Hit Success), FARR (First Answer Reciprocal Rank), FARWR (First
Table 8
First Hit Success (FHS), Fast Answer Reciprocal Rank (FARR), First Answer Reciprocal Word Rank (FARWR), Total Reciprocal Rank (TRR) and Reciprocal Rank (RR)
for Evaluating Question Answering based on CIT framework from 50 Selected Questions as Test Input.
Q.No. FHS FARR FARWR TRR RR Q.No. FHS FARR FARWR TRR RR
1 1 1 1/7 1.83 1 26 1 1 1/7 1.53 1

2 1 1 1/4 1.53 1 27 1 1 1/5 1.83 1
3 1 1 1/8 2.08 1 28 1 1 1/6 1.75 1
4 1 1 1/6 1.75 1 29 1 1 1/7 2.28 1
5 0 0 0 0 0 30 1 1 1/6 1.83 1
6 1 1 1/6 1.50 1 31 1 1 1/7 1.75 1
7 1 1 1/5 0.83 1 32 1 1 1/7 2.28 1
8 1 1 1/6 1.53 1 33 1 1 1/6 1 1
9 1 1 1/7 1.33 1 34 1 1 1/7 1.5 1
10 1 1 1/8 1.25 1 35 0 0 0 0 0
11 1 1 1/6 1.83 1 36 1 1 1/7 2.28 1
12 1 1 1/5 1.83 1 37 1 1 1/7 2.08 1
13 1 1 1/7 1.75 1 38 1 1 1/7 1.75 1
14 1 1 1/8 2.28 1 39 1 1 1/7 1.83 1
15 1 1 1/7 1.25 1 40 1 1 1/6 1 1
16 1 1 1/6 2.08 1 41 1 1 1/6 1.33 1
17 1 1 1/5 2.08 1 42 1 1 1/7 1.53 1
18 1 1 1/8 2.28 1 43 1 1 1/8 1.75 1
19 1 1 1/9 1.53 1 44 1 1 1/8 2.08 1
20 1 1 1/7 1.25 1 45 1 1 1/7 1.50 1
21 1 1 1/6 2.08 1 46 1 1 1/6 2.28 1
22 1 1 1/7 2.28 1 47 1 1 1/7 1.83 1
23 1 1 1/7 1.75 1 48 1 1 1/6 1.83 1
24 1 1 1/9 1.83 1 49 1 1 1/7 1.75 1
25 1 1 1/8 2.08 1 50 1 1 1/6 2.08 1
12
relationships between entities-entities, or entities-attributes to con-

struct facts as knowledge triplets (Chen, Argentinis, & Weber, 2016). By
means of definition, a triplet is an arrangement of three words in which
the first word represents〈subject 〉, the second word denotes 〈predicate〉
and the third word is〈object 〉. So, the triplet’s looks
like〈subject , predicate , object 〉. In order to better understand this, con-
sider the example:
“Rose is of red colour” The triplet from this part of object in-
formation will be:
Rose (Subject), colour (Predicate), red (Object) Many researchers
create knowledge from the web documents or unstructured text and
store it in the triplet format.
Evaluation of experience retrieval

In regard to experience based concept retrieval according to con-
Fig. 12. Knowledge Processing Functionality Inside Cognitive Information fidence value (rank), as done by CIT framework for question answering
Technology (CIT) Architecture. Figure shows creation of Knowledge base inside
(Q/A). The three main performance parameters are based on binary
CIT cognitive architecture. The “Knowledge Base” block stores the unstructured
relevance considering ranks and denoted as Precision@K (P@K), Mean
information as Subject, Subject Attribute, Predicate, Predicate Attribute,
Object, and Object Attribute. Average Precision (MAP) and Mean Reciprocal Rank (MRR).
confidence and reward based judgment. Therefore, the action re-

• Precision(@K): In Precision@K, it is necessary to set the rank
threshold ‘K’ and compute the % relevant experiences from the top K
presents First Answer Reciprocal Rank (FARR) equals to 1. The fra- documents/experiences, while ignoring the documents/experiences
mework also shows multiple rank based correct answer. The correctness ranked less than the threshold document value K. For example, let
of many answers measured using Total Reciprocal Rank (TRR). The us set the value of K = 5. Among the top 5, the relevant doc/ex-
TRR shows that generally, the CIT has a value greater than 1, which periences are at instances: 1st, 3rd, and 5th. Therefore the P@5 will
signifies that apart from getting the first answer right, the system also be 3/5, while the P@3 is 2/3.
has multiple answers representing high depths in the order. The Mean
Reciprocal Rank (MRR) is around 0.9 exhibiting the average of times
• Mean Average Precision(MAP): Considering the position of ranks for
each relevant experience K1, K2,…, Kp. Computing Precision@K for
the first response is relevant. each K1, K2, …, Kp is essential and determines average precision
(AP) which is equal to the average of P@K. For example, 1st, 3rd,
and 5th experiences are relevant then the average precision is given
Knowledge representation
by 1/3(1/1 + 2/3 + 3/5) representing 0.76. Mean Average
Precision(MAP) is the average precision of the multiple query/
Fig. 12 shows the Knowledge Representation used in CIT archi-
rankings. For example, the Average Precision (AP) for query1 is 0.76
tecture. The raw information gets stored as (subject, predicate, object)
and 0.82 then MAP equals to 1/2(0.76 + 0.82).
format in the knowledge base as XML files. Knowledge representation
and storage is the essential part for autonomous and cognitive com- • Mean Reciprocal Rank (MRR): For calculating the Mean Reciprocal
puting agents to make decisions (Kumar, Schmidt, & Köhler, 2017;
Lieto, Lebiere, & Oltramari, 2018; Wu et al., 2018). So, the primary Table 9
objective of CIT architecture uses the approach as represented in Fig. 13 Precision (P@K), Average Precision (AP) and Rank Ratio (RR) for Experience
to build and store concepts (CO ) representing human like knowledge CIT framework from 50 Selected Cases as Test Input to validate the Knowledge
model (KM ) comprising entity C (EO ) and Attributes C (ATO ) as object Model in CIT.
and physical characteristics respectively, Associations C (AO ) as
Question P@5 AP RR Question P@5 AP RR
1 0.6 0.76 1 26 0.8 0.95 1

2 0.8 0.95 1 27 0.4 0.8 1
3 0.6 0.8 1 28 0.4 1 1
4 0.4 0.59 0.5 29 1 1 1
5 0 0 0 30 0.8 0.89 1
6 0.8 0.89 1 31 0.8 0.89 1
7 0.8 0.89 1 32 0.6 1 1
8 0.6 1 1 33 0.8 1 1
9 0.8 1 1 34 1 1 1
10 1 1 1 35 0 0 0
11 0.6 0.8 1 36 0.6 0.76 1
12 0.4 0.8 1 37 0.4 0.8 1
13 0 0 0 38 0.2 1 1
14 0.2 0.50 0.5 39 1 1 1
15 1 1 1 40 0.4 1 1
16 1 1 1 41 0.4 0.5 0.5
Fig. 13. CIT Complete Structure System for Knowledge learning, 17 0.4 0.59 0.5 42 1 1 1
Representation and Storage to Perform Action. Figure shows that the archi- 18 0.6 0.76 1 43 0.2 0.58 0.5
tecture learns and store knowledge from Social, Web, and Teacher Learning. 19 0.8 0.95 1 44 0.4 0.8 1
The unstructured sentences are represented as Standard Semantic representa- 20 0.4 0.8 1 45 0.8 0.89 1
tion as (S,P,O) showing Subject, Predicate, Object and its Attributes. The 21 0.8 0.89 1 46 1 1 1
“Knowledge Store” block stores the knowledge in the form of XML/RDF file. 22 1 1 1 47 0.6 0.8 1
The XML file is called by cognitive Processor (CIT) to give decision and make 23 0.6 0.8 1 48 0.8 0.95 1
24 0.8 0.95 1 49 1 1 1
certain Action. Knowledge Structure in CIT is central unit for managing and
25 1 1 1 50 0.4 0.8 1
storing and retrieving information by Cognitive Computing Agent.
13
Table 10
CIT Cognitive Information Technology Architecture – Human-like
Functionalities.
Cognitive Agent Function Modalities Check Cognitive Computing
Functionalities technologies
Perception Audio(A) ✓ Audio processing and Deep

learning
Vision(V) ✓ Computer Vision and Deep
learning
Touch(To) ✗ Not-yet Implemented
Smell(S) ✗ Not-yet Implemented
Taste(Ta) ✗ Not-yet Implemented
Proprioception(P) ✓ information technology
Data Input(D) ✓ Text and Data mining, IOT
Attention Audio ✓ Digital Signal Processing

Visual ✓ Digital Image Processing
Fig. 14. Action functionality Inside Cognitive Information Technology
Architecture. Figure shows Action Functionality starts with taking inputs as “ Learning Perceptual(Pe) ✓ Sensor technology
Declarative(De) ✓ XML/RDF
Command, Statement, Question” and produce certain “Reward Based Action”
Procedural(Pr) ✓ Robotics
according to input type. Associative(A) ✓ Machine Learning and NLP
Non-associative ✓ Confidence Value Updation
(NA)
Rank, it considers rank position k of the first relevant experience.
Priming(Pm) ✓ NLP
The next thing is to determine Reciprocal Rank (RR) score = 1/ K .
MRR is the mean of RR across multiple queries. For example the Memory Sensory(Se) ✓ Buffers
Working(W) ✓ Buffers
MRR = 1/4(1/2 + 1 + 1/3) when the three queries are having the Semantic(Sm) ✓ XML/RDF
first relevant document are at the 2nd , 1st , and 3rd place and having Procedural(P) ✓ Robotics
individual Reciprocal Rank (RR) as 1/2, 1, and 1/3 respectively. Episodic(E) ✓ Semantic Web and NLP
Global(G) ✗ Not Implemented
Table 9 shows the evaluation of the system for retrieving experi- Action Selection Planning(P) ✓ Strategic Codes
ences from the knowledge corpus based on user queries. For evaluation Winner Take All(W) ✓ Selection Codes
Probabilistic(Pr) ✓ Random Decision Codes
50 questions are taken and asked the system. The experiences obtained
Predefined(Pr) ✗ Not yet Implemented
from the experience corpus are multiple documents. So, it is necessary Relevance(R) ✓ Selection Codes
to evaluate the experiences based on ranks. So, the quantitative metrics Utility(U) ✓ Utility Codes
relate to ranks based judgment. The Precision@5(P@5) of 11 queries is Emotion(E) ✓ Emotion Codes
1, while 12 queries have 0.8. Rest queries are having 0.6, 0.4, 0.2 and 0. Reactive(Ra) ✓ Rule Based Codes
Learning(L) ✓ Machine Learning Codes
Around 50% of user queries have P@5 equal to 0.8 or 1. The Mean
and Models
Average Precision(MAP) for the system is 0.82 representing good
Action Robotics(R) ✓ Robotics, NLP,ML,Audio
average precision value based on relevant experience retrieval. Also,
and Vision Processing
the MRR of the test queries is 0.85 representing a higher value. From Computer Vision(V) ✓ Audio/Image Processing,
these metrics, it concludes that the system’s experience retrieving NLP, Computer Vision and
ability is showing promising outcomes for answering human question. Deep Learning
NLP(N) ✓ Audio and NLP
Psychological(P) ✓ Computing Techniques
Action process inside CIT architecture for completing real world HRI/HCI(H) ✓ All techniques
Quantitative(Q) ✓ Arithematic and Logical
problems Processing
Creation/Building ✓ Robotics and Knowledge
Fig. 14 shows the action model used in CIT architecture. CIT ar- (B) Processing
chitecture decides the action by selecting a group of candidate decision Categorization/ ✓ NLP and ML
Cluster(C)
based on Confidence Value and rewards as shown in Fig. 15. The Con-
Decision-making(D) ✓ ML, Semantic Web, NLP,
fidence value is the average word frequency matching from the un- Inference Engine
structured text or semi-structured data. The successful candidate an- Games and Puzzles ✓ Logical Computing and
swers are then ranked and given as decisions. Another approach is to (G) Deep Learning
Human ✓ All
Performance(HPM)
Notes:
1 Functionalities are chosen as per human abilites.
2 Many Action Selection codes are under development in CIT framework, but it
proposes for future use.
3 Cognitive computing technologies for Actions are tentative plan and may vary
as per Users requirement.
use rewards dependent answering from social cognitive agent world.

The decision from a cognitive agent having highest rewards based on
the User Rewards is selected as the best decision. So either based on
confidence or from rewards taken as the best decision which ever is
Fig. 15. CIT Complete Structure System for Decsion Making Based on comprising of highest ranking. It is shown to the human user as the best
Confidence and Reward to Perform Action. answer doing the query. The cognitive computing performs actions
14
Fig. 17. Method for Building Object Recognition Experience Model.

Fig. 16. Method for building Object Recognition Concept Model.
n
Δ
based on the statement and command obtain from the inference engine.
H (COM ) ← ∑ H (COo )|H (COo) = Ω(H (EO ), H (FO ), H (AO ), H (AOM ))
i=1
The actions are either audio or motor movements. In the audio action,
(15)
the agent speaks the statement as a results using speakers by converting
text to speech (TTS). On the other hand, according to some given
command, the agents searches the action rule base and implement the Method for developing experience model
proper head, hand or leg movements. After developing the concept model for objects, the next phase in
So, Table 10 presents summary of CIT architecture with complete the strategy of cognitive decision information skill for object recogni-
human like functionality. tion is creating experiences H (EOx ) and experience model H (EOM ) as
shown in Fig. 17.
Case study on cognitive agent based object recognition using An Experience H (EOx ) signifies expert decision for object informa-
cognitive information technology (CIT) architecture tion H (dOi ) for a case (i) of an object concept model H (COM ) . So, COi
represents a case for a object under query having category as Entity
Method: cognitive information technology (CIT) power for object recognition (EO ) , consisting Object names as Facts (FO ) link with Associations (AO ) .
Therefore, the instantaneous experience design for the object is given
Method for developing concept model by.
In the cognitive agent based Humanoid system performing object The experiences by the human expert (EOx ) in the scenario of CIT
recognition, the first stage is to develop concepts H (COo) and concept Based Humanoid Object Recognition System (CHORS), helps the hu-
model H (COM ) as shown in Fig. 16. A Humanoid object concept H (COo) manoid cognitive agent to gain the knowledge about the object under
builds its instant entities H (EO ) signifying Object categories. These query in unknown situation. For learning the object attributes, the ex-
categories connect with the set collection of association tags H (AO ) perience model gets the human decision H (dOi ) from the teacher using
indicating relationship of symptoms. The set of selected attributes bears function Ψi which convert the object concepts into an agent experience
a resemblance to the object denoting its name. The groups of similar for creating object knowledge. The Eq. (16) shows the Experience
names signify the finite group of facts H (FO ) for the distinct category of H (EOx ) and the Eq. (17) shows the Experience Model H (EOM ) for object
object. recognition.
The concepts in CHORS(CIT based Humanoid Object Recognition Δ
System) helps the humanoid cognitive agent in understanding the query H (EOx ) = Ψ(H (COM )): {H (dOi ) ∩ [COi]} (16)
for recognition of object problems. For understanding the particular n
Δ
object query info, concept model accumulates the relation of object H (EOM ) = ∑ H (EOx )|H (EOx ) ≜ Ψ(H (COo )i , H (DOx )i)
category, name, and association as a XML or SQL based semantic re- i=1
presentation in the form of entities H (EO ) representing the category, → H (COo )i ∩ H (DOx )i (17)
facts H (FO ) representing the object name and association H (AO ) de-
noting the attributes for creating the relationship between the entity
and the facts. The Eq. (14) shows the Concept H (COo) and the Eq. (15) Method for developing knowledge model
shows the Concept Model H (COM ) for object recognition. When object learning framework creates Experience model, the next
initial step is to insert the decision inside an agent as a learnt model
Δ ⎧ using data mining technique. The cognitive techniques based learning
H (COo) = Ω(H (EO ), H (FO ), H (AO )): H (EOi )
⎨ model creates pattern information behavior for various objects as
⎩
shown in Fig. 18. The knowledge model H (K OM ) in the case of CHORS
m n
⎡ ⎛ ⎞⎤⎫ aids the cognitive agent to store full facts about different instants in the
× ⎢ ∑ H (FOj )⋃⎜ ∑ H (AOjk ) ⎟ ⎥
⎬ artificial memory. The Eq. (18) shows the Knowledge H (K On ) and the
⎣ j=1 ⎝ k=1 ⎠⎦⎭ (14)
Eq. (19) shows the Knowledge Model H (K OM ) for object recognition.
15
Fig. 20. Humanoid (CITHCog) Hardware structure for Cognitive Information

Technology (CIT) Architecture. Figure shows that (CITHCog) has human-like
physical structure. It has human like audio and visual structure to handle in-
frormation.
H (IOp ) ≜ Θ(H (SOp), H (K OM ), H (∐)): [O]

OO (20)
n
H (IOM ) ← ∑ H (IOp )|H (IOp ) ≜ Θ(H (SOp), H (K Op)) → H (AOp )
Fig. 18. Method for building Object Knowledge Model. i=1 (21)
H (K On ) ≜ Φ(M (EOM )): {H (EOx ) → K On ∈ (HOi )} (18)

Experimental design and results
n
H (K OM ) ← ∑ H (K On )|H (K On ) ≜ Φ(H (EOx ))
The primary focus of the research is to create human-like computing
i=1 (19)
systems. So, this work proposes a humanoid named CITHCog based on
CIT (Cognitive Information Technology) Architecture and its
Framework developed in C# and Python.
Method for developing interface model The Cognitive Computing Agent-based Humanoid system uses audio
The Last stage in the decision making framework is to generate an processing and computer vision to grab information from the real world
Interface Model H (IOM ) having human like decision skills to know the and bring it into the system. This cognitive intelligence can occur when
Object (D) and answer its queries as shown in Fig. 19. the system has humans like physical sensors and body. So, the cognitive
The interface H (IOp) is denoted by H (I [O]) . The Humanoid cogni- computing agent must have human-like embodied parts. The body of
tive agent provides the object recognition decision action H (AOp ) by
such kind of artificial cognitive computing system for performing audio
using current object under perception H (SOp) the individual humanoid
and visual perception must have eyes like apparatus for seeing, ears for
knowledge model H [K OM ]. The Eq. (20) shows the Interface H (IOp) and
hearing, and mouth for speaking. Therefore, the “Cognitive Computing
the Eq. (21) shows the Interface Model H (IOM ) for object recognition.
agent Humanoid (nickname-CITHCog)” as shown in Fig. 20 uses cameras
for visuals, microphones as hearing aid, and speakers to speak state-
ments. The agent is capable of synchronizing the hardware using an
intelligent software. All the connections transfer data to the cognitive
agent software that performs (Perception-Learning-Action) cycle.
CITHCog built on CIT architectural framework is a cognitive system
that associates capabilities in speech processing, computer vision, NLP,
semantic analytics, and machine learning techniques. CITHCog ad-
vances insights and becomes intelligent with every user interaction and
by adding new evidence at each particular instant.
The latest computing technologies in cognitive system are shown as
following. NLP, dynamic (active, web, and social) learning, generating
hypothesis (answer patterns) and evaluating solutions (based on re-
wards and confidence), CITHCog is anticipated to assist experts by
creating belief (best option candidates) from experience data, quicken
results, and decide the accessibility of supporting evidence to solve real-
world problems. CITHCog is a system to improve decisions by em-
powering humans to interact with artificial machines in a natural ap-
proach.
In current scenario experts and users are accustom to utilize classy
web-based search engines or database query systems to determine info
to provide decision-making. CITHCog, which also expedites the data-
driven search, but uses a different method based on CIT archictecture.
In principle, CITHCog powers human-like (perception, under-
standing, learning, and decision-making abilities) using latest com-
Fig. 19. Method for building Object Recognition Interface Model. puting and advanced analytics techniques.
16
Fig. 21. CITHCog’s Vision knowledge building for Query. Figure shows a real time image learning to recognize the objects. It shows the ability of CIT architecture to
learn individual item as we humans do.
CITHCog’s human like computer vision 2. The Humanoid goal HGq is Object Recognition OR and answer
questions regarding the objects OQA under consideration. The goal
To build Cognitive Computing agent based Humanoid System for are assigned to the humanoid cognitive agent aqi . Thus,
autonomously performing Visual Question Answering (VQA), it is es- HGq ← {Hgq1 = OR, OQA}
sential to actively generate visual perception power inside them for 3. Therefore the work develops cognitive skills inside humanoid cog-
viewing objects under focus, and also construct a feature representation nitive agents denoted by HCSaq . The first skill is (HCSaq 1) are de-
for it. Therefore, the humanoid uses Computer Vision service developed termining type (TO) and name (NO) using image (NO, TO ) , of object
inside the framework to complete the objective shown in the Fig. 21. under observation. Second skill (HCSaq 2) is answering questions
The Cognitive Computing agent uses both hardware setup and software (QA) about the Object (O) . HCSaq =
framework as explain below: {HCSaq (1) = OR (NO, TO ), HCSaq (2) = QA (O )}
Hardware: The Humanoid comprises of a stereo vision RGB camera
fixed on the top of its body. Both the cameras are attached on pan tilt So, the design creates individual cognitive skills HCSaq of Humanoid
servomotor giving human like neck motion for visual attention and are cognitive agents HAq .
placed at the height of around 0.7 meters, pointing downward at an The development of the humanoid skills starts by creating the
angle of 30 to 120 degree according to the perspective. The two concept model.
Cameras have a similar high definition resolution of pixels for grabbing
object as we do with our two eyes. Design of concept model
Software: The computer vision develops software configuration in C♯ The object concept model is design for Food as the Entity class and
language for windows forms. The software framework uses Aforge.net having subclass as fruits and vegetables H (EO ) . The names of the ve-
library for detecting visual camera interface working as eyes. The getables like (potato, onion, tomato,….) and fruits like (apple, banana,
camera captures the image of the object, and the computer vision self mango,…) are considered as sub entities. Fact H (FO ) shows the detail
made codes determines the color, shape and texture features. These attribute value. The relationships representing associations H (AO ) for
features are compared to the templates inside the object knowledge each entity with a detail attribute fact. It can be same or may be unique
model to ascertain the name of the sample under view. for each fact as shown in Fig. 22. The concept model for object is a
vocabulary by which the cognitive agent is able to recognize various
CITHCog decision model items in view. The Concept Model is a semantic knowledge re-
presentation about a specific domain. The cognitive agent uses the
When the humanoid see numerous objects in the real world, his concept model as a grammar for object, in the same manner as human
main aim must be to know the name of these objects, their qualities, do for recognition (Table 11).
purpose and use (Gharaee, Gärdenfors, & Johnsson, 2017). On the other
hand, individual teacher interacting with the humanoid must give it Design of experience model
knowledge about each object. So, it is necessary to design the human- The humanoid dynamically learns the food object model from a
humanoid interaction system capable of performing human like deci- human teacher using his experience using the model shown in Fig. 23.
sion making (Serrano, Iglesias, & del Castillo, 2017). At this point, a The system obtains the relationship between the entity and the attri-
human like humanoid system is needed to help the robot to have the butes for particular food item by visualizing objects appearance using
cognitive skills. On this path, the research work designs a novel CIT computer vision technology to get the image, its name and answers for
Based Humanoid System for object recognition (CHORS). The CIT based different attributes of the object as shown in Table 12. The attributes
Humanoid Object Recognition System (CHORS) is design as a case represent colors, shape, texture, and the food object category. The hu-
study using following parameters. manoid learns the food item features autonomously, and with the help
of the human experience support. So, the system can recognize and
1. The experiment design considers a Humanoid Cognitive Agent answer queries of objects of the same class, as we human do.
system HAq having a single cognitive agent given by Haqi .
HAq = {Haqi } Design of knowledge model
The Food knowledge model represents the knowledge in the form of
XML as shown in Fig. 24. It is a representation language that scientists
17
Fig. 22. Concept Model For Object

Recognition. Figure shows human like concept
building using Entity, Fact, Association, and
Attributes. “Entity” block shows the super and
sub class domains of Object. “Fact” block shows
the detail about the entity objects. “
Associations” shows the relation between the
entity and facts. The attributes helps to create
the “Fact” about the Object.
Table 11 Table 12
Association Types for Food Objects. Experience Model for Cognitive Computing Agent Based Humanoid System.
Object Category: Food Food: Fruits
Associations Object Association Attributes Type Learning
Association Names HasColour HasShape HasTexture Orange Colour Orange Features Humanoid
HasSize HasTaste HasUse (RGB) (Self)
HasNutrition HasClass HasCost Shape Round Feature Humanoid
HasHardness IsGrownAt IsAvailableAt (Numeric) (Self)
IsTypeof IsCombinedWith Texture Rough Features Humanoid
(Spectral) (Self)
Size Medium Semantic Teacher
Tastes Sweet and citrus Semantic Teacher
Prepares Juice,Fruit salads Semantic Teacher
Nutritions Excellent Source of Semantic Teacher
Vitamin C
Class Citrus Fruits Semantic Teacher
Costs Low Semantic Teacher
Hardness Soft Semantic Teacher
Grown Moderate temperature Semantic Teacher
15 to 30 degree
Available Fruit Markets Semantic Teacher
Type Fruit Semantic Teacher
Fig. 23. Experience Model For Food Object recognition and Answering Queries.
Figure shows that “Concept Model” block is passed to the Teacher. The Teacher
teaches the CITHCog various objects by creating a combination of (Entity, Fact,
and Association with attributes) to form an Experience. A set of Experience
develops Experience Model.
currently accept as one of the best tools for representing data for var-
ious types of applications. The advantages are its portable light-weight
system; natural data formation and representation, and support for Fig. 24. Knowledge Model For Food recognition and Answering Queries. The
“Experience Model” for each Object is curated, Processed, and Modeled by Data
every programming languages. It provides general rule-based instruc-
Scientist/Humanoid to generate Knowledge Model.
tions for generating a detailed structure of an information document.
These structures can nest inside one another so that one element can
have relationships with other elements. Elements may consist of attri- linguistic of the food domain, i.e., the terms for describing the parti-
butes, which are assigned food item values. The Food knowledge model cular item. This style builds food concepts having entities, attributes,
as XML document contains two portions shown in Fig. 25: Food on- and association. Experience Domain models, on the other hand, are
tology and Experience domain models. Food semantics considers as the intelligible assortments of numeric values and teacher experience
fundamental component of Food Knowledge Model. It describes the statements about the food item that represent particular perspectives on
18
Fig. 25. Example of Object knowledge representation in CIT Architecture. The XML file shows the complete Knowledge template for an Object.
Fig. 26. CITHCog’s Concept Building Regarding Query. Figure shows the pictorial view of Concept creation as (Subject, Predicate, Object).
the food domain knowledge that is useful for answering queries. In this Design of interface model
work, rather than gathering information from the web world and then The conventional robot systems use the pre-programmed instruction
crawl the web document to create knowledge inside the cognitive chips to perform reactive actions on perceiving information from the
computing agent. Teachers provide this type of knowledge information environment. Some intelligent robots use stored rule base knowledge to
in unstructured text creating human like experiences shown in Fig. 26. conduct a deliberative performance in the world. On the other hand,
The advantage of teacher based learning is that, the information is proposed Cognitive computing based humanoid system can perform
given by a human which makes knowledge more authenticated. Sec- human-like sensing, comprehending and decision making under un-
ondly, the processing and storage of info inside the agent will be fast. certain situation by using stored data knowledge. So, such humanoid
Thirdly, the agent gets knowledge without any web data connection. system needs an active interface model to do such type of actions as
Lastly, the teacher provides the correct triplet for representing knowl- shown in Fig. 28. The focus of real-time implementation of CIT archi-
edge, instead of applying automatic algorithms to extract subject, pre- tecture based Question/Answering system as shown in Fig. 29. The
dicate and object from sentences. The Teacher input the experiences design structure of CITHCog includes both hardware and software ar-
inside the agent. A set of experiences thus creates the knowledge model chitecture for building humanoid system and a procedure to research,
and stores it in the XML/RDF form as shown in the Fig. 27. improve, and incorporate algorithmic methods as services into the
system. Though power consumption and speed are essential elements of
19
Fig. 27. CITHCog’s Experience Building Regarding Query.
Fig. 28. CIT powered Inference Model

For Food recognition. In the Figure
“Sense” block grabs the image and
query. The “Knowledge Model” block
stores the feature information. The
“Action” block performs the task object
name decision and “Answer” block pro-
vides the end outcome result for the
query. The “Cognitive Computing
Agent” block manages the knowledge,
Action and the understanding of the
query.
Fig. 29. CIT powered Overall Question

Answering System. In the Figure the “Sense”
block grabs and speech and image query
information. “NLP Analysis” block applies
natural language processing to obtain fea-
tures like “Question Type, Answer Type,
Subject, Predicate, Object and the
Attributes” for Knowledge and Answer
finding. “Cognitive Computing agent” block
plays the central role of managing natural
language analysis, retrieval and extraction
of information from “Knowledge Model”
block. “Action” block decides documents/
Experiences, judges Ranks for selected
documents/Experiences, and suggests the
appropriate the best answers for the query.
CITHCog, the architecture design primarily focused on attaining cor- framework include the following human like functionality for Question
rectness and confidence of decision. If the system gets deprived of these Answering System as shown in Fig. 30:
features, the speed would be worthless. So, an important strategy
component comprises creating algorithms for evaluating and increasing 1. Perception Ability:
accuracy. The Hardware and Software technologies incorporated in (a) Listening Audio from Users (Using speech processing)
CITHCog through CIT(Cognitive Information Technology) architectural (b) Understanding Images and Real World Scenes (Using computer
20
Fig. 30. CIT Powered Cognitive Computing Agent Question Answering System. The Figure shows that User enters the query. “Stop Words” block removes the English
stop words and punctuation’s. “Tokenize” block breaks the sentence into words. From each Words “Question Type, Subject and Predicate” is determined by respective
blocks. “Get Answer” block grabs multiple answers and choose the best answer from the system. “Add Score” blocks add the score between 0 to 10, by the user when
satisfied by the answer given. The “Get Statistics” block shows the rewards of all the selected answer.
vision) Question (Q/A)- Question Type: What,

2. Comprehending Ability: Subject: Object(underView),
(a) Query understanding (Using Natural Language Processing) Predicate: hasColour.
(b) Query parsing and classification (Using Natural Language Step 6: Apply Computer Vision to Understand Image: Feature
Processing) (Colour,Shape, texture)
(c) Query decomposition (Using Natural Language Processing) Step 7: Image Recognition: Object: Apple (Got Answer)
(d) Automatic source attainment and evaluation (Using Active, Step 8: Insert Object Question: What is the colour of an apple?
Web, and Social learning as Well as Computing) Step 9: Query Understanding: Apply Natural Language Processing?
(e) Entity and its relation recognition (Using Linguistic and se- Tokenize: What|is|the|colour|of|an|apple
mantic computing) Step 10: Semantic Question Understanding: Question Type:
(f) Generating Logical form or query (Using Semantic Ontologies: Factoid(What)
Subject (S), Predicate (P), Query Form (QF), Question Type Answer Type: Colour.
(QT), Answer Type (AT) Subject Attribute (SA), Predicate Subject: Apple.
Attribute (Predicate Attribute)) Predicate: HasColour.
(g) Knowledge representation (Using XML, OWL, and Text) The Fig. 32 shows the answer procedure.
(h) Reasoning (Probabilistic Based Reasoning) Step 11: Create Question Template:
3. Action Ability: Answer(Apple,hasColour)
(a) The system is generating answer Candidates for a particular si- Step 12: Search Semantic Knowledge
tuation. (Hypotheses Generation of solution patterns) Step 13: Gets Answer based on Confidence.
(b) Confidence and Reward Based Answer Selection (Statistics Step 14: Select Best Answer.
based on User feedbacks and importance of the answer.) Step 15: Give Answer:
(c) The system is implementing the decision as audio or motor ac- The colour of an apple is red.
tion. (Motor and sensor implementation) Step 16: Get Satisfaction Reward from User.
A case model of object recognition by cognitive humanoid object recognition The quantitative result for is given as follow in Table 13:
system at interface
Understanding the complete case of object detection in CIT frame-
work, the work explains a case study which explains each step in the Evaluation results for cognitive functionality in CITHCog
process of decision.
CASE CIT FOR OBJECT RECOGNITION: In addition to the qualitative and conceptual analysis described in
Fig. 31 show the visual application for object recognition. the preceding sections, CIT framework based humanoid performing
Question/ Answering to do decision-making analyzed through a
Step 1: Let take an input Image quantitative exploration. So, the first test is on selecting the efficacy of
(I ) : = An Apple. using speech processing method and platform independent and general
Step 2: Input the queries through speech: grammars in contrast to specific grammars. Next, the evaluation is for
Query (Object) = “ What is this object”. the performance of the NLU module bearing in mind only its outcome
Query (Q/A) = “What is the colour of the object”. and not the connection with the humanoid stage. As a final validation, a
Step 3: Apply Speech Processing to understand Queries. comprehensive test of the system, beginning from the spoken query and
Step 4: Apply Natural Language Processing to understand Query? ending with the humanoid executing the desired action, was performed
Tokenize: What|is|this|Object|?. considering its cognitive functionality based on CIT framework for
Tokenize: What|is|the|colour|of|the|object|?. Question Answering. The benefits acquired through the use of the re-
Step 5: Query Understanding: commended approach to build humanoid compared with other cogni-
Question (Object)- tive robots is explained as follows.
Question Type: What, Subject: Focus Image, Predicate: IsObject.
21
Fig. 31. CIT Powered Cognitive Computing Agent

Object Recognition System. The Figure shows live
demonstration of Object Recognition using pro-
posed CIT architecture. The User speaks and CIT
enabled system recognize it as (Command,
Statement, Question) and performs Action. The
Figure shows that the system has a “Speaker
Interface” for speech to text and text to speech
shown by “Audio Inference Phrase and Speak”
block. The “Image Interface” block grabs the real
world image and recognize the object on giving
the command. The “Audio Inference Action” Block
shows the Action codelet performed. The “CIT
Answer” block gives the end outcome.
Fig. 32. CIT Powered Cognitive Computing Agent Object Question Answer in
Real World Problem. Figure shows the “Question Entry” to enter questions. CIT
powered Interface processes the Query and prints the answer in the “Answer
Block”.
Table 13
Result of Case Study.
Tools Answer Reward
Object Recognition Apple 8.9 Fig. 33. ROC curve for Object recognition. The curve shows that the object
Qustion Answer Apple:Colour:Red 9.2 image finding is accurate.
Table 14 Table 16
Response for Object Category. Observed Operating Points.
Category Not Food Food Parameters Value1 Value2 Value3
Actually Not Food 23 2 False Positive Fraction 0.00 0.18 1.00

Actually Food 1 24 True Positive Fraction 0.00 0.94 1.00
Table 15 The count of Actual Negative Cases(Not Food) = 25.

Statistics Summary for Object Image Category. The count of Actual Positive Cases(Food) = 25.
The response given by the Humanoid Cognitive Agent Haqi is shown
Parameters Values
in the Table 14: The summary statistics for Object Image recognition is
Number of Image Object Cases: 50 shown in Table 15:
Number of Correct Object Classified: 47 For plotting receiver operating characteristic (ROC), requires the
Accuracy of Recognition: 94% observed operating points: In the Fig. 33, shows fitted ROC curve is a
Sensitivity: 96.0%
degenerate as shown by values in Table 16. The empiric ROC area is
Specificity: 92%
Food Image Missed: 1 0.89. From the curve, the interpretation are: It show the variation be-
Not Food Image Missed: 2 tween sensitivity(TPR) and specificity(FPR)(an increase in the value of
sensitivity will show a decrease in the specificity). Since the curve
touches the left hand first and then follows the top line of the ROC plot
Evaluation of computer vision skill for image understanding space, so the image category name H (ED ) understanding test is accur-
The result of image understanding based on two category showing ate.According to the concept of ROC if the curve comes at 45 degree
the object name as food correctly classified (Positive cases) or other then it is inaccurate. The curve is not at a diagonal so the test is ac-
object names correctly classified (Negative Cases). curate. Since the value of ROC area is high, therefore the object image
22
Table 17
Answer generations based on Confidence and selections based

Object/ human detection, recognition and Q/A about them
Audio and Natural Language Understanding, Generation
Evaluation of Cognitive Functionality Inside CIT Cognitive Architecture.
NLP, Computer Vision, Decision-making, HRI/HCI,

Commands, Questions and Statements using NLP
Categorization and Clustering, Semantic, Q/A

Function Cases Correct Wrong E1 E2 E3 Reward (10)
Colour, Shape, Texture, deep learning
pattern matching and deep learning

deeplearning and pattern matching
Object Recognition 30 27 3 1 1 1 9.2
Experiential, Web and Social

Question Answer 30 26 4 1 2 1 8.6
XML/RDF (Semantic Web)
on emotions(rewards,..)
Spatial and Temporal
CITHCog (Proposed)
Visual, Audio
Microphone
category H (ED ) findings for test cases is accurate.
Camera
CIT
Evaluation of CIT based cognitive computing agent performance for object
recognition and question answering
For evaluating the performance of CIT (Cognitive Information

Technology) based architecture for object recognition, the evaluation
parameters are errors and reward.These indicators shows that the
cognitive computing agents show better skills to improve the decision
as shown in Table 17.
Speech recognition, Text to

Let us see it by a sample example as shown in Table 17. The cog-
Computer Vision, Robotics

Value function (Q values)
declarative, procedural,
Colour, shape, motion
nitive agent (CIT(Haqi )) shows its result for a test case. Once the cog-
template matching
Object detection
nitive agent receives the problem, they provide decisions on object
Sense depth
Microphone
Commands
perceptual
Symbolic
Camera
Speech
MFCC
SOAR
recognition and question answer decision based on the best perfor-
Reem
N/A
mance. The work evaluates on the correct answers and the type of the
error. The three type of errors are (E1, E2, E3).
1. E1- The Cognitive Agent does not understand the query.

2. E2- The Cognitive Agent is giving the answers but the user is not
Declarative, Percpetual, Procedural, associative,

Mutual Information, Hough Transform + LMS
Curiosity, Experimentation, Social Interaction
Robotics, HRI/HCI, Psychological Experiment

satisfied.
human and object detection/recognition

3. E3- The Cognitive Agent is not providing correct answer.
intensity, shape, SIFT

An approach to realize this unique system is to consider how
Sound Localization
ITD, ILD, notches
non associative
CITHCog, as a cognitive computing system, is different from current
Microphone
Commands
Symbolic
Mapping
Camera
web-based decision-making systems that are depending on determi-
Gaze
iCub
iCub
nistic answer search engines. With a search engine, the user enters
keywords and acquires results on a topic centered on a suitable ranking.
In some of the decision-making system may ask a precise inquiry and
get ranked outcomes; but, it will never create a dialogue to endure to
improve the results. A standard search engine based decision making
systems uses algorithms to rank results founded on the importance of
the keywords. At this stage, humans interrelate with the list of out-
Comparison of Humanoid Functionality Based on Individual Cognitive Architecture.
comes and evaluate which answers or links best fits the query. With
CITHCog, the user gets a focused result– either an answer to the request
face detection and object
template matching and
or a continuation question to help elucidate the user’s problem re-

correlation analysis
intensity, motion
segmentation
solution. Therefore, the machine is proposed to act further similar to a

Symbolic
HRI/HCI
Camera
Visual
human expert for giving decisions. For instance, the user shall ask
State
N/A
N/A
N/A
N/A
N/A
TTS
Cog
Cog
CITHCog, “What is the best fruit for anemia?” or “What is the best way
to increase blood level in the body?” If CITHCog is having sufficient
data and adequate contextual knowledge associated with the domain,
the system can apprehend the semantics behind the query. This deep
level of comprehending power motivates the use of statistical analysis
and procedures for evolving predictive models. CITHCog does not only
Gaussian Mixture Model
beliefs and emotions

emotion recognition
Nearest Neighbour
Robotics, HRI/HCI
look for keywords as a search engine would. Also, CITHCog, power NLP
Object detection/
Non-Associative
motion, colour
pitch,Energy
Microphone
recognition
methods, that can break down a question into parts and assess each
Symbolic
Camera
Kismet
Kismet
Text
N/A
constituent for likely answers and results. This competency of getting

expressive, correct, and timely answers to a straight question is the
essential alteration between most decision-making search engines and
the question–answering procedure of cognitive systems like CITHCog
based on CIT architecture.
Knowledge Representation
To accomplish these objectives, CITHCog designed as a humanoid

Semantic Understanding
Core Audio Algorithms
having the ability to do Question–Answering, understand statements

and follow commands. The system uses consistently learned knowledge
Visual Algorithm
Action Selection
Visual Features
Audio Features
through (actual, web and social learning) to decide answers to ques-

Visual Sensor
Audio Sensor
Functionality
Architecture
Visual Skills
Audio Skill
tions and actions to other statements. It obtains reward and confidence

Attention
Learning
Table 18
scores associated with those queries to get the best solution. CITHCog
Action
comprehends the context of a sentence by analyzing each component of

the phrase, matching those rudiments against earlier consumed
23
information, and creating implications as to meaning. the experimentation) to achieve various decisions as a blend of cogni-
To make certain that replies are highly accurate, CITHCog con- tive skills. As a critical issue, how basic cognitive skills should be united
currently makes numerous hypotheses (answer candidates) in a social in demand to solve the task is not earlier stated. The entire method
and web computing environment. These beliefs need to produce in a evades AI agent based problem-solving technique and uses in its place a
way that performs an adequate comprehensive search so that the best cognitive computing agent approach (CIT) based on answering the
reply is among the choices, but not so expensive that large quantities of queries faced by the artificial agent. By solving the existing query
abnormal response inhibit with the complete competence of the pro- problem skill by skill, the humanoid finally reaches the decision (if it is
cess. Refined procedures rank and determine a confidence level for each reachable). Assumed a goal and a set of cognitive technologies, CIT
answer. Advances technologies helped to make this methodology an itself will make the necessary phases to accomplish the target using the
actuality. The CIT architecture supports the use of speech processing, skills (or at least attempt to grasp the purpose). Therefore, CIT can
computer vision, NLP, semantic knowledge representation, machine quickly adapt to new queries, readily. In starting if CIT architecture is
learning to do a broad investigation to progress CITHCog’s cognitive unable to decide for the requested problem it CIT will keep trying to get
competences incessantly. it, and it will send experiential answers. In the application process, the
set of queries that user can ask the cognitive computing agent restricted
Discussion by the speech and text recognition system. The system confirms that all
the recognized textual and vocal questions are accessible by a CIT im-
Table 18 shows that past research in cognitive robotics like Kismet, plementation. The architecture is completely general purpose. Any
Cog, iCub, and Reem has conventionally highlighted low-level per- decision problem can adapt by adding cognitive services to the inter-
ceptual sensing and control tasks comprising sensor handling of audio/ face. Moreover, adding and removing services, concepts and experi-
visual for certain level of cognitive function to show HRI/HCI skills. In ences become as simple as defining their outcomes. This characteristic
contrast, proposed CIT architecture based robot CITHCog takes the makes this architecture particularly adaptable and easy to implement in
cognitive robotics research to the next level. It is concerned with pro- any decision support system. Specific initial experiments claim success
viding computer systems, humanoids and expert agents with an ad- and give the path to more realistic decision-making environments. We
vanced level of cognitive functions that facilitate them to linguistically detected that the most warning factors are still in the hardware, the
perceive and inference using speech processing, computer vision, nat- governing algorithms and the speech recognition for understanding
ural language processing and machine learning (deep learning) to act in natural language and needs further research. These additions will
varying, incompletely identified, and uncertain situations. CITHCog permit the cognitive computing agent to evaluate the practicability of
using its cognitive computing and agent technologies molded inside its the provided queries and, even though this doesn’t assure that the
CIT framework is able to do human-like skills. It perform reasoning cognitive computing agent will perform as anticipated, but the decision
regarding its own goals, necessary action selection, when to perceive is on human like thinking.
(Audio, Visual, and Textual) information and what to gaze for in its
knowledge base using linguistic inputs from user as (Questions, Com- References
mands, and Statements). In social aspects, it can also know the cogni-
tive conditions of additional agents, time, cooperative task im- Abacha, A. B., & Zweigenbaum, P. (2015). Means: A medical question-answering system
plementation, etc. In short, CITHCog is concerned with integrating combining nlp techniques and semantic web technologies. Information Processing and
Management, 51(5), 570–594.
sense, comprehend and act using with a unified theoretical and im- Adams, S., Arel, I., Bach, J., Coop, R., Furlan, R., Goertzel, B., ... Shapiro, S. C. (2012).
plementation CIT architectural framework. The use of both cognitive Mapping the landscape of human-level artificial general intelligence. AI Magazine,
computing (providing human-like computing of information) and agent 33(1), 25–42.
Anderson, J. R. (2005). Human symbol manipulation within an integrated cognitive ar-
(doing human-like perception, learning, memory, and action selection) chitecture. Cognitive Science, 29(3), 313–341.
technologies. CIT can build both software machines (softbots) and ro- Anderson, J. R., & Lebiere, C. (2003). The newell test for a theory of cognition. Behavioral
botic objects in everyday life performing Question Answering (Q/A), and Brain Sciences, 26(5), 587–601.
Arsene, O., & Dumitrache, I. (2017). Mind as multiresolution system based on multiagents
Decision-making, Sentiment Analysis, Summarization, Categorization, architecture. Biologically Inspired Cognitive Architectures, 20, 31–38.
clustering, Case-based Reasoning, Service Robotics, Computer Vision Bellas, F., Duro, R. J., Faina, A., & Souto, D. (2010). Multilevel darwinist brain (mdb):
problems, Semantic knowledge understanding and much more. CIT Artificial evolution in a cognitive architecture for real robots. IEEE Transactions on
Autonomous Mental Development, 2(4), 340–354.
based application will be on the upsurge and will provide progressively
Boubenna, H., & Lee, D. (2018). Image-based emotion recognition using evolutionary
more illustrations of their use in society with commercial products in algorithms. Biologically Inspired Cognitive Architectures, 24, 70–76.
the future, and some are already in the research. As the interface with Boukezzoula, R., Coquin, D., Nguyen, T.-L., & Perrin, S. (2018). Multi-sensor information
human’s upsurges, so does the demand increases for sophisticated fusion: Combination of fuzzy systems and evidence theory approaches in color re-
cognition for the nao humanoid robot. Robotics and Autonomous Systems, 100,
system applications like CIT, having skills associated with deliberation 302–316.
and high-level reflective cognitive functions. So CIT is combining tra- Chen, Y., Argentinis, J. E., & Weber, G. (2016). Ibm watson: How cognitive computing
ditional robotics and expert domains with those from cognitive com- can be applied to big data challenges in life sciences research. Clinical Therapeutics,
38(4), 688–701.
puting and agent theory and will continue to be central to research in Ding, I.-J., & Shi, J.-Y. (2017). Kinect microphone array-based speech and speaker re-
artificial cognitive robotics and expert systems. cognition for the exhibition control of humanoid robots. Computers and Electrical
This paper aims to bring together researchers working in all facets of Engineering, 62, 719–729.
Ding, W., Gu, J., Shang, Z., Tang, S., Wu, Q., Duodu, E. A., & Yang, Z. (2017). Semantic
the theory and implementation of artificial cognitive systems, to initiate recognition of workpiece using computer vision for shape feature extraction and
current work and future directions towards Cognitive Computing classification based on learning databases. Optik – International Journal for Light and
Agents. The work is a foundation platform for the novel hybrid tech- Electron Optics, 130, 1426–1437.
Duch, W., Oentaryo, R. J., & Pasquier, M. (2008). Cognitive architectures: Where do we
nique to design application systems utilizing modern AI methods. go from here? In P. Wang, B. Goertzel, & S. Franklin (Eds.). Artificial general in-
telligence 2008: Proceedings of the first AGI conference. Vol. 171 of frontiers in artificial
Conclusion intelligence and applications (pp. 122–136). IOS Press.
Duris, F. (2018). Arguments for the effectiveness of human problem solving. Biologically
Inspired Cognitive Architectures, 24, 31–34.
The work introduced a cognitive architecture known as CIT Franklin, S., Madl, T., D’Mello, S., & Snaider, J. (2014). Lida: A systems-level architecture
(Cognitive Information Technology) to create general decision making for cognition, emotion, and learning. IEEE Transactions on Autonomous Mental
system using cognitive computing and agent technologies. This design Development, 6(1), 19–41.
Franklin, S., Strain, S., McCall, R., & Baars, B. (2013). Conceptual commitments of the
allows querying the agent using (Command, Question, Statements, LIDA model of cognition. Journal of Artificial General Intelligence, 4, 1–22.
Sensors and Image data) using a cognitive humanoid agent (CITHCog in
24
Georgeon, O. L., Marshall, J. B., & Manzotti, R. (2013). Eca: An enactivist cognitive ar- Peña-Ayala, A., & Mizoguchi, R. (2012). Intelligent decision-making approach based on
chitecture based on sensorimotor modeling. Biologically Inspired Cognitive fuzzy-causal knowledge and reasoning. Berlin, Heidelberg: Springer534–543.
Architectures, 6(Supplement C), 46–57 bICA 2013: Papers from the Fourth Annual Persiani, M., Franchi, A. M., & Gini, G. (2018). A working memory model improves
Meeting of the BICA Society. cognitive control in agents and robots. Cognitive Systems Research, 51, 1–13.
Gharaee, Z., Gärdenfors, P., & Johnsson, M. (2017). Online recognition of actions in- Pynadath, D. V., Rosenbloom, P. S., & Marsella, S. C. (2014). Reinforcement learning for
volving objects. Biologically Inspired Cognitive Architectures, 22, 10–19. adaptive theory of mind in the sigma cognitive architecture. International conference
Goertzel, B. (2014). Artificial general intelligence: Concept, state of the art, and future on artificial general intelligence (pp. 143–154). Springer.
prospects. Journal of Artificial General Intelligence, 5(1), 1–48. Samsonovich, A. V. (2010). Toward a unified catalog of implemented cognitive archi-
Goertzel, B., Ke, S., Lian, R., O’Neill, J., Sadeghi, K., Wang, D., et al. (2013). The cogprime tectures. Proceedings of the 2010 conference on biologically inspired cognitive archi-
architecture for embodied artificial general intelligence. In 2013 IEEE symposium on tectures 2010: Proceedings of the first annual meeting of the BICA society (pp. 195–244).
computational intelligence for human-like intelligence (CIHLI) (pp. 60–67). Amsterdam, The Netherlands: IOS Press.
Gray, W. D. (2007). Integrated models of cognitive systems. Cognitive models and architectures Samsonovich, A. V. (2012). On a roadmap for the bica challenge. Biologically Inspired
(1st Edition). USA: Oxford University Press. Cognitive Architectures, 1, 100–107.
Gudwin, R., Paraense, A., de Paula, S. M., Fróes, E., Gibaut, W., Castro, E., ... Raizer, K. Samsonovich, A. V. (2013). Emotional biologically inspired cognitive architecture.
(2017). The multipurpose enhanced cognitive architecture (meca). Biologically Biologically Inspired Cognitive Architectures, 6(Supplement C), 109–125 bICA 2013:
Inspired Cognitive Architectures, 22, 20–34. Papers from the Fourth Annual Meeting of the BICA Society.
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for Samsonovich, A. V. (2014). Goal reasoning as a general form of metacognition in bica.
visual understanding: A review. Neurocomputing, 187, 27–48 recent Developments on Biologically Inspired Cognitive Architectures, 9, 105–122 neural-Symbolic Networks for
Deep Big Vision. Cognitive Capacities.
Kieras, D. E., & Meyer, D. E. (1997). An overview of the epic architecture for cognition Sancin, U., Dobravc, M., & Dolšak, B. (2010). Human cognition as an intelligent decision
and performance with application to human-computer interaction. Human-Computer support system for plastic products’ design. Expert Systems with Applications, 37(10),
Interaction, 12(4), 391–438. 7227–7233.
Kirk, J., Mininger, A., & Laird, J. (2016). Learning task goals interactively with visual Seidita, V., & Chella, A. (2017). Representing social intelligence: An agent-based mod-
demonstrations. Biologically Inspired Cognitive Architectures, 18, 1–8. eling application. Biologically Inspired Cognitive Architectures, 22, 35–43.
Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, S. F., ... Botsis, T. Serrano, J. I., Iglesias, Ángel, & del Castillo, M. D. (2017). Plausibility validation of a
(2017). Natural language processing systems for capturing and standardizing un- decision making model using subjects’ explanations of decisions. Biologically Inspired
structured clinical information: A systematic review. Journal of Biomedical Cognitive Architectures, 20, 1–9.
Informatics, 73, 14–29. Starzyk, J. A., & Graham, J. (2017). Mlecog: Motivated learning embodied cognitive
Kumar, A. J., Schmidt, C., & Köhler, J. (2017). A knowledge graph based speech interface architecture. IEEE Systems Journal, 11(3), 1272–1283.
for question answering systems. Speech Communication, 92, 1–12. Sun, R. (2004). Desiderata for cognitive architectures. Philosophical Psychology, 17(3),
Laird, J. E. (2008). Extending the soar cognitive architecture. Proceedings of the 2008 341–373.
conference on artificial general intelligence 2008: Proceedings of the first AGI conference Sun, R. (2009). Motivational representations within a computational cognitive archi-
(pp. 224–235). Amsterdam, The Netherlands: IOS Press. tecture. Cognitive Computation, 1(1), 91–103.
Laird, J. E. (2012). The Soar cognitive architecture. MIT Press. Sun, R., Merrill, E., & Peterson, T. (2001). From implicit skills to explicit knowledge: A
Lieto, A., Chella, A., & Frixione, M. (2017). Conceptual spaces for cognitive architectures: bottom-up model of skill learning. Cognitive Science, 25(2), 203–244.
A lingua franca for different levels of representation. Biologically Inspired Cognitive Tenorth, M., & Beetz, M. (2017). Representations for robot knowledge in the knowrob
Architectures, 19, 1–9. framework. Artificial Intelligence, 247, 151–169 special Issue on AI and Robotics.
Laird, J. E., Kinkade, K. R., Mohan, S., & Xu, J. Z. (2012). Cognitive robotics using the soar Thórisson, K., & Helgason, H. (2012). Cognitive architectures and autonomy: A com-
cognitive architecture. Workshops at the Twenty-Sixth AAAI Conference on Artificial parative review. Journal of Artificial General Intelligence, 3(2), 1–30.
Intelligence, Cognitive Robotics. Tweedale, J. W. (2014). A review of cognitive decision-making within future mission
Lieto, A., Lebiere, C., & Oltramari, A. (2018). The knowledge level in cognitive archi- systems. Procedia Computer Science, 35(Supplement C), 1043–1052 knowledge Based
tectures: Current limitations and possible developments. Cognitive Systems Research, and Intelligent Information and Engineering Systems 18th Annual Conference, KES-
48, 39–55 cognitive Architectures for Artificial Minds. 2014 Gdynia, Poland, September 2014 Proceedings.
Miller, M. J., McGuire, K. M., & Feigh, K. M. (2017). Decision support system require- Vernon, D., Metta, G., & Sandini, G. (2007). A survey of artificial cognitive systems:
ments definition for human extravehicular activity based on cognitive work analysis. Implications for the autonomous development of mental capabilities in computa-
Journal of Cognitive Engineering and Decision Making, 11(2), 136–165. tional agents. IEEE Transactions on Evolutionary Computation, 11(2), 151–180.
Modha, D. S., Ananthanarayanan, R., Esser, S. K., Ndirango, A., Sherbondy, A. J., & Singh, Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An
R. (2011). Cognitive computing. Communications of the ACM, 54(8), 62–71. overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing 1–1.
Motamed, S., Setayeshi, S., & Rabiee, A. (2017). Speech emotion recognition based on a Wu, Z., Liao, J., Song, W., Mao, H., Huang, Z., Li, X., & Mao, H. (2018). Semantic hyper-
modified brain emotional learning model. Biologically Inspired Cognitive Architectures, graph-based knowledge representation architecture for complex product develop-
19, 32–38. ment. Computers in Industry, 100, 43–56.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA, USA: Harvard University Zhang, D., & Liu, C. (2014). A salient object detection framework beyond top-down and
Press. bottom-up mechanism. Biologically Inspired Cognitive Architectures, 9, 1–8.
Nweke, H. F., Teh, Y. W., Al-garadi, M. A., & Alo, U. R. (2018). Deep learning algorithms Zhu, G., & Iglesias, C. A. (2018). Exploiting semantic similarity for named entity dis-
for human activity recognition using mobile and wearable sensor networks: State of ambiguation in knowledge graphs. Expert Systems with Applications, 101, 8–24.
the art and research challenges. Expert Systems with Applications, 105, 233–261.
25
View publication stats

AChandiok BICA2018

Uploaded by

Copyright:

Available Formats

AChandiok BICA2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AChandiok BICA2018

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

CIT: Integrated cognitive computing and cognitive agent technologies based

Article in Biologically Inspired Cognitive Architectures · August 2018

Ashish Chandiok D. K. Chaturvedi

SEE PROFILE SEE PROFILE

Ph.D thesis View project

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

Biologically Inspired Cognitive Architectures

CIT: Integrated cognitive computing and cognitive agent technologies based

Perception Attention Learning Memory Action Selection Action

SOAR A,O,P,D,V N/A De,Pr,A Se,W,Sm,P,Ep P,R,U,E,L R,G,V,C

Cognitive Information Technology (CIT) is a state of the art archi- E = (e , e , e , …) .

handling skills inside an agent to conduct internal information ﬂows F = (f , f , f , …) .

Fig. 1. General Background of CIT

Cognitive skill Cognitive Computing Technique Purpose

Fig. 4. CIT Based Cognitive Computing

speech recognition) software services.

Two quality measures taken for quality measures.

• WER (Word Error Rate): A metrics to measure ASR performance,

– It is a measure denoting the accuracy of the performance. Greater

Proposed Feature Vector Model 79%

template matching comprising of colour, shape and texture Ding et al.

Comprehending process inside CIT architecture for user queries

Natural language processing

Analysis Deﬁnition Y/N Factoid (WH)

1 1 1 1/7 1.83 1 26 1 1 1/7 1.53 1

relationships between entities-entities, or entities-attributes to con-

Evaluation of experience retrieval

conﬁdence and reward based judgment. Therefore, the action re-

1 0.6 0.76 1 26 0.8 0.95 1

Perception Audio(A) ✓ Audio processing and Deep

Attention Audio ✓ Digital Signal Processing

use rewards dependent answering from social cognitive agent world.

Fig. 17. Method for Building Object Recognition Experience Model.

Fig. 20. Humanoid (CITHCog) Hardware structure for Cognitive Information

H (IOp ) ≜ Θ(H (SOp), H (K OM ), H (∐)): [O]

H (K On ) ≜ Φ(M (EOM )): {H (EOx ) → K On ∈ (HOi )} (18)

Fig. 22. Concept Model For Object

Associations Object Association Attributes Type Learning

Fig. 27. CITHCog’s Experience Building Regarding Query.

Fig. 28. CIT powered Inference Model

Fig. 29. CIT powered Overall Question

vision) Question (Q/A)- Question Type: What,

Fig. 31. CIT Powered Cognitive Computing Agent

Category Not Food Food Parameters Value1 Value2 Value3

Actually Not Food 23 2 False Positive Fraction 0.00 0.18 1.00

Table 15 The count of Actual Negative Cases(Not Food) = 25.

Answer generations based on Conﬁdence and selections based

NLP, Computer Vision, Decision-making, HRI/HCI,

Categorization and Clustering, Semantic, Q/A

Colour, Shape, Texture, deep learning

pattern matching and deep learning

Experiential, Web and Social

XML/RDF (Semantic Web)

For evaluating the performance of CIT (Cognitive Information

Speech recognition, Text to

Computer Vision, Robotics

1. E1- The Cognitive Agent does not understand the query.