SNSW Unit-5

Uploaded by

eedasuryarahul231226

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

SNSW Unit-5

Uploaded by

eedasuryarahul231226

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Social Networks

&
Semantic Web

CO 6: Evaluation of Web-Based Social Network Extraction

Introduction
• Social Network
• A network of social interactions and personal relationships.
• A social structure made up of a set of social actors (such as individuals or
organizations), sets of dyadic ties, and other social interactions between
actors.
• When, we Computer Science people look as social network in the first way
and study online networks as the equivalents of real-world networks.
• On the other hand, network analysts with a social science background
apply an extreme caution and most often still treat electronic
communication networks and online communities as a separate field of
investigation.
Introduction
• Nevertheless, surprisingly little is known about the exact relationship
between real world networks and their online reflections.
• To what extent electronic data obtained from the Web reveal social
structures?
• Different forms of electronic data could serve as a source for
obtaining information about different types of social relations.
• For electronic data such as email traffic that represent message exchanges
between individuals a higher degree of correspondence with social network
data is plausible.
• While for others such as web logs a more distant perspective seems to be
warranted.
• Social network mining from the Web based on co-occurrence is an
interesting method as it is likely to produce evidence of ties of different
strengths based on the variety of the underlying data.
Introduction
• Bearing in mind the limitations, in this unit, we are going to evaluate web-based social
network extraction using members of a research organization as the subjects of our
study.
• The authors choose to evaluate electronic data extraction against the results of a
survey method, which is the dominant approach to data collection for network analysis.
• Standard questionnaires are preferred in theory testing for their fixed structure, which
guarantees the comparability of results among test subjects and across studies.
• Various forms of asking for one’s social relations have been tested through the years for
reliability.
• The fixed structure of questionnaires also allows to directly extract relational data and
attributes for processing with network analysis tools and statistical packages.
• Questionnaires are also minimally invasive and can be easily mailed, emailed or
administered online.
Differences between survey methods and
electronic data extraction
• Differences in what is measured
• What is not on the Web cannot be extracted from the Web, which limits the
scope of extraction. Also, these data can be biased in case part of the
community to be studied is better represented on the Web than other parts.
• Our network extraction method is likely to find evidence for different kinds
of relationships resulting in what is called a multiplex network. These
relationships are not easily entangled although some progress can be made
by applying machine learning to disambiguate relationships.
• Authors measure a number of relationships in our survey and use these data
to understand the composition of relationships we find on the Web.
• The equivalent problem in survey methods is the difficulty of precisely
formulating those questions that address the relationship the researcher
actually wants to study.
Differences between survey methods and
electronic data extraction
• Errors introduced by the extraction method
• There are errors that affect the extraction of particular cases. Homonymy affects
common names (e.g. J. Smith or Xi Li), but can be reduced somewhat by adding
disambiguation terms to queries. Synonymy presents a problem whenever a person
uses different variations of his or her name. Different variations of first names (e.g.
James Hendler vs Jim Hendler), different listing of first and middle names, foreign
accentuation, different alphabets (e.g. Latin vs. Chinese) etc. can all lead to different
name forms denoting the same person.
• They addressed this problem by experimenting with various measures that could
predict if a particular name is likely to be problematic in terms of extraction. We can
test such measures by detecting whether the personal network of such persons is
in fact more difficult to extract than the networks of other persons.
Differences between survey methods and
electronic data extraction
• Errors introduced by survey data collection
• Unlike network mining from the Web, surveys almost never cover a network
completely. Although a response rate lower than 100% is not necessarily an
error, it does require some proof that either the non-respondents are not
significantly different from the respondents with respect to the survey or that
the collected results are so robust that the response from the non-
respondents could not have affected it significantly.
• The respondents are not likely to be equally co-operative either. There are
most likely differences in the level of cooperativeness and fatigue.
• The mere fact of observation can introduce a bias. At best this bias affects all
subjects in an equal manner.
• Not only the type of relationship that is considered by the respondent but
also the recall of contacts is affected by the way a question is formulated.
Context of the empirical study
• The researchers used network data on the social networks of the 123
researchers working at the Department of Computer Science of the Vrije
Universiteit, Amsterdam in September 2006.
• The Department is organized is six Sections of various sizes, which are in
decreasing order of size: Computer Systems (38), Artificial Intelligence (33),
Information Management and Software Engineering (22), Business
Informatics (17), Theoretical Computer Science (9) and Bioinformatics (4).
• The Sections are further divided internally into groups5, each led by a
professor. Researchers in each section include part- or full-time PhD
students, postdocs, associate and full professors, but the study excluded
master students, support staff (scientific programmers) and administrative
support.
Context of the empirical study
• They have chosen this community as a subject of our study because
the author is a member of the Business Informatics group of the
Department.
• Some participants (9) felt that this study should not have been carried
out by “one of their own” as they did not feel comfortable with
providing personal information to a colleague.
• From the remaining 114 researchers we have collected 79 responses
(a response rate of 64%), with above average response rates in the BI
group (88%) and the closely interlinked AI (79%) group.
Data Collection**
• The authors went with electronic data extraction and survey
approach to compare.
• We collected personal and social information using a custom-built
online survey system.
• An online survey offers several advantages compared to a paper
questionnaire:
• Easy accessibility for the participants.
• Greater flexibility in design, allowing for a better survey experience. Using an
electronic survey it is possible to adapt questions presented to the user based
on the answers to previous questions.
• Easier processing for the survey administrator. Our system recorded electronic
data directly in RDF using the FOAF-based semantic representations.
Data Collection
• The survey is divided over several pages.
• The first page asks the participant to enter basic personal information: his or her full-time or
part-time status, age, years at the organization, name of the direct supervisor and research
interests.
• The second and third pages contain standard questions for determining the level of self-
monitoring and the extent someone identifies with the different levels of the organization.
• The fourth page asks the participant the select the persons he or she knows from a
complete list of Department members. This question is included to pre-select those persons
the participant might have any relationship with.
• The next page asks the participant to specify the nature of the relationship with the
persons selected. In particular, the participant is suggested to consider six types of
relationships and asked to specify for each person which type of relationship applies to that
person. The six types of relations we surveyed were advice seeking, advice giving, future
cooperation, similarity perceptions, friendship, and adversarial ties.
• Upon completion of the last page, the survey software stored the results in a Sesame RDF
store.
Preparing Data
• Number of nodes in the network is 79.
• Constructed 6 networks of 6 relations.
• Removed directionality from our survey networks and our web-based network
Optimizing goodness of fit
• We had to filter the nodes and edges from the web-based network
also to compare it with survey-based network.
• Filtering of the web-based network requires to specify cut-off values
for two parameters:
• The minimal number of pages one must have on the Web to be included
(page count)
• The minimal strength of the relationships (used to filter out ties with too little
support).
• Filtering is either carried out before removing directionality or one
needs to aggregate the weights of the edges going in different
directions before the edges can be filtered.
Histograms Representing
No. of Web Pages & strength of relationships

Figure 1. Histogram for the number of web pages per

individual. Note that the x-axis is on a logarithmic scale
Histogram for the strength of relationships based on the web extraction method.
Optimizing goodness of fit
• Finding the appropriate parameters for filtering can be considered as
an optimization task where we would like to maximize the similarity
between our survey networks and the extracted network.
• We can consider relationship extraction as an information retrieval
task and apply well-known measures from the field of information
retrieval.
• Let’s denote our graphs to be compared as G1(V1, E1) and G2(V2, E2).
• Precision, recall and the F-measure are common measures in
information retrieval, while the Jaccard-coefficient is also used for
example in UCINET
Optimizing goodness of fit

Similarity measures for graphs based on edge sets

Optimizing
goodness of fit
Once we have chosen a measure,
we can visualize the effect of the
parameters on the similarity using
surface plots.
These figures show the changes in
the similarity between the advice
seeking network and the network
obtained from the Web using the
F-measure. The similarity (plotted
on the vertical, z axis) depends on
the value of the two parameters of
the algorithm.
The network obtained from web
mining as we change the page
count and strength thresholds.
Optimizing goodness of fit
• The F-measure, which is the harmonic mean of precision and recall, has a
single highest peak (optimum) and a second peak representing a different
trade-off between precision and recall.
• In general, we note that it seems easier to achieve high recall than high
precision using a two-stage acquisition process where we first collect a
social network using web mining and then apply a survey in which we ask
respondents to remove the incorrect relations and add the few missing
ones.
• Such a pre-selection approach can be particularly useful in large networks
where listing all names in a survey would result in an overly large table.
• Further, subjects are more easily motivated to correct lists than to provide
lists themselves.
Comparison across methods and networks
• Our Standard survey data also allows a direct comparison of methods
for social network mining.
• In this case they compare the best possible results obtainable by two
methods, i.e. they choose the parameters for each method separately
such that some jjis optimized.
• They have subjected to this test our benchmark method of co-
occurrence analysis and the method based on average precision.
• The results confirm our intuition that the average precision method
produces higher precision, but lower recall resulting in only slightly
higher F-measure values.
Comparison across methods and networks
Comparison across methods and networks
• It is likely that the relationships we extract from the Web reflect a
number of underlying relationships, including those we asked in our
survey and possibly others we did not.
• To measure to what extent each of our surveyed relationships is
present on the Web it would be possible to perform a p* analysis,
where we assume that the Web-based network is a linear
combination of our survey networks.

• The analysis would then return the coefficients of this linear equation
Predicting the goodness of fit
• Many personal factors influence the success of extracting social networks
from the Web.
• For example, as we already saw the amount of information available about
a person or the commonality of someone’s name on the Web.
• The author has collected attribute data from survey, so he used it to
investigate whether some of these factors can be surly linked to the
success of obtaining the person’s social network from the Web.
• If we find measures that help us to predict when the extraction is likely to
fail we can exclude those individuals from the Web-based extraction and
try other methods or data sources for obtaining information about their
relations.
Predicting the goodness of fit
• We have to measure the similarity between personal networks from
the survey and the Web and correlate it with attributes of the
subjects.
• The attributes the author has consider are those from the survey,
• The number of relations mentioned (surveydegree).
• The age of the individual.
• The number of years spent at the VU (variables age and entry).
• He also look at Web-based indicators such as
• The number of relations extracted (miningdegree).
• The number of pages for someone’s name, which we recode based on its
distribution by taking the logarithm of the values (pagecount).
Predicting the goodness of fit
• Last, experimented with measures for name ambiguity based
on the existing literature on name disambiguation using
clustering methods.
• The author used two measures
• NC1: Jaccard-coefficient between the first name and the last
name.
• NC2: The ratio of the number of pages for a query that includes
the full name and the term Vrije Universiteit divided by the
number of pages for the full name only.
Predicting the goodness of fit
• Findings:
• None of the survey attributes has a direct influence on the result.
• The NC1 measure has no significant effect.
• The NC2 measure has a negative effect on the F-measure.
Evaluation through analysis
• If the network extraction is optimized to match the results of the survey it
will give similar results in analysis.
• However, we will see that a 100% percent match is not required for
obtaining relevant results in applications: most network measures are
statistical aggregates and thus relatively robust to missing or incorrect
information.
• Group-level analysis, for example, is typically insensitive to errors in the
extraction of specific cases.
• The macrolevel social structure of our department can be retrieved by
collapsing this network to show the relationships between groups using
the affiliations or by clustering the network.
• The two networks reveal the same underlying organization:
Evaluation through two of the groups (the AI and BI sections) built close
relationships with each other and with the similarly densely
analysis linked Computer Systems group.
Evaluation through analysis
• Our experiments also show the robustness of centrality measures
such as degree, closeness and betweenness.
• For example, if we compute the list of the top 20 nodes by degree,
closeness and betweenness we find an overlap of 55%, 65% and 50%,
respectively.
• While this is not very high, it signifies a higher agreement than would
have been expected: the correlation between the values is 0.67, 0.49,
0.22, respectively. The higher correlation of degrees can be explained
by the fact that we optimize on degree when we adjust our networks
on precision/recall.

AFR-200 NovaTrend CE en User Manual
No ratings yet
AFR-200 NovaTrend CE en User Manual
32 pages
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
100% (3)
The Absolute Basics: Basic and Intermediate Python 3 - Notes/Cheat Sheet
11 pages
Couper - Web Surveys A Review of Issues and Approaches PDF
No ratings yet
Couper - Web Surveys A Review of Issues and Approaches PDF
31 pages
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
From Everand
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
Len Silverston
No ratings yet
09 - Exploratory Study Part 1
No ratings yet
09 - Exploratory Study Part 1
17 pages
Advance HCM - Chapter 4 Social Network
No ratings yet
Advance HCM - Chapter 4 Social Network
5 pages
Methods and Tools of Data Collection
No ratings yet
Methods and Tools of Data Collection
25 pages
Excelente Curso Do McCarty Para o Uso Do Egonet Personal Networks Research and Applications-28-41
No ratings yet
Excelente Curso Do McCarty Para o Uso Do Egonet Personal Networks Research and Applications-28-41
14 pages
Handbook of Web Surveys
From Everand
Handbook of Web Surveys
Jelke Bethlehem
No ratings yet
Access Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual All Chapters Immediate PDF Download
100% (12)
Access Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual All Chapters Immediate PDF Download
33 pages
Agneessens & Labianca (2022) Collecting Survey for SNA in Organisation
No ratings yet
Agneessens & Labianca (2022) Collecting Survey for SNA in Organisation
17 pages
Data Collection
No ratings yet
Data Collection
61 pages
overview of methods (1)
No ratings yet
overview of methods (1)
28 pages
Complete Answer Guide for Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual
100% (15)
Complete Answer Guide for Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual
45 pages
Lecture 1 - Introduction to Survey Methodology
No ratings yet
Lecture 1 - Introduction to Survey Methodology
33 pages
GENSI A new graphical tool to collect ego-centered network data
No ratings yet
GENSI A new graphical tool to collect ego-centered network data
10 pages
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual - Full Version Is Ready For Free Download
100% (4)
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual - Full Version Is Ready For Free Download
33 pages
Local Media8934129821237461434
No ratings yet
Local Media8934129821237461434
8 pages
Dataaa Gatheringg
No ratings yet
Dataaa Gatheringg
11 pages
Chapter 25 - Survey Methods For Network Data
No ratings yet
Chapter 25 - Survey Methods For Network Data
36 pages
Survey Methodology: Cristina Giudici University "La Sapienza" Roma
No ratings yet
Survey Methodology: Cristina Giudici University "La Sapienza" Roma
39 pages
CHAPTER 3 Draft
No ratings yet
CHAPTER 3 Draft
2 pages
Data Gathering Methods
No ratings yet
Data Gathering Methods
7 pages
Surveys, Questions and Other Cheap Ways of Getting Data: How To Do Stuff 1/14/02 Michael Ramscar
No ratings yet
Surveys, Questions and Other Cheap Ways of Getting Data: How To Do Stuff 1/14/02 Michael Ramscar
54 pages
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual - Latest Version Can Be Downloaded Immediately
100% (1)
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual - Latest Version Can Be Downloaded Immediately
33 pages
BRM Type of Data: Primary Data: Collection: Survey Research
No ratings yet
BRM Type of Data: Primary Data: Collection: Survey Research
42 pages
Tan
No ratings yet
Tan
2 pages
Qualitative Research: Is Used To Quantify The Problem by Way of Generating Numerical
No ratings yet
Qualitative Research: Is Used To Quantify The Problem by Way of Generating Numerical
2 pages
AEB 303 Topic 5
No ratings yet
AEB 303 Topic 5
13 pages
Survey Eapp
No ratings yet
Survey Eapp
39 pages
MSO-02 PYQ NOTES PART-2
No ratings yet
MSO-02 PYQ NOTES PART-2
22 pages
Chapter 6 RM
No ratings yet
Chapter 6 RM
6 pages
Survey Research
No ratings yet
Survey Research
6 pages
Survey Research
No ratings yet
Survey Research
29 pages
Effects of a Virtual Community of Practice in a Management-Consulting Organization
From Everand
Effects of a Virtual Community of Practice in a Management-Consulting Organization
Roderick C. French
No ratings yet
Assignment
No ratings yet
Assignment
4 pages
Survey Research Errors
No ratings yet
Survey Research Errors
47 pages
Sampling For Internet Surveys PDF
No ratings yet
Sampling For Internet Surveys PDF
43 pages
BRM Unit 3
No ratings yet
BRM Unit 3
88 pages
Social Network Analysis
No ratings yet
Social Network Analysis
14 pages
Writing Better Questionnaires FINAL
No ratings yet
Writing Better Questionnaires FINAL
40 pages
Correlational Research Group 2 Final
No ratings yet
Correlational Research Group 2 Final
17 pages
Survey Research Design
No ratings yet
Survey Research Design
25 pages
Session 11 - Methods of Data Collection
No ratings yet
Session 11 - Methods of Data Collection
28 pages
Net Text
No ratings yet
Net Text
151 pages
Research Methids II Module III Notes
No ratings yet
Research Methids II Module III Notes
5 pages
Pfpi Notes
No ratings yet
Pfpi Notes
33 pages
Introduction To Social Network Methods
No ratings yet
Introduction To Social Network Methods
322 pages
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual instant download
100% (1)
Research Methods Exploring the Social World Canadian Canadian 1st Edition Diane Symbaluk Solutions Manual instant download
20 pages
Data Collection Methods
No ratings yet
Data Collection Methods
62 pages
Q2 PPT4
No ratings yet
Q2 PPT4
34 pages
The State of Survey Methodology: Challenges, Dilemmas, and New Frontiers in The Era of The Tailored Design - Stern, Bilgen & Dillman
No ratings yet
The State of Survey Methodology: Challenges, Dilemmas, and New Frontiers in The Era of The Tailored Design - Stern, Bilgen & Dillman
18 pages
Online Survey Document
No ratings yet
Online Survey Document
42 pages
rmass
No ratings yet
rmass
3 pages
Introduction To Social Network Methods
No ratings yet
Introduction To Social Network Methods
149 pages
The power of AI and ML to transform Social Science Research
From Everand
The power of AI and ML to transform Social Science Research
Zemelak Goraga
No ratings yet
The Wiley Blackwell Handbook of the Psychology of the Internet at Work
From Everand
The Wiley Blackwell Handbook of the Psychology of the Internet at Work
Guido Hertel
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Statistical and Machine Learning Approaches for Network Analysis
From Everand
Statistical and Machine Learning Approaches for Network Analysis
Matthias Dehmer
No ratings yet
E - Learning Modules: Dlr Associates Series
From Everand
E - Learning Modules: Dlr Associates Series
Dan Ryan
No ratings yet
Driving Results Through Social Networks: How Top Organizations Leverage Networks for Performance and Growth
From Everand
Driving Results Through Social Networks: How Top Organizations Leverage Networks for Performance and Growth
Robert L. Cross
4/5 (2)
An Introduction to Search Engines and Web Navigation
From Everand
An Introduction to Search Engines and Web Navigation
Mark Levene
No ratings yet
Kami Export - Kami Tool Guide
No ratings yet
Kami Export - Kami Tool Guide
5 pages
Experiences Tracking Agile Projects: An Empirical Study
No ratings yet
Experiences Tracking Agile Projects: An Empirical Study
20 pages
67554064188e27e03ad45576_19913190761
No ratings yet
67554064188e27e03ad45576_19913190761
2 pages
Question Bank Fybcom
No ratings yet
Question Bank Fybcom
2 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
20230928_1130_T2_Data Mesh
No ratings yet
20230928_1130_T2_Data Mesh
14 pages
Module 3 - Design of Facility Layout
No ratings yet
Module 3 - Design of Facility Layout
6 pages
12 B 2021
No ratings yet
12 B 2021
8 pages
Analisis Pengendalian Cacat Produk Pada Proses: Plastic Injection Molding Dengan Material Polyprophylene
No ratings yet
Analisis Pengendalian Cacat Produk Pada Proses: Plastic Injection Molding Dengan Material Polyprophylene
8 pages
DS Laser - Sentinel ENA4
No ratings yet
DS Laser - Sentinel ENA4
6 pages
Synopsis
No ratings yet
Synopsis
3 pages
Literature Review Organization Template
100% (2)
Literature Review Organization Template
5 pages
Q Line User Manual
No ratings yet
Q Line User Manual
37 pages
MP 4000
No ratings yet
MP 4000
47 pages
Activating X Entry XDOS OPEN SHELL 9.2020
No ratings yet
Activating X Entry XDOS OPEN SHELL 9.2020
12 pages
Nudging Towards Responsible Recommendations A Graph-Based Approach To Mitigate Belief Filter Bubbles
No ratings yet
Nudging Towards Responsible Recommendations A Graph-Based Approach To Mitigate Belief Filter Bubbles
15 pages
Full Download Computational Thinking (MIT Press Essential Knowledge series) Peter J. Denning PDF DOCX
100% (2)
Full Download Computational Thinking (MIT Press Essential Knowledge series) Peter J. Denning PDF DOCX
55 pages
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
No ratings yet
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
3 pages
Education and Training: 12/08/2015 - 10/06/2019 - Meada Agame, Adigrat, Ethiopia
No ratings yet
Education and Training: 12/08/2015 - 10/06/2019 - Meada Agame, Adigrat, Ethiopia
3 pages
Kenya National Integrated Identity Management Systems (Niims)
No ratings yet
Kenya National Integrated Identity Management Systems (Niims)
11 pages
Parser Mini Pascal
No ratings yet
Parser Mini Pascal
6 pages
9.2 Notes 2DArray Challenges - Watermark
No ratings yet
9.2 Notes 2DArray Challenges - Watermark
11 pages
Setup Serial Serial Serial Serial Serial Serial: Void Begin Println Print Println Println Println
No ratings yet
Setup Serial Serial Serial Serial Serial Serial: Void Begin Println Print Println Println Println
5 pages
Chapter3 Excel
No ratings yet
Chapter3 Excel
30 pages
User Manual Toshiba Portégé Z20t-C (English - 145 Pages)
No ratings yet
User Manual Toshiba Portégé Z20t-C (English - 145 Pages)
2 pages
Human Factors Handbook: Kementerian Perhubungan Republik Indonesia
No ratings yet
Human Factors Handbook: Kementerian Perhubungan Republik Indonesia
530 pages
Led LCD TV: Service Manual
No ratings yet
Led LCD TV: Service Manual
45 pages
GCU 0103 Computer Platforms
No ratings yet
GCU 0103 Computer Platforms
5 pages