SNSW Unit-5
SNSW Unit-5
SNSW Unit-5
&
Semantic Web
• The analysis would then return the coefficients of this linear equation
Predicting the goodness of fit
• Many personal factors influence the success of extracting social networks
from the Web.
• For example, as we already saw the amount of information available about
a person or the commonality of someone’s name on the Web.
• The author has collected attribute data from survey, so he used it to
investigate whether some of these factors can be surly linked to the
success of obtaining the person’s social network from the Web.
• If we find measures that help us to predict when the extraction is likely to
fail we can exclude those individuals from the Web-based extraction and
try other methods or data sources for obtaining information about their
relations.
Predicting the goodness of fit
• We have to measure the similarity between personal networks from
the survey and the Web and correlate it with attributes of the
subjects.
• The attributes the author has consider are those from the survey,
• The number of relations mentioned (surveydegree).
• The age of the individual.
• The number of years spent at the VU (variables age and entry).
• He also look at Web-based indicators such as
• The number of relations extracted (miningdegree).
• The number of pages for someone’s name, which we recode based on its
distribution by taking the logarithm of the values (pagecount).
Predicting the goodness of fit
• Last, experimented with measures for name ambiguity based
on the existing literature on name disambiguation using
clustering methods.
• The author used two measures
• NC1: Jaccard-coefficient between the first name and the last
name.
• NC2: The ratio of the number of pages for a query that includes
the full name and the term Vrije Universiteit divided by the
number of pages for the full name only.
Predicting the goodness of fit
• Findings:
• None of the survey attributes has a direct influence on the result.
• The NC1 measure has no significant effect.
• The NC2 measure has a negative effect on the F-measure.
Evaluation through analysis
• If the network extraction is optimized to match the results of the survey it
will give similar results in analysis.
• However, we will see that a 100% percent match is not required for
obtaining relevant results in applications: most network measures are
statistical aggregates and thus relatively robust to missing or incorrect
information.
• Group-level analysis, for example, is typically insensitive to errors in the
extraction of specific cases.
• The macrolevel social structure of our department can be retrieved by
collapsing this network to show the relationships between groups using
the affiliations or by clustering the network.
• The two networks reveal the same underlying organization:
Evaluation through two of the groups (the AI and BI sections) built close
relationships with each other and with the similarly densely
analysis linked Computer Systems group.
Evaluation through analysis
• Our experiments also show the robustness of centrality measures
such as degree, closeness and betweenness.
• For example, if we compute the list of the top 20 nodes by degree,
closeness and betweenness we find an overlap of 55%, 65% and 50%,
respectively.
• While this is not very high, it signifies a higher agreement than would
have been expected: the correlation between the values is 0.67, 0.49,
0.22, respectively. The higher correlation of degrees can be explained
by the fact that we optimize on degree when we adjust our networks
on precision/recall.