The blog phenomenon is universal. Blogs are characterized by their evaluative use, in that they enable Internet users to express their opinion on a given subject. From this point of view, they are an ideal resource for the constitution of an annotated sentiment analysis corpus, crossing the subject and the opinion expressed on this subject. This paper presents the Blogoscopy corpus for the French language which was built up with personal thematic blogs. The annotation was governed by three principles: theoretical, as opinion is grounded in a linguistic theory of evaluation, practical, as every opinion is linked to an object, and methodological as annotation rules and successive phases are defined to ensure quality and thoroughness.
Translation of « un type de site web composé essentiellement de billets (ou d’actualités) publiés au fil de l’eau et apparaissant selon un ordre anté-chronologique (les plus récents en haut de page), le plus souvent enrichis de liens hypertextes externes ».
Evan Williams launched Pyra Labs in 1999. This company created the first platform which allows people to create their own blog (Blogger.com). http://www.useit.com/alertbox/20001001_comments.html.
The enunciation is also considered as constituent of the act which consists in using the elements of the language to put them into discourse. Within the framework of a “textual linguistics”, we do not use this meaning of the term.
Over-Blog is a platform of blogs, which means a tool enabling the creation of blogs. This platform is managed by the company JFG network, the industrial partner application software of the Blogoscopy project, loaded with the extraction of the textual data.
For a complete typology of the modal zones, consult Galatanu (2002, pp. 17–32).
It should be noted, however, that the set of tags used to annotate [Blogoscopy] does not differentiate between sarcastic or ironic uses and clusters under the same attribute irony. In cases where the blogger employs metaphor to express an evaluation, the form is simply tagged according to the category of evaluation to which it belongs.
Appendix: Example of an annotated post
Appendix: Example of an annotated post
<?xml version=“1.0” encoding=“UTF-8”?>
<!DOCTYPE page SYSTEM “../pagev4.dtd”>
<page mes_blog_rank=“84” mes_mediametrie=““ tags_blog=“xbox old-gen” thematique=“Wii” url=“http://www.hoaxgames.net/”>
<billet age=““ auteur=“Olivier & Maxence” id_b=“B1020329120” profession=““ url=“http://www.hoaxgames.net/article-13982542.html“ orthographe=“standard” syntaxe=“correcte”>
<date>2007-11-21 21:28:00</date>
<titre><IA cc=“console”>Wii</IA>LE<CA cc=“C1”>CADEAU</CA><Appreciation type=“PIA” forme=“cadeau, Wii”>LE PLUS EN VOGUE</Appreciation>A<IA cc=“C2”>NOEL</IA></titre>
<partie organisation=“narratif”>
[Techno.branchez-vous.com] La<CC id_c=“C1”>console</CC>Wii de<IA cc=“console”>Nintendo</IA>serait encore une fois, cette année, le<CA cc=“C1”>cadeau de Noël</CA>le plus demandé. Le président de<IA cc=“console”>Nintendo America</IA>prévoit même des<CA cc=“C1, Wii”>ruptures de stock</CA>aux<IA cc=“régions du monde”>États-Unis</IA>et dans d’autres<CC id_c=“C3”>régions du monde</CC>. Même si on peut encore trouver la Wii dans plusieurs<CA cc=“C1, Wii”>magasins</CA>,<Opinion type=“Medium_Supposition_Certitude” forme=“Wii”>elle risque vite</Opinion>de devenir<Appreciation type=“PIA” forme=“Wii”>introuvable</Appreciation>d’ici le temps des<CC id_c=“C2”>fêtes</CC>même si Nintendo en produit 1,8 million par mois. Selon<IA cc=“analyste”>Gerrick Johnson</IA>, un<CA cc=“C4”>analyste</CA>de l’<CC id_c=“C4”>industrie du jouet</CC>chez<IA cc=“industrie du jouet”>BMO Capital Markets</IA>, plus personne n’achète de<CA cc=“C4”>jouets</CA>aux États-Unis en raison des<CA cc=“C4”>multiples rappels</CA>. En effet, des milliers de jouets ont été rappelés à cause de la présence de<CA cc=“C4”>peinture au plomb</CA>. À cause de cela, les<CA cc=“C4”>gens</CA><Appreciation=“PIA” forme=“jouets”>ont perdu confiance</Appreciation>dans les jouets.[…]
Daille, B., Dubreil, E., Monceaux, L. et al. Annotating opinion—evaluation of blogs: the Blogoscopy corpus. Lang Resources & Evaluation 45, 409–437 (2011). https://doi.org/10.1007/s10579-011-9154-z
https://doi.org/10.1007/s10579-011-9154-z