Leonhardt J, Anand A and Khosla M. Boilerplate Removal using a Neural Sequence Labeling Model. Companion Proceedings of the Web Conference 2020. (226-229).

Uzun E. A Novel Web Scraping Approach Using the Additional Information Obtained From Web Pages. IEEE Access. 10.1109/ACCESS.2020.2984503. 8. (61726-61740).

https://ieeexplore.ieee.org/document/9051800/

Jiang Z, Yin H, Wu Y, Lyu Y, Min G and Zhang X. (2019). Constructing Novel Block Layouts for Webpage Analysis. ACM Transactions on Internet Technology. 19:3. (1-18). Online publication date: 31-Aug-2019.

https://doi.org/10.1145/3326457

Alarte J, Silva J and Tamarit S. (2019). What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors. ACM Transactions on the Web. 13:2. (1-19). Online publication date: 31-May-2019.

https://doi.org/10.1145/3316810

Chen Y and Yao Z. (2019). Multi-layer Filtering Webpage Classification Method Based on SVM. Human Centered Computing. 10.1007/978-3-030-37429-7_56. (554-559).

http://link.springer.com/10.1007/978-3-030-37429-7_56

Vogels T, Ganea O and Eickhoff C. (2018). Web2Text: Deep Structured Boilerplate Removal. Advances in Information Retrieval. 10.1007/978-3-319-76941-7_13. (167-179).

https://link.springer.com/10.1007/978-3-319-76941-7_13

Uçar E, Uzun E and Tüfekci P. (2017). A novel algorithm for extracting the user reviews from web pages. Journal of Information Science. 43:5. (696-712). Online publication date: 1-Oct-2017.

https://doi.org/10.1177/0165551516666446

Omari A, Kimelfeld B, Yahav E and Shoham S. Lossless Separation of Web Pages into Layout Code and Data. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (1805-1814).

https://doi.org/10.1145/2939672.2939858

Yuan P, Li Y, Jin H and Liu L. Self-Adaptive Extracting Academic Entities from World Wide Web. Proceedings of the 2015 IEEE Conference on Collaboration and Internet Computing (CIC). (270-277).

https://doi.org/10.1109/CIC.2015.33

Madaan A and Chu W. (2015). In-depth querying of web-based medical documents. International Journal of Computational Science and Engineering. 11:3. (284-296). Online publication date: 1-Oct-2015.

https://doi.org/10.1504/IJCSE.2015.072650

AL-Ghuribi S and Alshomrani S. (2015). Bi-languages Mining Algorithm for Extraction Useful Web Contents (BiLEx). Arabian Journal for Science and Engineering. 10.1007/s13369-014-1530-8. 40:2. (501-518). Online publication date: 1-Feb-2015.

http://link.springer.com/10.1007/s13369-014-1530-8

Freire de Amorim E. HTML Segmentation for Different Types of Web Pages. The Evolution of the Internet in the Business Sector. 10.4018/978-1-4666-7262-8.ch005. (98-119).

http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-4666-7262-8.ch005

Wang J, Wu J, Zhang Y and He G. Content Information Extraction of Theme Web Pages Based on Tag Information. Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 01. (501-504).

https://doi.org/10.1109/ISCID.2014.257

Uzun E, Serdar Güner E, Kılıçaslan Y, Yerlikaya T and Agun H. (2014). An effective and efficient Web content extractor for optimizing the crawling process. Software—Practice & Experience. 44:10. (1181-1199). Online publication date: 1-Oct-2014.

https://doi.org/10.1002/spe.2195

Soska K and Christin N. Automatically detecting vulnerable websites before they turn malicious. Proceedings of the 23rd USENIX conference on Security Symposium. (625-640).

/doi/10.5555/2671225.2671265

Kurmi R and Jain P. (2014). Text summarization using enhanced MMR technique 2014 International Conference on Computer Communication and Informatics (ICCCI). 10.1109/ICCCI.2014.6921769. 978-1-4799-2352-6. (1-5).

http://ieeexplore.ieee.org/document/6921769/

Gao B and Fan Q. (2014). Multiple Template Detection Based on Segments. Advances in Data Mining. Applications and Theoretical Aspects. 10.1007/978-3-319-08976-8_3. (24-38).

http://link.springer.com/10.1007/978-3-319-08976-8_3

Fan Q, Yan C, Huang L and Huang L. (2014). Discovering Informative Contents of Web Pages. Web-Age Information Management. 10.1007/978-3-319-08010-9_20. (180-191).

http://link.springer.com/10.1007/978-3-319-08010-9_20

Hachenberg C and Gottron T. Locality sensitive hashing for scalable structural classification and clustering of web documents. Proceedings of the 22nd ACM international conference on Information & Knowledge Management. (359-368).

https://doi.org/10.1145/2505515.2505673

Schäfer R and Bildhauer F. (2013). Web Corpus Construction. Synthesis Lectures on Human Language Technologies. 10.2200/S00508ED1V01Y201305HLT022. 6:4. (1-145). Online publication date: 19-Jul-2013.

http://www.morganclaypool.com/doi/abs/10.2200/S00508ED1V01Y201305HLT022

Uzun E, Agun H and Yerlikaya T. (2013). A hybrid approach for extracting informative content from web pages. Information Processing and Management: an International Journal. 49:4. (928-944). Online publication date: 1-Jul-2013.

https://doi.org/10.1016/j.ipm.2013.02.005

Geraci F and Maggini M. (2013). A Fast Method for Web Template Extraction via a Multi-sequence Alignment Approach. Knowledge Discovery, Knowledge Engineering and Knowledge Management. 10.1007/978-3-642-37186-8_11. (172-184).

http://link.springer.com/10.1007/978-3-642-37186-8_11

Hu F, Li M, Zhang Y, Peng T and Lei Y. (2013). A Non-Template Approach to Purify Web Pages Based on Word Density. Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012. 10.1007/978-1-4471-4847-0_27. (221-228).

https://link.springer.com/10.1007/978-1-4471-4847-0_27

Ly P, Pedrinaci C and Domingue J. Automated information extraction from web APIs documentation. Proceedings of the 13th international conference on Web Information Systems Engineering. (497-511).

https://doi.org/10.1007/978-3-642-35063-4_36

Pappas N, Katsimpras G and Stamatatos E. Extracting informative textual parts from web pages containing user-generated content. Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies. (1-8).

https://doi.org/10.1145/2362456.2362462

Uzun E, Agun H and Yerlikaya T. (2012). Web content extraction by using decision tree learning 2012 20th Signal Processing and Communications Applications Conference (SIU). 10.1109/SIU.2012.6204476. 978-1-4673-0056-8. (1-4).

http://ieeexplore.ieee.org/document/6204476/

D’souza R, Kulkarni A and Mirza I. (2012). Automatic Link Generation for Search Engine Optimization. International Journal of Information and Education Technology. 10.7763/IJIET.2012.V2.163. (401-403).

http://www.ijiet.org/show-32-229-1.html

Madaan A, Chu W and Bhalla S. VisHue. Proceedings of the 7th international conference on Databases in Networked Information Systems. (89-108).

https://doi.org/10.1007/978-3-642-25731-5_9

Mukund S, Indurkhya N and Sundaresan N. Segmenting eBay item descriptions into coherent sections. Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data. (1-8).

https://doi.org/10.1145/2034617.2034625

Kim C and Shim K. (2011). TEXT. IEEE Transactions on Knowledge and Data Engineering. 23:4. (612-626). Online publication date: 1-Apr-2011.

https://doi.org/10.1109/TKDE.2010.140

Seo J, Diaz F, Gabrilovich E, Josifovski V and Pang B. Generalized link suggestions via web site clustering. Proceedings of the 20th international conference on World wide web. (77-86).

https://doi.org/10.1145/1963405.1963420

Spengler A and Gallinari P. Document structure meets page layout. Proceedings of the 10th ACM symposium on Document engineering. (151-160).

https://doi.org/10.1145/1860559.1860590

Kang J, Yang J and Choi J. (2010). Repetition-based web page segmentation by detecting tag patterns for small-screen devices. IEEE Transactions on Consumer Electronics. 56:2. (980-986). Online publication date: 1-May-2010.

https://doi.org/10.1109/TCE.2010.5506029

Kohlschütter C, Fankhauser P and Nejdl W. Boilerplate detection using shallow text features. Proceedings of the third ACM international conference on Web search and data mining. (441-450).

https://doi.org/10.1145/1718487.1718542

TSURUTA M and MASUYAMA S. (2010). An Extraction Method of an Informative DOM Node from a Web Page by Using Layout Information. Transactions of the Japanese Society for Artificial Intelligence. 10.1527/tjsai.25.742. 25. (742-756).

http://joi.jlc.jst.go.jp/JST.JSTAGE/tjsai/25.742?from=CrossRef

Guo W, Kim Y and Kang B. (2010). Webpage Segments Classification with Incremental Knowledge Acquisition. U- and E-Service, Science and Technology. 10.1007/978-3-642-17644-9_9. (79-87).

http://link.springer.com/10.1007/978-3-642-17644-9_9

Román P, Dell R and Velásquez J. (2010). Advanced Techniques in Web Data Pre-processing and Cleaning. Advanced Techniques in Web Intelligence - I. 10.1007/978-3-642-14461-5_2. (19-48).

http://link.springer.com/10.1007/978-3-642-14461-5_2

Vineel G. Web page DOM node characterization and its application to page segmentation. Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications. (325-330).

/doi/10.5555/1812598.1812659

Vineel G. (2009). Web page DOM node characterization and its application to page segmentation 2009 3rd International Conference on Internet Multimedia Services Architecture and Application (IMSAA). 10.1109/IMSAA.2009.5439444. 978-1-4244-4792-3. (1-6).

http://ieeexplore.ieee.org/document/5439444/

Laber E, de Souza C, Jabour I, de Amorim E, Cardoso E, Rentería R, Tinoco L and Valentim C. A fast and simple method for extracting relevant content from news webpages. Proceedings of the 18th ACM conference on Information and knowledge management. (1685-1688).

https://doi.org/10.1145/1645953.1646204

Li H, Vukovic M, Pingali G and Lee W. SolutionFinder. Proceedings of the 2009 IEEE International Conference on Services Computing. (73-80).

https://doi.org/10.1109/SCC.2009.64

Vieira K, Costa Carvalho A, Berlt K, Moura E, Silva A and Freire J. (2009). On Finding Templates on Web Collections. World Wide Web. 12:2. (171-211). Online publication date: 1-Jun-2009.

https://doi.org/10.1007/s11280-009-0059-3

Nam S, Na S, Lee Y and Lee J. DiffPost. Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. (791-795).

https://doi.org/10.1007/978-3-642-00958-7_87

Kohlschütter C and Nejdl W. A densitometric approach to web page segmentation. Proceedings of the 17th ACM conference on Information and knowledge management. (1173-1182).

https://doi.org/10.1145/1458082.1458237

Wang Y, Fang B, Cheng X, Guo L and Xu H. (2008). Incremental Web Page Template Detection by Text Segments 2008 IEEE International Workshop on Semantic Computing and Systems (WSCS). 10.1109/WSCS.2008.17. . (174-180).

https://ieeexplore.ieee.org/document/4570835

Lidong Bing , Yexin Wang , Yan Zhang and Hui Wang . (2008). Primary content extraction with Mountain Model 2008 8th IEEE International Conference on Computer and Information Technology (CIT). 10.1109/CIT.2008.4594722. 978-1-4244-2357-6. (479-484).

http://ieeexplore.ieee.org/document/4594722/

Wang Y, Fang B, Cheng X, Guo L and Xu H. Incremental web page template detection. Proceedings of the 17th international conference on World Wide Web. (1247-1248).

https://doi.org/10.1145/1367497.1367749

Chakrabarti D, Kumar R and Punera K. A graph-theoretic approach to webpage segmentation. Proceedings of the 17th international conference on World Wide Web. (377-386).

https://doi.org/10.1145/1367497.1367549

Punera K and Ghosh J. Enhanced hierarchical classification via isotonic smoothing. Proceedings of the 17th international conference on World Wide Web. (151-160).

https://doi.org/10.1145/1367497.1367518

Gottron T. Clustering template based web documents. Proceedings of the IR research, 30th European conference on Advances in information retrieval. (40-51).

/doi/10.5555/1793274.1793284

Urvoy T, Chauveau E, Filoche P and Lavergne T. (2008). Tracking Web spam with HTML style similarities. ACM Transactions on the Web. 2:1. (1-28). Online publication date: 1-Feb-2008.

https://doi.org/10.1145/1326561.1326564

Gottron T. Clustering Template Based Web Documents. Advances in Information Retrieval. 10.1007/978-3-540-78646-7_7. (40-51).

http://link.springer.com/10.1007/978-3-540-78646-7_7

Muwanguzi P. Globalisation of Financial Markets; Overcoming the Hurdles of Liberalisation. SSRN Electronic Journal. 10.2139/ssrn.1019290.

http://www.ssrn.com/abstract=1019290