Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2783258.2788628acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

An Architecture for Agile Machine Learning in Real-Time Applications

Published: 10 August 2015 Publication History

Abstract

Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive.
The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models.
We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks.

References

[1]
Data mining hackathon on (20 mb) Best Buy mobile web site - ACM SF Bay Area Chapter. http://bit.ly/1O3eDOD. Accessed: 2015-02--20.
[2]
Stream processing explained. http://www.sqlstream.com/stream-processing/. Accessed: 2015-02--20.
[3]
T. Akidau, A. Balikov, K. Bekiro\uglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: Fault-tolerant stream processing at internet scale. Proc. VLDB Endow., 6(11):1033--1044, Aug. 2013.
[4]
M. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. J. Cafarella, A. Kumar, F. Niu, Y. Park, C. Ré, and C. Zhang. Brainwash: A data system for feature engineering. In CIDR, 2013.
[5]
K. Beck, M. Beedle, A. van Bennekum, A. Cockburn, W. Cunningham, M. Fowler, et al. The Agile manifesto. http://agilemanifesto.org/, 2001.
[6]
D. Betts, J. Dominguez, G. Melnik, F. Simonazzi, and M. Subramanian. Exploring CQRS and Event Sourcing: A Journey into High Scalability, Availability, and Maintainability with Windows Azure. Microsoft patterns & practices, 2013.
[7]
F. P. Brooks Jr. The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, 2/E. Addison-Wesley Professional, 1995.
[8]
S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In VLDB, pages 203--214, 2002.
[9]
K. Chandy and W. Schulte. Event Processing: Designing IT Systems for Agile Companies. McGraw-Hill, Inc., New York, NY, USA, 2010.
[10]
D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. CoRR, abs/1409.3809, 2014.
[11]
D. G. Feitelson, E. Frachtenberg, and K. L. Beck. Development and deployment at Facebook. IEEE Internet Computing, 17(4):8--17, July 2013.
[12]
P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. WTF: The who to follow service at Twitter. In WWW, pages 505--514, 2013.
[13]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009.
[14]
L. Hong, R. Bekkerman, J. Adler, and B. D. Davison. Learning to rank social update streams. In SIGIR, pages 651--660, 2012.
[15]
R. Jurney. Agile Data Science: Building Data Analytics Applications with Hadoop. O'Reilly Media, 2013.
[16]
S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. Enterprise data analysis and visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions on, 18(12):2917--2926, 2012.
[17]
K. Kapoor, M. Sun, J. Srivastava, and T. Ye. A hazard based approach to user return time prediction. In KDD, pages 1719--1728, 2014.
[18]
R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, and N. Pohlmann. Online controlled experiments at large scale. In KDD, pages 1168--1176, 2013.
[19]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, (8):30--37, 2009.
[20]
T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan. MLbase: A distributed machine-learning system. In CIDR, 2013.
[21]
J. Kreps. The log: What every software engineer should know about real-time data's unifying abstraction. http://linkd.in/1fDnlQk, Dec. 16 2013.
[22]
A. Kumar, F. Niu, and C. Ré. Hazy: Making it easier to build and maintain big-data analytics. Commun. ACM, 56(3):40--49, Mar. 2013.
[23]
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670, 2010.
[24]
G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76--80, 2003.
[25]
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009.
[26]
N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co., 2015.
[27]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, et al. Ad click prediction: A view from the trenches. In KDD, pages 1222--1230, 2013.
[28]
B. Meyer. Agile!: The Good, the Hype and the Ugly. Springer Science & Business Media, 2014.
[29]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1999.
[30]
J. Schleier-Smith. System and method of selecting a relevant user for introduction to a user in an online environment, June 17 2014. US Patent 8,756,163.
[31]
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine learning: The high interest credit card of technical debt. In SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014.
[32]
C. Zhang, C. Ré, A. A. Sadeghian, Z. Shan, J. Shin, F. Wang, and S. Wu. Feature engineering for knowledge base construction. CoRR, abs/1407.6439, 2014.

Cited By

View all
  • (2024)Teaching Machine Learning as Part of Agile Software EngineeringIEEE Transactions on Education10.1109/TE.2023.333734367:3(377-386)Online publication date: Jun-2024
  • (2024)Architecting ML-enabled systems: Challenges, best practices, and design decisionsJournal of Systems and Software10.1016/j.jss.2023.111860207(111860)Online publication date: Jan-2024
  • (2023)The pipeline for the continuous development of artificial intelligence models—Current state of research and practiceJournal of Systems and Software10.1016/j.jss.2023.111615199:COnline publication date: 22-Mar-2023
  • Show More Cited By

Index Terms

  1. An Architecture for Agile Machine Learning in Real-Time Applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Check for updates

    Author Tags

    1. agile
    2. machine learning
    3. recommender systems

    Qualifiers

    • Research-article

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)186
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Teaching Machine Learning as Part of Agile Software EngineeringIEEE Transactions on Education10.1109/TE.2023.333734367:3(377-386)Online publication date: Jun-2024
    • (2024)Architecting ML-enabled systems: Challenges, best practices, and design decisionsJournal of Systems and Software10.1016/j.jss.2023.111860207(111860)Online publication date: Jan-2024
    • (2023)The pipeline for the continuous development of artificial intelligence models—Current state of research and practiceJournal of Systems and Software10.1016/j.jss.2023.111615199:COnline publication date: 22-Mar-2023
    • (2023)Artificial Intelligence Enables Agile Software Development Life CycleAgile Software Development10.1002/9781119896838.ch17(325-343)Online publication date: 8-Feb-2023
    • (2022)DESIGN AN AGILE OF MACHINE LEARNING TO PREDICTIVE HOUSE PRICING AND TARGETING SEGMENTED MARKETProceedings of the 2022 International Conference on Engineering and Information Technology for Sustainable Industry10.1145/3557738.3557856(1-8)Online publication date: 21-Sep-2022
    • (2022)Software Engineering for AI-Based Systems: A SurveyACM Transactions on Software Engineering and Methodology10.1145/348704331:2(1-59)Online publication date: 1-Apr-2022
    • (2021)Software Project Management Using Machine Learning Technique—A ReviewApplied Sciences10.3390/app1111518311:11(5183)Online publication date: 2-Jun-2021
    • (2021)AgileML: A Machine Learning Project Development Pipeline Incorporating Active Consumer Engagement2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)10.1109/CSDE53843.2021.9718470(1-7)Online publication date: 8-Dec-2021
    • (2020)Spatiotemporal Evolution of Urban Expansion Using Landsat Time Series Data and Assessment of Its Influences on ForestsISPRS International Journal of Geo-Information10.3390/ijgi90200649:2(64)Online publication date: 21-Jan-2020
    • (2020)Design and Development of Machine Learning Technique for Software Project Risk Assessment - A Review2020 8th International Conference on Information Technology and Multimedia (ICIMU)10.1109/ICIMU49871.2020.9243459(354-362)Online publication date: 24-Aug-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media