Text2App - A Framework For Creating Android Apps From Text Descriptions
Text2App - A Framework For Creating Android Apps From Text Descriptions
Descriptions
Masum Hasan1* , Kazi Sajeed Mehrab1* , Wasi Uddin Ahmad2 , and Rifat Shahriyar1
1
Bangladesh University of Engineering and Technology (BUET)
2
University of California, Los Angeles (UCLA)
1
masum@ra.cse.buet.ac.bd, 1505025.ksh@ugrad.cse.buet.ac.bd, rifat@cse.buet.ac.bd
2
wasiahmad@cs.ucla.edu
{ "STRING0": "Speak" }
.apk
MIT App Inventor MIT App Inventor SAR Compiler
Ready to Use! backend Compiler Source (.scm, .bky)
Figure 2: Text2App Prediction Pipeline. A given text is formatted and passed to a seq2seq network to be translated
into SAR. Using a SAR Compiler, it is converted to App Inventor project, which can be built into an application.
Sample
NL
Figure 3: Automatic synthesis of NL and SAR parallel corpus. Bold-italic indicates text is selected stochastically.
Original: Create an app that has an audio player with pretrained weights, and the other one with Code-
source string0, a switch. If the switch is flipped, play BERT base (Feng et al., 2020) pretrained weights.
player.
RoBERTa is pretrained with natural language With
Augmentation: Create an app that has an external MLM objectives, that showed excellent results in
player with source string0, a switch. If the switch gets
numerous NLU tasks. CodeBERT is pretrained
flipped, play player.
similarly on an amalgam of natural languages and
Table 1: BERT mask filling based data augmentation source codes, thereby making it more familiar with
method. Mutated words are highlighted green. programming structure and terminologies.
list is selected with a descending weighted prob- 3.7 Simplifying Abstract Natural Language
ability. That means, the top predictions are more Instructions using GPT-3
likely to be selected, however, all 10 predictions
have a chance. This augmentation technique intro- In our survey sessions (Section 3.1) we found that
duces contextually correct unseen vocabulary to the users often provide highly abstract instructions
dataset, and familiarizes the sequential model with which requires external knowledge to understand
realistic natural language noises. Table 1 presents (e.g. “Create a photo editor app” – expects knowl-
an example of our BERT based augmentation. edge how a photo editor looks and works). Large
pretrained language models (LMs), such as GPT-3
3.6 NL to SAR Translation using Seq2Seq (Brown et al., 2020), have shown to understand ab-
Networks stract natural language concepts and even explain
We generate 50,000 unique NL and SAR parallel them in simple terms (Mayn, 2020). We incorpo-
data using our data synthesis method, and mutate rate this capability of GPT-3 to enable our model
1% of the natural language tokens. We split this to create applications from highly abstract instruc-
dataset into train-validation-test sets in 8:1:1 ratio tions. We provide GPT-3 with nine abstract app
and train three different models. concepts and their corresponding NL descriptions
Pointer Network: We train a Pointer Network such that it can be generated using our system (Ap-
(See et al., 2017) consisting of a randomly initial- pendix C). We then instruct it to describe an unseen
ized bidirectional LSTM encoder with hidden lay- abstract concept, and the description is sent to our
ers of size 500 and 250. As our output vocabulary seq2seq network in order to produce SAR. Table
is fixed, we do not use copy mechanism. 4 shows some successful (1, 2) and unsuccessful
Transformer with pretrained encoders: We (3, 4) app descriptions generated by GPT-3. Al-
create two sequence-to-sequence Transformer though the LM generates plausible sequences, it
(Vaswani et al., 2017) networks each having 12 fails to limit its prediction to our supported func-
encoder layers and 6 decoder layers. Every layer tionalities. With more features and functionalities
has 12 self-attention headsof size 64. The hidden added to Text2App, LM-based explanations can
dimension is 768. The encoder of one of the models be a viable method of creating apps from abstract
is initialized with RoBERTa base (Liu et al., 2020) specifications.
BERT Mutation
Test Unseen Pair
#Epoch 2% 5% 10%
BLEU EM BLEU EM BLEU EM BLEU EM BLEU EM
Unmutated Training Data
PointerNet 13.6 94.64 79.24 94.16 72.06 91.80 56.14 88.96 40.78 96.75 82.91
RoBERTa init 3 97.20 77.80 96.83 73.20 94.97 61.68 92.86 48.06 98.11 79.66
CodeBERT init 8 97.42 80.02 97.18 76.02 95.37 64.38 93.29 51.24 98.47 83.50
Training with 1% Mutation
PointerNet 23.2 95.03 81.40 94.85 79.46 93.85 72.04 92.53 63.68 96.68 83.33
RoBERTa init 3 97.66 81.76 97.60 80.66 96.91 76.16 96.04 70.10 98.64 84.68
CodeBERT init 7 97.64 81.66 97.51 80.20 96.74 74.98 95.71 67.58 98.62 84.51
Table 2: Comparison between Pointer Network and seq2seq Transformer with encoder initialized with RoBERTa
and CodeBERT pretrained weights. BLEU indicates BLEU-4 and Exact Match (EM) is shown in percent.
1. number adding app - make an app with a textbox, a From Table 2 we can see that adding as little as
textbox, and a button named ”+”. 1% mutation to the training data notably improves
SAR: <complist> <textbox> <textbox> <button> +
</button> </complist> all models’ ability to handle noisy input (up to
2. twitter app - make an app with a textbox, a button
22.04%). We also see that the RoBERTa initialized
named “tweet”, and a label. When the button is pressed, model performs best in all evaluation categories.
set the label to textbox text. Note that, all predictions reported in Table 2 are
SAR: <complist> <textbox> <button> tweet
</button> <label> label1 </label> </complist> valid SAR format.
<code> <button1clicked> <label1> <textboxtext1>
</label1> </button1clicked> </code> 5 Future Work
3. browser app - create an app with a textbox, a button
named “go”, and a button named “back”. When the button The core contribution of our project lies in the de-
“go” is pressed, go to the url in the textbox. When the velopment endeavour of building the SAR, the SAR
button “back” is pressed, go back to the previous page. compiler, and the SAR-NL parallel data synthesizer.
4. Google front page - make an app with a textbox, a but- Although, we are working on adding new features
ton named “google”, and a button named “search”. When
the button “google” is pressed, search google. When the
to Text2App, the total implemented functionalities
button “search” is pressed, search the web. is a small fraction of what is possible in our system.
For each new component added to our system, the
Table 3: Abstract instructions to simpler app descrip- possible app space grows exponentially. With more
tion using GPT-3. Unsupported functionalities in red development effort, we can expect notably more
and italic. utility from Text2App. We invite both software
development and NLP community to contribute to
this project and turn it into a general-purpose app
4 Evaluation development platform. Our short term goal with
In this section, we evaluate the three seq2seq net- Text2App is to add more functionalities and app
works mentioned in Section 3.6 – PointerNetwork, components. In the long term, we would like to
seq2seq Transformer initialized with RoBERTa, build SAR compilers for native application devel-
and seq2seq Transformer initialized with Code- opment platforms, such as, Android, iOS, etc., and
BERT. We evaluate the models in 3 different set- cross-platform frameworks like Flutter, Ionic, etc.
tings – firstly, in a held our test set, secondly, with
6 Conclusion
increasing amount of mutation in the test set (2%,
5% 10%) (Section 3.5), and finally, using data In this paper, we explore creating functional mo-
containing specific combinations of components bile applications from natural language text de-
(<button1clicked>, <text2speech>) that were scriptions using seq2seq networks. We propose
excluded during the training. These establishes the Text2App, a novel framework for natural language
models’ ability to handle unstructured NL instruc- to app translation with the help of a simpler inter-
tions and to generalize beyond the training patterns mediate representation of the application. The in-
it was trained on. Model checkpoints are selected termediate formal representation allows to describe
based on validation BLEU score. an app with significantly smaller number of tokens
than native app development languages. We also for Computational Linguistics: Human Language
design a data synthesis method guided by a human Technologies, Volume 1 (Long and Short Papers),
pages 4171–4186, Minneapolis, Minnesota. Associ-
survey, that automatically generates fluent natural
ation for Computational Linguistics.
language app descriptions and their formal repre-
sentations. Our AI aware design approach for a Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xi-
formal language can guide future programming lan- aocheng Feng, Ming Gong, Linjun Shou, Bing Qin,
Ting Liu, Daxin Jiang, and Ming Zhou. 2020. Code-
guage and frameworks development, where further BERT: A pre-trained model for programming and
source code generation works can benefit from. natural languages. In Findings of the Association
for Computational Linguistics: EMNLP 2020, pages
Acknowledgement 1536–1547, Online. Association for Computational
Linguistics.
We thank Prof. Zhijia Zhao from UCR for propos-
ing the problem that inspired this project idea. Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh
Kapoor, and Shashwat Gulyani. 2019. Sketch2code:
We also thank OpenAI, Google Colaboratory,
Transformation of sketches to ui in real-time using
Hugging Face, MIT App Inventor community, the deep neural network.
survey participants, and Prof. Anindya Iqbal for
feedback regarding modularity. This project was K. Kolthoff. 2019. Automatic generation of g raphical
user interface prototypes from unrestricted natural
funded under the ‘Innovation Fund’ by the ICT language requirements. In 2019 34th IEEE/ACM In-
Division, Government of the People’s Republic of ternational Conference on Automated Software En-
Bangladesh. gineering (ASE), pages 1234–1237.
Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey
Gaunt, and Oleksandr Polozov. 2019. Generative Svyatkovskiy, Ambrosio Blanco, Colin B. Clement,
code modeling with graphs. In International Con- Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li-
ference on Learning Representations. dong Zhou, Linjun Shou, Long Zhou, Michele Tu-
fano, Ming Gong, Ming Zhou, Nan Duan, Neel Sun-
Tom Brown, Benjamin Mann, Nick Ryder, Melanie daresan, Shao Kun Deng, Shengyu Fu, and Shujie
Subbiah, Jared D Kaplan, Prafulla Dhariwal, Liu. 2021. Codexglue: A machine learning bench-
Arvind Neelakantan, Pranav Shyam, Girish Sastry, mark dataset for code understanding and generation.
Amanda Askell, Sandhini Agarwal, Ariel Herbert- CoRR, abs/2102.04664.
Voss, Gretchen Krueger, Tom Henighan, Rewon
Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Andrew Mayn. 2020. Openai api alchemy: Sum-
Clemens Winter, Chris Hesse, Mark Chen, Eric marization – @andrewmayne. https://
Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, andrewmayneblog.wordpress.com/2020/06/
Jack Clark, Christopher Berner, Sam McCandlish, 13/openai-api-alchemy-summarization/.
Alec Radford, Ilya Sutskever, and Dario Amodei. (Accessed on 03/22/2021).
2020. Language models are few-shot learners. In
Advances in Neural Information Processing Systems, K. Moran, C. Bernal-Cárdenas, M. Curcio, R. Bonett,
volume 33, pages 1877–1901. Curran Associates, and D. Poshyvanyk. 2020. Machine learning-based
Inc. prototyping of graphical user interfaces for mobile
apps. IEEE Transactions on Software Engineering,
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and 46(2):196–221.
Kristina Toutanova. 2019. BERT: Pre-training of
deep bidirectional transformers for language under- Emilio Parisotto, Abdel rahman Mohamed, Rishabh
standing. In Proceedings of the 2019 Conference Singh, Lihong Li, Dengyong Zhou, and Pushmeet
of the North American Chapter of the Association Kohli. 2017. Neuro-symbolic program synthesis.
In Proceedings of the 5th International Conference
on Learning Representations (ICLR 2017), Toulon,
France.
Maxim Rabinovich, Mitchell Stern, and Dan Klein.
2017. Abstract syntax networks for code generation
and semantic parsing. In Proceedings of the 55th An-
nual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 1139–
1149, Vancouver, Canada. Association for Computa-
tional Linguistics.
Alex Robinson. 2019. Sketch2code: Generating a web-
site from a paper mockup.
Abigail See, Peter J. Liu, and Christopher D. Manning.
2017. Get to the point: Summarization with pointer-
generator networks. In Proceedings of the 55th An-
nual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 1073–
1083, Vancouver, Canada. Association for Computa-
tional Linguistics.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz
Kaiser, and Illia Polosukhin. 2017. Attention is all
you need. In Advances in Neural Information Pro-
cessing Systems, volume 30. Curran Associates, Inc.
Pengcheng Yin and Graham Neubig. 2017. A syntactic
neural model for general-purpose code generation.
In Proceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics (Volume 1:
Long Papers), pages 440–450, Vancouver, Canada.
Association for Computational Linguistics.