Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Assignment3

The document outlines an assignment involving two scenarios related to experimental design and data analysis. It includes contributions from four group members, details various experimental designs, hypotheses, statistical tests, and qualitative data analysis methods. The findings indicate no significant difference between two IDEs based on implementation time, and emphasize the importance of management support and communication in industry-academia collaborations.

Uploaded by

210IIM2OO2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment3

The document outlines an assignment involving two scenarios related to experimental design and data analysis. It includes contributions from four group members, details various experimental designs, hypotheses, statistical tests, and qualitative data analysis methods. The findings indicate no significant difference between two IDEs based on implementation time, and emphasize the importance of management support and communication in industry-academia collaborations.

Uploaded by

210IIM2OO2
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Group 1 - Assignment 3

March 6, 2025

Name P. Number Contribution (25% for each if all contributed equally)


Filippo Muscherà 991219T591 30%
Harsha Vardhan Devulapalli 010623T251 25%
Mandela Ogunleye 910412T877 25%
Maria Styliani Schistocheili 021126T540 20%

Table 1: Project Contribution Table

1 Scenario 1
1.1 Answer the following questions related to the design
of this experiment
• 1. What are the objects, subjects, treatments, and factors used in the four
designs listed above for this experiment?
Objects: The programming tasks
Subjects: 12 students
Treatments: Two different IDEs
Factors: IDE-A or IDE-B and Prog-1 or Prog-2
• 2. How would you describe Design-A and Design-D in terms of a standard
design type, e.g., one factor or two treatments?
Design-A is one factor with two treatments, while Design-D is a paired
comparison design.

• 3. What are the benefits and limitations of using Design-B instead of


Design-A?
A good thing about this design is that each student uses both IDEs, so
this will reduce variability. This happens because this design minimizes
the impact of individual differences on the data.

1
This design also has limitations: there is the threat of maturation, because
the second time a student is going to implement Program-1, it might be
faster not because of the IDE, but because it is the second time imple-
menting that specific program, and experience might have been gain from
the first implementation, even if it was done with a different IDE.

• 4. What problems/mistakes can you identify in Design-C?


This design assumes ”that tasks Prog-1 and Prog-2 are different but
equally difficult”. This is a strong assumption because if it is not true
it might undermine the experiment’s validity. If this assumption turns
out to be false, you would end up basing all subsequent data analysis and
interpretation on data that do not reflect reality. The efficiency an IDE
allows students to have may be related to the complexity of the program,
and not to the quality of the IDE itself.
• 5. Does Design-D solve the problems you have identified in Design-B and
Design-C?
It does because students will not implement twice the same program, so
there is no threat of maturation. There is also no assumption about the
difficulty of each program.
• 6. What are the benefits and limitations of the designs Design-A and
Design-D? e.g. one is easier to analyze.
Design-A is easier to analyze since only one program (Prog-1) is used in
the experiments, so there are fewer variables to consider in the experiment.
Design-D is more complete because it tests all the possible combinations
of IDEs and tasks. But this also means that it is more difficult to analyze
since there are more groups and it involves more variables.

• 7. What variables must be controlled in Design-A to increase the validity


of the experiment? e.g. previous experience/familiarity of subjects to
IDE’s?
The familiarity of each subject with both IDEs must be taken into account.
It would also be important to consider the familiarity of each student with
Prog-1 and Prog-2. This is because these factors might influence the time
taken for students to complete the task, and then influence the results.
Also, it must be controlled previous experience of each student, to make
sure that students’ background will not affect the results.

1.2 Answer the following questions related to the analysis


of an experiment with Design-A (as shown in Table
1) and the results in Table 5
• 1. State the null and alternative hypothesis for this investigation

2
Null Hypothesis H0 : There is no difference in the average implementation
time between IDE-A and IDE-B.
Alternative Hypothesis HA : There is a difference in the average imple-
mentation time between IDE-A and IDE-B.
• 2. Use descriptive statistics and visualize the data in Table 5 use e.g. box
plots, histograms, and scatter plots. Which visualization tool helped you
develop some insights into the data? What were the insights, e.g. any
interesting patterns or trends in the data, a clear difference in efficiency
between two IDEs, or outliers?
We used RStudio to visualize the data and run the data analysis.

Figure 1: Box Plot for Design-A data, as shown in Table 5

We can see from the box plot that IDE-A has a lower median than IDE-B.
Still, there’s no significant difference between the two. From this box plot,
we can also see that data for IDE-A have a higher variance than the ones
for IDE-B. Finally, from the plot it appears that there are no outliers in
the given data.
• 3. Choose and justify your choice of a parametric/nonparametric test
for analyzing the given data (document the steps you undertook and the
results).
To analyze the data we choose a parametric test. Specifically, we run the
Student’s t-test for differences between population means, as described in
section 6.6.2.2 of Fenton, N., & Bieman, J. (2014). Software metrics: a
rigorous and practical approach.

3
We run this test since it is used to compare two independent groups. Be-
fore running this test, we needed to check if the data given would respect
the assumptions of this test, as described in paragraph 6.6.2.2 of ”Fen-
ton, N., & Bieman, J. (2014). Software metrics: a rigorous and practical
approach.” Specifically, we checked the distributions of the data in both
groups, since they have fewer than 30 subjects per group. To do so, we
produced a Q-Q plot and ran the Shapiro-Wilk normality test.

Figure 2: Q-Q Plot for Design-A data

The Shapiro-Wilk normality test produced a p-value = 0.3661, and so the


null hypothesis that the sample comes from a normal distribution, cannot
be refused. The Q-Q plot also gives a visual confirmation of the fact that
the data came from a Normal Distribution. It compares the cumulative
distribution of the observed variable with the cumulative distribution of
the normal. If the observed variable has a normal distribution, the points
of this joint distribution thicken on the diagonal running from bottom to
top and from left to right. As we can see in the plot, for both IDE data,
the points tend to be on this diagonal, confirming the normality of the
distributions.
Then, to be able to run the t-test, we should also check that the variance
of scores in the two population groups is equal or homogeneous. To do
that, we used the f-test. This test has a null hypothesis stating that the
variance of the two datasets is homogeneous. The p-value obtained from
this test is 0.6085, and so this null hypothesis cannot be rejected.
• 4. Run the statistical method and report if you can reject the null hypoth-
esis. Please interpret your results, what does this imply for the objective
of the study?
We run the Student’s t-test, obtaining a p-value = 0.4129. This value is
significantly higher than the threshold of 0.05, and for this reason, the null
hypothesis cannot be rejected. This means that the difference in the data
is not significant, and so with these data, we cannot conclude if, with one
of the IDEs, there is a significant difference in development times.

4
• 5. Based on the results would you be confident to recommend an IDE
either IDE-A or IDE-B for use in your company? Why or why not?
We would not be confident about recommending either IDE-A or IDE-
B, since from the current data, it cannot be stated if one of the two is
better than the other in terms of efficiency. This is because, as discussed
in question 4., the null hypothesis cannot be rejected.

2 Scenario 2
• A. Describe the approach that you will follow to analyze the given data
(i.e. the three papers identified in Section 2.2). Please read Chapter 18
of C. Robson, K. McCartan, Real world research: A resource for social
scientists and practitioner-researchers. Fourth Edition. Wiley, 2016, to
make an informed decision about your approach and the steps you take.
For example, the analysis approach you will use (a. Quasi-statistical ap-
proach, b. thematic coding approach or c. grounded theory approach).
Also describe your mechanism for coding the data. Also explain why you
chose the approach over other alternatives.
In this scenario, the approach chosen for qualitative data analysis is the-
matic code analysis. We chose this approach mainly because of its flex-
ibility. Indeed, as reported in ”Chapter 18 of C. Robson, K. McCar-
tan, Real world research: A resource for social scientists and practitioner-
researchers”, thematic analysis is “very flexible, can be used with virtually
all types of qualitative data.” Moreover, it is a convenient instrument for
summarizing the key features of any volume of qualitative data and can
be used for a wide range of fields and disciplines. Another aspect that im-
proves the flexibility of this approach is that it can be used with virtually
any type of qualitative data. We will use it to compare findings across mul-
tiple papers in a structured way and find common key concepts. We also
chose this approach because it can be used for exploratory and descriptive
studies, unlike approaches such as grounded theory analysis, where the
focus is on “generating a theory to explain what is central in the data”.
Such approach is more strict and time-consuming. Our mechanism for
coding the data involves carefully reading the papers under analysis sev-
eral times, assigning a coding to each identified unit. We carried out this
approach manually, as we would have done if we had worked on paper,
highlighting the individual segments and annotating the related coding in
the margin. Then, we put similar ideas together to make the information
easier to understand. This part of the process is the one that will allow
us to produce themes.
• B. Please describe the coding procedure that you followed. For each step,
please provide an example of how you coded the information in the papers
Specifically, we segmented the data into smaller units (sentences) and
we labeled each unit with a code. We used an iterative approach to re-

5
view and refine our coding, improving segmentation and label assignment.
To identify our codes, we focused mainly on ”Consequences”, ”Strategies,
Practices or Tactics”, and ”Conditions or Constraints”, to try and identify
the core information about collaboration between researchers and indus-
try. These codes were then grouped to form themes where we identified
similar units. To define themes we tried to identify codes that discusses
the same macro-topic. This helped us to discover patterns and understand
the main ideas in the papers.
Here we provide an example of our coding mechanism:

Figure 3: Example 1: coding

6
Figure 4: Example 1: coding - cont.

After coding, similar segments can be grouped to form themes. In our


example, we could group ”Lack of pull”, ”Internal Motivation” and ”Fi-
nancial commitment as solution” together, to form a theme. This theme
could be named ”Organizational commitment by the industry”. Simi-
larly, ”Management support importance” and ”Reduction of Instability”
can be grouped under the theme ”Management Support Role”. Finally,
for our example, we can group ”long/short-term conflict”, ”Time alloca-
tion discussion” and ”Workload balance” in the ”Differences in Industry
and Academic needs” theme.
• C. Answer the following questions by citing examples from your analysis
of the three studies (provided in Section 2.2):

1. Which challenges or impediments for industry-academia collabora-


tions have been raised by the papers?
Management support emerges as a crucial point for successful col-
laboration between industry and academics. Lack of management
support can determine a failure in the collaboration. In the first pa-
per the author identifies the lack of management support and the
lack of ’pull’ among the reasons that determined the failure of the
collaboration with respect to its original plans. This means also that
goals for the collaboration must be defined jointly. In the second
paper, this is highlighted as a point of strength in the collaboration.
Also, another challenge can be the flexibility of both personnel and
researchers. This is a common key point in all of the papers analyzed.
In fact they agree on the fact that ”the ability by the researcher to
adjust to different cultures, including time perspective is another key
to success” (P. Runeson) and ”The flexibility of both teams was vital
for proper coordination” (S. Martı́nez-Fernández et al.)
In the papers, it is also noticeable that there is a strong focus on the
benefits that industry and academia can gain from their collabora-
tion. It is emphasized that there needs to be a push for collaboration
on both sides to have a collaboration that can benefit both of these
worlds. Academia can benefit from the real-world relevance of its

7
research, while industry can benefit from the fact that it can support
innovation in concrete ways. Finally, lack of commitment on either
side can lead to minimal or even no technology transfer, which can
mean failure for the collaboration itself.
2. What patterns have been proposed for industry-academia collabora-
tions?
For successful industry-academic collaborations, we need technology
transfer models, for example, Gorschek proposed seven steps for a
successful collaboration as we can see below.

Figure 5: Gorschek model

This pattern is introduced in Gorschek’s paper, and it is used in both


the other two papers. Altough, in Martı́nez-Fernández’s paper, the
authors state that ”Although we did not follow the full Gorschek et
al.’s process, we applied the first steps.”.
Runeson’s experience also suggests a balanced approach between the
push of research ideas and the pull of solutions by the industry. An-
other pattern that we found in the qualitative data we analyzed, is the
need for a continuous, two-way, communication and interaction be-
tween researchers and practitioners, based on frequent meetings and
on-site presence. It is also important to adapt the solutions and in
general the collaboration, to the specific situation, context and type
of industry. This process is sometimes referred to as ”tailoring”.
3. What should be avoided during industry-academia collaborations?
When universities and companies work together, some problems can
make the collaboration difficult. Poor communication can lead to
misunderstandings and delays. In addition, it is important to avoid
absent or inadequate management support, and there must always
be a “champion” as reference for the researcher within the industry.
Ideally, to ensure stability this figure should not change over time. It
should also be avoided having too little interaction, both in terms of
communication and the researcher’s on-site presence at the industry.
Finally, it is important not to impose a rigid solution that does not
consider the specific characteristics of the situation in which this is

8
applied. Failing in this aspect, can lead to solutions that are then
poorly applicable in practice, and as stated by Gorschek et al., based
on their experience, “from our point of view the value of these results
were directly linked to usability in industry.”

You might also like