Having established the importance of realistic 3D product display and its influence on immersive e-commerce, as well as the resource constraints faced by smaller businesses, we proceed to the study of various scanning methods that try to find a balance between quality and cost, as well as good performance during real-time graphics processing using VR Headsets.
Therefore, in the following subsections we will present in detail the software and hardware used in the study for the digitisation of physical products, characteristics and quality metrics for the comparison of the digitisation techniques and, finally, the experimentation carried out with some products considered as samples.
3.1. Software Tools Selection Based on Scanning Techniques
In
Section 2, state-of-the-art 3D scanning techniques were introduced, including photogrammetry, LiDAR mapping and NeRF. In the current section, we will focus on choosing low-cost tools and technologies that support these techniques and the needs of small businesses. We would like to point out that all the selected software tools have a user interface very similar to that of the camera application integrated with the mobile phone. In addition, they usually include tooltips and visual guides to perform scans in video mode. This makes the learning curve very easy for people who are not experts in the use of scanning technologies [
28], which is an incentive to suggest the use of these applications as opposed to a 3D scanner.
LiDAR sensors were first integrated in smartphones only in iPhone and iPad devices from the 4th generation (2020). Several authors have conducted studies with these sensors that show good results in terms of accuracy of the generated models. For example, ref. [
29] stated that the accuracy is of ±1 cm for small objects with side length greater than 10 cm, with the limit of detection for objects being around 5 cm. This suggests that LiDAR sensors might be more accurate and preferable to use for large objects. Even though these devices are considerably cheaper compared with professional 3D scanners, they might not be affordable for every small business owner. Therefore, it is necessary to explore scanning tools available in Android-based devices, as this operating system for mobile devices has 71.44% market share globally (
https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/ accessed on 7 July 2024) and is the operating system installed in the majority of mobile devices. Therefore, we searched for scanning applications available in either Android or iOS to select a set of tools that allow us to study the three technologies mentioned above. Even though some applications found had both Android and iOS versions, the features offered differed in some cases. Therefore, we proceeded to explore scanning applications from various sources, including the literature, to identify the most used ones by researchers.
Regarding the current state of photogrammetry applications, the image acquisition phase of such a process is easily done by the cameras of current mobile devices [
30]. These cameras have superior specifications to those on previous years’ mobile phones, especially when they are high-end devices, allowing one to obtain high-quality images from which to generate the 3D models. In addition, the rest of the phases (image processing, point cloud generation, mesh generation and post processing) are done by more powerful processors in current mobile phone devices or even by a cloud-based computing process. Therefore, we explored free or low-cost applications available for Android devices. Moreover, we considered user’s ratings and reviews as well as the published results made with such applications in order to decide whether to consider them or not. Since all of them are based on the photogrammetry technique, we also considered additional features made public by the developers of such applications, such as editing tools for the obtained model inside the application, estimated time for generating the model, subscription prices, exporting formats or the guides offered for scanning.
With respect to NeRF, we believe that this novel technique can offer high-quality results, even when compared with professional scanners. For instance, the authors of [
31] showed encouraging results, where, compared with Agisoft Metashape software 2.1.0, the accuracy comparisons performed over different datasets gave differences of less than 1 cm, averaging around 0.5 cm. Due to the novelty of this technique, we have only chosen Luma AI since it is free and accessible to small retailers, aligning with our goal of identifying low-cost solutions, and it has been more used in the literature, demonstrating its effectiveness and reliability in generating high-quality 3D models using the NeRF technique.
Table 1 shows the relevant features of the found scanning technologies regarding the economic resources of small businesses.
The feature for exporting the models is essential in order to both evaluate its characteristics and to use it in further VR applications. Note that the pricing for these applications is very similar, hence, the pricing is not a differential factor. However, after conducting a test scan under the same conditions on one of the objects selected with these applications, we decided to use Polycam as the application for evaluating photogrammetry and LiDAR techniques. Polycam was selected for its user-friendly interface, extensive usage in the literature, and comprehensive feature set. It supports both photogrammetry and LiDAR scanning technologies, which are crucial for our comparative analysis. Additionally, Polycam offers multiple export formats and editing tools, making it a versatile option for small businesses aiming to digitize their product catalogs for VR shopping environments.
Therefore, we selected Polycam and Luma AI as the scanning applications to be used in this study, the second one for being the only one to use NeRF technology by the time we carried out the selection process. Once we finished the selection process described previously, we began the scanning process as well as the definition of characteristics to be measured.
3.4. Scanning of Basic 3D Primitives
Because of the above, we decided to build, scan and model the objects shown in
Figure 1. The cube, pyramid and sphere are basic primitives used in computer graphics software and are easy to model in a program like Blender 3.5 (
https://www.blender.org/ accessed on 7 July 2024) with little experience. So are the letters ’A’, ’I’ and ’R’, in this case in Times New Roman font. After obtaining the necessary dimensions for the model of the expanded polystyrene objects and modelling them in Blender, we obtained a model that was close to the real one, with a significantly reduced amount of polygons, which allowed us to calculate the error of the technology we used as a reference in the comparisons. Thus, despite having the most detailed model of one of the tools we use as a reference, we will be aware of the error it has in advance.
Following the scanning process defined in
Section 3.5, these figures were scanned with Luma AI to export the high-poly models as a result, as well as with both devices. Then, we used the 3D models designed with Blender as a reference model to obtain the results of the MSDM2 (Mesh Structural Distortion Measure 2) metric [
36] and the accuracy of the 3D model (see
Section 3.3) obtained as a result of the scanning. In this way, we quantified the inherent errors these scanning applications have. The philosophy behind MSDM2 is very similar to that of CMDM (see
Section 3.3), which aims to provide a score of the subjective visual quality, as perceived by human vision, of a 3D model with respect to a reference model. However, unlike CMDM, this metric considers features related to colour, and we decided not to use it since the models created with Blender did not have textures. This decision was made because it could compromise the accuracy when obtaining the score.
Table 2 shows the results obtained for the mentioned metrics. The RMS values obtained with the comparisons obtained from CloudCompare 2.9.3 have resulted in 0.00425 ± 0.00253 m. That is, on average we can count on an error of approximately 0.42 cm between the distance from one point to another of the real object with respect to the generated 3D model. Depending on the dimensions of the object, 0.42 cm could be critical, therefore, considering the dimensions of real expanded polystyrene objects (around 30 cm), which can be consulted in the following repository:
https://github.com/AIR-Research-Group-UCLM/PDIVR-ZOCO accessed on 7 July 2024. Taking these dimensions into account, the average error is 1.4 %. With respect to the MSDM2 values obtained, the mean is 0.112375. A value of 0, as in CMDM, indicates that the meshes are identical. Furthermore, in the work of the authors of the metric, graphical examples are shown in which a value of 0.14 is barely noticeable to the human eye [
36]. With these results, we can anticipate the error made in the scans when using this type of model as a reference for comparisons.
3.5. Scanning of Objects Selected for the Study
We have followed best practice guidelines provided by Polycam 1.3.10 (
https://learn.poly.cam/about accessed on 7 July 2024) and Luma 1.3.8 (
https://docs.lumalabs.ai/MCrGAEukR4orR9 accessed on 7 July 2024), including good lighting and moving slowly while capturing or avoiding objects with transparent materials or complex reflections. There are a set of well-known materials for which it can be hard to obtain good-quality results from a scan. Those are transparent materials (i.e., glass or plastic), reflective materials (i.e., mirror-like or metallic surfaces), smooth and even surfaces (i.e., a white wall) and furry materials (i.e., a carpet). It can be seen in
Figure 2b that the mirror-like material of the object is badly represented, with holes in the 3D model and the bottom part not being filled. Therefore, we tried to avoid objects for this work with such materials in order to obtain high visual quality results for later VR experimentation, in which a high level of realism is important. However, we selected two objects that have one of the materials listed above to research the impressions caused on users in further studies.
Although the guidance interface in Luma is more comprehensive than that of Polycam, it required, on average, 4 min more to complete the rings. We used a round table with a radius of 50 cm to rotate around the object placed on it to capture the different points of view. We did not take into account the processing time for generating the model since both applications are based on cloud computing to process it. However, the usage of LiDAR sensors in Polycam allows for local processing, generating the model in a range of 1 min 30 s and 3 min, depending on the product scanned.
We note that the scanning time with Polycam was around 4 min, while with Luma it was around 8 min. However, the model generation time depends on the workload of the cloud servers at that time. Bearing in mind that these times can vary, models generated with Polycam took around 3–5 min to generate, while those with Luma AI could take up to 10 min. However, these times were obtained at the time of the study and may change in the near future.
It is important to consider the various formats that these applications offer for generating 3D models. Ordered by the lowest to the highest detailed model given, Luma AI offers low-poly, medium-poly and high-poly, while Polycam provides optimized, medium, full and raw. Regarding Polycam, the last two options are better for VFX effects and professional workflows use cases, whereas the two first are designed for game engines [
37]. Nonetheless, the optimised option is more convenient for fast loading and real-time rendering environments, which is our target by allowing potential customers to interact with the 3D models of products in the VR environment. Although it could seem that, among the options provided by Luma AI, the best for our use case is low-poly, we have explored every format since there is no existing research specifically focused on this technology for our use case.
We selected five objects (see
Figure 3) made with different materials and from different sizes that could be potentially showcased in a VR e-commerce platform. In addition, we considered objects with a size large enough for LiDAR sensors to detect them effectively, since they commonly provide worse results with small objects, such as LEGO blocks [
19]. Therefore, we selected objects sized between 20 and 100 cm in height or width. It was also important to select objects with different colours, as LiDAR sensors seem to return more 3D data points when lighter coloured objects are scanned [
6,
19].
Figure 4 shows the 3D models in .obj extension exported in low-poly format from Luma AI.
The selected objects for this scanning study represent a spectrum of surface types and their interactions with light, which are essential for assessing scanning effectiveness and detail accuracy. We include organic materials and textiles, which predominantly absorb and scatter light, serving as examples of diffuse surfaces. Ceramics are chosen for their semi-specular properties, providing a balanced mix of light absorption and moderate reflection. Metallic objects, known for their highly reflective, specular surfaces, are included to test the scanner’s ability to handle intense reflections and mirroring effects.
3.6. 3D Model Quality Evaluation
Table 3 shows the measurements taken from each 3D model obtained from the scanning process for each of the features described above. On the one hand, it is noteworthy that the polygon count from Luma AI’s high-poly models are between 9 and 10 times higher compared to those of medium-poly models, and approximately 50 times higher than those of low-poly models. This seems to indicate that high-poly models will not be the most feasible to use in VR environments in terms of the impact they will have on VR headset performance. However, this hypothesis will be tested in
Section 4.
As can be seen, the sizes of the Luma AI high-poly files are very large compared to the other types, as are the sizes of the textures, since they have a higher resolution. In the case of textures, the scanning applications do not offer control over the desired resolution when exporting them. In the case of Polycam, a single texture file was generated, 4096 × 4096 in the case of the Android version (photogrammetry), and in the case of the iOS version (LiDAR sensors) the file obtained is 4096 pixels wide and a variable height depending on the scanned object. On the other hand, Luma AI generates several texture files in the three formats shown in a much more flexible way. For example, for the sneakers in high-poly format, 78 files were generated in which the highest resolution was 4096 × 4096, while for the octopus 56 files were generated in which the highest resolution was 2048 × 2048. The same applies for the rest of the data, generating 3–4 files for low-poly models and between 10–30 for medium-poly ones. This may indicate that the algorithms adapt the textures and their resolution more intelligently than in Polycam depending on the scanned object, since the process and the conditions under which the scanning was performed were identical. However, a downside for these models is the need to apply more than one material to the model when rendered by a graphics engine, which may worsen performance.
Regarding the quality metrics NR-3DQA and MM-PCQA, a low result does not necessarily indicate low visual quality. Because the neural networks are regression-based, they are trained with the features extracted from the corresponding models and the MOS obtained from the volunteer study present in the WPC database [
33]. Since not only original models are used, but also distortions based on them, it is also necessary to take into account the results obtained considering the group of models of the same object.
Analysing the results, we observed discrepancies between the metrics, due to the characteristics of the models used in their algorithms. In the case of NR-3DQA, we noted that the models from Luma performing best were the medium-poly ones, except for the sport shoes, for which by a few decimals it was the Polycam model in its Android version. It can also be seen that, apart from the sport shoes, Luma’s models outperform Polycam’s in both versions. When we calculate the percentage difference between the top-performing models and those with inferior results, we find the following. The greatest difference is seen in the trophy, where Polycam’s models are outperformed by 45% compared to Luma’s medium-poly model, followed by the octopus (39% on the iOS version and 20% on the Android version) and the ceramic mushroom (Polycam’s Android model) with a 17% difference. The least difference is found in the case of the burner, with just a 6% gap between the best Luma model and Polycam’s, and it is the object where the quality of all models is most similar.
The situation with the MM-PCQA results is slightly different, as we see that the high-poly models yield the best results, except for the burner, as it is the low-poly model. These outcomes continue to demonstrate good visual quality for both the Luma and the Polycam models, with the difference narrowing compared to the NR-3DQA results. For Polycam’s Android version models, the results (from top to bottom in
Table 3) are 4%, 16%, 1%, 19% and 5% lower compared to the highest scoring model. Once again, the burner is the object whose models have been most similarly generated by both applications. For the iOS version models, we can see that the percentage differences are even greater in some cases, with two models standing out where the visual quality is low: the plush octopus (74%) and the ceramic mushroom (46% less).
Based on these results, we can determine that Luma AI’s models offer the highest visual quality, with percentage differences ranging from 1% (MM-PCQA of the burner) to 74% (MM-PCQA of the octopus teddy). By calculating the mean and median of the percentage differences between Luma AI’s top-scoring model and Polycam’s models, we can more generally visualise Luma AI’s superior visual quality of models. Thus, for both metrics, we obtain the following results: (i) NR-3DQA. Mean: 17.78%, median: 17.72% better results compared to Polycam’s photogrammetric models (Android), mean: 22.73%, median: 15.18% better results compared to the models obtained with Polycam’s LiDAR sensors (iOS); (ii) M-PCQA. Mean: 9.43%, median: 5.2% better results compared to Polycam’s photogrammetric models, mean: 28.19%, median: 9.21% better results compared to the models obtained with Polycam’s LiDAR sensors.
In
Figure 5, we can observe a comparison of models where a colour scale visually indicates the distances between the points of the point clouds of both models. In those parts of the compared model where colours show more red or blue hues, it indicates that the differences in distances are greater, which translates into less precision in the compared model.
Figure 6a,b present a comparison of the meshes from high-poly and low-poly 3D models. It can be noted that the number of vertices and edges in the high-poly model is significantly greater than in the low-poly one, but this implies much more computation time and effort by the game engine that would render such a model. Moreover,
Figure 7 shows a comparison in details with the different technologies used and the real object scanned.
Furthermore, the accuracy of the generated 3D model is presented in the remainder of this subsection. As stated before, given the absence of a ground truth model, high-poly models of each object have been employed as a reference model, as the data obtained confirm their ability to capture the most intricate details. The process implemented in the CloudCompare 2.9.3 tool is comprehensively outlined in its corresponding documentation. In addition, this tool provides a statistical distribution of the mean and standard deviation of the distances calculated during the comparison.
Table 4 shows the RMS given by CloudCompare 2.9.3 from the comparisons performed, as well as the results of using the MEEP2 0.15.1 platform (
https://github.com/MEPP-team/MEPP2 accessed on 7 July 2024) to obtain CMDM metric values for each comparison. In the case of a high difference of polygons, the application’s manual recommends sampling points in the compared model. Therefore, the points were sampled to match the number of vertices in the reference models.
Analysing the RMS results, it is expected that the medium and low-poly models of Luma would be the most precise compared to the reference model, maintaining accuracy in millimetres relative to the reference model. For the rest of the models, it can be observed that, except on one occasion, the results of the models obtained with Polycam using photogrammetry indicate greater precision compared to the models obtained with LiDAR sensors. Regarding the models obtained by photogrammetry, the average of the RMS results indicates a 2.36 cm difference in the measured distances.
By inspecting the CMDM results obtained, we can see that medium-poly models achieved the best results overall, with an average perceived distortion of 13.22% considering all comparisons. Only in the case of the mushroom did another comparison achieve better results, which was the one between Polycam models, the reference model being that from the Android version. However, this was the model that obtained the least realistic results due to the material it is made of. Again, the object that achieved the best results in both applications was the burner. It is interesting to note that the difference in perceived distortion between medium-poly and low-poly results is on average 3.4%, which indicates really good quality for low-poly models as well. This metric also shows the overall low capability of LiDAR sensors to provide good visual quality results when scanning moderate-sized objects.
Having analysed the obtained data, we can conclude that the use of LiDAR sensors with the Polycam application should not be the preferred option for obtaining realistic 3D models that can be showcased in VR shopping environments. The various metrics employed have yielded the worst visual quality level results for these models, both in comparisons using a reference model and those without. Therefore, for obtaining a realistic 3D model, the preferred option would be NeRF, specifically in the form of a mobile application like Luma AI, when available on a device. The data from the metrics used, both with and without a reference model, show that the models from Luma AI have the highest visual quality, except in NR-3DQA in the case of sports shoes. Taking into account high-poly models, values between 5 and 20% higher were obtained compared to the photogrammetry application used. Considering low-poly models, this range is reduced to between 2 and 15%. Given the visual results of the photogrammetry application, and considering that the difference is not very large in the results, this option is also viable for obtaining high visual quality 3D models that can be used in VR.
Discussion
Our evaluation revealed that the difference in visual quality between Luma AI models and those obtained through photogrammetry was generally minor, ranging from 10% to 20%. Specifically, the Root Mean Square (RMS) values for the medium-poly models of Luma AI were as low as 0.00041, while Polycam’s photogrammetry models showed slightly higher RMS values of approximately 0.01269 for the optimised models. However, the difference with models obtained using LiDAR sensors was significantly greater, with some cases showing RMS differences up to 70% higher, such as the RMS value of 0.05976 for low-poly models.
This substantial disparity leads us to consider the models produced by Luma AI as having the highest visual quality among the technologies we tested. As such, we recommend Luma AI for product digitisation when the highest visual fidelity is required. Additionally, Luma AI provides comprehensive support for NeRF technology, which is critical for achieving high-quality 3D models.
It is crucial to remember that we are focusing on technologies that can be easily used by individuals without advanced technical knowledge, such as small business owners, who may not have access to expensive hardware and software resources. Therefore, it is important to find solutions that offer a good balance between quality and accessibility. Polycam stands out in this regard due to its user-friendly interface and extensive usage in the literature. It supports both photogrammetry and LiDAR scanning technologies, which are crucial for our comparative analysis. Polycam’s average NR-3DQA score for optimised models was 0.8111, indicating good quality while maintaining accessibility for non-experts.
Conversely, our results indicate that models obtained using LiDAR sensors are less suitable for 3D object scanning in the context of product digitisation. The lower visual quality and higher error rates observed with LiDAR make it less ideal for creating detailed product models. For example, the CMDM values for LiDAR models reached up to 0.318699, significantly higher than those of photogrammetry models. However, LiDAR can be more appropriate for scanning larger environments, such as entire stores or shopping spaces, where the focus is on capturing the overall layout rather than fine details.
Further limitations of these scanning technologies were detected. Products made with transparent material or specular surfaces require a specific scanning environment where light sources do not produce poor results. Moreover, objects with very small dimensions are quite difficult to scan with the technologies proposed. Instead, it would require a professional camera where the optical zoom allows for quality photos, as well as the use of photogrammetry software to transfer the photos to.
In
Section 4, we will analyze the performance of these models on various current market VR headsets, including the Meta Quest 2, Quest Pro and Quest 3. This analysis aims to determine the practical feasibility of using these models in VR environments and to provide guidelines for their optimal use in different VR shopping scenarios.