Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Jump to content

File management

From Wikiversity

This lesson teaches the management of digital files and directories.

While there exists no ideal method of file management, this resource documents possibly helpful practices and inspiring ideas.

This lesson was created under the impression that the average computer and mobile phone user still struggles to keep track of files in the long term despite of all the tools at their disposal.

Descriptive file names

[edit | edit source]

Every file manager has a "rename" feature, letting you change the names of files and folders.

Don't just let this essential feature "rust in the junk yard". Utilize it to make files easily searchable. For example, add a short description after a picture or video file name. If you recorded using a smartphone, you can rename the file on the device itself right afterwards. For example,

  • VID_20241213_102241 trampoline jumping.mp4

It is recommended to add the short description after the file name rather than before, to prevent interference with alphanumerical sorting. Even though file managers can still sort items by date and time, files' time stamp attribute might not always be retained throughout transfers, for example when transferring files through MTP, between a smartphone's internal memory and memory card, or uploading files to a cloud storage service. A time stamp at the beginning of the file name allows alphanumerical sorting to act as chronological sorting even if the date and time attribute is lost.

Not every single file needs to be given a name manually, since that would take much effort. If your file manager lacks range selection and bulk renaming, meaning the selection and renaming of many items at once, adding a descriptive file name to the first file of similar files will still facilitate finding those files, since the title of the first file in the group covers the remaining files. For example, all four of these imaginary files depict strawberries, but only the first was manually named:

  • IMG_20230914_090112 strawberries.jpg
  • IMG_20230914_090116.jpg
  • IMG_20230914_090119.jpg
  • IMG_20230914_090124.jpg

Directories

[edit | edit source]

Make use of directories. Categorize files into logically structured directories to facilitate finding them later.

Some inexperienced computer users might have a habit of just throwing files into the root directory of their device such as a flash drive. With an increasing count of accumulating files, they would become increasingly difficult to find, especially multimedia content which can not simply be searched for text strings using tools like `grep` but only through metadata.

Create a mind map by asking yourself where you would look for these files whenever necessary.

Below are two example directory structures for video projects, where the project files of the first project are directly in the project folder, while those of the second are in a subfolder; a matter of preference.

  • Video projects
    • Example project 1
      • Assets
    • Example project 2
      • Project files
      • Assets

Structure files in the order your mind recalls them. For example, the type and source of files is more memorable than the date. For more examples of directory structures, see the /directory structures sub page. Feel free to add examples to it.

For directories where a sufficiently descriptive directory name would be considered too long, consider creating a descriptive text file inside named description.txt, info.txt, or similar. Comments about specific files may be noted in a text file with .meta, .meta.txt, or a similar suffix appended to its name.

If files within the same category have wildly varying sizes, such as command line outputs, you might want to move files beyond a size threshold such as 1 MB or 5 MB into a separate folder whose name can be the same with an added "-large" suffix. This allows minimizing the size of compressed snapshots of the directory.

Avoid putting sub-directories in directories populated with many files, since they can become a nuisance when wanting to view the lastest file whereas the file manager lists sub-directories at the top, or otherwise be difficult to find between the files when needed. Due to the slow loading times of the Media Transfer Protocol (MTP) which is commonly used to connect mobile devices to desktop/laptop computers, and due to the possibility of file managers trying to determine file types by reading internal file information,[1] it could make navigating more difficult. Therefore, it should be considered to create a separate folder that only contains subfolders, such as Downloads_2.

You don't need to spend much effort coming up with good names for files and folders, as you can change them later at any time if you happen to come up with a better idea.

Revision history

[edit | edit source]

When spending much time on a project, write-up, email draft, etc., consider saving a revision history, by selecting "Save as" (also accessible through Ctrl+↑Shift+S keyboard shortcut on some software) and changing the file name through numbering or timestamping, or alternatively creating renamed copies externally using a file manager or command terminal. For possible file naming schemes and variations of numbering and time-stamping, see File naming.

This enables you to revert to an earlier revsion in case of error, prevents total loss in case of failed writes caused by a power outage or software crashing, and later facilitates comprehending the work progression, while only consuming marginial space compared to common data storage, and being efficiently compressible due to redundancy.

A new revision does not necessarily have to be created upon every saving, but whenever the changes since the last revision are major enough at your discretion. Optionally, a short comment summarizing the changes can be added to the file name.

For programming code, suffixes like -stable and -unstable may be added after the time stamp, for example: UserScriptName-revision-20241213102241-stable.js. Separating the number or time stamp allows convenient double-click selection in the file saving dialogue.

Revisions can optionally routinely be moved into a separate subfolder.

Dumping ground

[edit | edit source]

For files and folders you are unsure where to put, consider creating a directory on your device named dump, sandbox, or similar.

You may wish to categorize those into text, compressed archives, drawings, or by whichever task, or dated folder names such as 2024-12-13.

When managing files and directories in a command line terminal, an item can temporarily be given a simple and short name such as 1 to facilitate typing in the commands, that will be changed shortly after. Examples are:

Output from multiple commands into one log file
$ ls [path] -alR >>1 # Write names and attributes of that folder's content into the file "1".
$ find [path] >>1 # Write a list of bare file paths into the file "1".
$ mv -n 1 [desired file name] # Rename "1" to desired file name. "-n" prevents accidental overwriting.

The above command could also be done in a single line with (ls [path] -alR; find [path]) >>[desired file name], but the former might be preferred to save time if the target directory contains a high number of files, as the former command starts immediately after pressing ⏎ Return (also known as "Enter"), or when one wishes to enter commands before thinking of an output file name, while the latter requires typing in the whole command first.

Moving files from multiple sources into a directory
$ mkdir 1 # Create a folder named "1".
$ mv -n *.mp4 *.mkv 1 # Move mp4 and mkv files into it.
$ mv -n 1 Videos-$(date +%Y-%m-%d) # Renaming folder to intended name.

Should it be necessary to print out the absolute (full) file path of a file for, for example, quick copying to the clipboard, use the command readlink -f [target file], and for information about the current mount point, use the findmnt -T . command.

Finished tasks

[edit | edit source]

Files whose task has finished, such as a posted message that was initially drafted offline into a text file, can be moved into a subfolder of the current directory named done, or alternatively in one big shared folder for this purpose.

Storage types

[edit | edit source]

Auxiliary data which is frequently accessed can reside on the operating system drive, on its own partition or a separate one. This especially applies to portable computers and operating systems installed on external media such as a USB flash drive.

Secondary and external storage

[edit | edit source]

On desktop computers, large files can reside on a secondary large hard disk drive or solid state drive, the former of which costs less per space. For laptops, stationary hard drives located at home or portable external hard drives or solid state drives can be used. While working with a laptop as passenger in any moving vehicle such as a bus or tram, solid state memory is preferrable due to sturdiness, as hard disk drives do not like constant physical movement due to mechanical wear. If portable storage necessary, files can be stored on a constantly inserted memory card, which does not compromise ergonomy as they don't physically protrude, or at most a little. In addition, it can be occasionally removed when the data is needed on a different device. SD cards with 1 TB exist since at least 2017, though are expensive.[2]

A home server adds the convenient benefit of access from all devices at home, and even through the internet if set up by the user ("private cloud"), but typically has longer latency times (access delays) than physically attached storage, and lacks mass storage access that may be necessary for some programs to work properly. In such a case, the files would have to be downloaded first and worked with locally, and uploaded after finishing.

Incubate work on flash storage

[edit | edit source]

If, for example, your computer has a setup with solid state drive for operating system and hard disk as expansion storage, a project may be worked on on the flash storage, and changes can be applied to the hard drive at the end of a day.

If the hard drive is set to spin with no or a long timeout such as an hour or more, this may not be as necessary, but for short timeouts, frequent spin-ups would cause mechanical wear and tear in addition to annoying delays.

Using flash memory can be particularly useful on battery-powered laptops for power efficiency, though mechanical hard drives are increasingly being usurped by solid state memory in laptops, mainly due to physical robustness, as hard drives on laptops have long served for cost saving, whereas solid state memory is becoming increasingly affordable since the mid-2010s, though external USB hard drives may be used on the go.

Hard drives' purposes

[edit | edit source]

After purchasing a hard drive, choose whether to dedicate it to either auxiliary storage or archival (cold storage).

Use as auxiliary storage such as for a workstation or server demands the drive be in constant operation, which wears it down over time, making it unsuitable for long-term archival.

Use as archival storage demands only sporadic (rare) operation to add or retrieve data, which induces far less mechanical wear.

Data retrieved from an archive drive should be copied over to auxiliary storage to avoid needless wear on the former whenever that data is recalled again in near future, as can be expected, and in order to further duplicate files that have proven to be useful, as archival is commonly done with an uncertainty of which files will be useful in future. A functional archive drive may later be repurposed for auxiliary storage after moving the archived data over to a new device.

Make a list of data you wish to retrieve the next time you access your archive drive. If you can afford it, mirror the archive to a readily available hard disk or network-attached storage. If your live hard disk or network attached storage runs out of disk space, delete the files that were not accessed for the longest time if they are already backed up somewhere else. To find them, file managers are equipped with the ability to sort files and directories by "last accessed" (or similar) in either direction.

Labels

[edit | edit source]

Consider physically labeling your storage media to facilitate finding data. For example, a date or a short summary of the contents can be written on a label sticker that is applied onto the casing of an external data storage device.

For optical discs, a disc marker (also interchangeably known as "CD marker", "DVD marker", etc.) can be used to write notes directly on the disc. If more space is necessary, label stickers on its containing jewel case or spindle can be used. Many jewel cases include a paper sheet for notes. Do not attach stickers directly onto the disc, as they could disintegrate during high rotation speeds, risking damage to the optical drive's internal components.

It is also recommended to change the device's file system label to a year or a short summary of the contents so it can quickly be seen in the file manager's device list (usually a side bar on the left) when plugged in.

On data storage devices whose life expectancy appears to be nearing its end, as indicated by S.M.A.R.T. data on hard drives and performance loss on solid state memory. an "expired" or "EOL" (end of life) label can be added. Such devices should at most be used for temporary purposes such as testing.

Data storage size

[edit | edit source]

It is recommended to get a larger storage capacity than one intends to use. This leaves some spare room in the case that more data than expected is created, such as vacation video recordings.

On portable devices such as mobile phones and digital cameras, a larger storage capacity lowers the required frequency of file transfers into ones archive.

Another benefit of larger storage media is that more data can be readily accessible at a given time, and a needed file can be found faster since fewer devices have to be searched. However, a central file list and labeling, as described in other sections, can facilitate finding files as well.

If your storage device has, for example, 128GB, it realistically means 100 GB. Not only because of reserved space from file system overhead and possibly operating system files, but because of those 128 GB, the end is less useful, since one might want to put files there that are slightly larger than the remaining space, and one would like to have all those files in one location to avoid confusion and the annoyance of changing media.

In other words, if one would like to put 11 GB where only 10 GB are available, those 10 GB are as little useful as 0 GB.

Partitioning

[edit | edit source]

User data may be stored on the same or a separate partition as the operating system.

The benefit of a separate partition for user data is that possible file system corruption on the operating system partition would not spill over to user data, though modern file systems such as NTFS and ext4 protect themselves from damage by journaling, which allows the file system driver to recover quickly after an unexpected termination of write access caused by an operating system crash or unexpected removal. Infrequently accessed files that may be necessary in near future can be stored there as well, or moved to a secondary drive if their size is significant.

Another benefit of a separate user data partition is the smaller backup size of the operating system partition and facilitated recovery in case of a malfunctioning operating system where other means of repair have failed or would be too difficult. Because operating systems are subject to corruption and can at worst become unbootable, it is good practice to back up their partition regularly into a disk image. A smaller operating system partition can be imaged more quickly and the routine induces less wear on the backup media. Should the operating system malfunction, it can be imaged and then restored more quickly from the functional previous disk image, with less work to merge desirable changes since the last backup.

Packing and archival

[edit | edit source]

Perpetual streams of new files such as web downloads, photos and videos from digital cameras and cell phones, and screen captures can be packed by renaming their parent folder into a uniquely identifying name, such as with date stamp: Camera-2024-12-13, after which they can be moved to an archive drive at the next backup appointment. The folder's name may also contain a location, device type, and/or short description. If packed more than once on the same day, time stamps or part numbers can be added to the names, or the files can be merged into the same directory.

If you no longer intend to change the contents of an archived folder, a file count and byte size can optionally be added to the name to provide a quick overview in file lists and to facilitate verifying whether all data has been transferred without having to navigate back to the source device and wait for loading to view the folder's byte size. An example name is Camera-2024_12_13-12147367878b-2093items. It is recommended to use the exact byte number to prevent confusion between file size units that are powers of two (KiB, MiB, GiB) and powers of ten (KB, MB, GB).

Alternatively to renaming on the source device, the directory with uniquely identifying name can be created on the archive drive first, and files can be moved there out of the source directory.

When to pack files is end users' decision, though it is recommended to do so before exhausting free space on the source device. Renaming is not as necessary if new folders are created automatically, like some digital camera firmwares do per 999 or 1000 pictures. All filled folders can be considered eligible for archival.

Larger storage space provides the benefit of more buffer until the next file transfer becomes necessary to clear space, thus it needs to be done less frequently.

Individual files needed for a specific purpose such as an impending project can be copied or moved into a separate directory.

Write protection

[edit | edit source]

Write protection may be desirable to defeat the fear of accidental modification of data when not desired by the user.

A simple way of achieving write protection in Linux-based operating systems is to mount or re-mount a device or partition as read-only with this command which requires superuser privileges: mount -o remount,ro [device or mountpoint].

If write-protection is not supported by the operating system, an SD card with write protection switch feature can be used. The switch relies on the SD card reader to obey it and deny writing access to the operating system. Some memory card readers, both built-in ones and USB adapters, might not obey the write protection switch.

Another way to achieve write protection is finalized write-once optical media or a read-only optical drive with insufficient laser beam power to write data, as described in § Sensitive environment.

File listing

[edit | edit source]

Searches within file lists inside a text file are significantly faster than searches through a file system.

See this guide on how to create file lists.

File index

[edit | edit source]

As explained in File puzzling § Orphaned directories, some file systems store directories that comprise file paths as "linked lists", meaning distributed over the entire space rather than one index of "nodes" at the beginning, which has both benefits and disadvantages, the latter of which is slower file searching.

A searchable file index stored in a text file named "index" created using the find >index command can facilitate finding files, as it contains a list of paths to all files at one place. The index can be updated by running the command again to overwrite the existing one. If the working directory is not the root of the file system, it should be changed to it or the paths need to be specified.

The "index" file can be searched with ease using grep -i "searched file name" index, which is typically much faster than directly searching the file system. -i may be left out for case-sensitive search. These commands have additional options, but these are outside the scope of this section.

Time stamp preservation

[edit | edit source]

Some methods of file transfer, such as copying within/onto mobile phone storage, the cp command without activated -p ("preserve") option, and a directory on Unix/Linux not owned by the current user, might discard date and time stamp file attribute(s), resetting it to the current time.

To preserve last-modified time stamps over FTP, downloading is preferred, as uploading while preserving it requires both client and server to support the MDTM (Modify Fact: Modification Time) command, which it is not widely.

High numbers of small files

[edit | edit source]

With an increasing number of files, file searches slow down. High numbers of small files also restrict portability, as they demand more file operations for file transfers, slowing the process down.[3] Additionally, higher cluster sizes in combination waste more space to cluster overhead (unused reserved space).

If you happen to have a high number of currently unneeded small files, such as tens of thousands, consider packing them into one big archive file for improved portability.

Compression may be considered where efficient, such as in human-readable text files and code, and/or where more necessary, such as online file sharing. Compression ratios of 100 may be achievable by strong compression algorithms on text documents and code. However, it should be taken into consideration that damage magnifies enormously over compressed archives, as demonstrated in Backup § Compressed archives. Therefore, it is recommended to store compressed archives on at least two devices.

Text inside compressed archives can be searched through directly without extracting using tools such as zgrep, xzgrep, bzgrep, and for 7-Zip, 7z e -so -bd [path] |grep [query].

If frequent modification of small files is necessary, an alternative to packed archive files (such as .tar, .zip) are file system containers (virtual disks) with a small cluster size. File system containers are manually generated disk image files that can be mounted like usual drives.

Avoid exhausting the operating system's partition

[edit | edit source]

Exhausted space storage should be avoided, especially on an operating system partition, as it could lead to bogus behaviour by software not designed to handle such condition, or other unwanted behaviour. For example, a failed write while saving could blank the target file, causing the loss of work and reset of configuration. A web browser might automatically delete early browsing history entries to make space for new. Even seemingly basic features such as command line parameter completion could malfunction.[4]

On an operating system partition, keeping a safety margin of free storage such as 5% at any time is recommended, and at least 1% on secondary expansion storage. On archival media, a controlled exhaustion of space is less critical, though the readability of the final written files should be verified.

Should you still find yourself with 100% exhausted space, first seek few megabytes of files to move out, perhaps temporarily, to improve system stability. This allows you to take time to calmly search for more and/or larger files to move out.

File system repairs

[edit | edit source]

Tools such as CHKDSK on Microsoft Windows and fsck on Linux promise repairing damaged file systems. Logical file system errors may be caused by unexpected power outages or unpluggings.

Be careful with file system repairs. It is recommended to back up any device (either to a full-disk image or by copying all files) prior to running one, in order to be able to revert with ease in case of unwanted collateral damage.

Detected file system errors may be caused by incompatibile file names across operating systems. For example, Linux allows characters in file and folder names in NTFS that Windows considers invalid, such as a colon (:) and a pipe character (|), as well as case-insensitive file names. Upon detection of invalid characters, the CHKDSK tool moves and renames such items, which leads to the loss of file names and paths. In particular, ChkDsk moves files and folders with invalid names into a directory located at the file system's root named found.000, and renames them to generic names like file00000000.chk and dir_00000000.chk, where the number is hexadecimal and incremented.

Disk usage analysis

[edit | edit source]
Disk usage analyzers facilitate finding directories with the largest content size. Some illustrate the directory structure graphically.

Disk usage analyzers calculate the size of directories on any selected path, allowing the user to easily discover directories which occupy the most space. Large folders not currently needed can be moved over to an archive drive, which clears the most space on the source device.

Popular tools for desktop operating systems include Baobab for Linux (pre-installed on some popular distributions) and Xinorbis for Windows, both with sophisticated graphical user interfaces. Linux is also equipped with the command-line tool du, which allows outputting results directly into a text file.

For mobile (Android OS), ES File Explorer is equipped with such functionality, though that application has been subject to controversy and has developed into adware.

Deduplication

[edit | edit source]

Several tools to automatically deduplicate files exist: rdfind, fdupes, jdupes, rmlint, dupeGuru, FSlint.

Storing one duplicate of files may be desirable in certain situations, such as compressed archives intended for long-term preservation, where even the slightest damage can render any data after the point of damage unreadable. For accessing the same file from different locations, hard links or symbolic links can be used.

You may want to deduplicate files across two different storage devices without accidentally deleting any files that do not exist on the storage device you want to keep the files on. A manual deduplication may be necessary if you wish to free up space from a device after copying files to a different device or after creating an archive file from those files without accidentally deleting any files that were not copied or archived from the source device, or after a file copying operation was interrupted and you need to clean up without deleting any files that were not copied.

The quickest way to accomplish this would be to compare the total count and size of those files in both the source and the target device by selecting the files and opening the "Properties" window. If the byte counts match, the source folder can be safely deleted. However, if the selection contains subdirectories too, differences in file systems mean that the total size might not match due to differences in how directories are stored. Some file managers also don't show the exact byte count.

If you want to deduplicate files between two different computers or between a computer and a smartphone, an additional problem is that file managers count files and folders differently. Some file managers only show the number of "items", meaning both files and folders, where as others may only show the number of "files" without counting the folders. If possible, use the same file manager on both devices, and one that shows the exact byte count.

A more sophisticated way to manually deduplicate files is creating a temporary script that contains a list of files to be deleted. In Linux, this can be accomplished first changing the working directory to the target folder (where the files are supposed to stay) using cd and then creating a temporary script by running:

# list files to be deduplicated
find -type f |sed -r "s/(.*)/rm -v '\1'/g" >>temporary.sh
# list empty directories to be removed
find -type d |sed -r "s/(.*)/rmdir -v '\1'/g" >>temporary.sh
# script deletes itself after work done
echo "rm temporary.sh" >>temporary.sh

No files have been deleted up to this point. Only the disposable script to do the work has been created. This script contains a list of files to be deleted from the source device, but the list is generated from the target device to avoid containing the name of any files that have not been copied. It is recommended to use a temporary script rather than throwing around commands involving an |xargs pipe, since running |xargs commands in combination with deletion is dangerous since you don't see a list of files before starting the deduplication process.

The -v flags are optional and serve to later show the names of the files deleted in the deduplication process in the terminal.

Now move the temporary.sh file to the original directory on the source device from which you want to delete the duplicate files. Then change your terminal's working directory to that directory using cd [path]. Before running the script, open it in a text editor and glance over it to verify that it doesn't contain any files you don't intend to delete. Then run it using sh temporary.sh.

Do not run the script before changing your working directory to the same directory the script resides in, because the script contains relative paths rather than full (absolute) paths, so the path of your terminal's working directory will be presumed to be infront of the relative paths in the script.

Temporary folder

[edit | edit source]

Programs may use temporary folders such as /tmp/ and ~/.cache/ on Linux and %temp% on Windows to store data such as preview thumbnails and data from the web to reduce loading times.

Should the way your operating system or file manager handles its temporary or trash folder not suit your needs (e.g. if retention span is unchangeable or limited), you may wish to manually operate such in your user home folder (e.g. ~/tmp/, ~/trash/). Some software may allow changing the path for the temporary folder. Furthermore, such folders can be used for frequent short-term backups of projects, which can be deleted when free space becomes necessary.

In comparison to a traditional "Trash" folder implemented by the operating system or file manager, files can be opened directly as usual, whereas a file manager may disallow directly opening files sitting in a "Trash" folder and show file properties instead, as Windows Explorer does, whereas, for example, the Nemo file manager for Linux allows directly opening files located in the trash directory.

Trash bin (recycle bin)

[edit | edit source]

When deleting files, you might want to consider using the trash bin (or "recycle bin" and other names) feature, which stores files in a temporary location until they are automatically deleted. This allows reversing unwanted deletions, for example if you selected a file or folder you didn't mean to delete. This lifts the emotional burden and carefulness during deletion, since you know you can undo the deletion for the time being. Without it, deletions are "walks on eggshells", metaphorically speaking.

A recycle bin also makes file moving (with copying and deletion from source) safer, because during the deletion step, you have to be less careful to select the exact same files and folders that you moved. If your trash bin does automatic deletion after a certain time, you don't have to bother with cleaning it up later anymore, while you can still bring back the file if necessary in the near future. The files still take up space on your device while residing in the trash bin, but it doesn't matter if you have enough space free.

Mobile

[edit | edit source]

Memory cards

[edit | edit source]
Smartphone with inserted memory card (located below camera lens)

Some smartphones and tablet computers allow the expansion of storage capacity using memory cards, typically MicroSD, which significantly facilitates file management and is user-friendly.

Memory cards can be re-used immediately between devices without need for file transferring, and data stored on the memory card is not at the risk of mobile devices' technical defect, as it can be ejected, after which data can be retrieved externally. Mass storage access from an external computer also may allow recovering some files imminently after a deletion accident caused by bogus software[5] and/or human error.

For huge file transfers, ejecting the memory card and directly transferring to the PC through mass storage may save time compared to MTP (media transfer protocol) through the phone or tablet, as the latter does not handle high counts of files within a directory well. Additionally, memory cards can immediately be reused in a different device without lengthy file transfers. USB-OTG (On The Go) may be used as well, connected through an adapter directly to the mobile device, though it might not preserve a date and time attribute. Tablet computers with desktop operating system are widely equipped with at least one default-sized USB-A port.

Additionally, using a memory card takes stress off the device's non-replaceable internal memory, preserving its limited rewrite cycles, which is especially beneficial for repeated heavy tasks such as high-resolution filming and mobile FTP server hosting.

Between computer and mobile

[edit | edit source]

Media Transfer Protocol

[edit | edit source]

File management on mobile phones and tablet computers with mobile operating systems is more restricted than on desktop/laptop computers and mass storage devices such as USB sticks and memory cards, as the media transfer protocol (MTP), which is used to access files on a mobile device from a computer, lists files slowly, which is problematic for loading directories with high counts of files.

As such, it is recommended to manage such directories on the device itself. If transfer between a desktop/laptop computer is desired, handle those files in a little to unpopulated directory.

MTP file listing can be sped up by not loading preview thumbnails. Depending on file manager used, this may be done by deactivating preview thumbnails in settings or choosing the "detail" view mode, where files appear in a list instead of a grid which foregoes the loading of preview thumbnails. However, files can be dropped into a directory without opening the target directory, by pasting them through the right-click context menu.

A benefit of MTP is it not being prone to file system corruption as a result of unexpected removal, meaning without being "safely unmounted" through the client operating system (i.e. desktop / laptop computer), as it operates through an abstraction layer and the file system is controlled by a driver on a battery-powered host device (i.e. smartphone / tablet).

Only a selection of files, no directories, should be moved away from the device, because users have reported files on MTP not being listed properly.[6][7][8] If a directory is moved away from the device, the computer might delete it from the mobile device without all content having been transferred away. Instead, it should be copied, and the byte size be compared on both the computer and the smartphone itself, where a match indicates a successful transfer, meaning the directory can now be deleted from the mobile device. The only exception where moving folders out is safe is when the number of files within is overseeable, i.e. less than ten, where all files are clearly listed in the computer's file manager.

Windows Explorer additionally displays files while listing is in progress, which can be of use when moving files out, since the loading of the file list can be interrupted to allow moving out the displayed files chunk-wise, reducing the number of remaining ones each time.

Transferring files onto the device through MTP may dismiss their date and time attribute.

If a file has newly been created on the smartphone while it was connected to the computer through MTP, the computer's file manager could potentially misreport the file size as too small due to having loaded the directory listing in a moment where the file was not complete. Moving the file away from the phone could cause it to be truncated (incomplete) on the target path while being deleted from the source, since the file manager might wrongfully assume that the file has been fully transferred while it hasn't.

File Transfer Protocol

[edit | edit source]

An alternative to MTP is FTP (file transfer protocol) through ethernet.

On the desktop computer, a dedicated and sophisticated FTP client such as FileZilla (open-source) may be used to handle high numbers of files, though FTP is widely supported by file managers and web browsers.

FileZilla does not support moving files out of an FTP server, meaning downloading and deleting automatically, whereas moving within a server is supported through the standard rename command. If the intention is moving files out of an FTP server, the highlighted selection of files on the server needs to be deleted after the transfer after verifying that all files have been transferred successfully, meaning no new entries in the "Failed transfers" list. To get a peace of mind that the selection was transferred successfully, try downloading it again while skipping existing local files. If no new files are downloaded, this ensures all files have already been transferred. This might apply to other software as well.

FTP server applications for mobile devices may handle file listing differently. Some do not report the year of the file, only day and month, causing the FTP client to insert the current year for files except those last modified at a later time of the year than currently, for which the previous year is inserted instead. Another distinction between FTP server apps is whether they list file and directory names starting with a dot, which is considered hidden in the Unixverse (i.e. on Unix and Linux-based operating systems, which includes Android OS, the most popular mobile operating system).

FTP server applications typically allow the user to select a specific directory to share, rather than the entire storage. This feature has been proposed for MTP, but never implemented there so far.

Two open-source FTP server apps for Android OS are the integrated FTP server of "Amaze File Manager", and the more sophisticated "primitive FTPd", only the latter of which reports files and folders with names starting with a dot.

Alternatively, files may be uploaded vice versa from the mobile device to an ethernet FTP site served by a home computer, though as of 2021, no mobile file manager's FTP client supports preserving files' date and time stamps upon uploading.

Handling invalid file names

[edit | edit source]

Since the most popular mobile operating system, Android OS, is Linux-based, it supports characters in file names that are unsupported by the most popular desktop operating system, Windows, and by some file systems. These characters include a colon (:), a back slash (\), a vertical pipe (|), a question mark (?), and an asterisk (*). Additionally, file names are case-sensitive, meaning files named "file" and "File" and "FILE" can co-exist within a directory on Linux and Android OS, but not on Windows.

While some mobile phone apps disallow the creation of such files and automatically replace characters Windows considers invalid with a substitute character such as an underscore (_), other apps might have created files with names containing aforementioned characters.

When copying or moving such files from a mobile device through MTP (Media Transfer Protocol) using Windows, all characters before and including the last invalid character are discarded from the file name on the target location to make the file openable. Therefore directories on a mobile device which contain such files will need to be moved out in two passes to retain the file names. First, an archive file such as a ZIP file should be created on the mobile device which can then be moved out before the files themselves can be moved out.

The archive file can optionally be created after isolating files with invalid characters in a separate directory to consume less space, though that can be a difficult task on a mobile device due to limited file management software and users' infamiliarity with the Linux terminal which can be accessed on Android OS through third-party applications such as Termux or Jack Palevich's "Android Terminal" app.

Another option would be to only create the archive file and not transfer the bare files to the archival media. However, this requires files to be extracted before being opened, which adds a delay for larger files and does not allow for preview thumbails without extracting the entire archive file.

On-device management

[edit | edit source]

Additionally, file access on the most popular mobile operating system has been restricted significantly over time, and to varying degrees per storage type (internal, memory card, and USB-OTG).

Such restrictions affect third-party applications installed by the user, including file managers. Pre-installed file managers are usually unaffected, though these tend to be functionally restricted, such as lacking range selection, where only two entries need to be tapped for all inbetween to be marked.

Options to deactivate these restrictions at user discretion were not officially provided, leaving so-called rooting as the only possibility of bypassing them. This is a process in which the user unlocks administrative access over the operating system.

The operating system vendor claims aforementioned file access restrictions to serve user security, though them being a cloud storage vendor as well suggests a commercial interest that conflicts with end users' desire of freedom, and simultaneously may encourage users to unlock root access, which is against vendors' recommendation, and where inexperienced tampering can lead to malfunction.

Other ideas

[edit | edit source]

Archival queue

[edit | edit source]

New files from portable devices which are currently unneeded can be moved into a buffer directory of files ready for archival, which means moving them to a large and stationary hard drive at the next connection to the computer.

External flash storage such as USB sticks and solid state drives can also be used to store data for, for example, the duration of a trip or vacation, where they can be moved to an archive hard drive when arriving at home.

Temporary redundant retention after archival

[edit | edit source]

Files that have already been moved to a larger stationary archive drive may be redundantly kept on the smaller portable data storage such as a mobile phone or USB stick, but in a directory in which any file is eligible for deletion, such as at space exhaustion.

This would serve as a short-term backup, which could be retrieved from in case anything goes wrong with the archive drive prior to it getting backed up itself.

This increases file fragmentation on the portable device, though that does not noticably affect performance on flash storage.

Partition for small files

[edit | edit source]

If your computer setup has no secondary drive and/or partition, you may create a small partition (e.g. 4 GB) with a low cluster size for more efficient storage of small files.

Additionally, if space storage happens to be exhausted on the main partition, with software arbitrarily attempting to write to it, files can still be added on the secondary partition without interference.

Sensitive environment

[edit | edit source]

Inside sensitive environment, data may be exchanged through rewritable optical media such as DVD±RW and BD-RE, as these use external storage controllers, making the media itself unable to contain malicious hardware such as so-called rubber duckies used to simulate keystrokes from a USB keyboard.

Additionally, finalized write-once media and/or read-only (ROM) optical drives can ensure write protection where necessary, for example in a malware-infested environment.

Copies of description files

[edit | edit source]

Description files such as text files describing the contents of a folder may be stored both inside a folder and a copy in a central location with other description files for easier discovery. The name of that file should contain the name of the folder it is describing.

It is recommended to store such a description file inside the folder it is describing rather than along with it in the parent directory, to make it easier to find if the items in the parent directory are not sorted alphabetically, since a file managers might separate files and folders regardless of which sorting method (alphabetical, by size, by name, by last modified) is chosen, which would make the description file not appear next to the folder it is describing.

Observations and tips

[edit | edit source]

Spare directories

[edit | edit source]

When space on a device or partition is exhausted, no new directories that could be helpful for organizing files can be created, such as for moving files from a highly populated directory (i.e. with many files, such as a download folder) on a mobile phone in order to skip having to open the populated folder directly through MTP (Media Transfer Protocol), which notoriously handles long file listings poorly.

Prepare for such a situation by creating a reserve of spare empty directories inside one dedicated directory. The spare directories can be moved out of the reserve, and be renamed as necessary, even without space left, which allows organizing files on the go to be able to move them elsewhere immediately when arriving home.

File move behaviour

[edit | edit source]

When the aim is to bring files to an archive, moving files rather than copying and deleting afterwards has the convenience benefit of acting like a check list of files, instead of creating duplicates that would later have to be sorted out without accidentally deleting non-copied files, as well as imminently clearing (freeing) space on the source device.

Moving instead of copying and then deleting files also defeats the psychological barrier that may come from the deletion step, as it feels like a destructive action even though it leads to the same result as moving between storage devices. Another barrier is the uncertainty of having inadvertently selected any file not copied.

When moving files, some file managers may delete files individually after transfer, while others only delete selected files from source only after the last file has finished transfering.

Windows Explorer uses the former method for mass storage devices, but the latter with Media Transfer Protocol. The Linux file manager Nemo always uses the former method.

With the latter method, any interruption would cancel the file transfer without having freed up any space on the source device.

Escape auto-closure

[edit | edit source]

Some file managers such as Windows Explorer close themselves when detecting the removal (unmounting and/or physical unplugging) of a storage device, while others jump to the starting directory. Some file managers might do nothing.

The first case may be perceived as annoying, as it forces users to re-open the file manager and navigate all the way back to the previous directory.[9]

This can be prevented by opening a different device in the file manager before unplugging. After plugging the device back in, the previously opened directory can be navigated back to immediately through the navigation history, using the on-screen button or Alt+ on the keyboard.

Using the command line

[edit | edit source]

Command-line file operations can be logged for later reference by using the "verbose" switch --verbose or shorthand -v on the cp and mv commands in Linux and redirecting the output into a text file by appending >>path/to/textfile.txt. On Windows, file operations are outputted by default, meaning no "verbose" switch is necessary. On Linux, using | tee -a path/to/textfile.txt allows visibly printing out command line output in real-time (as usual) while logging into a text file simultaneously.

Autocompletion using the ↹Tab key is widely supported on both Microsoft Windows and Linux/Unix-based operating systems and facilitates navigation by selecting file and folder names.[10]

Shortcuts

[edit | edit source]

Don't hesitate to use bookmarks to frequently accessed directories in your file manager and file picker dialogue (also known as "Open" and "Save as"). If they become unnecessary, they can be removed at any time.

For command-line use, familiarize yourself with environment variables, as they allow quicker navigation throughout your directories. They include ~ on Linux and %user profile (user home directory, equivalent to C:\Users\Username\) and %temp% (C:\Users\Username\AppData\Local\temp\).

Environment variables are also typically recognized by file pickers.

Additional shortcuts for command-line use can be specified as variables in the script that runs when starting the terminal, which in Linux is typically located at ~/.bashrc. For example, $dl can be set to refer to the download folder by appending dl=~/Downloads to that file.

Spaces in file names

[edit | edit source]

Keep in mind that space characters in file and path names can become a nuisance when entering paths, creating variables, selecting text, and navigating and auto-completing in a command prompt/terminal using the ↹Tab key.

Cleanup

[edit | edit source]

"Cleanup" may refer to moving files scattered around the storage and desktop in folders to improve overview, or deleting files like duplicates to clear free space. In the former case, refer to /Directory structures and consider moving files you are unsure where to move into a dumping ground. In the latter case, as with lost backups, first weigh effort against price, meaning here first consider whether the time and effort spent searching files to clean up really outweighs the storage space price.

It is not worth trying to search and eliminate duplicate files for the sake of it if no significant space is cleared. The time and effort necessary to do it might outweigh any benefit from the saved space. However, duplicates may be deleted for organizational purposes, such as duplicate music tracks in the same folder.

Note that cleaning does not necessarily speed up the computer noticeably, except if the partition was nearly full, which should be avoided anyway, as described in § Avoid exhausting the operating system's partition. Rather, closing tasks that stress the CPU and take much RAM is effective in speeding up the system. Clearing caches is mostly unnecessary and even disadvantageous, as they provide faster recall of information.

For files vulnerable to damage like PNG and TIFF images, and compressed archives, duplicates are beneficial for files not backed up yet.

Experiences by participants

[edit | edit source]

This section goes deeper into the idea about that "the average computer and mobile phone user still struggles to keep track of files in the long term despite of all the tools at their disposal". Here participants or visitors to this learning resource can add why they need a resource such as this to learn from and/or interact with it.

Moved to Talk:File management.

References

[edit | edit source]
  1. nemo significantly slower opening folder content #1907 - GitHub (Nemo file manager reads the header of each file with unknown extension
  2. SanDisk outs the 'world's first' 1TB SD card (2016-09-20)
  3. Related: Answer on Quora to "Why is copying 1,000 1MB files so much slower than copying 1 1GB file, given that the same amount of data is being copied?" by Franklin Veaux, on January 14th, 2020
  4. "Tab completion errors: bash: cannot create temp file for here-document: No space left on device". Unix & Linux Stack Exchange. 2016-04-18. Retrieved 2022-02-10.
  5. Users report Android devices' entire internal user storage being deleted instantly, caused by poor software design – "I just deleted a random folder in my internal storage and it wiped my internal storage. What the heck just happened?" – Reddit.com/r/Android (2013-01-11)
  6. Files show up on Nexus 5 but not in Windows 7 – Android Stack Exchange – February 11th, 2016
  7. Not all files are visible over MTP – Android Stack Exchange – May 29th, 2013 (209,502 views as of September 21st, 2022)
  8. Nexus 4 not showing files via MTP – StackOverflow – December 6th, 2012 (67,504 views as of September 21st, 2022)
  9. How to stop Windows Explorer closing for removable Disks! – SevenForums
  10. Use tab to autocomplete commands in the command line (ComputerHope, 2020-12-31)

See also

[edit | edit source]
[edit | edit source]