Compare Internal Storage Types- Hard Disk vs. SSD

.

.

By Richard Katz

.

In this article, I will talk about two technologies that are commonly used to store information after a computer is powered off. While a computer is running, the data being used is held in volatile memory that is constantly being refreshed. When a computer is shut down or powered off, any information in volatile memory is lost. In order to keep data “permanently”, data needs to be written to some sort on non-volatile device. By the way, permanently does not really mean forever. After I describe the storage types, I will talk about how long data really lasts.

.

Hard Disk Drives

.

A hard disk drive (HDD) is a mechanism that allows data to be remembered after the computer is turned off or restarted. The data is stored magnetically with very small particles on the disk’s surface being magnetized or demagnetized. Magnetized particles are used to represent ones and demagnetized particles represent zeroes. The order of the ones and zeroes can be converted into characters. The magnetized particles remain magnetized after the power is removed. The disk is deemed non-volatile or permanent.

.

Inside a traditional spinning hard drive (HDD), there will be one or more platters that turn on a spindle. Data can be read and written on both sides of the platter. The read-write heads sit very close to the surface of the spinning platter and wait for the desired data location to come around. All the data that can be read or written from a platter surface with the head at a fixed location is collectively referred to as a track. Data is stored in on these tracks in logical blocks called sectors. Some number of sectors fit on a track and there is a small space between adjacent sectors. Tracks are laid out as concentric circles on a surface of the platter. The heads can move in (toward the spindle) and out (toward the edge) so that one head can access all the tracks on a surface. The thing that moves the heads is called the stepping motor. The time it takes for the head to move from track to track is called the seek time. After the head moves to be over a track (or under the track for the bottom surface) it is necessary to wait until the drive rotates to the point where the desired data is adjacent to the head. Sometimes the data happens to be there as soon as the head arrives; sometimes, it takes a full revolution. On average, a drive needs to spin ½ of a revolution to get to the correct spot. On a 7200RPM drive that amounts to 1/3600 of a minute, which is referred to as latency. Obviously, a faster spinning drive means less waiting time.

.

Every time a sector is read, the data is moved somewhere, and something is done with it. Years ago, this transfer was slower and the processors for dealing with the data were slower so the machine would still be doing things with the first sector when the second one got to the head. By the time the process was ready to read the next sector, it would be long gone, and the process would have to wait until the platter spun around to the correct location. Smart people would try to figure out how to arrange the data on the track so that the data you wanted would likely be at the head at the time you wanted it. As an example, I can take a hypothetical drive with 10 blocks or sectors fitting on a track. Perhaps, it is figured out that two more sectors would likely pass the head after the first sector was read, before the process was ready for the second one. In our hypothetical drive, our 10 sectors would have physical addresses on the platter of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. After sector 10, we are back to sector 1. We want to read logical sector 1 followed by 2, etc. Therefore, we would store the data as follows: 1, 8, 5, 2, 9, 6, 3, 10, 7, 4. We would read sector 1, then process it while sectors 8 and 5 passed by and then be ready for sector 2, which would then be correctly positioned. While we were processing sector 2, sectors 9 and 6 would pass leaving us positioned to read sector 3. If our processing took a little longer than expected, we would need to wait for a full revolution. The arrangement I described is called a 1:3 interleave. You would need to add 3 to the location you stored where the first sector in order to get the location to put the second one. We could read 10 sectors with 3 revolutions.

.

A cylinder is defined as the tracks that can be read without the heads moving. Therefore, on a one-platter drive, there are two tracks per cylinder (top surface and bottom surface). If two platters are stacked together, there are four tracks per cylinder. All four heads move as a unit, so when I move the heads on a two-platter drive, there are four tracks that can be read before seeking again. If I take a 500GB single platter drive or a 1000GB two platter drive, there are 250GB per surface. Modern drives and systems can process the data as fast is the drive spins, without needing an interleave. In reality, there are a lot more than 10 sectors per track. If I write on both surfaces of the tracks of a single platter drive, I can read two full tracks of data in two revolutions without moving the heads. As the density increases, the amount of data that can be processed in one revolution goes up and it becomes more likely that the file I want will fit on a single track, or, if failing that, on all the tracks that make up a cylinder. If that happens, there is only one seek per file. If I use a two-platter drive, I can read four tracks of data without doing a seek. Fewer seeks translates to faster operation. More sectors per track translates to more data per revolution. You can see that with denser data, the rotation speed is less important.

.

Sometimes realities complicate a nice, logical picture. When drives are manufactured, the surfaces are tested for defects. Normally, there are a small number of tracks or parts of tracks where there are permanent or intermittent errors. Drives are always manufactured with spare sectors. There is a table stored in a rewritable chip on the drive controller card that gives a map of bad tracks and sectors and the alternate locations to use. This map is built at the factory. When the drive is being used, all the remapping goes on behind the scene and is invisible to the user, but if you are reading a file that spans two tracks on the same cylinder and one of the bad sectors or tracks is involved, there is an unexpected wait for the heads and platters to be positioned. If there are too many bad locations found during manufacturing, the drive will not be used. Defects can be added to the table by tools like CHKDISK, but if there are many defects remapped, it is easy to see why throughput would decline. When a drive is lightly loaded (lots of free space), the system can place files so that they can be read most efficiently. As the drive fills up, it is less likely that files can be placed in a logical manner so they can be read with as few revolutions as possible. Files can become fragmented, where 2 or more parts are placed in non-contiguous locations. Modern operating systems defragment disks in their “spare time”, but they can do a better job if a drive has more free space. I start thinking about replacing a drive whenever it gets more than about 2/3 full.

.

Everything I have said about single vs. dual-platter drives and fragmentation and the like is completely different when you consider the boot-up process. The system is reading thousands of mostly small files that are unlikely to be fragmented and are not likely to be adjacent to whatever you will be reading next. If you are booting from a spinning drive, the fastest-spinning drive will be better than one that spins more slowly. The fastest burst transfer rate will not help you if you need to read a 200-byte .ini file and then wait while the heads move and the drive spins to the correct spot. I prefer to put my boot partition on a relatively small drive with the fastest possible access time, and I keep a large data drive elsewhere. You should note that selecting a boot drive size based solely on the size of the operating system is not practical. No matter how hard you try to avoid it, some applications locate parts of themselves on your boot drive.

.

Solid State Drives

.

A solid-state drive (SSD) contains memory chips that are different from other memory chips in a computer. In general, computers keep information that is currently being used in volatile memory, which means that if the computer is restarted or turned off, the information in the memory is lost.  Memory chips in an SSD are non-volatile so the data remains even after power is removed. I am not going to try to explain how memory chips actually store data. There are various types of NAND Flash memory chips. NAND refers to the type of floating gate transistors (NOT-AND) that store the information. I am not going to try to explain how the information is actually stored in the memory. If you want to do circuit design, any simplistic explanation I can provide will not be a good place to start. If you are not planning to do circuit design, there is no reason to care why data stored in a computer’s main memory turns bits on for ones and turns them off for zeros, while the NAND flash memory chips use an uncharged state (off) to represent ones and a charged state to represent zeros. What is important for the user is that computers use DRAM (dynamic random-access memory) chips for main memory. That memory needs to be electrically refreshed many times per second of the information is lost. NAND flash memory is non-volatile so that no refresh is needed. DRAM is much faster than flash memory, but the flash memory is much faster than any spinning disk since it doesn’t rely on anything mechanical. Beyond that, I think of SSDs as relying on magic. When data is read or written from an SSD, it is made available for use in the computer over some sort of connection. Modern hard drives are connected using a SATA-type interface. SATA (Serial AT Attachment) has been around since about 2003 and replaced PATA (substitute parallel for serial). The “AT” refers to the IBM PC-AT from the mid to late 1980s. The original SATA had a maximum speed of 1.5 Gb/second. (Gb is billions of bits) It was replaced by SATA-II, which doubled the speed, followed by SATA-III, which is twice as fast as SATA-II. Data retrieved comes in bytes, and what is called “bandwidth throughput) is about 600 MB/sec. MB is millions of bytes.

.

.

Conclusions

.

Before I start, I want to state that I will include drive and software brands and models. I am not recommending a particular product, but simply including the information to provide a context.

.

An SSD’s biggest speed advantage is that nothing moves. There is no latency time while the platters spin into position or seek time while the heads move. This advantage is most apparent when one reads a lot of tiny files at one time. For every file, it is necessary to get the heads into position. On larger files, it may be possible to read an entire cylinder’s data after positioning a drive once.  The obvious next question would be “how much data is that?”, but the answer is less obvious. I have a Seagate ST4000DM000 4TB drive in my desktop machine. Seagate uses 1TB platters, which is 500GB per surface.  Using simple arithmetic, it seems obvious that there would be four platters or 8 surfaces per drive. That would imply eight read-write heads or eight tracks per cylinder. Logical sectors are 4k bytes so arithmetic would suggest that there are 125,000,000 sectors per cylinder. So, obviously, the number of sectors per track should be 125,000,000 divided by 8, or 15,625.

.

The physical layout of the drive is entirely different from those numbers. There is a Logical Block Addressing (LBA) translation that occurs between the physical data on the drive and what is read by a program. If you have insomnia, I suggest studying a detailed description of the LBA process at bedtime.  In essence, there are limits in the sizes of the data fields in the ATA specification that restrict the way operating system can send to the drive, in order to specify what to read or write. All modern drives seem to report their size as having 4096 bytes per sector, 63 sectors per track and 255 tracks per cylinder. Arithmetic seems to imply that there are 486,401 cylinders.

.

Actually, the drive stores the data 512 bytes at a time and the reported numbers have nothing to do with reality. The operating system talks to the drive as if it has 63 sectors per track and 255 tracks per cylinder, and the drive keeps the data someplace, but will give it back to you when requested. I accept what is going on by attributing the process to magic. Once magic becomes the explanation, then all things become possible.  All I care about is the fact that when I write something to a drive and then read it back, I get the same data.

.

In real life, I don’t really care about performance testing software results. I care only about my perceived speed. On my desktop machine, I use a Samsung 950 NVMe X4 drive as my boot disk. I copied the boot disk to a Western Digital 7200 RPM spinning disk and tried booting from both images. Booting Windows 10 Professional from the SSD took about 27 seconds. Booting from the spinning drive took 7 minutes and 27 seconds. That is a difference I notice. I tried loading the types of software applications I use, and I noticed no difference between load times from the SSD vs. the spinning drive, but there were differences depending on what else was running in the system. Ass I type this, I have Microsoft Word running. There is a browser window open with 4 tabs, and Microsoft Outlook is running in the background. The Task Manager shows 102 background processes running. I am sure they are all doing something.

.

I tried to do a test that gave times for copying large files from a spinning drive to another spinning drive or an SSD to another SSD or from a spinning drive to an SSD, etc. I could never get consistent results. For normal sized files, the biggest variation seemed to be what else was running in the system. I also tried a Windows iso image, about 7GB, and the SSD to SSD file transfer was certainly faster, but if I ran the test multiple times, there were huge variations. I don’t often copy 7GB files, anyway, and a variation of a few seconds seems irrelevant.

.

For me, reliability is the most important factor in choosing storage devices. I back my data up every night, but losing a day’s work can be a problem. Also, if a single file is corrupted or lost, I may not notice for some time, and then finding a backup copy can be challenging. My current, Samsung boot drive has a published Mean Time Before Failure (MTBF) of 1.5 million hours, or about 172 years. My Seagate ST4000DM000’s MTBF is “only” half that long. An 80GB was a big drive just 10 years ago.   

.

In reality, I have seen spinning drives fail and I have seen SSDs fail. Maybe, my desk is a different environment from wherever the statistics are calculated. In my experience, most spinning drives fail slowly, with a few bad sectors or read errors. I can usually recover or copy almost all the data elsewhere before replacing the drive. In the SSD failures I have seen, the drive is suddenly, totally dead. The data is gone forever.

.

If I need to recover the operating system, I can install a fresh copy. It can be time consuming to reinstall applications, but it can be done.  I keep my operating system on a fast SSD, which makes starting and stopping the operating system much faster.

.

I keep copies of the application installation files along with all my data on a spinning drive. Spinning drives are cheap and fairly reliable. I back everything up every day.

This entry was posted in Computer, Computer technology. Bookmark the permalink.

Please Login to Comment.