Home Here you can view your subscribed threads, work with private messages and edit your profile and preferences Registration is free! Frequently Asked Questions Search

HardwareDude.com Forum Index -> Hardware, Software and Organizations Reviews

INFO ABOUT FILE COMPRESSION


Post new topic   Reply to topic

  Author    Thread
sri
Moderator
Moderator


Joined: 28 Jan 2006
Posts: 379
Location: Hyderabad , India


 Reply with quote  
INFO ABOUT FILE COMPRESSION

FILE COMPRESSION NOTES :
If you download many programs and files off the Internet, you've probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users, because it lets you reduce the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk. Once you download the file, your computer uses a program such as winzip or stuffit to expand the file back to its original size. If everything works correctly, the expanded file is identical to the original file before it was compressed.
At first glance, this seems very mysterious. How can you reduce the number of bits and bytes and then add those exact bits and bytes back later? As it turns out, the basic idea behind the process is fairly straightforward.
Finding Redundancy
Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.
As an example : "Ask not what your country can do for you -- ask what you can do for your country."
Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote. To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.
Lets see how file-compression systems accomplish this.
Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data.
goin back to example :
So, if this is our dictionary >>>
ask what your country can do for you
Our sentence now reads >>>
"1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4"
If you knew the system, you could easily reconstruct the original phrase using only this dictionary and number pattern. This is what the expansion program on your computer does when it expands a downloaded file. You might also have encountered compressed files that open themselves up. To create this sort of file, the programmer includes a simple expansion program with the compressed file. It automatically reconstructs the original file once it's downloaded.
But how much space have we actually saved with this system? "1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4" is certainly shorter than "Ask not what your country can do for you; ask what you can do for your country;" but keep in mind that we need to save the dictionary itself along with the file
In an actual compression scheme, figuring out the various file requirements would be fairly complicated; but for our purposes, let's go back to the idea that every character and every space takes up one unit of memory. We already saw that the full phrase takes up 79 units. Our compressed sentence (including spaces) takes up 37 units, and the dictionary (words and numbers) also takes up 37 units. This gives us a file size of 74, so we haven't reduced the file size by very much
But this is only one sentence! You can imagine that if the compression program worked through the rest of the text .
In our example, we picked out all the repeated words and put those in a dictionary. To us, this is the most obvious way to write a dictionary. But a compression program sees it quite differently: It doesn't have any concept of separate words -- it only looks for patterns. And in order to reduce the file size as much as possible, it carefully selects which patterns to include in the dictionary.
If we approach the phrase from this perspective, we end up with a completely different dictionary.
If the compression program scanned " our example " , , the first redundancy it would come across would be only a couple of letters long. In "ask not what your," there is a repeated pattern of the letter "t" followed by a space -- in "not" and "what." If the compression program wrote this to the dictionary, it could write a "1" every time a "t" were followed by a space. But in this short phrase, this pattern doesn't occur enough to make it a worthwhile entry, so the program would eventually overwrite it.
The next thing the program might notice is "ou," which appears in both "your" and "country." If this were a longer document, writing this pattern to the dictionary could save a lot of space -- "ou" is a fairly common combination in the English language. But as the compression program worked through this sentence, it would quickly discover a better choice for a dictionary entry: Not only is "ou" repeated, but the entire words "your" and "country" are both repeated, and they are actually repeated together, as the phrase "your country." In this case, the program would overwrite the dictionary entry for "ou" with the entry for "your country."
The phrase "can do for" is also repeated, one time followed by "your" and one time followed by "you," giving us a repeated pattern of "can do for you." This lets us write 15 characters (including spaces) with one number value, while "your country" only lets us write 13 characters (with spaces) with one number value, so the program would overwrite the "your country" entry as just "r country," and then write a separate entry for "can do for you." The program proceeds in this way, picking up all repeated bits of information and then calculating which patterns it should write to the dictionary. This ability to rewrite the dictionary is the "adaptive" part of LZ adaptive dictionary-based algorithm
So how good is this system? The file-reduction ratio depends on a number of factors, including file type, file size and compression scheme.
In most languages of the world, certain letters and words often appear together in the same pattern. Because of this high rate of redundancy, text files compress very well. A reduction of 50 percent or more is typical for a good-sized text file. Most programming languages are also very redundant because they use a relatively small collection of commands, which frequently go together in a set pattern. Files that include a lot of unique information, such as graphics or MP3 files , cannot be compressed much with this system because they don't repeat many patterns
This efficiency also depends on the specific algorthm used by the compression program. Some programs are particularly suited to picking up patterns in certain types of files, and so may compress them more succinctly. Others have dictionaries within dictionaries, which might compress efficiently for larger files but not for smaller ones. While all compression programs of this sort work with the same basic idea, there is actually a good deal of variation in the manner of execution. Programmers are always trying to build a better system
The type of compression we've been discussing here is called lossless compression, because it lets you recreate the original file exactly. All lossless compression is based on the idea of breaking a file into a "smaller" form for transmission or storage and then putting it back together on the other end so it can be used again


- SRIKANTH DHANWADA

Post Tue Jan 31, 2006 1:30 am 
 
willowpu
HT Member


Joined: 27 Nov 2007
Posts: 15


 Reply with quote  

Hello, This is such a mind-blowing information on you have provided here. Its too beneficial for me. I hope this will loved by all who land up here. Keep sharing………..thanks!!
_______________
seo companies

Post Wed Apr 20, 2011 4:52 am 
 
  Display posts from previous:      



Post new topic   Reply to topic
Page 1 of 1



Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 



-Your Link Here-
Networking hardware and software solutions | web hosting directory
Powered by phpBB: © 2008 HardwareDude.com |Privacy | Terms of Service