Length of Human DNA in bytes


Recommended Posts

It's a very valid question and I have a very good reason for asking it. I will post more about my reasoning in another tread later on.

I had a look on Yahoo Answers and I read this: http://answers.yahoo.com/question/index?qi...25135324AApHMlU

I wonder if the Human Genome Project have the full file for us to download? If they RAR it up, the file should compress nicely for about 50MB.

Why would you need 12 bytes per base pair? There can be only four types of pairing (AT, TA, GC, CG). So, one pair can be encoded in two bits. [say, 00=AT, 01=TA, 10=GC, 11=CG]. Since a male human has 3080 Million base pairs in his DNA, it translates to 6160000000 bits or a little more than 734 MB. And for the human female with 3022 base pairs in her DNA, its 6044000000 bits or a little more than 720 MB.

That's of course assuming writing it in binary. Writing it in ASCII will be 8 times larger, and using Unicode will be two or four times larger than the ASCII text file. I don't know how well it will get compressed.

Edit: Wikipedia gets a big larger file size.

Edit 2: While I was calculating, virtorio beat me with the WP link.

Also, does that sequence include the RNA sequence too?

Central dogma of genetics: DNA -> RNA -> Protein. The short story is that RNA is transcribed from DNA (the genome).

So what are you asking for exactly? The entire genome as in all DNA contained in a human cell's chromosomes? Total coding regions? Total protein-encoding genes? Total DNA and products such as mRNA, siRNA, miRNA, tRNA, rRNA?

Say, 00=AT, 01=TA, 10=GC, 11=CG

If you're willing to simplify and assume Watson-Crick base-pairing, then you may as well take only the total genome length (as opposed to total genome length x 2 to account for dsDNA), since A "always" pairs with T, and C "always" base pairs with G. If you want to get more complicated, you will have modified bases, ranging from methylation, acetylation, and a few other base analogues, as well as non-W/C pairing.

Also note the presence of mobile DNA elements such as (retro)transposons. They can move or replicate and insert themselves back into the genome. You also get all sorts of other crazy events like site-specific recombination that can result in duplication or excision of a stretch of DNA from a replicating chromosome, so really, the take-home message is that it is a freaking miracle that anything manages to live.

the take-home message is that it is a freaking miracle that anything manages to live.

lol, this^

I think there are quite large issues with the HGP (or anyone else) publishing somebody's genome, I doubt they'd allow any Joe Bloggs to download it.

edit: "All genome sequence generated by the Human Genome Project has been deposited into GenBank, a public database freely accessible by anyone with a connection to the Internet."

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • PotPlayer 260622 by Razvan Serea PotPlayer is an extremely light-weight multimedia player for Windows. It feels like the KMPlayer, but is in active development. Supports almost every available video formats out there. PotPlayer contains internal codecs and there is no need to install codecs manually. Other key features include WebCam/Analog/Digital TV devices support, gapless video playback, DXVA, live broadcasting. Distinctive features of the player is a high quality playback, support for all modern video and audio formats and a built DXVA video codecs. A wide range of subtitles are supported and you are also able to capture audio, video, and screenshots. A comprehensive video and audio player, that also supports TV channels, subtitles and skins. Its been described on the Internet as The KMPlayer redux, and it pretty much is. Daum PotPlayer 260622 (1.7.22963) changelog: Removed Kakao TV Added pause function when navigating via the navigation bar Significantly improved internal stability Fixed an issue where colors appeared strange during RGB24 processing Improved playback for some HTTP streams Improved sync processing for the built-in audio renderer Fixed an issue where certain MP4 files behaved abnormally during playback Download: Daum PotPlayer (64-bit) | 54.7 MB (Freeware) Download: Daum PotPlayer (32-bit) | 61.1 MB View: Daum PotPlayer Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Tixati 3.44 is out.
    • Speccy 1.34.084 by Razvan Serea Speccy will give you detailed statistics on every piece of hardware in your computer. Including CPU, Motherboard, RAM, Graphics Cards, Hard Disks, Optical Drives, Audio support. Additionally Speccy adds the temperatures of your different components, so you can easily see if there's a problem! Processor brand and model Hard drive size and speed Amount of memory (RAM) Graphics card Operating system At first glance, Speccy may seem like an application for system administrators and power users. It certainly is, but Speccy can also help normal users, in everyday computing life. If you need to add more memory to your system, for example, you can check how many memory slots your computer has and what memory's already installed. Then you can go out and buy the right type of memory to add on or replace what you've already got. Download: Speccy 1.34.084 | 20.5 MB (Freeware) View: Speccy Website | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • ImgDrive 2.2.7 by Razvan Serea ImgDrive is a CD/DVD/BD emulator - a tool that allows you to mount optical disc images by simply clicking on them in Windows Explorer. If you have downloaded an ISO image and want to use it without burning it to a blank disc, ImgDrive is the easiest way to do it. ImgDrive features: One-click mounting of iso, cue, nrg, mds/mdf, ccd, isz images Runs on 32-bit and 64-bit Windows versions Mount ape, flac, m4a, wav, wavpack, tta file as AUDIO CD (16-bit/44.1kHz) Mount a folder as DVD/BD Mount images in command line Does not require rebooting after installation Support up to 7 virtual drives at the same time Support multi session disc image (ccd/mds/nrg) A special portable version is available Translated to more than 10 languages Support File Type: .ccd - CloneCD image files .cue - Cue sheets files of ape/flac/m4a/tta/wav/wv/bin .iso - Standard ISO image files .isz - Compressed ISO image files .nrg - Nero image files .mds - Media descriptor image files ImgDrive 2.2.7 changelog: Added command line parameter to set number of drives Added AACS-Auth support for HD DVD Bumped kernel driver version to 2.2.7 Download: ImgDrive 2.2.7 | 692 KB (Freeware, paid upgrade available) Download: ImgDrive Portable 535 KB View: ImgDrive Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • AnyDesk 9.7.7 by Razvan Serea AnyDesk is a fast remote desktop system and enables users to access their data, images, videos and applications from anywhere and at any time, and also to share it with others. AnyDesk is the first remote desktop software that doesn't require you to think about what you can do. CAD, video editing or simply working comfortably with an office suite for hours are just a few examples. AnyDesk is designed for modern multi-core CPUs. Most of AnyDesk's image processing is done con­currently. This way, AnyDesk can utilize up to 90% of modern CPUs. AnyDesk works across multiple platforms and operating systems: Windows, Linux, Free BSD, Mac OS, iOS and Android. Just 7 megabytes - downloaded in a glimpse, sent via email, or fired up from your USB drive, AnyDesk will turn any desktop into your desktop in se­conds. No administrative privileges or installation needed. AnyDesk 9.7.7 fixes: Fixed an issue that prevented users from creating meetings without an active license Download: AnyDesk 9.7.7 | 8.0 MB (Free for private use, paid upgrade available) Links: AnyDesk Home Page | Other platforms | Release History | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Dedicated
      tuben earned a badge
      Dedicated
    • Week One Done
      mnsgroup earned a badge
      Week One Done
    • Conversation Starter
      sumytbe earned a badge
      Conversation Starter
    • One Year In
      B4dM1k3 earned a badge
      One Year In
    • One Year In
      DarkWun earned a badge
      One Year In
  • Popular Contributors

    1. 1
      +primortal
      524
    2. 2
      +Edouard
      199
    3. 3
      PsYcHoKiLLa
      94
    4. 4
      Michael Scrip
      82
    5. 5
      Steven P.
      67
  • Tell a friend

    Love Neowin? Tell a friend!