當我們將NGS 資料(reads data)利用各種方法去和參考序列(reference sequence)做比對之後,我們該如何表達比對之後的結果呢?這個問題的答案就是我們耳熟能詳的SAM檔案。SAM的縮寫是Sequence Alignment/Map,它是來自於Heng Li 等人在2009發表在Bioinformatics的文章。藉由標準的SAM檔案格式,我們得以描述每一次比對之後的結果。SAM是一個純文字檔案,可以用任何的文字編輯器開啟,其格式具有以下的優點:
- Is flexible enough to store all the alignment information generated by various alignment programs;
- Is simple enough to be easily generated by alignment programs or converted from existing alignment formats;
- Is compact in file size;
- Allows most of operations on the alignment to work on a stream without loading the whole alignment into memory;
- Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
- http://samtools.sourceforge.net/