ComputersProgramming

UTF-8 - cim encoding

Unicode txhawb txhua uas twb muaj lawm cim poob lawm. Qhov zoo tshaj plaws daim ntawv ntawm maub Unicode cim txheej yog UTF-8 encoding. Nws txhawb compatibility nrog ASCII, ua hauj distortion ntawm cov ntaub ntawv, cov ua hauj lwm zoo thiab yooj yim ntawm kev. Tiam sis ua ntej tej yam uas ua ntej.

coding daim ntawv

Computers khiav lag luam tsis tau tsuas yog raws li tus xov tooj paub daws teeb zauv khoom, raws li zoo raws li lub ob peb ua ke ntawm chav nyob ntawm cia thiab tuav tsau-size cov ntaub ntawv - byte thiab 32-ntsis lus. Encoding txheem yuav tsum noj cov tshuaj no mus rau hauv tus account thaum txiav txim yuav ua li cas los qhia tus naj npawb ntawm cov cim.

Nyob rau hauv computer systems, lub zauv muab cia rau hauv lub cim xeeb hlwb ntawm 8 khoom (1 byte), 16 los yog 32 me me. Txhua daim ntawv hais ib Unicode encoding, uas ib theem zuj zus ntawm lub cim xeeb hlwb yog ib qho integer coj mus rau ib tug kev cim. Nyob rau hauv tus txheej txheem muaj peb hom sib txawv ntawm coding Unicode cim 8, 16 thiab 32-ntsis blocks. Raws li, lawv yog hu ua UTF-8, UTF-16 thiab UTF-32. Lub npe UTF stands rau Unicode Transformation Format. Txhua yam ntawm tus peb cov ntaub ntawv ntawm encoding txhais tau tias yog sib npaug zos sawv cev Unicode cim muaj zoo nyob rau hauv ntau yam ntaub ntawv sau npe.

Cov ntaub ntawv encryption yuav siv los sawv cev rau tag nrho cov cim nyob rau hauv lub Unicode txheem. Yog li, lawv yog siab tshaj rau kev daws teeb meem rau ib tug ntau yam yog vim li cas, siv ntau yam ntaub ntawv ntawm coding. Txhua coding yuav unambiguously yuav hloov dua siab tshiab rau hauv ib yam ntawm cov ob tsis poob ntawm cov ntaub ntawv.

nenalozheniya hauv paus ntsiab lus

Txhua yam ntawm cov ntaub ntawv Unicode encoding tsim nyob rau hauv view ntawm cov ib nrab sib tshooj. Piv txwv li, lub qhov rais-932 ntaub ntawv lub cim ntawm ib tug los yog ob tug bytes ntawm code. Cov kab ntev nyob rau hauv cov thawj byte, ces tus leading byte qhov tseem ceeb nyob rau hauv lub series ntawm ob-byte thiab ib byte disjoint. Txawm li cas los, qhov muaj nqis ntawm ib tug tib byte thiab trailing byte ib theem zuj zus yuav coincide. Qhov no txhais tau tias piv txwv li hais tias cov cim nrhiav D (code 44) yuav nrhiav tau nws yuam kev nkag mus kawm rau hauv lub thib ob feem ntawm ib theem zuj zus ntawm ob-byte cim "D" (code 84 44). Yuav kom nrhiav tau tawm uas ib theem zuj zus yog muaj tseeb, qhov kev pab cuam yuav tsum coj mus rau hauv tus account lub yav dhau los bytes.

Qhov teeb meem no yog nyuab, yog hais tias tus ua thiab trailing bytes match. Qhov no txhais tau tias nyob rau hauv thiaj li yuav tshem tawm cov ambiguity yuav ua tau ib tug rov qab saib ua ntej ncav lub pib ntawm cov ntawv nyeem los yog cov cim code ib theem zuj zus. Qhov no tsis yog tsuas yog inefficient, tab sis yog tsis muaj kev tiv thaiv los ntawm tau uas tsis, vim tsuas yog ib qho tsis ncaj ncees lawm byte mus rau tag nrho cov phau ntawv tau ua nyeem.

Hom hloov dua siab tshiab Unicode txhob qhov teeb meem no vim hais tias tus nqi ntawm cov kev, trailing, thiab ib chav tsev xwb cia yog tsis tau tib yam ntaub ntawv. Qhov no kom hais tias tag nrho cov Unicode rau searching thiab kev sib piv, yeej tsis muab erroneous tau vim lub coincidence ntawm ntau qhov chaw ntawm cov ua cim code. Qhov tseeb hais tias cov ntaub ntawv ntawm coding soj ntsuam cov hauv paus ntsiab lus nenalozheniya, distinguishes lawv los ntawm lwm East Asian multi-byte encodings.

Lwm nam nonintersection Unicode encodings yog tias txhua tus ua cim muaj ib tug kom meej meej tseg ciam teb. Eliminates qhov no yuav tsum tau luam theej duab ib indefinite tooj ntawm yav dhau los cim. Qhov no feature yog tej zaum hu ua self-clocking encoding. Distortion ntawm code units yuav qhia ib tug distortion ntawm tsuas yog ib cim, thiab cov uas lwm tus cim yog tseem lawm. Nyob rau hauv lub 8-ntsis hom hloov dua siab tshiab, yog hais tias tus pointer ntsiab lus mus rau byte, pib nrog 10xxxxxx (nyob rau hauv binary code) mus nrhiav tus pib ntawm lub cim yog yuav tsum tau rau ib mus rau peb rov qab zaus.

taub hau

Unicode Consortium siab txhawb tag nrho 3 yam ntaub ntawv ntawm encodings. Nws yog ib qho tseem ceeb tsis txhob tawm tsam UTF-8 thiab Unicode, raws li tag nrho cov conversion hom - Attendance siv tau ntaub ntawv ntawm cov embodiment ntawm lub Unicode cim-encoding txheem.

Byte-orientation

Los sawv cev rau UTF-32 cim yuav tsum tau ib tug 32-ntsis code unit, uas coincides nrog rau cov Unicode code. UTF-16 - ib mus rau ob 16-ntsis units. Ib tug UTF-8 siv li 4 bytes.

UTF-8 encoding yog tsim los ua tau tshaj byte-taw qhia kom paub ASCII-raws li lub nruab. Feem ntau ntawm cov uas twb muaj lawm software thiab kev xyaum ntawm cov ntaub ntawv technology rau ib ntev lub sij hawm cia siab rau cov sawv cev ntawm cov cim nyob rau hauv ib tug sib lawv liag ntawm bytes. Ntau twg nyob rau hauv lub constancy ntawm ASCII encoding thiab siv yog txhob tshwj xeeb tswj cim. Ib tug yooj yim txoj kev uas yuav hloov mus rau lub sijhawm Unicode tau, siv 8-ntsis coding sawv cev Unicode cim, tej sib npaug ASCII cwj pwm los yog ib tug tswj cim. Yuav kom qhov no kawg, thiab nws yog UTF-8 encoding.

nce mus nce los ntev

UTF-8 - coding ntawm nce mus nce los ntev, muaj raws ntawm 8-ntsis cia lwm, lub sab sauv khoom uas qhia rau uas yog ib feem ntawm ib theem zuj zus ntawm txhua tus neeg byte belongs. Ib tug ntau ntawm qhov tseem ceeb allotted mus rau tus thawj lub caij ntawm cov kev cai theem zuj zus, lwm - rau tom ntej no. Qhov no yog qhia disjointness encoding.

ASCII

UTF-8 encoding yog mas txaus siab ASCII cov lis dej num (0x00-0x7F). Qhov no txhais tau tias lub Unicode cim U + 0000-U + 007F yog hloov dua siab tshiab rau hauv ib byte 0x00-0x7F UTF-8 thiab li ua indistinguishable los ntawm ASCII. Ntxiv mus, kom tsis txhob ambiguity, tus nqi 0x00-0x7F tsis tau siv ntau nyob rau hauv ib byte sawv cev ntawm Unicode cim. Yuav kom encode cim neideograficheskih lwm tshaj ASCII, siv ib tug sib lawv liag ntawm ob bytes. Cim li U + 0800-U + FFFF yog sawv cev los ntawm peb cov bytes, thiab ntxiv leb uas muaj ntau tshaj U + FFFF yuav tsum tau plaub bytes.

kheej ntawm daim ntawv thov

UTF-8 encoding feem ntau yog muab nyiam nyob rau hauv lub HTML raws tu qauv, thiab cov zoo li.

XML tau ua tus thawj txheem nrog rau tag nrho cov nyiaj them yug rau UTF-8 encoding. Standards koom haum kuj pom zoo kom nws. Kev them nyiaj yug teeb meem nyob rau hauv lub URL chaw nyob uas yog txawv los ntawm lub ASCII-cim, twb daws thaum lub Consortium W3C thiab IETF engineering pab pawg neeg tuaj mus rau ib qho kev pom zoo rau cov coding ntawm tag nrho cov URL chaw nyob heev dua lwm yam nyob rau hauv UTF-8.

Compatibility nrog ASCII tswj cov kev hloov mus rau qhov tshiab software. Nrog UTF-8 ua hauj lwm feem ntau ntawv nyeem editors, xws li JEdit, Emacs, BBEdit, dab noj hnub, thiab "Notepad" lub qhov rais operating system. Tsis muaj lwm yam daim ntawv ntawm encoding Unicode yuav tsis khav ntawm xws li ib tug kev pab txhawb nqa ntawm lub cuab tam.

coding kom zoo dua yog hais tias nws muaj ib tug sib lawv liag ntawm bytes. Nrog UTF-8 hlua yog ib qho yooj yim mus ua hauj lwm nyob rau hauv C thiab lwm yam cajmeem lus. Qhov no yog qhov daim ntawv ntawm encoding, qhov kev txiav txim tsis yuav tsum tau daim ntawv bytes BOM los yog ib tug encoding tshaj tawm nyob rau hauv XML.

self-synchronization

Nyob rau hauv ib qho chaw uas siv 8-ntsis cim ntawm lub ua piv nrog rau lwm cov multi-byte cim poob lawm, UTF-8 muaj raws li nram no zoo:

  • Tus thawj byte code ib theem zuj zus muaj cov lus qhia txog nws kav ntev npaum. Qhov no yuav tsub lub efficiency ntawm qhov ncaj nrhiav.
  • Yooj yim zog nrhiav thaum pib ntawm lub cim yog qhov pib ntawm byte yog tsuas yog siv rau ib tug taag ntau ntawm qhov tseem ceeb.
  • Tsis tshuam byte qhov tseem ceeb.

Sib piv cov kev pab

UTF-8 encoding yog compact. Tab sis thaum siv rau encoding East Asian cim (Suav, Japanese, Korean, Suav sau ntawv siv tej yam tshwm sim) siv 3-byte sequences. Tsis tas li ntawd UTF-8 encoding yog tsis tau zoo mus rau lwm cov ntaub ntawv ntawm cov cai kev ceev. Ib tug binary sorting kab ua tus raug tib yam li cov binary sorting Unicode.

Lub cim encoding tswvyim

Lub cim encoding tswvyim comprises encoding cim daim ntawv thiab txoj kev rau ib byte qhov chaw code units. Yuav kom txiav txim seb lub encoding tswvyim Unicode txheem muab kev siv ntawm ib tug thawj zaug byte thiaj cim (BOM, Byte thiaj cim).

Thaum lub BOM nyob rau hauv UTF-8 feature tag yog tsuas los ntawm kev siv rau tus siv cov ntaub ntawv ntawm coding. Teeb meem nyob rau hauv kev txiav txim endian UTF-8 muaj, raws li nws encoding tsev loj yog ib tug byte. Siv cov BOM rau daim ntawv no ntawm coding yog tej nuj nqis yuav tsum tau tsis pom zoo. BOM zaum yuav tshwm sim nyob rau hauv cov ntawv nyeem los hloov dua siab tshiab los ntawm lwm codings siv byte thiaj cim los yog kos npe rau UTF-8 encoding. Yog ib tug sib lawv liag ntawm 3 bytes EF BB 16 16 BF 16.

Yuav ua li cas teem lub UTF-8 encoding

Cov HTML cai UTF-8 yog ntsia nrog lub nram qab no:

lub taub hau

Meta http-equiv = "Cov ntsiab lus-hom" ntsiab lus = "text / html; charset = utf-8" ˃

Nyob rau hauv PHP UTF-8 encoding yog teem siv hauv lub header () muaj nuj nqi nyob rau thaum pib ntawm cov ntaub ntawv tom qab teev cov qhov tso zis theem nqi yuam kev:

˂? Php

error_reporting (-1);

header ( "Cov ntsiab lus-hom: cov ntawv nyeem / html; charset = utf-8 ');

Yuav kom txuas mus rau ib tug MySQL database UTF-8 encoding yog teev:

˂? Php

mysql_set_charset ( 'UTF8');

Lub CSS-cov ntaub ntawv encoding yog UTF-8 cim teev raws li nram no:

@charset "utf-8";

Thaum koj cawm cov ntaub ntawv ntawm tag nrho cov hom xaiv UTF-8 encoding tsis BOM, txwv tsis pub qhov chaw yuav tsis ua hauj lwm. Ua li no nyob rau hauv DreamWeave yuav tsum xaiv cov zaub mov yam khoom "Modifications - Page khoom - Title / Encoding" hloov lub encoding rau UTF-8. Raws li los ntawm reloading cov nplooj ntawv, tshem tawm cov kos cim los ntawm "Connect Unicode kos npe (BOM)» thiab siv cov kev hloov. Yog hais tias cov ntawv nyeem nyob rau hauv ib daim ntawv los yog nyob rau hauv ib tug database twb qhia lwm daim ntawv ntawm coding, nws yog tsim nyog los rov qab mus los yog re-encode. Thaum koj ua hauj lwm nrog cov kab zauv, nco ntsoov siv cov modifier u.

Koj yuav tau txuag cov ntaub ntawv nyob rau hauv UTF-8 encoding nyob rau hauv lub "Notepad" ntawm lub qhov rais. Tom qab koj xaiv cov zaub mov yam khoom "Cov ntaub ntawv - Tseg Raws li ..." mus rau nruab qhov tsim nyog hauv daim ntawv ntawm encoding thiab txuag cov ntaub ntawv nyob rau hauv UTF-8.

Nyob rau hauv ib phau ntawv editor Notepad ++, yog tias teem lwm tshaj UTF-8, ntawm cov zaub mov yam khoom "Hloov mus UTF-8 tsis muaj BOM» hloov cov ua cim thiab txuag nyob rau hauv UTF-8.

muaj yog tsis muaj lwm txoj kev

Nyob rau hauv lub ntsiab lus teb ntawm globalization, qhov twg cov nom tswv thiab cov lus ib thaj tsam yog erased, cov ua cim poob lawm uas muaj cov kev yam ntxwv, yog me ntsis siv. Unicode yog ib tug hluas ua cim txheej uas txhawb nqa tag nrho cov localizations. Ib tug UTF-8 - ib qho piv txwv ntawm cov kev siv ntawm Unicode, uas yog:

  • Nws txhawb nqa ib tug ntau yam ntawm cov cuab yeej, nrog rau compatibility nrog ASCII encoding;
  • Nws yog resistant rau distortion cov ntaub ntawv;
  • yooj yooj yim thiab zoo nyob rau hauv cov kev kho mob;
  • yog platform ywj siab.

Nrog lub advent ntawm lub UTF-8 sib cav tswv yim hais txog dab tsi hauv daim ntawv ntawm encoding los yog cim txheej yog zoo dua, nws yuav meaningless.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 hmn.birmiss.com. Theme powered by WordPress.