Protobuf-netは、オープンストリートマップを脱皮化します
-
10-10-2019 - |
質問
私の人生のために、私はプロトブフファイルをからゆるくすることはできません オープンストリートマップ.
私は次の抽出物を脱上しようとしています: http://download.geofabrik.de/osm/north-america/us-northeast.osm.pbf ノードを取得し、使用しています http://code.google.com/p/protobuf-net/ 図書館として。私はさまざまなオブジェクトの束を脱上しようとしましたが、それらはすべてヌルになります。
Protoファイルはここにあります: http://trac.openstreetmap.org/browser/applications/utils/export/osm2pgsql/protobuf
助言がありますか?
解決
右;問題は、これがそうではないということです ただ protobuf-ハイブリッドファイル形式です(ここで定義されています それ 含まれています 内部的にさまざまなフォーマットの中でプロトブフ。また、圧縮が組み込まれています(ただし、オプションのように見えます)。
私はスペックからできることを引き離しました、そしてここに私はここにcotobuf -netを使用してチャンクを処理するC#リーダーを持っています - それはそのファイルを最後まで喜んで読みます - 私はあなたに4515ブロックがあることを伝えることができます(BlockHeader
)。にたどり着くとき Blob
スペックがどのように境界を登録するかについては少し混乱しています OSMHeader
と OSMData
- 私はここで提案を受け入れています!私も使用しました zlib.net 使用されているZlib圧縮を処理するため。頭をこれに巻き込んでいない場合、私はZLIBデータの処理と主張されたサイズに対してそれを検証することに落ち着きました。少なくとも正気であることを確認しました。
あなたが彼らがどのように分離しているかを理解できる(または著者に尋ねる)ことができるなら OSMHeader
と OSMData
私は喜んで何か他のものをクランクします。私がここでやめたことを気にしないでください - しかし、それは数時間経ちました; P
using System;
using System.IO;
using OpenStreetMap; // where my .proto-generated entities are living
using ProtoBuf; // protobuf-net
using zlib; // ZLIB.NET
class OpenStreetMapParser
{
static void Main()
{
using (var file = File.OpenRead("us-northeast.osm.pbf"))
{
// from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
//A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
//The format is a repeating sequence of:
//int4: length of the BlockHeader message in network byte order
//serialized BlockHeader message
//serialized Blob message (size is given in the header)
int length, blockCount = 0;
while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
{
// I'm just being lazy and re-using something "close enough" here
// note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
// actually need the other way around (network byte order):
uint len = (uint)length;
len = ((len & 0xFF) << 24) | ((len & 0xFF00) << 8) | ((len & 0xFF0000) >> 8) | ((len & 0xFF000000) >> 24);
length = (int)len;
BlockHeader header;
// again, v2 has capped-streams built in, but I'm deliberately
// limiting myself to v1 features
using (var tmp = new LimitedStream(file, length))
{
header = Serializer.Deserialize<BlockHeader>(tmp);
}
Blob blob;
using (var tmp = new LimitedStream(file, header.datasize))
{
blob = Serializer.Deserialize<Blob>(tmp);
}
if(blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");
using(var ms = new MemoryStream(blob.zlib_data))
using(var zlib = new ZLibStream(ms))
{ // at this point I'm very unclear how the OSMHeader and OSMData are packed - it isn't clear
// read this to the end, to check we can parse the zlib
int payloadLen = 0;
while (zlib.ReadByte() >= 0) payloadLen++;
if (payloadLen != blob.raw_size) throw new FormatException("Screwed that up...");
}
blockCount++;
Console.WriteLine("Read block " + blockCount.ToString());
}
Console.WriteLine("all done");
Console.ReadLine();
}
}
}
abstract class InputStream : Stream
{
protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
public sealed override int Read(byte[] buffer, int offset, int count)
{
int bytesRead, totalRead = 0;
while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
{
count -= bytesRead;
offset += bytesRead;
totalRead += bytesRead;
pos += bytesRead;
}
return totalRead;
}
long pos;
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override long Position
{
get
{
return pos;
}
set
{
if (pos != value) throw new NotImplementedException();
}
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override bool CanWrite
{
get { return false; }
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
}
class ZLibStream : InputStream
{ // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
private ZInputStream reader; // seriously, why isn't this a stream?
public ZLibStream(Stream stream)
{
reader = new ZInputStream(stream);
}
public override void Close()
{
reader.Close();
base.Close();
}
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
{
// OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
return reader.read(buffer, offset, count);
}
}
// deliberately doesn't dispose the base-stream
class LimitedStream : InputStream
{
private Stream stream;
private long remaining;
public LimitedStream(Stream stream, long length)
{
if (length < 0) throw new ArgumentOutOfRangeException("length");
if (stream == null) throw new ArgumentNullException("stream");
if (!stream.CanRead) throw new ArgumentException("stream");
this.stream = stream;
this.remaining = length;
}
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
{
if(count > remaining) count = (int)remaining;
int bytesRead = stream.Read(buffer, offset, count);
if (bytesRead > 0) remaining -= bytesRead;
return bytesRead;
}
}
他のヒント
マークによるアウトラインのセットアップの後、私は見て最後の部分を理解しました http://git.openstreetmap.nl/index.cgi/pbf2osm.git/tree/src/main.c?h=35116112EB0066C7729A963B292FAA608DDC8AD7
これが最終コードです。
using System;
using System.Diagnostics;
using System.IO;
using crosby.binary;
using OSMPBF;
using PerlLLC.Tools;
using ProtoBuf;
using zlib;
namespace OpenStreetMapOperations
{
class OpenStreetMapParser
{
static void Main()
{
using (var file = File.OpenRead(StaticTools.AssemblyDirectory + @"\us-pacific.osm.pbf"))
{
// from http://wiki.openstreetmap.org/wiki/ProtocolBufBinary:
//A file contains a header followed by a sequence of fileblocks. The design is intended to allow future random-access to the contents of the file and skipping past not-understood or unwanted data.
//The format is a repeating sequence of:
//int4: length of the BlockHeader message in network byte order
//serialized BlockHeader message
//serialized Blob message (size is given in the header)
int length, blockCount = 0;
while (Serializer.TryReadLengthPrefix(file, PrefixStyle.Fixed32, out length))
{
// I'm just being lazy and re-using something "close enough" here
// note that v2 has a big-endian option, but Fixed32 assumes little-endian - we
// actually need the other way around (network byte order):
length = IntLittleEndianToBigEndian((uint)length);
BlockHeader header;
// again, v2 has capped-streams built in, but I'm deliberately
// limiting myself to v1 features
using (var tmp = new LimitedStream(file, length))
{
header = Serializer.Deserialize<BlockHeader>(tmp);
}
Blob blob;
using (var tmp = new LimitedStream(file, header.datasize))
{
blob = Serializer.Deserialize<Blob>(tmp);
}
if (blob.zlib_data == null) throw new NotSupportedException("I'm only handling zlib here!");
HeaderBlock headerBlock;
PrimitiveBlock primitiveBlock;
using (var ms = new MemoryStream(blob.zlib_data))
using (var zlib = new ZLibStream(ms))
{
if (header.type == "OSMHeader")
headerBlock = Serializer.Deserialize<HeaderBlock>(zlib);
if (header.type == "OSMData")
primitiveBlock = Serializer.Deserialize<PrimitiveBlock>(zlib);
}
blockCount++;
Trace.WriteLine("Read block " + blockCount.ToString());
}
Trace.WriteLine("all done");
}
}
// 4-byte number
static int IntLittleEndianToBigEndian(uint i)
{
return (int)(((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8) + ((i >> 24) & 0xff));
}
}
abstract class InputStream : Stream
{
protected abstract int ReadNextBlock(byte[] buffer, int offset, int count);
public sealed override int Read(byte[] buffer, int offset, int count)
{
int bytesRead, totalRead = 0;
while (count > 0 && (bytesRead = ReadNextBlock(buffer, offset, count)) > 0)
{
count -= bytesRead;
offset += bytesRead;
totalRead += bytesRead;
pos += bytesRead;
}
return totalRead;
}
long pos;
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override long Position
{
get
{
return pos;
}
set
{
if (pos != value) throw new NotImplementedException();
}
}
public override long Length
{
get { throw new NotImplementedException(); }
}
public override void Flush()
{
throw new NotImplementedException();
}
public override bool CanWrite
{
get { return false; }
}
public override bool CanRead
{
get { return true; }
}
public override bool CanSeek
{
get { return false; }
}
public override long Seek(long offset, SeekOrigin origin)
{
throw new NotImplementedException();
}
}
class ZLibStream : InputStream
{ // uses ZLIB.NET: http://www.componentace.com/download/download.php?editionid=25
private ZInputStream reader; // seriously, why isn't this a stream?
public ZLibStream(Stream stream)
{
reader = new ZInputStream(stream);
}
public override void Close()
{
reader.Close();
base.Close();
}
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
{
// OMG! reader.Read is the base-stream, reader.read is decompressed! yeuch
return reader.read(buffer, offset, count);
}
}
// deliberately doesn't dispose the base-stream
class LimitedStream : InputStream
{
private Stream stream;
private long remaining;
public LimitedStream(Stream stream, long length)
{
if (length < 0) throw new ArgumentOutOfRangeException("length");
if (stream == null) throw new ArgumentNullException("stream");
if (!stream.CanRead) throw new ArgumentException("stream");
this.stream = stream;
this.remaining = length;
}
protected override int ReadNextBlock(byte[] buffer, int offset, int count)
{
if (count > remaining) count = (int)remaining;
int bytesRead = stream.Read(buffer, offset, count);
if (bytesRead > 0) remaining -= bytesRead;
return bytesRead;
}
}
}
はい、それはfileformat.csのProtogenから来ました(OSM fileformat.proto file ..以下のコードに基づいています。)
package OSM_PROTO;
message Blob {
optional bytes raw = 1;
optional int32 raw_size = 2;
optional bytes zlib_data = 3;
optional bytes lzma_data = 4;
optional bytes bzip2_data = 5;
}
message BlockHeader {
required string type = 1;
optional bytes indexdata = 2;
required int32 datasize = 3;
}
これが、生成されたファイルのブロックヘッダーの宣言です。
public sealed partial class BlockHeader : pb::GeneratedMessage<BlockHeader, BlockHeader.Builder> {...}
- > pb = global :: google.protocolbuffersの使用。
(protocolbuffers.dll)このパッケージが付属しています。
小さなエリアを手に入れようとしましたか? us-pacific.osm.pbfなど
最終的には、エラーメッセージを投稿すると便利です。