[.NET, C#, Data Engineering] One of the ways to dynamically deserialize a part of a JSON document with an unknown structure
Автор
Сообщение
news_bot ®
Стаж: 6 лет 9 месяцев
Сообщений: 27286
In this topic, I will tell you how to dynamically parse and deserialize only part of the whole JSON document. We will create an implementation for .NET Core with C# as a language.Thanks dbcfor this idea!An issue descriptionRecently, at work, I was faced with the following task: to develop some tool to generate flexible reports and configure a schedule for sending these reports to the customers. Not a particularly original task. One of the biggest issues I encounter during the development process was dynamically deserializing a part of provided JSON document with the data for the report. At the compile step, we don't know the structure of this JSON and a list of fields that we need to take for the report. For example, we have the next JSON as a data source for the report. Notice that we will get this JSON in the runtime and at the compile step we don't know the structure of this document:Original JSON
{
"ResponseCode" : 200,
"ResponseMessage" : "Data was received",
"Data":[
{
"Id" : 147,
"FirstName" : "Bob",
"MiddleName" : "Lawrence",
"LastName" : "Welch",
"Age" : 66,
"Profession" : "musician",
"Score" : 10,
"Address": "Hollywood, Los Angeles, California, USA"
},
{
"Id" : 18,
"FirstName" : "Steven",
"MiddleName" : "Paul",
"LastName" : "Jobs",
"Age" : 56,
"Profession" : "businessman",
"Score" : 9,
"Address": "San Francisco, California, USA"
},
{
"Id" : 594,
"FirstName" : "Hanna",
"MiddleName" : "",
"LastName" : "Holasava",
"Age" : 23,
"Profession" : "developer",
"Score" : 10,
"Address": "Gomel, Belarus"
},
{
"Id" : 363,
"FirstName" : "Stephen",
"MiddleName" : "William",
"LastName" : "Hawking",
"Age" : 76,
"Profession" : "scientist",
"Score" : 7,
"Address": "Oxford, United Kingdom"
}
]
}
And imagine that for the report we need only the next fields:
- FirstName
- LastName
- Age
- Score
We need to answer two questions: how to parse only part of the unknown JSON document and how to allow users to tell us which data we need to extract for the report?One of the solutionsI want to suggest one of the solutions. I know that there can be some other ways to solve my problem. I will be happy if you will suggest me something different in the comments.We will use Newtonsoft.Json.Schema NuGet package. It allows us to build custom schemas of the JSON documents and validate that some JSON corresponds to this schema. The main idea is:
- Allow users to define JSON schema with only fields needed for the report
- Mark all subnodes with "additionalProperties" : false property.
- Validate provided JSON document. After validation, we will get a list of JSON tokens with an error.
- Remove all tokens that have an error.
To extract JSchema from some JSON you can use the next tool. For our example using this tool we will get the next schema:JSchema that was generated for our data
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"ResponseCode": {
"type": "integer"
},
"ResponseMessage": {
"type": "string"
},
"Data": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"Id": {
"type": "integer"
},
"FirstName": {
"type": "string"
},
"MiddleName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Profession": {
"type": "string"
},
"Score": {
"type": "integer"
},
"Address": {
"type": "string"
}
},
"required": [
"Id",
"FirstName",
"MiddleName",
"LastName",
"Age",
"Profession",
"Score",
"Address"
]
},
{
"type": "object",
"properties": {
"Id": {
"type": "integer"
},
"FirstName": {
"type": "string"
},
"MiddleName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Profession": {
"type": "string"
},
"Score": {
"type": "integer"
},
"Address": {
"type": "string"
}
},
"required": [
"Id",
"FirstName",
"MiddleName",
"LastName",
"Age",
"Profession",
"Score",
"Address"
]
},
{
"type": "object",
"properties": {
"Id": {
"type": "integer"
},
"FirstName": {
"type": "string"
},
"MiddleName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Profession": {
"type": "string"
},
"Score": {
"type": "integer"
},
"Address": {
"type": "string"
}
},
"required": [
"Id",
"FirstName",
"MiddleName",
"LastName",
"Age",
"Profession",
"Score",
"Address"
]
},
{
"type": "object",
"properties": {
"Id": {
"type": "integer"
},
"FirstName": {
"type": "string"
},
"MiddleName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Profession": {
"type": "string"
},
"Score": {
"type": "integer"
},
"Address": {
"type": "string"
}
},
"required": [
"Id",
"FirstName",
"MiddleName",
"LastName",
"Age",
"Profession",
"Score",
"Address"
]
}
]
}
},
"required": [
"ResponseCode",
"ResponseMessage",
"Data"
]
}
And we need to modify this schema. We will make the next manipulations:
- Remove all required properties.
- Remove unnecessary fields definition (Like ResponseCode, ResponseMessage, Id, MiddleName etc.)
- For all nodes with type "object" add the "additionalProperties": false property
- For the array definition change "items": [ { ... } ] to "items": { ... }
As a result, we will have the next correct schema:Correct JSchema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"additionalProperties": false,
"properties": {
"Data": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"FirstName": {
"type": "string"
},
"LastName": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Score": {
"type": "integer"
}
}
}
}
}
}
Then we need to validate our JSON document with built schema:
var data = JObject.Parse(jsonString);
var isValid = data.IsValid(schema, out IList<ValidationError> errors);
Definitions for jsonString and schema variables you can find here. The errors variable will contains all validation error:
Validation errorsAnd each validation error contains a lot of info about an error. For us important is thePath and ErrorTypefields. We can use this field value to select the unnecessary tokens and ErrorType we can use to ensure that this token is superfluous.
Full info about an error!! Note that in array definition should be "items": { ... } not "items": [ { ... } ]. Otherwise, the validator will check the only first item in the provided array.To remove some node from the Json three we will use the next code:
public static JToken SearchNodeByName(this JToken parentNode, string nodeName)
{
if (parentNode == null || parentNode.Children().Count() == 0)
{
return null;
}
if (parentNode is JProperty && string.Equals(((JProperty)parentNode).Name, nodeName))
{
return parentNode;
}
foreach (JToken children in parentNode.Children())
{
var node = SearchNodeByName(children, nodeName);
if (node != null)
{
return node;
}
}
return null;
}
And this is the way how we will process our validation errors:
var isValid = data.IsValid(schema, out IList<ValidationError> errors);
if (!isValid)
{
foreach (var error in errors)
{
if (error.ErrorType == ErrorType.AdditionalProperties)
{
data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent();
}
}
}
The full code of the program you can find in this spoiler or in my Git repository. Thank you for your attention! Fill free to left any notes in the comment. Full text of the program
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Schema;
using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
namespace JSchemaTester
{
class Program
{
static void Main(string[] args)
{
string jsonString = @"{
'ResponseCode' : 200,
'ResponseMessage' : 'Data was received',
'Data':[
{
'Id' : 147,
'FirstName' : 'Bob',
'MiddleName' : 'Lawrence',
'LastName' : 'Welch',
'Age' : 66,
'Profession' : 'musician',
'Score' : 10,
'Address': 'Hollywood, Los Angeles, California, USA'
},
{
'Id' : 18,
'FirstName' : 'Steven',
'MiddleName' : 'Paul',
'LastName' : 'Jobs',
'Age' : 56,
'Profession' : 'businessman',
'Score' : 9,
'Address': 'San Francisco, California, USA'
},
{
'Id' : 594,
'FirstName' : 'Hanna',
'MiddleName' : '',
'LastName' : 'Holasava',
'Age' : 23,
'Profession' : 'developer',
'Score' : 10,
'Address': 'Gomel, Belarus'
},
{
'Id' : 363,
'FirstName' : 'Stephen',
'MiddleName' : 'William',
'LastName' : 'Hawking',
'Age' : 76,
'Profession' : 'scientist',
'Score' : 7,
'Address': 'Oxford, United Kingdom'
}
]
}";
JSchema schema = JSchema.Parse(@"{
'$schema': 'http://json-schema.org/draft-04/schema#',
'type': 'object',
'additionalProperties' : false,
'properties': {
'Data': {
'type': 'array',
'items':
{
'type': 'object',
'additionalProperties' : false,
'properties': {
'FirstName': {
'type': 'string'
},
'LastName': {
'type': 'string'
},
'Age': {
'type': 'integer'
},
'Score': {
'type': 'integer'
}
}
}
}
}
}");
var data = JObject.Parse(jsonString);
var isValid = data.IsValid(schema, out IList<ValidationError> errors);
if (!isValid)
{
foreach (var error in errors)
{
if (error.ErrorType == ErrorType.AdditionalProperties)
{
data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent();
}
}
}
JToken dataNode = data.SearchNodeByName("Data");
var dataTable = JsonConvert.DeserializeObject<DataTable>(dataNode.First.ToString());
dataTable.PrintToConsole();
Console.ReadKey();
}
}
public static partial class JsonExtensions
{
public static JToken RemoveFromLowestPossibleParent(this JToken node)
{
if (node == null)
{
return null;
}
// If the parent is a JProperty, remove that instead of the token itself.
var property = node.Parent as JProperty;
var contained = property ?? node;
if (contained.Parent != null)
{
contained.Remove();
}
// Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should
if (property != null)
{
property.Value = null;
}
return node;
}
public static JToken SearchNodeByName(this JToken parentNode, string nodeName)
{
if (parentNode == null || parentNode.Children().Count() == 0)
{
return null;
}
if (parentNode is JProperty && string.Equals(((JProperty)parentNode).Name, nodeName))
{
return parentNode;
}
foreach (JToken children in parentNode.Children())
{
var node = SearchNodeByName(children, nodeName);
if (node != null)
{
return node;
}
}
return null;
}
}
public static class DataTableExtensions
{
public static void PrintToConsole(this DataTable data)
{
string[] columnNames = data.Columns
.Cast<DataColumn>()
.Select(column => column.ColumnName)
.ToArray();
Console.WriteLine(string.Join("|", columnNames.Select(c => c.PadRight(15, ' '))));
Console.WriteLine(string.Empty.PadRight(15 * data.Columns.Count, '-'));
foreach (DataRow row in data.Rows)
{
string[] fields = row.ItemArray
.Select(field => field.ToString())
.ToArray();
Console.WriteLine(string.Join("|", fields.Select(c => c.PadRight(15, ' '))));
}
}
}
}
===========
Источник:
habr.com
===========
Похожие новости:
- [Open source, Big Data, Машинное обучение, Kotlin] KotlinDL 0.2: Functional API, зоопарк моделей c ResNet и MobileNet, DSL для обработки изображений
- [Сетевые технологии, Исследования и прогнозы в IT] Отнимаем и делим — исследуем целостность Рунета
- [Apache, Big Data, Hadoop, Data Engineering] Архитектура непрерывной потоковой доставки в Cloudera Flow Management (перевод)
- [Python, Программирование, Работа с 3D-графикой, Учебный процесс в IT] 3D teeth instance segmentation. В темноте, но не один
- [Машинное обучение] Топ 6 библиотек Python для визуализации: какую и когда лучше использовать? (перевод)
- [Big Data] Location Intelligence
- [Программирование, .NET, ASP, C#] 6 типов кода, которого не должно быть внутри контроллеров .NET (перевод)
- [Информационная безопасность, Облачные сервисы] Если не хватает NSX Edge: как клиенты нашего облака переезжают в сервис NGFW
- [Big Data, DevOps] Новый KubernetesExecutor 2.0 в Airflow 2.0 (перевод)
- [Data Mining, Big Data, Аналитика мобильных приложений, Управление продуктом] Хочу всё знать о клиенте! Или как обогатить сухие факты DWH цифровыми путями и свойствами клиента из Amplitude
Теги для поиска: #_.net, #_c#, #_data_engineering, #_json, #_data, #_parsing, #_newtonsoft, #_deserialization, #_.net, #_.net, #_c#, #_data_engineering
Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете голосовать в опросах
Вы не можете прикреплять файлы к сообщениям
Вы не можете скачивать файлы
Текущее время: 22-Ноя 16:06
Часовой пояс: UTC + 5
Автор | Сообщение |
---|---|
news_bot ®
Стаж: 6 лет 9 месяцев |
|
In this topic, I will tell you how to dynamically parse and deserialize only part of the whole JSON document. We will create an implementation for .NET Core with C# as a language.Thanks dbcfor this idea!An issue descriptionRecently, at work, I was faced with the following task: to develop some tool to generate flexible reports and configure a schedule for sending these reports to the customers. Not a particularly original task. One of the biggest issues I encounter during the development process was dynamically deserializing a part of provided JSON document with the data for the report. At the compile step, we don't know the structure of this JSON and a list of fields that we need to take for the report. For example, we have the next JSON as a data source for the report. Notice that we will get this JSON in the runtime and at the compile step we don't know the structure of this document:Original JSON {
"ResponseCode" : 200, "ResponseMessage" : "Data was received", "Data":[ { "Id" : 147, "FirstName" : "Bob", "MiddleName" : "Lawrence", "LastName" : "Welch", "Age" : 66, "Profession" : "musician", "Score" : 10, "Address": "Hollywood, Los Angeles, California, USA" }, { "Id" : 18, "FirstName" : "Steven", "MiddleName" : "Paul", "LastName" : "Jobs", "Age" : 56, "Profession" : "businessman", "Score" : 9, "Address": "San Francisco, California, USA" }, { "Id" : 594, "FirstName" : "Hanna", "MiddleName" : "", "LastName" : "Holasava", "Age" : 23, "Profession" : "developer", "Score" : 10, "Address": "Gomel, Belarus" }, { "Id" : 363, "FirstName" : "Stephen", "MiddleName" : "William", "LastName" : "Hawking", "Age" : 76, "Profession" : "scientist", "Score" : 7, "Address": "Oxford, United Kingdom" } ] }
{
"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "ResponseCode": { "type": "integer" }, "ResponseMessage": { "type": "string" }, "Data": { "type": "array", "items": [ { "type": "object", "properties": { "Id": { "type": "integer" }, "FirstName": { "type": "string" }, "MiddleName": { "type": "string" }, "LastName": { "type": "string" }, "Age": { "type": "integer" }, "Profession": { "type": "string" }, "Score": { "type": "integer" }, "Address": { "type": "string" } }, "required": [ "Id", "FirstName", "MiddleName", "LastName", "Age", "Profession", "Score", "Address" ] }, { "type": "object", "properties": { "Id": { "type": "integer" }, "FirstName": { "type": "string" }, "MiddleName": { "type": "string" }, "LastName": { "type": "string" }, "Age": { "type": "integer" }, "Profession": { "type": "string" }, "Score": { "type": "integer" }, "Address": { "type": "string" } }, "required": [ "Id", "FirstName", "MiddleName", "LastName", "Age", "Profession", "Score", "Address" ] }, { "type": "object", "properties": { "Id": { "type": "integer" }, "FirstName": { "type": "string" }, "MiddleName": { "type": "string" }, "LastName": { "type": "string" }, "Age": { "type": "integer" }, "Profession": { "type": "string" }, "Score": { "type": "integer" }, "Address": { "type": "string" } }, "required": [ "Id", "FirstName", "MiddleName", "LastName", "Age", "Profession", "Score", "Address" ] }, { "type": "object", "properties": { "Id": { "type": "integer" }, "FirstName": { "type": "string" }, "MiddleName": { "type": "string" }, "LastName": { "type": "string" }, "Age": { "type": "integer" }, "Profession": { "type": "string" }, "Score": { "type": "integer" }, "Address": { "type": "string" } }, "required": [ "Id", "FirstName", "MiddleName", "LastName", "Age", "Profession", "Score", "Address" ] } ] } }, "required": [ "ResponseCode", "ResponseMessage", "Data" ] }
{
"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "additionalProperties": false, "properties": { "Data": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "FirstName": { "type": "string" }, "LastName": { "type": "string" }, "Age": { "type": "integer" }, "Score": { "type": "integer" } } } } } } var data = JObject.Parse(jsonString);
var isValid = data.IsValid(schema, out IList<ValidationError> errors); Validation errorsAnd each validation error contains a lot of info about an error. For us important is thePath and ErrorTypefields. We can use this field value to select the unnecessary tokens and ErrorType we can use to ensure that this token is superfluous. Full info about an error!! Note that in array definition should be "items": { ... } not "items": [ { ... } ]. Otherwise, the validator will check the only first item in the provided array.To remove some node from the Json three we will use the next code: public static JToken SearchNodeByName(this JToken parentNode, string nodeName)
{ if (parentNode == null || parentNode.Children().Count() == 0) { return null; } if (parentNode is JProperty && string.Equals(((JProperty)parentNode).Name, nodeName)) { return parentNode; } foreach (JToken children in parentNode.Children()) { var node = SearchNodeByName(children, nodeName); if (node != null) { return node; } } return null; } var isValid = data.IsValid(schema, out IList<ValidationError> errors);
if (!isValid) { foreach (var error in errors) { if (error.ErrorType == ErrorType.AdditionalProperties) { data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent(); } } } using Newtonsoft.Json;
using Newtonsoft.Json.Linq; using Newtonsoft.Json.Schema; using System; using System.Collections.Generic; using System.Data; using System.Linq; namespace JSchemaTester { class Program { static void Main(string[] args) { string jsonString = @"{ 'ResponseCode' : 200, 'ResponseMessage' : 'Data was received', 'Data':[ { 'Id' : 147, 'FirstName' : 'Bob', 'MiddleName' : 'Lawrence', 'LastName' : 'Welch', 'Age' : 66, 'Profession' : 'musician', 'Score' : 10, 'Address': 'Hollywood, Los Angeles, California, USA' }, { 'Id' : 18, 'FirstName' : 'Steven', 'MiddleName' : 'Paul', 'LastName' : 'Jobs', 'Age' : 56, 'Profession' : 'businessman', 'Score' : 9, 'Address': 'San Francisco, California, USA' }, { 'Id' : 594, 'FirstName' : 'Hanna', 'MiddleName' : '', 'LastName' : 'Holasava', 'Age' : 23, 'Profession' : 'developer', 'Score' : 10, 'Address': 'Gomel, Belarus' }, { 'Id' : 363, 'FirstName' : 'Stephen', 'MiddleName' : 'William', 'LastName' : 'Hawking', 'Age' : 76, 'Profession' : 'scientist', 'Score' : 7, 'Address': 'Oxford, United Kingdom' } ] }"; JSchema schema = JSchema.Parse(@"{ '$schema': 'http://json-schema.org/draft-04/schema#', 'type': 'object', 'additionalProperties' : false, 'properties': { 'Data': { 'type': 'array', 'items': { 'type': 'object', 'additionalProperties' : false, 'properties': { 'FirstName': { 'type': 'string' }, 'LastName': { 'type': 'string' }, 'Age': { 'type': 'integer' }, 'Score': { 'type': 'integer' } } } } } }"); var data = JObject.Parse(jsonString); var isValid = data.IsValid(schema, out IList<ValidationError> errors); if (!isValid) { foreach (var error in errors) { if (error.ErrorType == ErrorType.AdditionalProperties) { data.SelectToken(error.Path)?.RemoveFromLowestPossibleParent(); } } } JToken dataNode = data.SearchNodeByName("Data"); var dataTable = JsonConvert.DeserializeObject<DataTable>(dataNode.First.ToString()); dataTable.PrintToConsole(); Console.ReadKey(); } } public static partial class JsonExtensions { public static JToken RemoveFromLowestPossibleParent(this JToken node) { if (node == null) { return null; } // If the parent is a JProperty, remove that instead of the token itself. var property = node.Parent as JProperty; var contained = property ?? node; if (contained.Parent != null) { contained.Remove(); } // Also detach the node from its immediate containing property -- Remove() does not do this even though it seems like it should if (property != null) { property.Value = null; } return node; } public static JToken SearchNodeByName(this JToken parentNode, string nodeName) { if (parentNode == null || parentNode.Children().Count() == 0) { return null; } if (parentNode is JProperty && string.Equals(((JProperty)parentNode).Name, nodeName)) { return parentNode; } foreach (JToken children in parentNode.Children()) { var node = SearchNodeByName(children, nodeName); if (node != null) { return node; } } return null; } } public static class DataTableExtensions { public static void PrintToConsole(this DataTable data) { string[] columnNames = data.Columns .Cast<DataColumn>() .Select(column => column.ColumnName) .ToArray(); Console.WriteLine(string.Join("|", columnNames.Select(c => c.PadRight(15, ' ')))); Console.WriteLine(string.Empty.PadRight(15 * data.Columns.Count, '-')); foreach (DataRow row in data.Rows) { string[] fields = row.ItemArray .Select(field => field.ToString()) .ToArray(); Console.WriteLine(string.Join("|", fields.Select(c => c.PadRight(15, ' ')))); } } } } =========== Источник: habr.com =========== Похожие новости:
|
|
Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете голосовать в опросах
Вы не можете прикреплять файлы к сообщениям
Вы не можете скачивать файлы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете голосовать в опросах
Вы не можете прикреплять файлы к сообщениям
Вы не можете скачивать файлы
Текущее время: 22-Ноя 16:06
Часовой пояс: UTC + 5