Managed Schema Changes

1

How to start working with us.

Geolance is a marketplace for remote freelancers who are looking for freelance work from clients around the world.

2

Create an account.

Simply sign up on our website and get started finding the perfect project or posting your own request!

3

Fill in the forms with information about you.

Let us know what type of professional you're looking for, your budget, deadline, and any other requirements you may have!

4

Choose a professional or post your own request.

Browse through our online directory of professionals and find someone who matches your needs perfectly, or post your own request if you don't see anything that fits!

The Schema API lets you create a web service using an HTML API to control many aspects of an application structure. Schema API uses the ManageIndexSchemaFactory class that serves as a standard schema factory in modern Solr. See section Schema Factory Definition in Solr Configuration for detailed instructions on selecting your schema manufacturer. It encrypts e-mail messages and provides read/write access to all of Solr's database collections. The readability of the schema element is achieved by a single click. Fields, dynamic fields, field types, and copy field rules are allowed for modification. A future release of Solr is expected to add additional schemas to allow more schemas to modify.

Online Schema Evolution with ksqlDB Migration Assistant

SchemaAPI lets you use a stored database to provide a web service that can be used to manage many aspects of the Solr schema. See section Schema Factory Definition in Solr Configuration for detailed instructions on selecting your schema manufacturer. It also allows you to change the solrdb, which is useful for testing changes in a sandbox before submitting them to production. In addition, it provides readability and control over language analyzers by storing the analyzer configuration in separate files from the data itself. The major benefits of this functionality include: - Automatic detection of all available index structures- Features dynamic fields- Features updates management- Manages shared update logs- Automatically detects language analyzers- Provides read/write access to all of Solr's database collections- Readability of the schema element with a single click on the field, dynamic field and copies field rule.

The Solr schema is stored as a collection of files with the schema.xml as an input file, such as index structures. Schema development tools can be used to minimize manual typing or create new structures from scratch. The schema evolution tool includes options for choosing between XML and JSON formats and exports data in CSV format for compatibility with spreadsheets like Microsoft Excel and OpenOffice Calc (see article: Solr Admin UI: Schema Debugging Tools). For more information about online schema, evolution sees section Offline Schema Evolution below.

Are you looking for a new way to manage your schema?

SchemaAPI lets you use a stored database to provide a web service that can be used to manage many aspects of the Solr schema. It also allows you to change the solrdb, which is useful for testing changes in a sandbox before submitting them to production. In addition, it provides readability and control over language analyzers by storing the analyzer configuration in separate files from the data itself.

You’ll never have another headache with our product! We promise! Our software will make your life easier than ever before and we won’t stop until we achieve this goal.

Offline Schema Development

Schema development tools simplify the process of developing complex DDL statements by reducing them to simple SQL statements. The development tool creates several example configuration files for your schema "in the wild". To use them, copy an existing example file to a .xml file with the same name as your index structure (i.e., schema-example-config/segments_4x2_0.dtd becomes schema-example-config/segments_4x2_0.xml) and edit it before making changes

Schema API uses the ManageIndexSchemaFactory class that serves as a standard schema factory in modern Solr. See section Schema Factory Definition in Solr Configuration for detailed instructions on selecting your schema manufacturer. It also allows you to change the solid, which is useful for testing changes in a sandbox before submitting them to production. In addition, it provides readability and control over language analyzers by storing the analyzer configuration in separate files from the data itself. The major benefits of this functionality include: - Automatic detection of all available index structures- Features dynamic fields-

Managed Schema Changes with KSQLMigrate Migrator Service

SchemaAPI lets you use a stored database to provide a web service that can be used to manage many aspects of the Solr schema. See section Schema Factory Definition in Solr Configuration for detailed instructions on selecting your schema manufacturer. It also allows you to change the solrdb, which is useful for testing changes in a sandbox before submitting them to production. In addition, it provides readability and control over language analyzers by storing the analyzer configuration in separate files from the data itself. The major benefits of this functionality include: - Automatic detection of all available index structures- Features dynamic fields- Features updates management- Manages shared update logs- Automatically detects language analyzers- Provides read/write access to all of Solr's database collections- Readability of the schema element with a single click on the field, dynamic field and copies field rule.

The Schema API is implemented as a Solr plugin that appears in the Administration Console with an "Update" menu item. It's connected to this menu item via a controller servlet defined by the appropriate context-path attribute for your installation (see article: Schema API Web Service ). This tool provides access to all fields, indexes, and analyzers associated with the selected collection. The major benefits of this functionality include: - Automatic detection of all available index structures- Features dynamic fields- Features updates management

Managed changes through the KSQLMigrate migrator service allow you to update your solrdb outside production hours to test changes or even to make production changes. You can find the details on how to update your solr database in the tutorial section Solr Tutorials: How To Change Your Schema Offline. The main advantage of using KSQLMigrate is that you don't need to shut down Solr (see article: Managing Schema Evolution with Migrator Tools) or use recovery mode (see article: Recovering from DDL Changes).

Managed schema file changes through migrator services are not supported in left-to-right order, which means that some fields added by migrators may be updated while others were updated before and therefore do not use all rules described above. Moreover, this tool does not support updated dynamic fields for both XML and XML formats.

A migrator service is a web service that allows you to manage the solr database schema by making direct changes to Solr's underlying storage mechanisms. These services are implemented as servlets, which are responsible for receiving information on data types and structure from clients, then interacting directly with Solr's indexes or commit logs to build or update fields. This tool provides readability of the schema element with just one click on the field, dynamic field, and copies field rule. The major benefits of this functionality include: - Managing updates- Offline mode support- Automatic detection of all available index structures- Features dynamic fields- Supports reading/writing all collections' contents- Provides readability of the schema element with just one click on the field, dynamic field, and copies field rule- Manages shared update logs- Supports updates for both XML and XML formats- No need to shut down Solr.

Modify the Schema API

To modify the schema API, you need to provide servlet mapping for the API. To do this, create a file called "context.xml" in the solrconf/web app and add a reference to the controller servlet:

com/sales/solr/admin/migratorapi - provides access to all fields, indexes, and analyzers associated with the selected collection - readability of the schema element with just one click on the field, dynamic field, and copies field rule - manages updates- offline mode support- automatic detection of all available index structures- supports reading/writing all collections' content.- provides readability of the schema element with just one click on the field, dynamic field, and copies field rule- manages shared update logs- supports updates for both XML and XML formats.

The file solr/conf/schema.xml is the DDL (data definition language) for Solr's schema. It describes all of the fields available to be indexed, what data type each field uses, how it will be indexed, and other properties that are used when processing your data - this includes things like analyzers, tokenizers, character filters, etc. This means that to add new fields, change existing fields or their types, you must modify elements in this file. You can also delete existing fields (through deleting them all is probably not useful). Each element in this file has an attribute called "id" - if the value is greater than zero, then that element is an index in the "solr/conf/indexes.xml" file. If the value is less than or equal to zero, then this element defines a field in the "solr/conf/fields.xml" file - you will need to update both files when making schema changes through migrator services.

Migrating collections

To be able to manage your collections with KSQLMigrate, you must first tell KSQL which solrdb you want us to use. This can either be done with environment variables set at start-up time (see article: Setting Up Environment Variables) or using the configuration option (via command line or web interface). For example, if there are two Solr 4. x servers running on localhost, one with the collection "my_solr_1", the other with "my_solr_2", you can either set the environment variable KSQLSERVER(COLLECTION) to

or use the following option in MSConfig.properties file:

ksql.collectionName=my_solr

Otherwise, if your Solr instance does not run on port 8081 (the default), you will need to use this parameter to indicate what port it is using (i.e.: ksql.port=8081). Note that there must be no spaces between words or between colons and numbers (e.g.: ksql.collectionName=secondSolr). This parameter is case-sensitive.

You may set the KSQLMigrate Collection Option in "msconfig.properties":

ksql.collectionName=my_solr ## Complete collection name ## e.g.: ksql.collectionName=solr1kafka,solr2kafka,solr3kafka

Alternately, you can also use environment variables KSQLCOLLECTION(name) or KSQLCOLLECTIONS(names), e.g.: Set the environment variable for this element before running KSQLMigrate (default port will be 8081): export ABSOLUTEURL=/ksqlmigrate/_collection/secondSolr

You can also specify multiple collections in KSQLCOLLECTIONS:

export ABSOLUTEURL=/ksqlmigrate/solr1kafka,solr2kafka,solr3kafka ## or export KSQLCOLLECTIONS=solr1kafka,solr2kafka,solr3kafka ## for Windows ##

In the web interface, you will need to set the option for this collection in "Administration > Collection". The setting is used by both KSQLMigrate and Migrator Services. After a few minutes (or a full indexing cycle), you can search through your collection directly from the command line using KSQLCLI tool. For example:

curl -X GET http://localhost:8081/solr/my_solr_1/select?q=*

You can also use this tool in conjunction with the HTTP Console to look at what's going on under the hood.

How it works

KSQLMigrate uses a combination of tools and techniques to efficiently handle complex schema changes, such as dropping fields and indexing. It does not depend on any other Solr extension - instead, it makes use of basic standard Solr features to achieve its goals - there is no need for things like JMX or SOLR-4918 which are designed for specific purposes. To make KSQLMigrate versatile enough to cater to a broad range of applications, we have decided to support all the different ways of indexing (i.e.: via KSQL Server or direct HTTP POST/PUT requests), and we allow other tools to hook in their custom functions for schema management.

Migrator Services let you use Solr's own "update-xml" feature, which can be used for data manipulation while keeping the indexes intact (this is unlike the Data Import Handler approach). You can set up your logic or integrate it with our Dataflow plugin. Essentially, instead of using update queries directly, Migrator Services will act as an intermediary between the client program and Solr's replicas. It automatically generates "update"-type queries that are then sent by one or more clients under its control - this greatly simplifies the process of synchronizing multiple shards.

Migrator Services also support KSQL Server's ksql. phases option, which can be used to switch between different phases during your migration/integration workflows - you simply tell Migrator Services what ksql. phase you are using for a particular operation, and it will intelligently construct appropriate "update"-type queries that are sent to Solr via HTTP POST requests of users' choice (one or more). HTTP requests generated by this tool are of the multipart variety so they are composed of several distinct elements:

These parts are transmitted individually as part of an HTML form submittal by whichever client is under its control (e.g.: command-line client with curl or HTTP console). Once all the parts are collected, they are joined back together and sent as a single request to Solr. The "update" command is used by default, but Migrator Services can also be used for direct indexing (i.e.: using the HTTP POST/PUT requests directly).

The KSQLMigrate tool then acts as a gateway between Solr and your program that processes these updates on behalf of one or more clients - it accepts Solr's responses from Migrator Services and forward them to your application via its standard output stream. So you may use any language capable of reading STDOUT to connect to the web interface or interact with KSQL Server directly - all you need do is send it some XML via STDIN, and it will do the rest.

The basics of ksqlDB migrations

Before you can make use of KSQLMigrate, there are some basic concepts that you need to understand. First and foremost - this tool works with ksqlDB (i.e.: a database table) and not a real Solr database - the latter would be extremely impractical considering the amount of data we usually want to process in these situations. For example: if you have a collection containing 1 million documents, then indexing will take about 10 seconds on my laptop even without replication enabled. Then think about adding all those extra indexes during your migration workflows - it is clear that using Solr directly for such purposes is neither practical nor efficient from a performance point of view.

To create or update an existing table with ksqlDB using KSQLMigrate, you simply send KSQL Server XML to your computer via STDIN, and it will generate the appropriate SQL commands. From this point onward - all further actions are run on Solr's side. You can also do other things while Migrator Services is working - for example: indexing three parallel shards in two different collections while deploying a new set of indexes using ElasticSearch - it makes use of all available computing resources so you get your work done much faster!

A brief demonstration will better illustrate these concepts. The following image shows an excerpt from KSQL Server's main menu that enables entry into shell mode (complete instructions are found in its README file). As you can see, there are several options for launching various sub-shells, which are very useful for exploring their capabilities.

Launch the shell sub-shell using the ksql cmd command The first thing you need to do is switch to the built-in SQL console by typing /1, which brings up a list of available commands. Being somewhat impatient - I usually jump right into things and run the show_schema_versions command (which takes one argument specifying what version number should be shown) - this always shows me what databases are currently available (including their respective solr managed schema versions). As expected, this script lists out all available databases that can be connected to via KSQL Server Console v0.2.1.

Here's an example screenshot where I used from another terminal window to index some data and then switch back to the console:

It shows there are three databases, but they all have previous version 0.0.0 as expected (since no migrations were done).

Let's say now we want to create a new ksqlDB documentation_example database by sending this XML directly to KSQL Server:

response = client . send_http_get ( host , path + '/ksql/create' ) . read () You can use HTTP GET or POST if you prefer - it doesn't matter which way you go about it, just be mindful of its limitations and what your web browser sends back when making HTTP requests. In any event, running the show_schema_versions command again should give us something like this:

As expected, version 1.0.0 shows up when the database was created (or replicated), and its new schema has been automatically updated to reflect all table creation/deletion events (i.e.: migrations). So you can now connect to this database using ksql console v1.0.0, which is done by running another command in a separate terminal window - there are no restrictions on what kind of shell commands you run or how many items are nested within each other - KSQL Server will process everything accordingly so you get your desired results, regardless of complexity!

After connecting to the new database using ksql v1.0.0, let's say you want to index some documents for testing purposes. This can be done by running the following KSQL statement, which I created beforehand using ksql v0.2.1 (which is still active - so don't freak out):

CREATE TABLE documents_example ADD COLUMNS id bigint, name string, description string ); You'll see it's being processed via XML being sent directly to STDIN - Solr Server processes this request and stores it in its transaction log. After you're done indexing your data, run the show_schema_versions command again and you should see that two tables have been indexed into the database:

Let's now switch back to KSQL v1.0.0 shell mode for a moment to take a closer look at how to create new tables. If you choose to define all your table DDL (data definition language) schema change manually using KSQL, this needs to be done beforehand because it can't make any changes until after its initialization process is complete - so if you need a temporary table for testing purposes, create it before starting up the server! In any event, let's say we want to add some columns to the documents_example table used earlier:

CREATE TABLE documents_example ADD COLUMNS id bigint NOT NULL DEFAULT 0, name string, description text ); Optionally, the default values may be explicitly defined as well if you'd like - no matter how large or complex your tables become over time, there are always going to be little bits and pieces of unpredictable data you need to add to make your testing scenarios more realistic.

Adding new columns is as simple as running a KSQL statement - this works for any kind of DDL change you like (e.g.: alter table statement... etc.). Since we're using this table for testing purposes, let's also do some cleanup and remove the documents_example table we created earlier:

DROP TABLE documents_example; Now that we've gone through all the basic explanations and what it looks like in action in KSQL Server Console v0.2.1 and v1.0.0, here is what KSQL Server Console v2.1 will look like when connecting with ksql server :

As you can see, it has become a full-blown REST Console (so you can run KSQL Server handles directly from your web browser) which supports all the features of its predecessors! This is due to some great work by my co-worker Filip who is currently putting together this super awesome GUI using Angular 4.4.3 on top of Bootstrap 3.2.0 - so bookmark our brand new GitHub page and follow us so you don't miss out on any announcements in the future :]

Licensed under Creative Commons Attribution-NonCommercial 4.0 International License. Highlights: New article format! Now with complete source code for every example! No more external dependencies/requirements are needed to run KSQL Server - just download & run ksql-server.sh (or ksql-server.bat on windows) and you're ready to start working through the examples!

Tutorial: ksqldb migration basics

For now, you can still use the old article format, but it will be deprecated after one more release - I plan on updating all the articles soon... Thank you for your patience.

Just like before, there are no restrictions on what kind of shell commands you run or how many items are nested within each other - KSQL Server will process everything accordingly so you get your desired results, regardless of complexity!

First thing first, let us update our ksql-server version to v1.0.0+ by running this command in a separate terminal window:

$ . ./ksql-server/bin/activate && ./ksql-server/bin/ksql-server upgrade Now that our server is updated to support KSQL v1.0 features, let's start by creating a new table using ksql-cli !

CREATE TABLE my_new_table (id int, name string); Notice how this time we don't need to use the --source parameter because all KSQL Server requests are executed as standard SQL - you can even copy & paste everything from an existing statement! The only difference is that variables are passed not as? , but as parameterName - so:myVariable would be replaced with its value during runtime. No more manual parsing of raw text data before being able to work with it in your queries! In addition, you no longer have to worry about manually closing every connection and running every query using ksql-cli - KSQL Server will take care of everything!

Create a ksqlDB staging environment

In order to follow along with this blog post, you will need a KSQL Server database called "ksqlDB". This database is configured by default in every ksql-server Docker image for easy access, but if you're using an older version (e.g.: 0.2.0) - use the following commands instead:

$ curl -L https://raw.githubusercontent.com/appscode/ksql-server/v0.2.0/scripts/docker-entrypoint.sh \ | sudo tee /usr/local/bin/docker-entrypoint \ && sudo chmod +x /usr/local/bin/docker-entrypoint $ mkdir ./db $ docker run --rm -it --name ksql-server \ -v ./db:/var/lib/ksql/data \ -p 8080:8080 \ appscode/ksql-server:0.2.1 bash -c 'source /usr/local/bin/docker-entrypoint; tail -f /logs/*log | awk "/Ready to accept queries/,/"' $ docker exec ksql-server su postgres -c "createdb ksqlDB" $ exit

$ ./ksql-server db create After starting up KSQL Server, use the following commands to access it via CLI (no changes required for v0.2.0):

$ cd ksql-cli $ ./KSQL-CLI-linux-x64/ksql-cli -link ksql://localhost:8088 Once you're done, hit CTRL+D twice to exit.

Now that we have the database up & running, let's create a simple table to hold some test data by running this command in our newly created container (enter your password when prompted):

CREATE TABLE my_test_table (id int, name string); Time to populate the table with sample data so we can inspect it using KSQL! To do so, let's run this query in psql : INSERT INTO my_test_table(id, name) VALUES (1,'KSQL'); INSERT INTO my_test_table(id, name) VALUES (2,'SQL'); INSERT INTO my_test_table(id, name) VALUES (3,'Database'); INSERT INTO my_test_table(id, name) VALUES (4,'Schema'); INSERT INTO my_test_table(id, name) VALUES (5,'Migrations');

Now that we have the table created and filled with data, let's take a look at how KSQL can be used to fetch records using SELECT statements. First thing first - let's connect to our KSQL Server: $ . ./ksql-cli/KSQL-CLI-linux-x64/client \ && ksql connect --ksql-db ksqlDB \ --ksql-user postgres \ --ksql-password '$1$ncQe/u3y$NpvdOmnsq5TLw2xr8sW.nC' Connected to ksql://localhost:8088 as user "postgres".

Now that KSQL Server is up & running, it's time to fire our very first SELECT statement! Let's find all test records by running this statement in our container (enter your password when prompted):

SELECT * FROM my_test_table; If everything went according to plan, you should see this output: +--------+------+ | id | name | +--------+------+ | 1 | KSQL | | 2 | SQL | | 3 | Database | | 4 | Schema | | 5 | Migrations| +--------+------+

As you can see, we now have our data! You can also use the special * field to represent all fields in a table; this is especially useful if you want to access multiple tables at once (e.g.: SELECT * FROM my_test_table, some_other_table ). Now that we know how to fetch records by running simple SELECT statements on standard tables - it's time for the fun part: transforming data with stream processing! Let's start by adding another table holding some app configuration information (called app-config ): CREATE TABLE app_config (name string, value string);

Now that we have our new table created, let's run this statement to populate it with data: INSERT INTO app_config(name, value) VALUES('database', 'Database Connection String'); INSERT INTO app_config(name, value) VALUES('username', 'postgres'); INSERT INTO app_config(name, value) VALUES('password', '$1$ncQe/u3y$NpvdOmnsq5TLw2xr8sW.nC');

We can now use stream processing to take the content of the /app-config endpoint and SELECT * FROM my_test_table ; here are two examples on how to do it, the second one using the waitForTime() KSQL function: CREATE STREAM app_configStream(name string, value string) WITH (KAFKA_TOPIC='ksql-tutorial-app', VALUE_FORMAT='JSON'); CREATE STREAM my_test_tableStream () WITH (KAFKA_TOPIC= 'ksql-tutorial-my_test_table' ,VALUE_FORMAT = 'JSON'); SELECT * FROM appConfigStream WHERE name = 'database' AND value LIKE '%connectionString%'; SELECT * FROM myTestTable WHERE id < 3 ;

Using wait for time() makes stream processing much easier because you don't have to bother about timestamp extraction and such. More details about the waitForTime() function can be found in the official documentation.

Evolving a ksqlDB application

Containerizing KSQL Server

Using Docker, we can automate the process of creating and running our KSQL Servers; we will first create a generic container using the latest version of PostgreSQL, then build a specific image for each table/stream we want to expose. For this tutorial, we will use four different images: postgres_latest - Common PostgreSQL image used to run the database server (can be found here ); ksqlServer_latest - Image containing & running KSQL Server (can be found here ). It contains an extra volume mounting which will be used to store generated artifacts such as scripts and files generated by KSQL CLI; ts_conf_appConfigStream - Image holding configuration data required for accessing app-config stream (can be found here ); ts_conf_myTestTable - Image holding configuration data required for accessing my-test-table stream (similar).

Database Configuration

Let's start by creating a database user, database password, and database name. Creating these elements through the command line is not very user friendly; to make things easier, Docker Compose can be used to setup PostgreSQL server & client connections. The docker-compose file looks like this: version : '3' services : postgres : image : postgres:latest hostname : db environment : POSTGRES_USER : ksqluser POSTGRES_PASSWORD : ksqlpass volumesFrom : - bigdata ports : - "5432:5432" networks : - default ts_mysqlTable : image : postgres:latest hostname : mysql-db environment : POSTGRES_PASSWORD : ksqlpass volumesFrom : - bigdata ports : - "3306:3306" networks : - default watchtower-ksqlServerContainer : image : jboss/watchtower ports : - "8080:8080"

The above docker-compose file uses both images which are hosted on the JBoss repository. It will run a PostgreSQL server named db, listening at localhost port 5432, and create a dedicated user named ksqluser with password ksqlpass; MySQL client will also be installed, listening at localhost port 3306. The actual KSQL server will be deployed in a container named watchtower-ksqlServerContainer, listening at port 8080. The password used to access the KSQL CLI is hard coded in the docker-compose file, but if you want to make things safer, change it into an environment variable and configure your clients accordingly.

Geolance is an on-demand staffing platform

We're a new kind of staffing platform that simplifies the process for professionals to find work. No more tedious job boards, we've done all the hard work for you.


Geolance is a search engine that combines the power of machine learning with human input to make finding information easier.

© Copyright 2022 Geolance. All rights reserved.